WO2013149385A1 - 一种拷贝数变异检测方法和系统 - Google Patents
一种拷贝数变异检测方法和系统 Download PDFInfo
- Publication number
- WO2013149385A1 WO2013149385A1 PCT/CN2012/073545 CN2012073545W WO2013149385A1 WO 2013149385 A1 WO2013149385 A1 WO 2013149385A1 CN 2012073545 W CN2012073545 W CN 2012073545W WO 2013149385 A1 WO2013149385 A1 WO 2013149385A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- window
- cnv
- sample
- breakpoint
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C99/00—Subject matter not provided for in other groups of this subclass
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2535/00—Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
- C12Q2535/122—Massive parallel sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2537/00—Reactions characterised by the reaction format or use of a specific feature
- C12Q2537/10—Reactions characterised by the reaction format or use of a specific feature the purpose or use of
- C12Q2537/165—Mathematical modelling, e.g. logarithm, ratio
Definitions
- the invention relates to the field of bioinformatics technology, in particular to a copy number variation
- CNV is a variant of genomic structure.
- Narrowly defined CNV usually refers to a copy of a DNA fragment on a chromosome.
- the types and causes of genomic structural variation include: 1. deletion (end deletion, intermediate deletion); 2. translocation (mutual translocation, Robertson translocation); 3. inversion; 4. circular chromosome; 5. double-wire Granular chromosomes; 6. Insertion and so on.
- Generalized CNV also includes structural variations such as chromosomal aneuploidy and partial aneuploidy.
- One technical problem to be solved by the present invention is to provide a copy number variation detecting method and system capable of accurately detecting copy number variation including microdeletions/microduplications.
- a copy number variation detecting method comprising: acquiring read order information of at least a part of nucleic acid molecules in a test sample; determining a unique alignment to (genome) reference according to the read order information Sequence sequence Signing; dividing the genome reference sequence into windows, counting the number of sequence tags falling into each window; performing GC correction on the number of sequence tags of each window and correcting according to the number of expected sequence tags corrected by the control sample set to obtain the adjusted number of sequence tags Using the start or end point of the window as the demarcation point, calculate the difference significance value of the numerical group consisting of the adjusted number of sequence labels on both sides, and select the boundary point with the smaller significance value as the candidate CNV breakpoint; for each CNV From the breakpoint to the previous CNV breakpoint and the two CNC breakpoint sequences, calculate the difference significance value of the two numerical populations consisting of the adjusted sequence number of the windows contained in the two sequences, the minimum of each culling Significant candidate CNV breakpoints and update the difference significance values
- the method further comprises the step of sequencing at least a portion of the nucleic acid molecules in the sample to obtain the reading information.
- each window has the same reference unique reads, or each window has the same length.
- the termination threshold is obtained from a control sample set consisting of normal samples.
- performing GC correction on the number of sequence labels of each window includes the following steps: grouping the windows by GC content, obtaining a correction coefficient based on the average number of sequence labels in the group and the average number of sequence labels of all the windows, the sequence of the window.
- the number of tags is corrected to obtain the number of GC-corrected sequence tags.
- the number of expected sequence labels corrected by the comparison sample set is obtained by: calculating a ratio of the number of GC-corrected sequence labels per window to the total number of labels in the control set; based on the ratio, obtaining an average of the corresponding window ratios of all the control samples The number is calculated based on the average number and the total number of sequence labels of the sample to be tested, and the expected value of the number of labels of each window sequence of the sample to be tested is calculated.
- the method further includes: performing a confidence selection on the segment between the CNV breakpoints; the confidence selection includes the following steps: using the control set according to the distribution rule of the adjusted number of sequence tags Determine the adjusted order A confidence interval in which the number of column labels is normal; when the mean value of the number of sequence labels adjusted within the segment is outside the confidence interval, it is considered that the segment between the CNV breakpoints does have an abnormality.
- the number of sequence tags conforms to a normal distribution
- the confidence interval is a 95% confidence interval
- a single chromosome cyclization or a genome-wide horizontal cyclization process is performed when a candidate CNV breakpoint is selected.
- the method further comprises: the test sample is a sample from a human sample, and the test sample comprises amniotic fluid obtained by amniocentesis, villus obtained by biopsy of the villus, and cord blood obtained by puncture of the abdominal umbilical vein.
- a spontaneously aborted fetal tissue or a peripheral blood of a human and/or a genomic DNA molecule of the test sample obtained by a salt extraction method, a column chromatography method, a magnetic bead method, or a SDS method; and/ Or randomly, the genomic DNA molecule of the test sample is subjected to enzymatic cleavage, atomization, ultrasound, or HydroShear method to obtain a DNA fragment; and/or unidirectionally sequence the genomic DNA molecule fragment of the test sample. Or the two-dimensional sequencing to obtain the DNA fragment reading information.
- the method further comprises: adding a different tag sequence to the DNA fragment of each test sample to distinguish different test samples.
- a copy number variation detecting system comprising: a reading order acquiring unit, configured to acquire reading order information of at least a part of nucleic acid molecules in a sample to be tested; and a sequence label determining unit, configured to The reading order information determines a unique alignment to the (genome) reference sequence sequence label; the label number statistics unit is configured to divide the genome reference sequence into a window, and count the number of sequence labels falling into each window; the label number adjustment unit is used for each The number of sequence labels of the window is GC corrected and corrected according to the number of expected sequence labels corrected by the comparison sample set to obtain the adjusted number of sequence labels; the candidate breakpoint selection unit is used to calculate the two points starting or ending with the window as the demarcation point The side consists of the adjusted number of sequence tags to form the difference significance value of the numerical population, and the demarcation point with the smaller significance value is selected as the candidate CNV.
- a breakpoint determining unit configured to calculate, for each of the two CNV breakpoints to the preceding CNV breakpoint and the subsequent CNV breakpoint, the number of the adjusted sequence labels of the window included in the two sequences The difference significance value of the two numerical populations, each time rejecting the least significant candidate CNV breakpoint and updating the difference significance value of the two candidate CNV breakpoints around the rejected candidate CNV breakpoint, loop iteration, until all candidates
- the difference significance value of the CNV breakpoint is less than the termination threshold to determine the CNV breakpoint.
- each window has the same number of reference sequence labels, or each window has the same length.
- the termination threshold is obtained from a control sample set consisting of normal samples.
- the label number adjusting unit includes: a GC correction module, configured to group the windows by GC content, obtain a correction coefficient based on an average number of sequence labels in the group and an average number of sequence labels of all the windows, and perform the correction coefficient on the window sequence number Correcting the number of sequence labels obtained by GC correction; a window correction module, configured to calculate a ratio of the number of sequence labels corrected by GC in each window of the control set to the total number of labels; and based on the ratio, obtain an average of corresponding window ratios of all control samples; The average number and the total number of sequence labels of the sample to be tested are calculated, and the expected value of the number of labels of each window sequence of the sample to be tested is calculated, and the number of the sequenced labels corrected by the GC is corrected by the number of expected sequence labels corrected by the sample set to obtain the adjusted sequence label. number.
- the system further includes a breakpoint filtering unit, configured to determine, according to a distribution rule of the number of sequence tags, after the breakpoint determining unit determines the CNV breakpoint, use the control set to determine a normal confidence interval of the number of sequence tags; When the mean of the number of sequence tags is outside the confidence interval, the segment between the CNV breakpoints is considered to be abnormal.
- a breakpoint filtering unit configured to determine, according to a distribution rule of the number of sequence tags, after the breakpoint determining unit determines the CNV breakpoint, use the control set to determine a normal confidence interval of the number of sequence tags; When the mean of the number of sequence tags is outside the confidence interval, the segment between the CNV breakpoints is considered to be abnormal.
- the adjusted number of sequence tags conforms to a normal distribution
- the confidence interval is a 95% confidence interval
- the candidate breakpoint selection unit performs a single chromosome cyclization or a genome-wide horizontal cyclization process when selecting candidate CNV breakpoints.
- the test sample is a sample from a human sample, the amniotic fluid obtained by amniocentesis, the villus obtained by biopsy of the villus, the cord blood obtained by puncture of the abdominal umbilical vein, and the fetus of spontaneous abortion.
- Peripheral blood of tissue or human body; and/or genomic DNA molecules of said test sample are obtained by salt extraction, column chromatography, magnetic bead method, or SDS method; and/or The genomic DNA molecule of the test sample is randomly interrupted by enzyme digestion, atomization, ultrasound, or HydroShear method to obtain a DNA fragment;
- the DNA fragment reading information is obtained by performing unidirectional sequencing or bidirectional sequencing on the genomic DNA molecule fragment of the test sample.
- test sequences are distinguished by different tag sequences added to the DNA fragments of the test sample.
- An advantage of the copy number variation detection method and system of the present invention is that it is clinically feasible to accurately detect copy number variations including smaller microdeletion/microrepeat regions.
- FIG. 1 is a flow chart showing an embodiment of a copy number variation detecting method of the present invention
- Figure 2 is a flow chart showing another embodiment of the copy number variation detecting method of the present invention.
- Figure 3 is a flow chart showing still another embodiment of the copy number variation detecting method of the present invention.
- Figure 4 is a schematic flow chart showing the analysis of chromosome CNV in an implementation of the present invention.
- Figure 5 is a block diagram showing an embodiment of a copy number variation detecting system of the present invention.
- Figure 6 shows the junction of another embodiment of the copy number variation detecting system of the present invention.
- 7A-7H are diagrams showing the results of detection of eight samples in one application example of the present invention. detailed description
- Copy number variation refers to a change in the copy of a nucleic acid molecule larger than lkb compared to a normal sample nucleic acid sequence.
- the circumstances and causes of copy number variation include: missing, such as micro-missing; insertion, such as micro-insertion, micro-repetition, repetition, inversion, transposition, and complex multi-site variation.
- Aneuploidy refers to the increase or decrease of the number of chromosomes in the genetic material compared to the normal sample, and further includes the increase or decrease of the whole or part of the chromosome.
- the copy number variation in the present invention also includes the case of aneuploidy.
- Sequencing The process of obtaining sample nucleic acid sequence information. Sequencing can be performed by a variety of sequencing methods, including but not limited to the dideoxy chain termination method; preferred high throughput sequencing methods include, but are not limited to, second generation sequencing techniques or single molecule sequencing techniques.
- Second generation sequencing platform (Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010 Jan; ll(l): 31-46) including but not limited to Illumiim-Solexa (GA TM , HiSeq2000TM, etc.), ABI- Solid and Roche-454 (pyrophosphate sequencing) sequencing platforms; single-molecule sequencing platforms (technologies) including but not limited to Helicos' True Single Molecule DNA sequencing, Pacific Biosciences single-molecule real-time sequencing (single molecule) Real-time (SMRTTM) ), and nanopore sequencing technology from Oxford Nanopore Technologies (Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 2446 (4 ) contradict
- Sequencing types can be single-end sequencing and two-way (Pair-end)
- the sequencing length can be 50 bp, 90 bp, or 100 bp.
- the sequencing platform is Illumina/Solexa
- the sequencing type is Pair-end sequencing, resulting in a 100 bp size DNA sequence molecule having a bidirectional positional relationship.
- the sequencing depth of the sequencing can be determined based on the size of the chromosome fragment of the test sample to be tested. The higher the sequencing depth, the higher the sensitivity of the detection, and the smaller the number of deletions and repeats detected.
- the sequencing depth can be 0.1-30 X, i.e., the total amount of data is 0.1-30 times the length of the human genome, for example, in one embodiment of the invention, the sequencing depth is 0.1 X, (2.5 X 108 bp).
- Reads A nucleic acid sequence of a fixed length (generally greater than 20 bp), such as the result of a sequencing sequence produced by a sequencer; by sequence alignment, a specific region or position of the reference sequence can be located.
- Sequence alignment refers to the process by which one or more nucleic acid sequences are compared to a reference sequence. It is common to compare a shorter nucleic acid sequence (such as a read sequence) with a reference genomic sequence to determine the location of a shorter nucleus on the reference genome.
- sequence alignment can be performed by any of a sequence alignment program, ELAND (Efficient local alignment of nucleotide data), SOAP (Short Oligonucleotide Analysis Package), and BWA (Burrows-Wheeler Aligner). .
- the criteria for successful comparison are divided into non-fault-tolerant comparisons (100% match) and partial fault-tolerant comparisons (less than 100% match).
- Sequence tag A read-only (read) of a unique position that can locate a reference sequence (such as a reference genome sequence).
- Reference unique reads A sequence that has a fixed length and a unique alignment position on a reference sequence (usually a reference genome).
- the process of obtaining a reference sequence tag includes, for example, cleaving the reference genome into fixed length sequences, aligning the sequences back to the reference genome, and selecting a sequence uniquely aligned to the reference genome as a reference unique alignment sequence.
- the fixed length depends on the sequence length of the sequencing result of the sequencer, and the specific length can be referred to specifically. Different sequencers get The length of the sequencing results is different. For each sequencing, the length of the sequencing results may be different. The selection of the length is subjective and empirical.
- Tag sequence is a sequence of nucleic acids of a specific length that serves as an identifier.
- each sample can be labeled with a different tag sequence for sample differentiation during sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et Al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol. 5 No. 3), thereby enabling simultaneous sequencing of multiple samples.
- the tag sequence is designed to distinguish between different sequences, but does not affect other functions of the DNA molecule to which the tag sequence is added.
- GC calibration Because there is a certain GC bias between the sequencing batches, the copy number deviation will occur in the high GC or low GC region of the genome, and the sequencing data will be corrected by GC based on the control sample set to obtain the corrected relative in each window. The number of sequencing sequences can remove this bias and improve the accuracy of copy number variation detection.
- Average The average number referred to herein is generally the arithmetic mean or median.
- Number of sequence tags can be based on the initial number of statistics to obtain the number of statistics, or the number of sequence tags can be corrected by other parameters, for example, can be a ratio, in some cases with "copy rate" can exchange.
- Test sample In some cases, it may be referred to as a sample to be tested, and refers to a sample containing a nucleic acid molecule, and the nucleic acid molecule is suspected of being mutated.
- the type of the nucleic acid is not particularly limited and may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), preferably DNA.
- DNA deoxyribonucleic acid
- RNA ribonucleic acid
- Control sample It is a sample that is considered to be normal relative to the sample. Normally, normal means that the phenotype is normal.
- Control sample set refers to a set of control sample compositions. In one embodiment of the invention, the number of control samples in the set is required to be greater than 30.
- sequencing technology has been used more and more widely in the detection of chromosomal variation.
- the present disclosure designs a high-throughput sequencing technology for genome-wide horizontal copy number variation screening, which has high throughput, high specificity, and accurate positioning.
- the advantages By obtaining a sample of the subject, DNA is extracted, high-throughput sequencing is performed, and the obtained data is analyzed to obtain a test result.
- Fig. 1 is a flow chart showing an embodiment of a copy number variation detecting method of the present invention.
- read information of at least a portion of the nucleic acid molecules in the test sample is obtained. At least a portion of the nucleic acid molecules or all of the nucleic acid molecules in the sample can be sequenced to obtain read order information.
- the reading order information of a part of the nucleic acid molecules of the test sample can be obtained, or the reading order information of all the nucleic acid molecules can be obtained.
- a genomic DNA molecule from a test sample is randomly broken to obtain a DNA fragment, and the DNA fragment is sequenced, and then a certain length of reading is obtained. The length of the obtained reading sequence may be within a certain range, and a fixed length reading order can be obtained by the intercepting operation.
- the length of the DNA fragment can range from 50 bp to 1500 bp, for example, 50 bp to 150 bp, 150 bp to 350 bp, 350 bp to 500 bp, 500 bp to 700 bp, 700 bp to 1000 bp, or lOOObp to 1500 bp.
- 50 bp, 90 bp, 100 bp, 150 bp, 300 bp, 350 bp, 500 bp, 700 bp, 1000 bp, 1500 bp can be selected.
- the length of the reading order is different in view of the sequencer
- the sequence tag is selected based on the reading order, the sequence of 20 bp or more in the reading sequence is generally selected for alignment, preferably above 26 bp.
- Step 104 Determine a sequence tag uniquely aligned to the (genome) reference sequence according to the read order information. For example, all or part of the sequence of the reading sequence is aligned with the genomic reference sequence to obtain the site information of the reading sequence on the genome, and the site information of the reading sequence on the specific chromosome is obtained.
- the human genome reference sequence can be the human genome reference sequence in the NCBI database.
- the human genome sequence is the human genome reference sequence of version 36 (hgl8; NCBI Build 36) in the NCBI database, and the alignment software used is SOAPaligner/soap2schreibselection and genomic reference sequence unique alignment
- the DNA fragment reads, ie, the last reading in the human genome reference sequence, that is, the only sequence tag aligned to the (genomic) reference sequence.
- the genome reference sequence is divided into windows, and the number of sequence tags falling into each window is counted.
- the window is determined by the following method:
- the reference genome is interrupted into fragments of the same length as the sample to be detected, and the same parameters of the same alignment software are used for comparison, and the positions on the chromosome that can be uniquely aligned are screened;
- a window is defined by a matching length of a certain length. You can choose between windows to have cross slides.
- the number of sites that can be uniquely aligned in the window is related to the amount of sequencing data of the sample to be tested.
- the number of expected reads of the sample to be tested falling in each window is more than 300, so that the number of reads in the window is consistent with the Poisson distribution. For example, suppose the number of points that the genome can be uniquely aligned is N, the number of valid reads of the sample to be tested is n, and the expected number of reads per window is E, then
- Each window of the reference genome should contain X uniquely aligned sites.
- Step 108 Perform GC correction on the number of sequence labels of each window and perform correction according to the number of expected sequence labels corrected by the comparison sample set to obtain the adjusted number of sequence labels.
- performing GC correction on the number of sequence labels of each window includes the steps of: grouping the windows by GC content, obtaining a correction coefficient based on the average number of sequence labels in the group and the sequence label average of all windows, The number of window sequence labels is corrected to obtain the number of GC-corrected sequence labels; The set of corrected expected sequence tag numbers is obtained by: calculating a ratio of the number of GC-corrected sequence tags per window to the total number of tags in the control set; based on the ratio, obtaining an average of the corresponding window ratios of all control samples; And the total number of sequence labels of the sample to be tested, and calculate the expected value of the number of labels of each window sequence of the sample to be tested.
- Step 110 Taking the start point or the end point of the window as a demarcation point, calculating a difference saliency value of the numerical group consisting of the number of the adjusted sequence labels on both sides, and selecting a demarcation point having a small significance value (ie, the difference is significant)
- Candidate CNV breakpoints For example, a predetermined number of windows are selected as candidate CNV breakpoints based on the P value representing the significance level of the copy number variation on each side of each window in the genome-wide range, and the difference significance value of each candidate CNV breakpoint is obtained, that is, p value.
- Step 112 Calculate the difference between two numerical groups consisting of the number of adjusted sequence labels of the window included in the two segments for each CNV breakpoint to the two CNV breakpoints and the subsequent CNV breakpoint sequence. Significant value, each time the least significant candidate CNV breakpoint is culled and the difference significance value of the two candidate CNV breakpoints around the rejected candidate CNV breakpoint is updated, loop iteration, until the difference of significance of all candidate CNV breakpoints The values are all less than the termination threshold to determine the CNV breakpoint.
- the termination threshold is usually preset. For example, the termination threshold is obtained by analyzing the control sample set consisting of normal samples.
- the loop iteration performs the CNV breakpoint selection, thereby completing the CNV detection, and accurately detecting the smaller copy number variation including the microdeletion/microrepetition region.
- the test sample may be amniotic fluid obtained by amniocentesis, villus obtained from villus biopsy, cord blood obtained by abdominal umbilical vein puncture, fetal tissue of spontaneous abortion or genomic DNA obtained from human peripheral blood.
- Conventional DNA extraction methods such as salting out, column chromatography, magnetic beads, and SDS.
- column chromatography is preferably used.
- the column chromatography refers to the action of blood, tissue or cells through the action of cell lysate and proteinase K to obtain a dew-exposed DNA molecule, and the DNA molecule is combined with a silica gel membrane under high salt conditions. DNA molecules elute from the silica gel membrane at low salt and high pH. See the Tiangen TIANamp Micro DNA Kit (DP316) data sheet for specific principles and methods)).
- each DNA fragment of the sample may be added with a different label sequence (index), and the length may be 4 bp to 12 bp for use in Differentiation of test samples during sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol.5 ⁇ 3) .
- index label sequence
- the length may be 4 bp to 12 bp for use in Differentiation of test samples during sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods, 2008, March, Vol.5 ⁇ 3) .
- the detection of a plurality of test samples can be simultaneously processed, the efficiency is improved, and the detection cost is reduced.
- Fig. 2 is a flow chart showing another embodiment of the copy number variation detecting method of the present invention.
- the genomic DNA molecule of the test sample is randomly interrupted to obtain a DNA fragment.
- Randomly interrupted DNA molecules can be digested, nebulized, sonicated, or HydroShear.
- an ultrasonic method such as Covaris' S-series (based on AFA technology, when the acoustic energy released by the sensor can pass through the DNA sample, the dissolved gas forms a bubble. When the energy is removed, the bubble bursts and produces The ability to break DNA molecules.
- Covaris' S-series based on AFA technology
- Step 204 sequencing the DNA fragment to obtain a DNA fragment sequencing sequence, that is, reading order.
- the read sequence obtained by sequencing can be within a certain length range, and a fixed length read sequence can be obtained from the DNA fragment sequencing sequence by a truncation operation.
- the DNA fragment sequencing sequence refers to a fixed length read sequence.
- Sequencing method used It can be a high throughput sequencing method Illumina/Hiseq2000, ABI/SOLiD, Roche/454.
- the sequencing type can be Single-end (unidirectional) sequencing or Pair-end (bidirectional) sequencing, and the sequencing length can be 50 bp to 1500 bp.
- the sequencing platform is Illumina/Hiseq2000, and the sequencing type is Pair-end sequencing, and a 100 bp DNA sequence molecule having a bidirectional positional relationship is obtained.
- the depth of sequencing can be determined according to the size of the chromosomal variation fragment detected. The higher the sequencing depth, the higher the sensitivity of detection, and the smaller the number of missing and repeated fragments can be detected.
- the amount of sequencing the human test sample is between 2 ⁇ 900 X 10 8 Article sequencing fragments.
- Step 206 Align the reading sequence with the genomic reference sequence to obtain the position information of the reading sequence on the genome.
- step 206 a read order on the unique alignment with the genomic reference sequence is selected as the sequence label.
- Step 208 counting the number of uniquely aligned sequence tags of the DNA fragments falling into each window of the genome. For each test sample, count the number of sequence labels that each window falls (denoted as ⁇ , subscripts i and j represent the window number and sample number, respectively, for distinction, not repeated below).
- Step 210 Determine an average GC content of each window on the genome to determine the window correction coefficient, and obtain the number of corrections of the sequence label of each window according to the correction coefficient. This step mainly corrects the number of sequence labels of the window according to the GC content of each window, which can be called batch correction or GC correction.
- the average GC content of the sequence tags dropped by each window was counted (denoted).
- the sum of the number of G and C bases in all sequence tags is ⁇ , and the total length of all sequence tags is recorded as
- each statistical window is pressed by its ' Grouping, ie ⁇ "the same window is grouped into a group.
- Step 212 Divide the number of corrections of the sequence label of each window by the expected number of the window to obtain the adjusted sequence number of each window, that is, the copy rate.
- the expected number of values for each window is obtained by composing a control set from a normal sample. This step mainly performs correction of the number of sequence labels per window based on normal sample data, which may be referred to as window correction.
- the percentage of relative sequence tags ( ⁇ ) is defined as the ratio of the number of sequence tags ( n " ) in the window to the total number of whole genome tags ( ).
- the copy rate of each window is equal to the number of modified sequence tags "" divided by the expected value in the window (the total number of genome-wide serial tags multiplied by the number of window sequence tags), ie ⁇ XN
- the method of building the library, the sequencing reagent and the type of sequencing should be consistent with the sample to be tested, thereby improving the correction effect of the sample to be tested.
- the sample in the control set should be a normal sample with a sample size greater than 30.
- Step 214 Select a predetermined number of windows as the candidate CNV breakpoints according to the ⁇ value representing the significant difference of the copy number variation on each side of the window in the genome-wide range, and obtain the ⁇ value of each candidate CNV breakpoint.
- Candidate CNV breakpoint selection For all windows of the whole genome, calculate a certain number of windows on the left and right sides of each window (the number of windows is usually greater than 30 or the minimum sample size limit of the test model is satisfied, so that the test model is statistically significant) Copy The difference in the rate of change in the shell rate, according to the level of significant difference in the genome-wide range (p value from small to large), select a certain number (for example, 1% of the total number of windows) of the point (corresponding to the window) as the candidate CNV breakpoint ( Breakpoint, That is, the boundary point of each CNV segment).
- Step 216 culling the least significant candidate CNV breakpoint each time and updating the p-values of the two candidate CNV breakpoints of the culled candidate CNV breakpoint, loop iteration until the p-values of all candidate CNV breakpoints are less than the termination p
- the value ie, the termination threshold
- the terminating p value is obtained from the control set.
- Iterative Merging By successive loop iterations, each time the least significant candidate breakpoint is culled, and the p values of the left and right breakpoints are updated until all p values are less than the end p value.
- the comparison sample may be subjected to the above-described iterative merging operation, and the maximum p-value of the merging of each iteration is recorded, and the iteration is terminated when it is merged into one segment.
- the point where the maximum p value changes most sharply that is, the point where the slope change is most obvious (the point with the largest curvature) in the curve of the p value corresponds to or before the merge process
- the maximum p value is used as the termination threshold.
- the termination of the iterative merging may also be set to the number of segments after the iterative merging equal to the number of predicted segments, for example, when the whole genome is analyzed, the control sample is terminated when the number of merged segments is 24 , terminate p by calculation at this time The average value of the values can effectively obtain the terminating 7 value.
- a single chromosome cyclization means: When calculating the window near the start of a chromosome, if the number of valid windows on the left side is not enough for statistical testing, then a sufficient window is obtained from the tail of the chromosome to calculate it; for the same reason, near the end point The position on the right side where a valid window is not available is obtained from the front end of the chromosome. This operation allows the windows at the front and the end of the chromosome to still be calculated.
- Whole genome level cyclization is to index the tail of the previous chromosome when the number of effective windows at the front end of each chromosome is insufficient, and to index the front end of the next chromosome when the tail is insufficient, and chromosome 1 is connected to the Y chromosome.
- the method further comprises: performing confidence selection on the segment between the CNV breakpoints: determining a normal confidence interval of the copy rate according to the distribution rule of the copy rate; and determining the mean value of the intra-segment copy rate in a confidence interval Outside, it is considered that there is an abnormality in the segment between the CNV breakpoints.
- the copy rate is in a normal distribution with a 95% confidence interval. This step is used to filter the fragmentation results to obtain reliable results. If the average ⁇ of the segment is less than the lower threshold or greater than the upper threshold, it will be output as a positive result.
- Threshold selection Calculate the copy rate distribution of the window in each control sample. According to the central limit theorem, the reading order is random in the window, so the copy rate r is normally distributed, and the significance level is 0.05. Quantile. The mean value in the control set was calculated as the upper and lower thresholds for screening the copy rate variation.
- the accuracy of the detection result is improved by batch correction and window correction.
- the accuracy can be increased by expanding the reference set to reduce the pressure on the amount of starting DNA.
- Figure 3 is a flow chart showing still another embodiment of the method for detecting genomic copy number variation of the present invention.
- the process flow of the control set consisting of normal samples (3A) and the process flow of the test sample (3B) are included.
- process 3A includes:
- step 310A DNA molecules of the control sample are extracted.
- Step 311A the DNA molecules of the control sample are randomly broken into DNA fragments for sequencing, and the DNA fragment test sequence data of the control sample is obtained, that is, the reading order.
- Step 312A comparing the reading of the control sample to the reference genome.
- Step 313A the statistical window uniquely compares the number of readings, that is, the number of sequence labels.
- step 314A batch correction is performed on the control sample.
- Step 315A obtaining a window number expectation value from the control sample for window correction of the test sample.
- Step 316A breakpoint selection and fragmentation. Selecting candidate CNV breakpoints, culling the least significant candidate CNV breakpoints at each time and updating the P values of the two candidate CNV breakpoints around the culled candidate CNV breakpoints, loop iterating, leaving the final segment with a predetermined number of numbers (eg 24) stop.
- a predetermined number of numbers eg 24
- Step 317A determining the termination threshold.
- Process 3B includes:
- Step 310B extracting DNA molecules of the test sample.
- Step 311B the DNA molecules of the test sample are randomly broken into DNA fragments for sequencing, and the DNA fragments of the test samples are read.
- Step 312B the reading of the DNA fragment of the test sample is compared with the reference genome.
- Step 313B go to the statistics window to uniquely compare the number of readings, that is, the number of sequence labels.
- step 314B the test sample is subjected to batch correction.
- step 315B the test sample is subjected to window correction according to the expected number of windows obtained from the control sample.
- Step 316B breakpoint selection and fragmentation.
- step 317B the result is filtered.
- the database construction method, sequencing reagent and sequencing type should be consistent with the test sample, thereby improving the calibration effect of the control sample on the test sample.
- the sample in the control set should be a normal sample with a sample size greater than 30.
- FIG. 4 shows a simplified flow chart of chromosome CNV analysis in one implementation of the invention.
- step 401 DNA extraction and sequencing: After extracting genomic DNA according to the Tiangen DP327-02 Kit operating manual, the library was built according to the Illumina/Hiseq2000 standard database construction process. In this process, the 500 bp DNA molecule itself is added to the linker used for sequencing, and each sample is labeled with a different tag sequence (index), so that data of multiple samples can be made in one sequencing data. differentiate.
- Step 402 Sequence alignment: Sequencing by Illumina/Hiseq2000 sequencing method (other or similar sequencing methods such as ABI/SOLiD can achieve the same or similar effect), each sample obtains a DNA reading of a certain length fragment, and it is combined with the NCBI database.
- the standard human genome reference sequence is subjected to a SOAP2 alignment to obtain information that is read in the corresponding position of the genome.
- only the readings that are uniquely aligned with the human genome reference sequence are selected, that is, only the previous reading sequence, ie the number of sequence tags, is aligned on the human genome reference sequence, as a subsequent CNV analysis.
- Valid data is possible.
- Step 403 PSCC analysis.
- PSCC bioinformatics methods
- Step 404 Perform CNV analysis based on the copy number segment obtained in step 403, and use the test sample copy rates ⁇ 0.7 and ⁇ 1.3 as detection thresholds of fragment deletion and repetition, respectively, and analyze the whole genome horizontal copy number variation segment, and then perform Resultable Visualization.
- the implemented software algorithm is a series of programs developed by Shenzhen Huada Genetic Research Institute for detection of genome-wide copy number variation, collectively referred to as
- PSCC PSCC. It is capable of batch-correcting the sample by data generated by next-generation sequencing technology, and then performing data correction, normalization, and fragmentation with the control set to estimate the degree and magnitude of copy number variation of the sample. At lower sequencing depth
- a single copy number variant (CNV) fragment of about 0.5 Mb can be detected (50 M sequencing short sequences).
- Fig. 5 is a view showing the configuration of an embodiment of the genome copy number variation detecting system of the present invention.
- the system includes: a read order obtaining unit 51, which acquires read order information of at least a part of nucleic acid molecules in the test sample; and a sequence label determining unit 52 that determines a unique alignment to the (genome) reference sequence.
- the tag number counting unit 53 divides the genome reference sequence into windows, and counts the number of sequence tags falling into each window; the tag number adjusting unit 54 performs GC correction on the number of sequence tags of each window and corrects according to the comparison sample set The number of expected sequence tags is corrected to obtain the adjusted number of sequence tags; the candidate breakpoint selection unit 55 uses the start or end point of the window as a demarcation point, and calculates the difference significance value of the numerical group composed of the adjusted number of sequence tags on both sides.
- each window can have the same number of reference sequence tags
- the candidate breakpoint selection unit 55 performs a single chromosome cyclization or a genome-wide horizontal cyclization process when selecting a candidate CNV breakpoint.
- the sequence tag determining unit determines the sequence tag uniquely aligned to the (genome) reference sequence according to the reading order, and the tag number adjusting unit corrects the number of sequence tags of each window, and is determined by the candidate breakpoint selecting unit and the breakpoint.
- the unit performs cyclical iteration of gene significant differences to perform CNV breakpoint selection, thereby completing CNV detection, and accurately detecting copy number variation including smaller microdeletion/microrepetition regions.
- Fig. 6 is a block diagram showing another embodiment of the copy number variation detecting system of the present invention.
- the system includes: a read order obtaining unit 51, a sequence tag determining unit 52, a tag number counting unit 53, a tag number adjusting unit 64, a candidate breakpoint selecting unit 55, and a breakpoint determining unit 56.
- the read order obtaining unit 51, the sequence label determining unit 52, the label number counting unit 53, the candidate breakpoint selecting unit 55, and the breakpoint determining unit 56 can be referred to the detailed description of Fig. 5, and will not be described in detail herein for the sake of brevity.
- the tag number adjustment unit 64 includes a GC correction module 641 and a window correction module 642.
- the GC correction module 641 groups the windows according to the GC content, obtains a correction coefficient based on the average number of the sequence labels in the group and the average number of the sequence labels of all the windows, and corrects the number of the window sequence labels to obtain the GC-corrected sequence label number;
- the window correction module 642 calculates a ratio of the number of sequence labels corrected by the GC in each window of the comparison set to the total number of labels; based on the ratio, obtains an average of the corresponding window ratios of all the control samples; and based on the average number and the sequence label of the sample to be tested Total, calculate the expected value of the number of labels of each window sequence of the sample to be tested, and correct the number of sequence labels that are corrected by the GC to the number of expected sequence labels corrected by the sample set to obtain the adjusted number of sequence tags, also called the copy rate.
- the system further includes a breakpoint filtering unit 67, which determines a normal confidence interval of the copy rate according to a distribution rule of the copy rate after the breakpoint determining unit determines the CNV breakpoint; When the mean of the internal copy rate is outside the confidence interval, it is considered that the segment between the CNV breakpoints does have an abnormality.
- the number of sequence tags conforms to a normal distribution, and the confidence interval is
- the test sample is a sample from a human sample
- the genomic DNA molecule of the test sample is amniotic fluid obtained by amniocentesis, villus obtained by biopsy of the villus, and cord blood obtained by puncture of the abdominal umbilical vein.
- Genomic DNA obtained from spontaneously aborted fetal tissue or human peripheral blood
- genomic DNA molecules of the test sample are obtained by salt extraction, column chromatography, magnetic bead method, or SDS DNA extraction method
- the genomic DNA molecule of the sample is randomly fragmented by digestion, atomization, ultrasound, or HydroShear method
- the DNA fragment sequencing sequence is obtained by unidirectional sequencing or bidirectional sequencing of the genomic DNA molecule fragment of the test sample. .
- Different test samples can be distinguished by different tag sequences added to the DNA fragments of the test sample.
- Samplel The DNA of 8 samples (hereinafter referred to as Samplel, Sample2, Sample3...Sample8) was extracted, and the extracted DNA was constructed according to the modified Illumina/Hiseq2000 standard library process, and the DNA molecules concentrated in 500b were
- each sample was labeled with a different tag sequence, and then hybridized with a complementary junction of the Flowcell surface, and the nucleic acid molecules were clustered under certain conditions, and then sequenced by double-end sequencing on an Illumina Hiseq2000.
- a DNA fragment of length lOObp is listed.
- the DNA of about 100 ng (Quant-IT dsDNA HS Assay kit) obtained from the above amniotic fluid sample was used to construct a modified Illumina/Hiseq 2000 standard procedure, and the specific process refers to the prior art (see http: ⁇ www. Illumina/Solexa standard library specification provided by illumina.com/).
- the size of the DNA library and the insert fragment were determined to be about 500 bp by the 2100 Bioanalyzer (Agilent), and the QPCR was accurately quantified and then sequenced.
- Sequencing In this example, the DNA samples obtained from the above 8 samples were processed according to the Illumina/Solexa officially published ClusterStation and Hiseb2000 (PEsequencing) instructions, so that each sample obtained about 5G data amount for sequencing on the machine. Each sample is separated according to the index tag area.
- the comparison software SOAP2 the sequenced DNA sequence was compared with the human genome reference sequence of the version 36 (hgl8; NCBIBuild36) in the NCBI database to obtain information on the position of the measured DNA sequence at the corresponding position of the genome.
- each statistical window is grouped by the average of the readings falling into it, and the median or arithmetic mean of each group is divided by the median or arithmetic mean of the whole genome level.
- the correction coefficient c s is obtained between the numbers, and the subscript g represents the GC content of the different groups. Put the original read of each window The number ⁇ is multiplied by the correction coefficient to obtain the correction value of the number of sequence tags in each window (denoted as c).
- Data correction ' Select the sequencing data of 90 YH populations as the control set, and define the relative sequence number of labels (p") as the window. Number of sequenced sequence tags
- the copy rate of each window is equal to the number of correction correction sequence labels "" divided by the expected value in the window (the total number of genome-wide sequence labels multiplied by the number of window sequence labels), ie
- Threshold and filtering Filter the fragmented results. If fragment When the average r " is less than 0.7 or greater than 1.3, it is output as a positive result.
- Table 1 CNV results for 8 samples where chr represents the chromosome and T7 represents the chromosome 3 trisomy. XYY has no sex chromosome trisomy abnormalities. 7A-7H are schematic views showing the results of detection of eight samples.
- the microdeletion fragment as small as 0.4M is as large as the entire number of chromosomes. This method can accurately detect and locate, which proves that the detection efficiency and precision are excellent. .
- the invention performs genome-wide copy number variation detection analysis on the applicable population, and is beneficial to providing genetic counseling and providing clinical decision-making basis for accurate pathological determination of patients with micro-deletion syndrome.
- the population to which the present invention is applicable may be all microdeletion patients or potential microdeletion carriers, and the applicable population is merely illustrative of the invention and should not be construed as limiting the scope of the invention.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Immunology (AREA)
- Genetics & Genomics (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Computing Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
Description
Claims
Priority Applications (11)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12873786.3A EP2835752B8 (en) | 2012-04-05 | 2012-04-05 | Method and system for detecting copy number variation |
US14/389,898 US20150056619A1 (en) | 2012-04-05 | 2012-04-05 | Method and system for determining copy number variation |
SG11201406250SA SG11201406250SA (en) | 2012-04-05 | 2012-04-05 | Method and system for detecting copy number variation |
PCT/CN2012/073545 WO2013149385A1 (zh) | 2012-04-05 | 2012-04-05 | 一种拷贝数变异检测方法和系统 |
RU2014144349A RU2014144349A (ru) | 2012-04-05 | 2012-04-05 | Способ и система детекции вариации числа копий |
KR1020147031062A KR101795124B1 (ko) | 2012-04-05 | 2012-04-05 | 복제 수 변이를 검측하기 위한 방법 및 시스템 |
JP2015503724A JP5972448B2 (ja) | 2012-04-05 | 2012-04-05 | コピー数変異を検出する方法及びシステム |
CN201280066929.3A CN104221022B (zh) | 2012-04-05 | 2012-04-05 | 一种拷贝数变异检测方法和系统 |
AU2012376134A AU2012376134B2 (en) | 2012-04-05 | 2012-04-05 | Method and system for detecting copy number variation |
IL234875A IL234875B (en) | 2012-04-05 | 2014-09-29 | A method for detecting a copy number change |
US15/881,902 US11371074B2 (en) | 2012-04-05 | 2018-01-29 | Method and system for determining copy number variation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/073545 WO2013149385A1 (zh) | 2012-04-05 | 2012-04-05 | 一种拷贝数变异检测方法和系统 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/389,898 A-371-Of-International US20150056619A1 (en) | 2012-04-05 | 2012-04-05 | Method and system for determining copy number variation |
US15/881,902 Continuation US11371074B2 (en) | 2012-04-05 | 2018-01-29 | Method and system for determining copy number variation |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013149385A1 true WO2013149385A1 (zh) | 2013-10-10 |
Family
ID=49299922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2012/073545 WO2013149385A1 (zh) | 2012-04-05 | 2012-04-05 | 一种拷贝数变异检测方法和系统 |
Country Status (10)
Country | Link |
---|---|
US (2) | US20150056619A1 (zh) |
EP (1) | EP2835752B8 (zh) |
JP (1) | JP5972448B2 (zh) |
KR (1) | KR101795124B1 (zh) |
CN (1) | CN104221022B (zh) |
AU (1) | AU2012376134B2 (zh) |
IL (1) | IL234875B (zh) |
RU (1) | RU2014144349A (zh) |
SG (1) | SG11201406250SA (zh) |
WO (1) | WO2013149385A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104560697A (zh) * | 2015-01-26 | 2015-04-29 | 上海美吉生物医药科技有限公司 | 一种基因组拷贝数不稳定性的检测装置 |
CN104694384A (zh) * | 2015-03-20 | 2015-06-10 | 上海美吉生物医药科技有限公司 | 线粒体dna拷贝数变异性的检测装置 |
CN105243299A (zh) * | 2015-09-30 | 2016-01-13 | 深圳华大基因科技服务有限公司 | 一种检测cnv的精确断点及断点周围特征的方法及装置 |
WO2016045106A1 (zh) * | 2014-09-26 | 2016-03-31 | 深圳华大基因股份有限公司 | 单细胞染色体的cnv分析方法和检测装置 |
CN107208153A (zh) * | 2015-01-13 | 2017-09-26 | 香港中文大学 | 血浆线粒体dna分析的应用 |
TWI607332B (zh) * | 2016-12-21 | 2017-12-01 | 國立臺灣師範大學 | Correlation between persistent organic pollutants and microRNAs station |
WO2019157791A1 (zh) * | 2018-02-14 | 2019-08-22 | 南京世和基因生物技术有限公司 | 一种拷贝数变异的检测方法、装置以及计算机可读介质 |
CN113362891A (zh) * | 2014-09-12 | 2021-09-07 | 伊鲁米纳剑桥有限公司 | 用短读测序数据检测重复扩增 |
CN115132271A (zh) * | 2022-09-01 | 2022-09-30 | 北京中仪康卫医疗器械有限公司 | 一种基于批次内校正的cnv检测方法 |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224543A (zh) * | 2014-05-30 | 2016-01-06 | 国际商业机器公司 | 用于处理时间序列的方法和装置 |
CN104745718B (zh) * | 2015-04-23 | 2018-02-16 | 北京中仪康卫医疗器械有限公司 | 一种检测人类胚胎染色体微缺失和微重复的方法 |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
KR101848438B1 (ko) | 2015-10-29 | 2018-04-13 | 바이오코아 주식회사 | 디지털 pcr을 이용한 산전진단 방법 |
AU2016355983B2 (en) * | 2015-11-18 | 2021-12-23 | Sophia Genetics S.A. | Methods for detecting copy-number variations in next-generation sequencing |
NZ745249A (en) | 2016-02-12 | 2021-07-30 | Regeneron Pharma | Methods and systems for detection of abnormal karyotypes |
CN105760712B (zh) * | 2016-03-01 | 2019-03-26 | 西安电子科技大学 | 一种基于新一代测序的拷贝数变异检测方法 |
EP3967324A1 (en) | 2016-07-20 | 2022-03-16 | BioNTech SE | Selecting neoepitopes as disease-specific targets for therapy with enhanced efficacy |
CN106520940A (zh) * | 2016-11-04 | 2017-03-22 | 深圳华大基因研究院 | 一种染色体非整倍体和拷贝数变异检测方法及其应用 |
US11993811B2 (en) * | 2017-01-31 | 2024-05-28 | Myriad Women's Health, Inc. | Systems and methods for identifying and quantifying gene copy number variations |
WO2018161245A1 (zh) * | 2017-03-07 | 2018-09-13 | 深圳华大基因研究院 | 一种染色体变异的检测方法及装置 |
CN109097457A (zh) * | 2017-06-20 | 2018-12-28 | 深圳华大智造科技有限公司 | 确定核酸样本中预定位点突变类型的方法 |
CA3085739A1 (en) * | 2017-12-14 | 2019-06-20 | Ancestry.Com Dna, Llc | Detection of deletions and copy number variations in dna sequences |
CN109979529B (zh) * | 2017-12-28 | 2021-01-08 | 北京安诺优达医学检验实验室有限公司 | Cnv检测装置 |
CN109979535B (zh) * | 2017-12-28 | 2021-03-02 | 浙江安诺优达生物科技有限公司 | 一种胚胎植入前遗传学筛查装置 |
CN108256289B (zh) * | 2018-01-17 | 2020-10-16 | 湖南大地同年生物科技有限公司 | 一种基于目标区域捕获测序基因组拷贝数变异的方法 |
KR102036609B1 (ko) * | 2018-02-12 | 2019-10-28 | 바이오코아 주식회사 | 디지털 pcr을 이용한 산전진단 방법 |
CN108415886B (zh) * | 2018-03-07 | 2019-04-05 | 清华大学 | 一种基于生产工序的数据标签纠错方法及装置 |
CN108664766B (zh) * | 2018-05-18 | 2020-01-31 | 广州金域医学检验中心有限公司 | 拷贝数变异的分析方法、分析装置、设备及存储介质 |
WO2021114139A1 (zh) * | 2019-12-11 | 2021-06-17 | 深圳华大基因股份有限公司 | 一种基于血液循环肿瘤dna的拷贝数变异检测方法和装置 |
CN111261225B (zh) * | 2020-02-06 | 2022-08-16 | 西安交通大学 | 一种基于二代测序数据的反转相关复杂变异检测方法 |
CN113496761B (zh) * | 2020-04-03 | 2023-09-19 | 深圳华大生命科学研究院 | 确定核酸样本中cnv的方法、装置及应用 |
DE102020116178A1 (de) * | 2020-06-18 | 2021-12-23 | Analytik Jena Gmbh | Verfahren zum Erkennen einer Amplifikationsphase in einer Amplifikation |
CN111968701B (zh) * | 2020-08-27 | 2022-10-04 | 北京吉因加科技有限公司 | 检测指定基因组区域体细胞拷贝数变异的方法和装置 |
CN114220481B (zh) * | 2021-11-25 | 2023-09-08 | 深圳思勤医疗科技有限公司 | 基于全基因组测序完成待测样本的核型分析的方法、系统和计算机可读介质 |
CN114999573B (zh) * | 2022-04-14 | 2023-07-07 | 哈尔滨因极科技有限公司 | 一种基因组变异检测方法及检测系统 |
CN114758720B (zh) * | 2022-06-14 | 2022-09-02 | 北京贝瑞和康生物技术有限公司 | 用于检测拷贝数变异的方法、设备和介质 |
CN114864000B (zh) * | 2022-07-05 | 2022-09-09 | 北京大学第三医院(北京大学第三临床医学院) | 一种动态鉴定人类单细胞染色体拷贝数的方法 |
CN116386718B (zh) * | 2023-05-30 | 2023-08-01 | 北京华宇亿康生物工程技术有限公司 | 检测拷贝数变异的方法、设备和介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004044225A2 (en) * | 2002-11-11 | 2004-05-27 | Affymetrix, Inc. | Methods for identifying dna copy number changes |
CN101449161A (zh) * | 2006-05-03 | 2009-06-03 | 人口诊断公司 | 评估遗传病缺陷的方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7979215B2 (en) * | 2007-07-30 | 2011-07-12 | Agilent Technologies, Inc. | Methods and systems for evaluating CGH candidate probe nucleic acid sequences |
US20120178635A1 (en) * | 2009-08-06 | 2012-07-12 | University Of Virginia Patent Foundation | Compositions and methods for identifying and detecting sites of translocation and dna fusion junctions |
JP2011078409A (ja) * | 2009-09-10 | 2011-04-21 | Fujifilm Corp | アレイ比較ゲノムハイブリダイゼーション法による核酸変異解析法 |
-
2012
- 2012-04-05 JP JP2015503724A patent/JP5972448B2/ja active Active
- 2012-04-05 KR KR1020147031062A patent/KR101795124B1/ko active IP Right Grant
- 2012-04-05 SG SG11201406250SA patent/SG11201406250SA/en unknown
- 2012-04-05 CN CN201280066929.3A patent/CN104221022B/zh active Active
- 2012-04-05 WO PCT/CN2012/073545 patent/WO2013149385A1/zh active Application Filing
- 2012-04-05 AU AU2012376134A patent/AU2012376134B2/en active Active
- 2012-04-05 EP EP12873786.3A patent/EP2835752B8/en active Active
- 2012-04-05 US US14/389,898 patent/US20150056619A1/en not_active Abandoned
- 2012-04-05 RU RU2014144349A patent/RU2014144349A/ru not_active Application Discontinuation
-
2014
- 2014-09-29 IL IL234875A patent/IL234875B/en active IP Right Grant
-
2018
- 2018-01-29 US US15/881,902 patent/US11371074B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004044225A2 (en) * | 2002-11-11 | 2004-05-27 | Affymetrix, Inc. | Methods for identifying dna copy number changes |
CN101449161A (zh) * | 2006-05-03 | 2009-06-03 | 人口诊断公司 | 评估遗传病缺陷的方法 |
Non-Patent Citations (5)
Title |
---|
METZKER ML: "Sequencing technologies-the next generation", NAT REV GENET., vol. 11, no. 1, January 2010 (2010-01-01), pages 31 - 46 |
MICAH HAMADY; JEFFREY J WALKER; J KIRK HARRIS ET AL.: "Error-correcting barcoded primers forpyrosequencing hundreds of samples in multiplex", NATURE METHODS, vol. 5, no. 3, March 2008 (2008-03-01) |
RUSK, NICOLE: "Cheap Third-Generation Sequencing", NATURE METHODS, vol. 6, no. 4, 1 April 2009 (2009-04-01), pages 2446 |
See also references of EP2835752A4 |
WALD, A.; WOLFOWITZ, J.: "The Annals of Mathematical Statistics", vol. 11, 1940, article "On a test whether two samples are from the same population", pages: 147 - 162 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113362891A (zh) * | 2014-09-12 | 2021-09-07 | 伊鲁米纳剑桥有限公司 | 用短读测序数据检测重复扩增 |
CN106795551B (zh) * | 2014-09-26 | 2020-11-20 | 深圳华大基因股份有限公司 | 单细胞染色体的cnv分析方法和检测装置 |
WO2016045106A1 (zh) * | 2014-09-26 | 2016-03-31 | 深圳华大基因股份有限公司 | 单细胞染色体的cnv分析方法和检测装置 |
CN106795551A (zh) * | 2014-09-26 | 2017-05-31 | 深圳华大基因股份有限公司 | 单细胞染色体的cnv分析方法和检测装置 |
CN107208153A (zh) * | 2015-01-13 | 2017-09-26 | 香港中文大学 | 血浆线粒体dna分析的应用 |
US11242559B2 (en) | 2015-01-13 | 2022-02-08 | The Chinese University Of Hong Kong | Method of nuclear DNA and mitochondrial DNA analysis |
CN104560697A (zh) * | 2015-01-26 | 2015-04-29 | 上海美吉生物医药科技有限公司 | 一种基因组拷贝数不稳定性的检测装置 |
CN104694384B (zh) * | 2015-03-20 | 2017-02-08 | 上海美吉生物医药科技有限公司 | 线粒体dna拷贝数变异性的检测装置 |
CN104694384A (zh) * | 2015-03-20 | 2015-06-10 | 上海美吉生物医药科技有限公司 | 线粒体dna拷贝数变异性的检测装置 |
CN105243299A (zh) * | 2015-09-30 | 2016-01-13 | 深圳华大基因科技服务有限公司 | 一种检测cnv的精确断点及断点周围特征的方法及装置 |
TWI607332B (zh) * | 2016-12-21 | 2017-12-01 | 國立臺灣師範大學 | Correlation between persistent organic pollutants and microRNAs station |
WO2019157791A1 (zh) * | 2018-02-14 | 2019-08-22 | 南京世和基因生物技术有限公司 | 一种拷贝数变异的检测方法、装置以及计算机可读介质 |
CN115132271A (zh) * | 2022-09-01 | 2022-09-30 | 北京中仪康卫医疗器械有限公司 | 一种基于批次内校正的cnv检测方法 |
CN115132271B (zh) * | 2022-09-01 | 2023-07-04 | 北京中仪康卫医疗器械有限公司 | 一种基于批次内校正的cnv检测方法 |
Also Published As
Publication number | Publication date |
---|---|
US11371074B2 (en) | 2022-06-28 |
AU2012376134A1 (en) | 2014-11-06 |
RU2014144349A (ru) | 2016-05-27 |
EP2835752A4 (en) | 2015-11-18 |
CN104221022A (zh) | 2014-12-17 |
AU2012376134B2 (en) | 2016-03-03 |
US20180148765A1 (en) | 2018-05-31 |
JP2015512264A (ja) | 2015-04-27 |
CN104221022B (zh) | 2017-11-21 |
US20150056619A1 (en) | 2015-02-26 |
KR101795124B1 (ko) | 2017-12-01 |
EP2835752A1 (en) | 2015-02-11 |
SG11201406250SA (en) | 2014-11-27 |
JP5972448B2 (ja) | 2016-08-17 |
EP2835752B8 (en) | 2018-12-26 |
KR20140140122A (ko) | 2014-12-08 |
IL234875B (en) | 2019-03-31 |
EP2835752B1 (en) | 2018-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013149385A1 (zh) | 一种拷贝数变异检测方法和系统 | |
AU2019250200B2 (en) | Error Suppression In Sequenced DNA Fragments Using Redundant Reads With Unique Molecular Indices (UMIs) | |
EP3191993B1 (en) | Detecting repeat expansions with short read sequencing data | |
CN106715711B (zh) | 确定探针序列的方法和基因组结构变异的检测方法 | |
JP2020108377A (ja) | 非侵襲的に胎児の性染色体異数性のリスクを計算する方法 | |
WO2013097062A1 (zh) | 一种遗传变异检测方法 | |
US20200286586A1 (en) | Sequence-graph based tool for determining variation in short tandem repeat regions | |
US20220254442A1 (en) | Methods and systems for visualizing short reads in repetitive regions of the genome | |
RU2825664C2 (ru) | Инструмент на основе графов последовательностей для определения вариаций в областях коротких тандемных повторов | |
RU2799654C2 (ru) | Инструмент на основе графов последовательностей для определения вариаций в областях коротких тандемных повторов | |
TWI564742B (zh) | Methods for determining the aneuploidy of fetal chromosomes, systems and computer-readable media | |
WO2014153755A1 (zh) | 确定胎儿染色体非整倍性的方法、系统和计算机可读介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12873786 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14389898 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2015503724 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20147031062 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2014144349 Country of ref document: RU Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012873786 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2012376134 Country of ref document: AU Date of ref document: 20120405 Kind code of ref document: A |