WO2014040206A1 - 利用基因组测序片段检测拷贝数变异的方法 - Google Patents

利用基因组测序片段检测拷贝数变异的方法 Download PDF

Info

Publication number
WO2014040206A1
WO2014040206A1 PCT/CN2012/001261 CN2012001261W WO2014040206A1 WO 2014040206 A1 WO2014040206 A1 WO 2014040206A1 CN 2012001261 W CN2012001261 W CN 2012001261W WO 2014040206 A1 WO2014040206 A1 WO 2014040206A1
Authority
WO
WIPO (PCT)
Prior art keywords
window
depth
sequencing
coverage
missing
Prior art date
Application number
PCT/CN2012/001261
Other languages
English (en)
French (fr)
Inventor
张帆
罗锐邦
李娜
李英睿
王俊
汪建
杨焕明
Original Assignee
深圳华大基因研究院
深圳华大基因科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因研究院, 深圳华大基因科技有限公司 filed Critical 深圳华大基因研究院
Priority to CN201280075581.4A priority Critical patent/CN104603284B/zh
Priority to PCT/CN2012/001261 priority patent/WO2014040206A1/zh
Publication of WO2014040206A1 publication Critical patent/WO2014040206A1/zh
Priority to HK15109609.7A priority patent/HK1208891A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • CNV Copy Number Variations
  • CNV refers to DNA fragments of more than 1 kb in length on the genome, with differences in copy number compared to the reference genome. This difference can manifest as an increase in copy number—including insertions and duplications, or as a reduction in copy number—including nulls and null genotypes.
  • CNV is widely distributed in the human genome, accounting for more than 10% of the sequence in the genome.
  • CNV detection mainly adopts comparative genomic hybridization technology, which directly obtains the site of genomic DNA variation in the test sample by simultaneously hybridizing the genomic DNA of the test sample and the reference sample with the DNA probe on the microarray chip. Information and copy quantity change information.
  • This technology is costly and has low resolution. This technology has low sensitivity to CNV of 10-25 kb.
  • CNV detection Fluorescence quantitative PCR technology, the disadvantage is that one reaction can only measure one CNV, and multiple repetitions are required; fluorescence in situ hybridization has the disadvantage that the probe is not Stable, cumbersome and not 100% hybrid; direct sequencing, this technology can detect insertion, rearrangement, breakpoint, but its disadvantages are low efficiency, small coverage; multi-link probe amplification technology, the technology Multiple CNVs can be measured simultaneously in one reaction, but the disadvantage is that the coverage is small and the size of the CNV itself is limited. A common disadvantage of using these techniques to detect CNV is that it is costly.
  • the current CNV detection method based on high-throughput sequencing results is mainly based on the results of paired-end read mapping (PEM).
  • PEM paired-end read mapping
  • the limitation of PEM is that multiple types of CNVs, including insertions and variations of large fragments in complex genomic regions, are difficult to detect and have limitations for insertion detection beyond average insertion libraries.
  • SUMMARY OF THE INVENTION In order to overcome the above limitations of CNV detection sensitivity, detection length limitation, cumbersome operation, high cost, and the like, the present invention provides a method for detecting copy number variation. The method evaluates the local copy number by analyzing the depth of coverage of the sequence of the genomic sequence and its corresponding position.
  • the method for detecting copy number variation based on the coverage depth of a sequenced segment includes the steps of:
  • the window is defined as one of the following four types: a. Normal window: A window with the same depth as the average coverage depth of the sequence.
  • c missing window a window covering a depth significantly smaller than the average depth of coverage
  • N window basically no window covering depth
  • SD is the average coverage depth of the sequence by averaging the coverage depths of all the loci Standard deviation
  • step 1) is included: assessing whether the sequencing results are acceptable, re-sequencing if unsatisfactory, and removing the linker sequences if a linker sequence is introduced during the sequencing process.
  • the method of the present invention can sensitively detect CNV falling in complex regions rich in structural variation, and has the advantages of no detection length limitation, simple operation, and low cost, which are other in the field. Technology can't reach it.
  • BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described more fully, and an exemplary embodiment of the invention will be described in detail.
  • the method for detecting copy number variation of the present invention may include the following steps:
  • the target sample is sequenced, and it is preferred to evaluate whether the sequencing result is acceptable. If it is unqualified, it needs to be re-sequenced; if the joint is used in the sequencing, the introduced joint is also removed.
  • the sequencing method is more than four, and the sequencing method for providing the sequencing data of the present invention is, for example, 454 sequencing, Illumina sequencing.
  • the length of the sequenced fragment is typically 90 bp or 100 bp.
  • the sequencing fragment obtained by Illumian sequencing is generally 90 bp in length and is double-end sequenced.
  • the length of the sequencing fragment used in the method of the present invention may be 100 bp, preferably 90 bp.
  • the depth of sequencing can be ⁇ , preferably 20 ⁇ , and most preferably 30 ⁇ or more.
  • a 35x sequencing depth can be used.
  • the evaluation of sequencing results can mainly include the following two aspects: Whether the ratio of complementary base contents is balanced, for example, the ratio of G/C bases, ie, G bases, and C bases. Whether the ratio is close to or so. For example, in the art, the G/C ratio of 3 times the mean value is usually used. If the difference falls outside the range, the sequencing result is unsatisfactory; the quality of the base and ⁇ (base uncertainty in the sequencing result) The content, if low quality bases, indicates that the sequencing results are unqualified.
  • the sequenced fragment obtained above is compared with the reference genome sequence, and the repetition and redundancy in the above alignment result are preferably removed, and the sequence information and the coverage depth information of the statistical site are included, that is, the alignment result is covered in the alignment result.
  • “repetition” when used for comparison results means that the sequencing fragment should only be tested once, and the result is measured multiple times due to PCR, and multiple sequencing fragments show completely identical genetic content.
  • the reference sequence typically selects a sequence whose sequence has been determined, for example, may be from a public database, or may be from a commercial database.
  • the reference sequence can be the sequence of the human genome hgl8 or hgl9.
  • hgl9 and hgl9 has more bases than hgl8, that is, the sample comparison rate will be relatively high, so hgl9 0 is preferred.
  • the sequence information of the site is the sequencing fragment containing the site in the alignment result, and the coverage depth information of the site is the number of sequencing fragments including the site in the comparison result.
  • Sequence alignment can be performed by any sequence alignment program, such as Short Oligonucleotide Analysis Package (SOAP) and BWA.
  • SOAP Short Oligonucleotide Analysis Package
  • BWA BWA
  • sequenced fragments are aligned with the reference genome sequence to obtain the position of the sequenced fragment on the reference genome.
  • sequence alignment can be performed using the default parameters provided by the program, or the parameters can be selected by those skilled in the art as needed.
  • results can be compared, for example, to remove sequences whose alignment results fall in multiple locations, because these sequences do not provide a unique alignment position; the recurring sequences are removed, as these sequences may be due to previous experiments.
  • the depth of coverage of the site can be calculated by any method known in the art, and the depth of coverage of the calculated site is based on the number of sequences after sequencing that cover the site. For example, the coverage depth of each site of the reference genome is calculated using the SOAP coverage in the short oligonucleotide analysis package.
  • the coverage depth of all the sites is averaged to obtain the average coverage depth of the sequence, and the coverage depth of all the windows of the reference sequence having the preset segment length is calculated in the same manner.
  • These windows are defined as the following four types One:
  • Normal window A window with the same depth as the average depth of the sequence
  • N window basically no window covering depth
  • the window may be 70 bp to 100 bp, 100 bp. 100 bp to 200 bp or 50 bp to 300 bp, preferably 50 bp to 150 bp, and most preferably about 100 bp.
  • the large window ⁇ (such as 1000 bp) does not provide the exact position of the CNV breakpoint, and large windows cannot accurately detect the short segment CNV.
  • the number distribution of the 100 bp window sequencing fragment is very close to the normal distribution, which makes it possible to assume that the data calculation is also in a normal distribution. Small window-sized sequencing fragments do not conform to a normal distribution.
  • the coverage depth of the window is calculated by counting the sum of the number of sequencing fragments covered by each site divided by the number of sites in the window.
  • a window having the same depth of coverage as the average coverage depth of the sequence refers to a window in which the two are substantially the same.
  • such two coverage depths are not statistically significant.
  • it may be defined in other ways.
  • the difference between the two coverage depths is within 1 time, within 75%, within 50%, or within 20%, such as within 10% or within 5%.
  • the depth of coverage is significantly greater than the average depth of coverage means that the former is at least 1.2 times, at least 1.5 times, at least 2 times, at least 4 times or at least 8 times the latter.
  • the depth of coverage is significantly smaller than the average depth of coverage means that the latter is at least 1.2 times, at least 1.5 times, at least 2 times, at least 4 times or at least 8 times the former.
  • the N window is a window having substantially no depth of coverage, and is preferably a window having no depth of coverage.
  • There is basically no depth of coverage which means that the coverage depth is less than 50%, less than 20%, less than 10%, less than 5% or less than 2% of the average depth.
  • the relative positions between the assembled contig sequences are determinable, but the specific sequence is unknown, and two overlapping relative sequences of known relative positions can be linked together, and intermediate unknown bases Indicated by N, the window on this area is the N window;
  • the normal window can be defined as a window of variation (repeat window and missing window) and a window outside of the N window.
  • the step is specifically as follows: To assess the depth of coverage, a window is glided over the reference sequence with a predetermined segment length, such as 100 bp, and the number of sequenced segments on the alignment with the windows is counted. The coverage depth of the window is calculated by counting the sum of the number of sequencing fragments covered by each locus divided by the number of loci of the window, for example 100.
  • the depth of sequencing coverage is affected by the GC content, so it is preferred to adjust the number of sequencing fragments of the window based on the deviation of the depth of coverage observed at a specific GC ratio.
  • subsequent analysis can be performed based on these GC adjusted data.
  • three or more consecutive windows satisfying any of the following conditions are merged, and it is determined whether the merged region belongs to the CNV of the missing and the duplicate:
  • SD averages the coverage depth of all loci to obtain the average coverage depth standard deviation of the sequence.
  • Whether the merged window belongs to both the missing and the duplicate CNV can be detected by any method available in the art. For example, you can use the Event-wise testing algorithm (Seungtai Yoon, Zhenyu Xuan, Vladimir Makarov, Kenny Ye, Jonathan Sebat, Sensitive and accurate detection of copy number variants Using read depth of coverage. Genome Res. 2009 September; 19(9):
  • Event-wise testing is a new CNV detection algorithm based on sequencing fragment data, which is based primarily on saliency testing. In this algorithm, the increase and decrease of the copy number can be reflected by the increase and decrease of the window coverage depth. Using this algorithm, it is possible to find a window with a preset segment length, for example, 100 bp, which is significantly increased or decreased by the sequenced segment.
  • whether the window is a missing or duplicate CNV is used in the following manner.
  • the window is defined as a normal window.
  • the CNV can be detected separately for both deletion and repetition.
  • the above formula is used for both repeated and missing tests, but the former is piUpper and the latter is piLower.
  • For each of them first look for two windows with CNV, and then increase the number of windows to increase the presence of the CNV. Pay attention to Yes, the broad value of FPR/(L/1) ⁇ (1/1) increases as 1 increases.
  • FPR / (L / 1)) ⁇ (1/1) N-1 exceeds the stop in the above steps are repeated at a 0.5.
  • the above-mentioned combination of the two CNVs, which are combined with deletion and repetition reduces the false positive, and the obtained region is the CNV region in which the insertion or deletion occurs.
  • Reducing false positives can be achieved by certain filtration conditions. For example, those sequencing fragments are covered by a variation of the depth median between 0.75 and 1.25 times the average coverage depth value.
  • the significance of each variation region is checked by a ⁇ value test, and the significance level can be used. 10 ⁇ 6 is used as a threshold to filter the merged area. Wherein the significance level 10 ⁇ 6 and a FPR0.05 to detect significant levels of two variants regions close match; Further, according to many CNV Artificial desired at all significant levels of 10.6 Width value is asserted reasonable of.
  • the copy number variation of the sequencing data of sample NA19238 in the copy number variation program was detected using the method of the present invention.
  • the copy number variation program was initiated by the Sanger Institute to study the effects of copy number variation on human health.
  • the analyte in this example was one of the samples of the project, NA19238 (Yoruba, Nigeria), whose CNV known results were obtained by comparative genomic hybridization against genome-wide arrays. (Data is available from ftp: ⁇ ftp.sanger.ac.uk.pub.cnv— project/ )
  • the NA19238 data is downloaded to obtain the high-depth sequencing data that has been compared.
  • the comparison result format is sam format (see
  • Http ⁇ hgdownload.cse.ucsc.edu/downloads.html#human ).
  • this step is completed by using the SOAP program.
  • the detailed operation steps are based on the operation instructions of the program (Short Oligonucleotide Analysis Package, Http://soap.genomics.org.cn )
  • the copy number variation of the sample is detected by the method of the present invention.
  • the coverage depth file obtained in the second step and the sequence file of the reference genome h g l 9 are required as input files, the number of copies of each window obtained after running, and whether the result of the window of the duplicate or missing copy number variation occurs is stored in a file. The results of each column are explained below.
  • the program implementation principle is to store the reference genome sequence file and the site coverage depth file as a hash sequence table and a hash coverage depth table.
  • the coverage depth and significance of the corresponding window are counted.
  • the variation of the depth of coverage of the local window is detected, and a window in which the average coverage depth of the window is significantly different from the average coverage depth of the whole genome is found.
  • the steps are as follows: In order to evaluate the coverage depth, a window is glided on the reference sequence with a preset segment length of 100 bp as a window, and the number of sequenced segments on the alignment with these windows is counted. The coverage depth of the window is calculated by counting the sum of the number of sequencing fragments covered by each site divided by the number of sites in the window.
  • the fourth step is to perform window merging for consecutive windows of the same variation or satisfying the following merging conditions, and finally summarize the segment information: For CNV, we define 4 states: missing window, repeat window, normal window and N window .
  • the merge situation is examined in every three adjacent windows: continuous repeat window or continuous missing window; there are N window intervals between missing windows, such as missing window + N window + missing window, wherein N window cannot appear more than one consecutively; There are N window intervals, such as repeat window + N window + repeat window, where N window can not appear more than one consecutive; there is a normal window interval between missing windows, such as missing window + normal window + missing window, where the depth of normal window Should be satisfied to subtract 3 times SD and fall within the coverage depth of the missing window, and the normal window can not appear more than one consecutive; there is a normal window interval between the repeating windows, such as repeat window + normal window + repeat window, where normal window The coverage depth should be within 3 times SD and fall within the repeat window coverage depth, and the normal window cannot appear more than one consecutively.
  • the window belongs to the normal window.
  • the CNV can be detected separately for both deletion and repetition.
  • the above formula is used for both repeated and missing tests, but the former is piUpper and the latter is piLower. For each of them, first look for two windows with CNV, and then increase the number of windows to increase the presence of the CNV. It should be noted that the broad value of FPR/(L/1) ⁇ (1/1) increases as 1 increases. When FPR / (L / 1) ⁇ (1/1) exceeds 0.5, the above steps are stopped at N-1.
  • Step 5 Filter the statistics of the copy number variation results obtained in the fourth step
  • Filtering criteria 1. The number of windows in which the copy number variation event occurs is greater than 10, that is, the copy number variation region is larger than lkb; 2. The coverage median multiple of the copy number variation event accounts for the median of the global coverage. Between 0.75 and 1.25. After filtering the results obtained by the present invention, the number and total length of the copy events are counted according to copy events (repetitions and deletions), and the statistical results are as follows:
  • the copy number variation result detected by the present invention is compared with the sample chip copy number variation scan result.
  • the copy number variation of the present invention and the chip test result are obtained, and the final uniform length and ratio are as follows. As seen from the table results, the copy number variation detected by the present invention is authentic.

Abstract

本发明提供了一种检测拷贝数变异的方法,该方法通过对目标样本进行测序、并对测序片段的覆盖深度进行分析来实现对拷贝数变异的评估。

Description

利用基因组测序片段检测拷贝数变异的方法
技术领域 本发明涉及基因组序列分析技术领域, 更具体而言涉及利用基因 组测序片段检测拷贝数变异的方法。 背景技术 拷贝数变异(Copy Number Variations, CNV )是指基因组上长度 超过 l kb的 DNA片段,与参考基因组相比存在拷贝数的差异。这种差 异可以表现为拷贝数的增加——包括插入和重复, 或者表现为拷贝数 的减少——包括缺失和零基因型 ( null genotype ) 。 CNV在人类基因 组中的分布普遍, 占基因组中超过 10%的序列。
目前 CNV检测主要采用比较基因组杂交( comparative genomic hybridization )技术, 该技术通过将试验样品和参照样品基因组 DNA 同时与微阵列芯片上 DNA探针杂交, 直观地得到试验样品中基因组 DNA发生变异的位点信息及拷贝数量变化信息, 该技术成本高, 分辨 率低, 该技术对于 10-25 kb的 CNV灵敏度低。 另外, 用于 CNV检测的技 术还有以下几种: 荧光定量 PCR技术, 其不足之处是一个反应只能测 定一种 CNV, 需要进行多次重复; 荧光原位杂交, 其缺点是探针不稳 定、 操作繁瑣且不能 100%的杂交; 直接测序, 该技术可以检测插入、 重排、 断点 (breakpoint ) , 但其缺点是效率低, 覆盖度小; 多重连接 探针扩增技术, 该技术可以在一个反应中同时测定多个 CNV, 但其缺 点是覆盖范围小, CNV本身大小有局限。 利用这些技术检测 CNV还有 一个共同的缺点是成本都较高。
目前基于高通量测序结果的 CNV检测方法主要是基于双末端测序比 对(paired-end read mapping, PEM )结果。 但是 PEM的局限性在于, 多种类型的 CNV——包括复杂基因组区域中的大片段的插入和变异—— 难以被检测到, 且对于超过平均插入文库的插入检测有局限性。 发明内容 为了克服上述 CNV检测灵敏度不高、检测长度限制、 操作繁瑣、 成 本高等局限性, 本发明提供了检测拷贝数变异的方法。 该方法通过分析 基因组序列及其相对应位置的测序片段的覆盖深度, 对局部拷贝数进行 评估。
这种基于测序片段的覆盖深度检测拷贝数变异的方法包括步骤:
1 )对目标样本进行测序, 得到测序片段;
2 )将以上得到的测序片段与参考基因组序列比对, 优选去除上述 比对结果中的重复和冗余, 得到参考序列每个位点的覆盖深度, 即比 对结果中涵盖该位点的测序片段数目;
3 )根据以上每个位点的覆盖深度, 将所有位点的覆盖深度取平均 得出序列的平均覆盖深度, 以同样的方式计算参考序列上具有预设片 段长度的所有窗口的覆盖深度, 这些窗口被定义为以下四种类型之一: a. 正常窗口: 覆盖深度与序列平均覆盖深度相同的窗口,
b. 重复窗口: 覆盖深度明显大于平均覆盖深度的窗口,
c 缺失窗口: 覆盖深度明显小于平均覆盖深度的窗口,
d. N窗口: 基本没有覆盖深度的窗口;
4 )将满足以下条件任一项的三以上连续窗口合并, 并判断合并后 的区域是否属于缺失和重复这两种 CNV:
i. 连续重复窗口或连续缺失窗口、
ii. 缺失窗口之间有 N窗口间隔, 例如缺失窗口 + N窗口 +缺失窗 口, 其中 N窗口不能连续出现一个以上、
iii. 重复窗口之间有 N窗口间隔, 例如重复窗口 +N窗口 +重复窗 口, 其中 N窗口不能连续出现一个以上、
iv.缺失窗口之间有正常窗口间隔,例如缺失窗口 +正常窗口 +缺失 窗口,其中正常窗口的覆盖深度应满足减去 3倍 SD后落在缺失窗口的 覆盖深度范围内, 并且其中正常窗口不能连续出现一个以上、
v. 重复窗口之间有正常窗口间隔, 例如重复窗口 +正常窗口 +重复 窗口,其中正常窗口的覆盖深度应满足加上 3倍 SD后落在重复窗口覆 盖深度范围内, 并且其中正常窗口不能连续出现一个以上,
其中, SD为所有位点的覆盖深度取平均得出序列的平均覆盖深度 标准差;
5 )对以上判断属于缺失和重复这两种 CNV的合并区域降低假阳 性, 所得到的区域就是发生了插入或缺失的 CNV区域。
在上述步骤 1 )和 2 )之间, 优选包括步骤 1,): 评估测序结果是否 合格, 如果不合格则需要重新测序, 并且如果在测序过程中引入接头 序列, 则去除这些接头序列。
本发明的方法可以灵敏地检测到落在那些富含结构变异的复杂区 域中的 CNV, 同时具有无检测长度限制、 操作简单, 以及从而带来的 成本低等优点, 这些有点是本领域中其他技术达不到的。 具体实施方式 下面对本发明进行更全面的描述, 详细描述了本发明的示例性实 施例。
本发明的目的是提供检测拷贝数变异的方法, 旨在利用测序数据, 通过生物信息学方法检测目标样本与参照基因组间发生拷贝数变异的 区域。
本发明检测拷贝数变异的方法可以包括以下步骤:
第一步, 在对目标样品进行测序, 并优选评估测序结果是否合格, 如果不合格则需要重新测序; 如果测序中使用了接头, 还要去除引入 的接头。
测序方法 4艮多, 可提供本发明的测序数据的测序法例如有 454测 序、 Illumina测序。测序片段长度一般是 90 bp或 100 bp,例如 Illumian 测序法得到测序片段的长度一般是 90 bp, 采用双末端测序。 本发明方 法使用的测序片段的长度可以是 100 bp, 优选是 90 bp。
对于本发明而言,测序深度可以是 ΙΟχ ,优选是 20χ ,最优选是 30χ 以上。 例如, 可以使用 35x的测序深度。
本领域已知对测序片段进行评估的方法, 例如, 对测序结果的评 估可以主要包括以下两方面: 互补碱基含量比例是否均衡, 例如 G/C 碱基即 G碱基的比例和 C碱基比例是否接近等, 例如本领域中通常使 用均值上下 3倍的 G/C比, 如果差异落在该范围外则说明测序结果不 合格; 碱基的质量和 Ν (测序结果中碱基不确定) 的含量, 如果低质 量的碱基, 说明测序结果不合格。 第二步, 将以上得到的测序片段与参考基因组序列比对, 优选去 除上述比对结果中的重复和冗余, 统计位点的序列信息和覆盖深度信 息, 即比对结果中涵盖该位点的测序片段数目。
本文中 "重复" 在用于比对结果的情况下时是指测序片段本来只 应该测一次, 结果由于 PCR的原因, 被测了多次, 是多个测序片段表 现出完全一致的基因内容。
本文中 "冗余,,在用于比对结果的情况下时是指在测序过程中, 实 验技术需求添加在真实测序片段两端的人工冗余序列。
参照序列通常选取其序列已被确定的序列, 例如可以来自公共数 据库, 或者可以来自商业数据库。 例如, 对于人的样品来说, 参照序 列可以是人基因组 hgl8或 hgl9的序列。 目前 hgl9的相关数据库相对 较多且 hgl9测出来的碱基量比 hgl8要多, 即样品比对率会相对较高, 所以优先选择 hgl90
位点的序列信息是比对结果中包含该位点的测序片段, 位点的覆 盖深度信息是比对结果中包含该位点的测序片段数目。
序列比对可以通过任何一种序列比对程序, 例如短寡核苷酸分析 包 ( Short Oligonucleotide Analysis Package, SOAP )和 BWA比对
( Burrows-Wheeler Aligner )进行, 将测序片段与参考基因组序列比 对, 得到测序片段在参考基因组上的位置。 进行序列比对可以使用程 序提供的默认参数进行, 或者由本领域技术人员根据需要对参数进行 选择。
另外, 还可以对比对结果进行歸选, 例如去除比对结果落在多个 位置的序列, 因为这些序列无法提供唯一的比对位置; 去除重复出现 的序列, 因为这些序列可能是由于前期实验的误差引入, 例如由测序 错误引起, 去除这种序列可使检测结果更加精准。
在比对后, 可以以本领域中已知的任何方法计算位点的覆盖深度, 计算位点的覆盖深度是基于覆盖该位点的测序后的序列数目。 例如, 利用短寡核苷酸分析包中的覆盖深度计算程序(SOAP coverage )计算 参考基因组每个位点的覆盖深度。
第三步, 根据以上每个位点的覆盖深度, 将所有位点的覆盖深度 取平均得出序列的平均覆盖深度, 以同样方式计算参考序列的具有预 设片段长度的所有窗口的覆盖深度, 这些窗口被定义为以下四种类型 之一:
a. 正常窗口: 覆盖深度与序列平均深度相同的窗口
b. 重复窗口: 覆盖深度明显大于平均深度的窗口
c 缺失窗口: 覆盖深度明显小于平均深度的窗口
d. N窗口: 基本没有覆盖深度的窗口,
在本发明中, 窗口可以为 70 bp-100 bp、 100 bp. 100 bp-200 bp或 50 bp-300 bp,优选 50 bp-150 bp,最优选约 100 bp。 大窗 π (比如 1000 bp )不能提供 CNV断点的精确位置, 并且大窗口不能准确的检测短片 段 CNV。 对于一般使用 30χ的样品测序量, 100 bp窗口测序片段的数 目的分布很接近正态分布, 这样使可以假设数据计算也是符合正态分 布的。 小窗口大小的测序片段分布却不符合正态分布。
在本发明中, 窗口的测序片段覆盖深度的计算是统计每个位点覆 盖的测序片段数之和除以该窗口的位点数。
在本发明中, 覆盖深度与序列平均覆盖深度相同的窗口, 是指二 者基本相同的窗口。 例如, 在一些实施方案中, 这样的两个覆盖深度 在统计学上差异不显著。 或者可以依据其他方式定义, 例如, 在一些 实施方案中, 这样的两个覆盖深度的差异在 1倍以内, 75%以内, 50% 以内, 或 20%以内, 例如 10%以内或 5%以内。
在本发明中, 覆盖深度明显大于平均覆盖深度是指前者是后者的 至少 1.2倍, 至少 1.5倍, 至少 2倍、 至少 4倍或至少 8倍。
在本发明中, 覆盖深度明显小于平均覆盖深度是指后者是前者的 至少 1.2倍, 至少 1.5倍, 至少 2倍、 至少 4倍或至少 8倍。
在本发明中, N窗口是基本没有覆盖深度的窗口, 优选是没有覆 盖深度的窗口。 基本没有覆盖深度, 是指覆盖深度是平均深度的小于 50% , 小于 20% , 小于 10% , 小于 5%或小于 2%。 特别地, 对于双末 端测序, 组装出来的重叠群序列之间的相对位置是可以确定的, 但是 具体序列未知, 可以把两条已知相对位置的重叠群序列连在一起, 中 间未知的碱基用 N表示, 在这个区域上的窗口即为 N窗口;
在一些实施方案中, 正常窗口可以被定义变异窗口 (重复窗口和 缺失窗口)和 N窗口之外的窗口。
在本发明中, 优选通过设置上述标准, 将所有窗口分配到上述四 种窗口类型之一并且仅之一。 在一个实施方案中, 该步骤具体如下: 为了评估覆盖深度, 以预 设片段长度例如 100 bp为窗口在参考序列上滑行取窗口, 统计与这些 窗口的比对上的测序片段数目。 窗口的覆盖深度的计算是统计每个位 点覆盖的测序片段数之和除以该窗口的位点数, 例如 100。
另外, 对于 Illumina的测序片段而言, 测序覆盖深度会受到 GC 含量的影响, 所以优选根据在具体 GC比下观察到的覆盖深度的偏差 来调整窗口的测序片段数目。例如,调整公式是〜 ri=ri*m/mGC,其中〜 ri 是公式校正后的测序片段数目, ri是第 i个窗口的测序片段数目, mGC 是与第 i个窗口有相同 G+C比例的所有窗口的测序片段数目的中值, m是所有窗口的总体中值。 对于 Illumina的测序片段而言, 后续分析 可以基于这些 GC调整后的数据进行。 第四步, 将满足以下条件任一项的三以上连续窗口合并, 并判断 合并后的区域是否属于缺失和重复这两种 CNV:
i. 连续重复窗口或连续缺失窗口、
ii. 缺失窗口之间有 N窗口间隔, 例如缺失窗口 + N窗口 +缺失窗 口, 其中 N窗口不能连续出现一个以上、
iii. 重复窗口之间有 N窗口间隔, 例如重复窗口 +N窗口 +重复窗 口, 其中 N窗口不能连续出现一个以上、
iv.缺失窗口之间有正常窗口间隔,例如缺失窗口 +正常窗口 +缺失 窗口,其中正常窗口的覆盖深度应满足减去 3倍 SD后落在缺失窗口的 覆盖深度范围内, 并且其中正常窗口不能连续出现一个以上、
v. 重复窗口之间有正常窗口间隔, 例如重复窗口 +正常窗口 +重复 窗口,其中正常窗口的覆盖深度应满足加上 3倍 SD后落在重复窗口覆 盖深度范围内, 并且其中正常窗口不能连续出现一个以上;
SD为所有位点的覆盖深度取平均得出序列的平均覆盖深度标准 差。 对于合并后的窗口是否属于缺失和重复这两种 CNV, 可以利用本 领域中已有的任何方法进行检测。 例如, 可以利用 Event-wise testing 算法 ( Seungtai Yoon, Zhenyu Xuan, Vladimir Makarov, Kenny Ye, Jonathan Sebat, Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009 September; 19(9):
1586-1592. )进行 CNV检测。 Event-wise testing是一种基于测序片段 数据的新 CNV检测算法, 它主要基于显著性检验。 在该算法中, 拷贝 数的增加和减少可以通过窗口覆盖深度的增加和减少反映出来。 利用 该算法可以找出测序片段显著增加或减少的预设片段长度例如 100 bp 的窗口。
在一个具体实施方案中, 利用以下方式窗口是否属于缺失和重复 这两种 CNV。
首先,将第 i个窗口的测序片段数目转换成 Z值( , i是自然数), 转换方法是以起始位点落入该窗口的测序片段数目减去所有窗口的测 序片段数目平均值(即所有测序片段总数和除以窗口数目) , 将相减 的结果除以窗口测序片段数的标准差, 得到 Z值。 再将 Z值进一步转 换成可能性上限值 piUpper =P(Z>zi)和可能性下限值 piLower
=P(Z<zi),即概率的上限和下限, P(Z>zi)是 Z大于 Zi的概率和, P(Z<zi) 是 Z小于 Zi的概率和。 对于有 1个连续窗口的区域 A, 如果它符合 max{ piUpper I ieA}< ( FPR/(L/1) ) Λ(1/1), 则认为它发生了重复变异; 如果它符合 max{ piLower | ie Α}< ( FPR/(L/1) ) Λ(1/1), 则认为它发生 了缺失, 其中 FPR (错误率)是指对于参考基因组的重复或缺失的假 阳性值之和, I型错误是 FPR (假阳性率)——即检测出的 cnv里面不 是真实的 CNV占检测出的所有 CNV的比率, II型错误是 FNR (假阴 性率)——即没有被检测出的真实 cnv占真实 cnv总数的比率; L是指 参考基因组的总窗口数目, 1是指 A区域的窗口数目, 大于等于 1且小 于等于 L。
很明显, 如果 A的所有探针是来自于正常状态, 即 A是 CNV的 可能性小于 FPR/(L/1) , 该窗口被定义为正常窗口。
这里需要注意的是, 没有以 FPR除以所有的窗口数目减 1 ( L-1 ), 而是除以 1, 因为前者在控制 I型错误上面过于保守, 这里考虑了重叠 窗口的数据检验是非独立的。
另外, 对于缺失和重复这两种 CNV可以分开进行检测。 对于重复 和缺失的检测均用上述公式, 但前者的阔值是前述 piUpper, 后者的是 piLower。 对于它们每一种的检测, 首先寻找两个存在 CNV的窗口, 然后每增加 1个窗口就重复增加后窗口是否也存在该 CNV。 要注意的 是, FPR/(L/1) ) Λ(1/1)的阔值是随着 1的增加而增加的。 当 FPR/(L/1) ) Λ(1/1)超过 0.5时在 N-1处停止重复上述步骤。 第五步,对以上判断合属于缺失和重复这两种 CNV的合并区域降 低假阳性, 所得到的区域就是发生了插入或缺失的 CNV区域。
降低假阳性可以通过一定的过滤条件实现。 例如, 将那些测序片 段覆盖深度中值在平均覆盖深度值 0.75倍和 1.25倍之间的变异区域过 另外, 还任选通过 Ζ值检验来检验每个变异区域的显著性, 可以 用显著性水平 10·6作为阔值来过滤合并区域。 其中, 显著性水平 10·6 和用 FPR0.05来检测 2个变异区域的显著水平是接近相符的; 另外, 根据很多 CNV在所有显著水平上的人工期望, 10·6阔值被认定是合理 的。 实施例
下面结合实施例, 对本发明进行详细说明。 但应理解, 以下实施 例仅是对本发明实施方式的举例说明, 而非是对本发明的范围限定。
在本实施例中, 利用本发明的方法检测拷贝数变异计划中样品 NA19238的测序数据的拷贝数变异。 拷贝数变异计划是由 Sanger研究 院发起的, 旨在研究拷贝数变异对人类健康的影响。 本实施例分析对 象是该项目其中一个样品 NA19238 (尼日利亚约鲁巴人) , 其 CNV已 知结果通过针对全基因组的阵列比较基因组杂交获得。 (数据可获自 ftp:〃 ftp.sanger.ac.uk.pub.cnv— project/ )
第一步, 对 NA19238数据下载得到已经完成比对的高深度测序数 据, 比对结果格式为 sam格式 (参见
http://samtools.sourceforge.net/SAMl.pdf). 包括 24条染色体的比对结 果(22条常染色体和 XY染色体) 。 数据比对采用比对软件 BWA ( http://bio-bwa.sourceforge.net/bwa.shtml ) , 比对使用的参考基因组 为 hgl9 (下载地址:
http:〃 hgdownload.cse.ucsc.edu/downloads.html#human ) 。
第二步, 本步骤利用 SOAP程序完成, 详细操作步骤依据该程序 的操作说明 ( Short Oligonucleotide Analysis Package, http://soap.genomics.org.cn )
在序列比对结果的基础上, 对参考基因组上每个位点的被覆盖的 深度情况进行统计, 并存放到覆盖深度文件中。 覆盖深度统计使用 SOAPcoverage程序 (Version: 2·7·7,下载地址:
http://soap.genomics.org.cn/down/soap.coverage.tar.gz), 得 i'J每个位 Ά 的覆盖深度情况。
第三步, 用本发明的方法对该样品的拷贝数变异进行检测。 需要 第二步获得的覆盖深度文件和参考基因组 hgl 9的序列文件作为输入文 件, 运行后得到的各窗口拷贝数情况以及是否为重复或缺失的拷贝数 变异发生窗口的结果存储文件, 文件结果每列说明如下。
Figure imgf000010_0001
程序实现原理是将参考基因组的序列文件和位点覆盖深度文件分 别存为哈希序列表和哈希覆盖深度表。
统计对应窗口的覆盖深度和显著性。 对局部窗口的覆盖深度进行 变异检测, 找出窗口平均覆盖深度与全基因组平均覆盖深度水平明显 差异的窗口。
该步骤具体如下: 为了评估覆盖深度, 以预设片段长度 100 bp为 窗口在参考序列上滑行取窗口, 统计与这些窗口的比对上的测序片段 数目。 窗口的覆盖深度的计算是统计每个位点覆盖的测序片段数之和 除以该窗口的位点数。 第四步, 对于连续的相同变异的或是满足以下合并条件的窗口, 进行窗口合并, 最终将片段信息汇总打印: 对于 CNV, 我们定义 4种 状态: 缺失窗口、 重复窗口、 正常窗口和 N窗口。 以每三个相邻窗口 考察合并情况: 连续重复窗口或连续缺失窗口; 缺失窗口之间有 N窗 口间隔, 例如缺失窗口 + N窗口 +缺失窗口, 其中 N窗口不能连续出现 一个以上; 重复窗口之间有 N窗口间隔, 例如重复窗口 +N窗口 +重复 窗口, 其中 N窗口不能连续出现一个以上; 缺失窗口之间有正常窗口 间隔, 例如缺失窗口 +正常窗口 +缺失窗口, 其中正常窗口的覆盖深度 应满足减去 3倍 SD后落在缺失窗口的覆盖深度范围内,并且其中正常 窗口不能连续出现一个以上; 重复窗口之间有正常窗口间隔, 例如重 复窗口 +正常窗口 +重复窗口, 其中正常窗口的覆盖深度应满足加上 3 倍 SD后落在重复窗口覆盖深度范围内,并且其中正常窗口不能连续出 现一个以上。
然后, 对于合并后的窗口是否属于缺失区域和重复区域这两种
CNV。
首先,将第 i个窗口的测序片段数目转换成 Z值( , i是自然数), 转换方法是以起始位点落入该窗口的测序片段数目减去所有窗口的测 序片段数目平均值(即所有测序片段总数和除以窗口数目) , 将相减 的结果除以窗口测序片段数的标准差, 得到 Z值。 再将 Z值进一步转 换成可能性上限值 piUpper =P(Z>zi)和可能性下限值 piLower
=P(Z<zi),即概率的上限和下限, P(Z>zi)是 Z大于 Zi的概率和, P(Z<zi) 是 Z小于 Zi的概率和。 对于有 1个连续窗口的区域 A, 如果它符合 max{ piUpper I ieA}< ( FPR/(L/1) ) Λ(1/1), 则认为它发生了重复变异; 如果它符合 max{ piLower | ie Α}< ( FPR/(L/1) ) Λ(1/1), 则认为它发生 了缺失, 其中 FPR (错误率)是指对于参考基因组的重复或缺失的假 阳性值之和, I型错误是 FPR (假阳性率)——即检测出的 cnv里面不 是真实的 CNV占检测出的所有 CNV的比率, II型错误是 FNR (假阴 性率)——即没有被检测出的真实 cnv占真实 cnv总数的比率; L是指 参考基因组的总窗口数目, 1是指 A区域的窗口数目, 大于等于 1且小 于等于 L。
很明显, 如果 A的所有探针是来自于正常状态, 即 A是 CNV的 可能性小于 FPR/(L/1), 该窗口属于正常窗口。 另外, 对于缺失和重复这两种 CNV可以分开进行检测。 对于重复 和缺失的检测均用上述公式, 但前者的阔值是前述 piUpper, 后者的是 piLower。 对于它们每一种的检测, 首先寻找两个存在 CNV的窗口, 然后每增加 1个窗口就重复增加后窗口是否也存在该 CNV。 要注意的 是, FPR/(L/1) ) Λ(1/1)的阔值是随着 1的增加而增加的。 当 FPR/(L/1) ) Λ(1/1)超过 0.5时在 N-1处停止重复上述步骤。
第五步: 对第四步得到的拷贝数变异结果进行过滤统计
对以上判断合属于缺失和重复这两种 CNV的合并区域降低假阳 性。 降低假阳性可以通过一定的过滤条件实现。
过滤标准: 1、 发生拷贝数变异事件的窗口数要大于 10, 即拷贝数 变异区域要大于 lkb; 2、 发生拷贝数变异事件的覆盖度中位数倍数占 全局覆盖度中位数的比例不在 0.75和 1.25之间。 对本发明得到的结果 过滤后, 按拷贝事件 (重复和缺失) , 对拷贝事件发生的数目和总长 度进行统计, 统计结果如下:
重复 缺失 总和
拷贝数事件发生数目 10892 24007 34899
拷贝数变异长度(bp ) 176381800 447025700 623407500 拷贝数变异平均长度 16194 18621 17863 第六步, 将用本发明检测得到的拷贝数变异结果与该样品芯片拷贝 数变异扫描结果进行比较。 利用发生的变异事件在基因组上的起始位点 和终止位点信息, 得到本发明检测结果和芯片检测结果一致的拷贝数变 异情况, 最终一致长度及比例如下表格。 由表格结果看出, 本发明检测 出的拷贝数变异具有真实性。
重复 缺失 芯片检测拷贝数变异事件总长 9295697 7100675 2195022 本发明检测与芯片检测结果一致 8050496 6171784 1878712 的总长
一致比例 86.60% 86.92% 85.59%

Claims

权 利 要 求 书
1. 一种检测拷贝数变异的方法, 包括步骤:
1 )对目标样本进行测序, 得到测序片段;
2 )将以上得到的测序片段与参考基因组序列比对, 得到参考序列 每个位点的覆盖深度, 即比对结果中涵盖该位点的测序片段数目;
3 )根据以上每个位点的覆盖深度, 将所有位点的覆盖深度取平均 得出序列的平均覆盖深度, 以同样的方式计算参考序列上具有预设片 段长度的所有窗口的覆盖深度信息, 这些窗口被定义为正常窗口、 重 复窗口、 缺失窗口和 N窗口四种类型之一, 其中正常窗口为覆盖深度 与序列平均覆盖深度相同的窗口, 重复窗口为覆盖深度明显大于平均 覆盖深度的窗口, 缺失窗口为覆盖深度明显小于平均覆盖深度的窗口, N窗口为基本没有覆盖深度的窗口;
4 )将满足以下条件任一项的三以上连续窗口合并, 并判断合并后 的区域是否属于缺失和重复这两种 CNV:
1. 连续重复窗口或连续缺失窗口、
ii. 缺失窗口之间有 N窗口间隔, 例如缺失窗口 + N窗口 +缺失窗 口, 其中 N窗口不能连续出现一个以上、
iii. 重复窗口之间有 N窗口间隔, 例如重复窗口 +N窗口 +重复窗 口, 其中 N窗口不能连续出现一个以上、
iv.缺失窗口之间有正常窗口间隔,例如缺失窗口 +正常窗口 +缺失 窗口,其中正常窗口的覆盖深度应满足减去 3倍 SD后落在缺失窗口的 覆盖深度范围内, 并且其中正常窗口不能连续出现一个以上、
v. 重复窗口之间有正常窗口间隔, 例如重复窗口 +正常窗口 +重复 窗口,其中正常窗口的覆盖深度应满足加上 3倍 SD后落在重复窗口覆 盖深度范围内, 并且其中正常窗口不能连续出现一个以上
其中, SD为所有位点的覆盖深度取平均得出序列的平均覆盖深度 标准差;
5 )对以上合并区域降低假阳性, 所得到的区域就是发生了插入或 缺失的拷贝数变异区域。
2. 根据权利要求 1所述的方法, 在步骤 1 )和 2 )之间, 还包括 步骤 1,): 评估测序结果是否合格, 如果不合格则需要重新测序, 并且 如果在测序过程中引入接头序列, 则去除这些接头序列。
3. 根据权利要求 1或 2的方法, 其中步骤 2 ) 中在测序片段与参 考基因组序列比对后包括步骤: 去除比对结果中的重复和冗余。
4. 上述权利要求任一项的方法, 其中步骤 1 ) 中的测序是 454测 序或 Illumina测序。
5. 上述权利要求任一项的方法, 其中步骤 1 ) 中测序的测序深度 是 10χ、 20χ、 30χ或 35χ。
6. 上述权利要求任一项的方法, 其中步骤 3 ) 中的预设片段长度 为 70 bp-100 bp、 100 bp、 100 bp-200 bp、 50 bp-300 bp或 50 bp-150 bp。
7. 上述权利要求任一项的方法, 其中在步骤 3 ) 中覆盖深度与序 列平均覆盖深度相同的窗口是差异 20%、 10%或 5%的窗口。
8. 上述权利要求任一项的方法, 其中在步骤 3 ) 中覆盖深度明显 大于平均覆盖深度的窗口是指覆盖深度大于平均覆盖深度的 2倍、 4倍 或 8倍, 覆盖深度明显小于平均覆盖深度是指覆盖深度小于平均覆盖 深度的 2倍、 4倍或 8倍。
9. 上述权利要求任一项的方法, 其中步骤 5 ) 中的降低假阳性通 过过滤进行。
10. 权利要求 9的方法, 其中所述过滤的过滤条件是测序片段中 值在总体中值 0.75倍和 1.25倍之间, 或者显著性水平 10—6
PCT/CN2012/001261 2012-09-12 2012-09-12 利用基因组测序片段检测拷贝数变异的方法 WO2014040206A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201280075581.4A CN104603284B (zh) 2012-09-12 2012-09-12 利用基因组测序片段检测拷贝数变异的方法
PCT/CN2012/001261 WO2014040206A1 (zh) 2012-09-12 2012-09-12 利用基因组测序片段检测拷贝数变异的方法
HK15109609.7A HK1208891A1 (zh) 2012-09-12 2015-09-30 利用基因組測序片段檢測拷貝數變異的方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/001261 WO2014040206A1 (zh) 2012-09-12 2012-09-12 利用基因组测序片段检测拷贝数变异的方法

Publications (1)

Publication Number Publication Date
WO2014040206A1 true WO2014040206A1 (zh) 2014-03-20

Family

ID=50277463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/001261 WO2014040206A1 (zh) 2012-09-12 2012-09-12 利用基因组测序片段检测拷贝数变异的方法

Country Status (3)

Country Link
CN (1) CN104603284B (zh)
HK (1) HK1208891A1 (zh)
WO (1) WO2014040206A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
CN106055923A (zh) * 2016-05-13 2016-10-26 万康源(天津)基因科技有限公司 一种基因拷贝数变异分析方法
EP3293270A4 (en) * 2015-05-06 2018-03-28 Annoroad Gene Technology Reagent kit, apparatus, and method for detecting chromosome aneuploidy
WO2018119438A1 (en) * 2016-12-22 2018-06-28 Grail, Inc. Base coverage normalization and use thereof in detecting copy number variation
CN113724791A (zh) * 2021-09-09 2021-11-30 天津华大医学检验所有限公司 Cyp21a2基因ngs数据分析的方法、装置及应用

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101828052B1 (ko) * 2015-06-24 2018-02-09 사회복지법인 삼성생명공익재단 유전자의 복제수 변이(cnv)를 분석하는 방법 및 장치
CN105760712B (zh) * 2016-03-01 2019-03-26 西安电子科技大学 一种基于新一代测序的拷贝数变异检测方法
CN107423534B (zh) * 2016-05-24 2021-08-06 郝柯 基因组拷贝数变异的检测方法和系统
CN110268044B (zh) * 2017-03-07 2022-08-02 深圳华大生命科学研究院 一种染色体变异的检测方法及装置
CN108256289B (zh) * 2018-01-17 2020-10-16 湖南大地同年生物科技有限公司 一种基于目标区域捕获测序基因组拷贝数变异的方法
CN111755066B (zh) * 2019-03-27 2022-10-18 欧蒙医学诊断(中国)有限公司 一种拷贝数变异的检测方法和实施该方法的设备
CN111710362B (zh) * 2020-08-20 2021-06-15 上海思路迪医学检验所有限公司 基于二代测序的捕获探针设计方法及应用
CN117334249A (zh) * 2023-05-30 2024-01-02 上海品峰医疗科技有限公司 基于扩增子测序数据检测拷贝数变异的方法、设备和介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914628A (zh) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 检测基因组目标区域多态性位点的方法及 系统

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914628A (zh) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 检测基因组目标区域多态性位点的方法及 系统

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHIANG, D.Y. ET AL.: "High-resolution mapping of copy-number alterations with massively parallel sequencing", NATURAL METHODS, vol. 6, no. 1, 30 November 2008 (2008-11-30), pages 99 - 103, XP055065796, DOI: doi:10.1038/nmeth.1276 *
NORD, A.S. ET AL.: "Accurate and exact CNV identification from targeted high-throughput sequence data", BMC GENOMICS, vol. 12, no. 184, 12 April 2011 (2011-04-12), pages 1 - 10 *
YOON, S. ET AL.: "Sensitive and accurate detection of copy number variants using read depth of coverage", GENOME RESEARCH, vol. 19, 5 August 2009 (2009-08-05), pages 1586 - 1592, XP055167321, DOI: doi:10.1101/gr.092981.109 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
EP3293270A4 (en) * 2015-05-06 2018-03-28 Annoroad Gene Technology Reagent kit, apparatus, and method for detecting chromosome aneuploidy
CN106055923A (zh) * 2016-05-13 2016-10-26 万康源(天津)基因科技有限公司 一种基因拷贝数变异分析方法
WO2018119438A1 (en) * 2016-12-22 2018-06-28 Grail, Inc. Base coverage normalization and use thereof in detecting copy number variation
CN113724791A (zh) * 2021-09-09 2021-11-30 天津华大医学检验所有限公司 Cyp21a2基因ngs数据分析的方法、装置及应用
CN113724791B (zh) * 2021-09-09 2024-03-12 天津华大医学检验所有限公司 Cyp21a2基因ngs数据分析的方法、装置及应用

Also Published As

Publication number Publication date
HK1208891A1 (zh) 2016-03-18
CN104603284A (zh) 2015-05-06
CN104603284B (zh) 2016-08-24

Similar Documents

Publication Publication Date Title
WO2014040206A1 (zh) 利用基因组测序片段检测拷贝数变异的方法
Ankala et al. A comprehensive genomic approach for neuromuscular diseases gives a high diagnostic yield
Abel et al. Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches
WO2022089033A1 (zh) 检测基因突变及表达量的方法及装置
CN106715711A (zh) 确定探针序列的方法和基因组结构变异的检测方法
Cantsilieris et al. Molecular methods for genotyping complex copy number polymorphisms
JP2020058393A (ja) 母体血漿の無侵襲的出生前分子核型分析
TW201840853A (zh) 使用核酸片段之診斷應用
CN113151474A (zh) 用于癌症检测的血浆dna突变分析
CN105177160B (zh) 检测多种新生儿遗传代谢病致病基因的引物及试剂盒
CN107368708B (zh) 一种精准分析dmd基因结构变异断点的方法及系统
CN105779572A (zh) 肿瘤易感基因目标序列捕获芯片、方法及突变检测方法
CN110343748B (zh) 基于高通量靶向测序分析肿瘤突变负荷的方法
Clarke et al. Fine mapping versus replication in whole-genome association studies
Stewart et al. Validation of the EuroClonality-NGS DNA capture panel as an integrated genomic tool for lymphoproliferative disorders
WO2020224159A1 (zh) 基于二代测序用于脑胶质瘤的检测panel、检测试剂盒、检测方法及其应用
Beal et al. Whole genome sequencing for quantifying germline mutation frequency in humans and model species: cautious optimism
Luo et al. Pilot study of a novel multi‐functional noninvasive prenatal test on fetus aneuploidy, copy number variation, and single‐gene disorder screening
WO2015043278A1 (zh) 同时进行单体型分析和染色体非整倍性检测的方法和系统
Wang et al. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing
CN114694750A (zh) 一种基于ngs平台的单样本肿瘤体细胞突变判别及tmb检测方法
CN111710362B (zh) 基于二代测序的捕获探针设计方法及应用
Popova et al. Analysis of somatic alterations in cancer genome: from SNP arrays to next generation sequencing
KR101896147B1 (ko) 샤르코-마리-투스 질환 진단용 키트
KR101921027B1 (ko) 샤르코-마리-투스 질환 진단용 키트

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12884700

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 07/09/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 12884700

Country of ref document: EP

Kind code of ref document: A1