WO2018107481A1 - 一种用于核酸样品标识的基因标签、试剂盒及其应用 - Google Patents

一种用于核酸样品标识的基因标签、试剂盒及其应用 Download PDF

Info

Publication number
WO2018107481A1
WO2018107481A1 PCT/CN2016/110457 CN2016110457W WO2018107481A1 WO 2018107481 A1 WO2018107481 A1 WO 2018107481A1 CN 2016110457 W CN2016110457 W CN 2016110457W WO 2018107481 A1 WO2018107481 A1 WO 2018107481A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
gene
sequencing
gene tag
sample
Prior art date
Application number
PCT/CN2016/110457
Other languages
English (en)
French (fr)
Inventor
张东
Original Assignee
深圳华大基因股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因股份有限公司 filed Critical 深圳华大基因股份有限公司
Priority to PCT/CN2016/110457 priority Critical patent/WO2018107481A1/zh
Priority to CN201680091177.4A priority patent/CN109996877A/zh
Publication of WO2018107481A1 publication Critical patent/WO2018107481A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the present application relates to the field of nucleic acid sample processing, and in particular to a gene signature, kit and application thereof for nucleic acid sample identification.
  • High-throughput sequencing technology has been widely used in life science research.
  • the basic process is to extract nucleic acid substances, DNA or RNA from biological tissues, and then transform these nucleic acid substances into specific sequence structures by molecular biology techniques, ie, sequencing libraries.
  • the process is called building a database. After sequencing, the sequencing library was subjected to high-throughput sequencing.
  • two samples are built at the same time, they can be compared with the reference genes by using information analysis technology in the sequencing data, and the similarity is compared, and the comparison rate is used for analysis and identification.
  • the two samples are of the same species, it is not feasible to use the reference data to identify the identification method.
  • the samples that are usually tied together are the same batch, and the probability of belonging to the same species is relatively high; therefore, it is difficult to guarantee the correctness of the two samples by reference gene comparison.
  • a 6-10 bp tag sequence can be added to the linker or primer during the ligation or PCR phase of the library construction to distinguish each sample at the time of sequencing.
  • this method can only distinguish the sample after the joint is connected or after the PCR, and cannot identify whether the sample before the joint is connected is reversed or cross-contaminated.
  • 21 high-frequency SNP loci multiplex PCR primers can be designed by selecting 21 high-frequency SNP loci in the sample, and then the PCR products are subjected to mass spectrometry and compared with the sequencing data to confirm that the sample is not rectified.
  • this method can only be applied to DNA sequencing for a long time, and it is costly. Multiplex PCR and mass spectrometry should be performed separately. More importantly, this method is for some samples from different parts of the same tissue, such as cancer and cancer. Parallel tissues are indistinguishable from genomic SNPs and therefore cannot function as sample markers.
  • the purpose of the present application is to provide a novel gene tag for nucleic acid sample identification, comprising the gene Labeled kits, and the application of gene tags.
  • One aspect of the present application discloses a gene tag for nucleic acid sample identification, the gene tag being a nucleic acid having a length greater than 130 bp, the nucleic acid sequence having a polyA tail at the 3' end, and at least one index inserted at a random position in the nucleic acid sequence sequence.
  • the gene tag of the present application is a nucleic acid having a length of more than 130 bp.
  • the gene tag of the present application is added to the sample DNA or RNA when used, and finally, by detecting the specific sequence or index sequence of the added gene tag, it is possible to know which sample is detected;
  • the sequence is an index sequence, which can adopt the conventional index sequence in the sequencing platform.
  • the index sequence is different for different sequencing platforms, and the specific index sequence is not limited in this application;
  • the polyA tail is mainly designed for the database building method that requires polyA capture.
  • the gene tag of the present application is used for labeling samples to distinguish different samples.
  • the nucleic acid sequence of the gene tag cannot be homologous to the identified sample, that is, the nucleic acid of the gene tag and the identified sample must be In the absence of homology, a random sequence can usually be used and the specificity is confirmed by blast alignment.
  • the sequence in the mRNA molecule of the ERCC is used, and the ERCC is a excision repair complementary cross-gene, which can be added as an internal reference gene to the sequencing sample.
  • a non-repetitive sequence was selected as a gene signature, that is, 8 gene signature combinations consisting of 8 sets of nucleic acids, which will be described in detail in the subsequent schemes.
  • the gene tag is a nucleic acid having a length of from 130 bp to 160 bp.
  • a more preferred gene tag is a 160 bp nucleic acid.
  • the length of the gene tag is only larger than 130 bp; however, in terms of DNA synthesis technology and cost, it is economical to select a short segment of 160 bp, and the technique is easy to implement.
  • the length of the index sequence is 6-10 bp.
  • index sequences are inserted uniformly into the nucleic acid sequence.
  • the insertion of a plurality of index sequences uniformly in the nucleic acid sequence of the gene tag is to further enhance the recognition of the gene tag.
  • the length of the polyA tail is 24 bp.
  • the nucleic acid is single stranded DNA, double stranded DNA or RNA.
  • the nucleic acid of the gene tag is an RNA sequence
  • the sample to be labeled is DNA
  • the nucleic acid of the gene tag is a single-stranded DNA or a double-stranded DNA sequence.
  • the gene tag consists of a nucleic acid of the sequence shown in Seq ID No. 1 to Seq ID No. 8.
  • the eight sets of nucleic acids shown in Seq ID No. 1 to Seq ID No. 8 are actually designed in an implementation manner of the present application for explaining the gene label of the present application; it can be understood that These eight sets of nucleic acids can be used in sequencing or elsewhere to identify the sample; however, the gene signature of the present application is not limited to the eight sets of nucleic acids, but may be other random sequences, and the number and length of nucleic acids may also be as needed.
  • the identified samples vary and the index sequence can also vary depending on the sequencing platform.
  • Another aspect of the present application discloses the use of the gene tag of the present application in nucleic acid sequencing.
  • the gene tag of the present application is designed for the easy reversal or cross-contamination of the sample during the nucleic acid sequencing process.
  • the sample can be judged according to the sequence of the gene tag in the sequencing data analysis. Whether there is rectification or cross-contamination, so as to play a role in identification.
  • kits for nucleic acid sample identification comprising the gene signature of the present application.
  • the gene tag of the present application can be added to the sample to be tested as a separate nucleic acid sample, thereby playing a labeling role. Therefore, the nucleic acid of the gene tag can be freeze-dried into a powder or formulated into a high-concentration nucleic acid solution. As a kit for ease of use and transport; this can be easily applied to nucleic acid sequencing or other areas where nucleic acid samples need to be identified.
  • a further aspect of the present application discloses a nucleic acid sequencing method comprising adding a gene tag of the present application to an original DNA or RNA sample, and then performing database construction and sequencing on the machine.
  • the nucleic acid sequencing method of the present application is actually a specific application of the gene tag of the present application, that is, adding the gene tag of the present application to the original DNA or RNA sample to play the role of sample identification, thereby avoiding
  • problems such as reversal may occur, and it is also possible to check whether there is cross-contamination between samples. For example, if the same gene signature is detected in both samples, it means that the two samples are cross-contaminated during the process of building or sequencing, so the sequencing results obtained are inaccurate and need to be re-sequenced.
  • the recorded sample information does not match the detected gene label, and the detected gene label corresponds to another sample, indicating that the reversed reaction occurs, and the genetic label according to the detection is required. Correct the corresponding sample information.
  • the gene tag of the present application is added to the nucleic acid sample for labeling; different samples are added with different gene tags, and the sequence of the gene tags themselves is known, therefore, by detecting the gene tag
  • the nucleic acid sequence can determine which sample the object is. Therefore, although the gene tag of the present application has been studied for nucleic acid sequencing, it is not limited to nucleic acid sequencing, and the gene tag of the present application can be used wherever a nucleic acid sample needs to be identified.
  • a series of modifications such as fluorescent modification, may be performed on the nucleic acid sequence of the gene tag of the present application to enhance the recognition performance, which is not specifically limited herein.
  • the gene tag for nucleic acid sample identification of the present application can be conveniently added to the nucleic acid sample, and the specific sequence of the gene tag can be detected to effectively distinguish different nucleic acid samples, thereby avoiding problems such as sample reversal and cross-contamination; Applying the gene tag of the present application to nucleic acid sequencing can better guarantee the quality of sequencing and avoid the influence of sequencing results due to reversal or cross-contamination.
  • Figure 1 is a diagram showing the base distribution of sequencing results in the examples of the present application.
  • the present application studies a gene tag for nucleic acid sample identification, when used, directly adds the gene tag to the original or processed nucleic acid sample, and records the gene tag added by each nucleic acid sample; After subsequent database construction and sequencing, based on the gene signature detected in the sequencing results, it is possible to accurately know which nucleic acid sample the sequencing result belongs to, effectively avoiding the problem of sample reversal, and can also be intuitively judged. Whether there is cross-contamination, thus ensuring the quality of sequencing.
  • the underlined bold portion that is, the index sequence, in the sequence shown in Seq ID No. 1 to Seq ID No. 8.
  • the eight gene tags of this example were synthesized by the Thermofisher Hong Kong branch and then diluted with water to 15 nM for use.
  • RNA standards of Universal Human Reference RNA UHRR
  • 4 ⁇ L of a 15 nM gene tag P1 was added to 200 ng of RNA standard (brand: Aglient, Cat. No. 740000-Universal Human Reference RNA), which was then used for subsequent library construction and sequencing.
  • the kit TruSeq_RNA_SamplePrep_v2kit (Cat. No.: RS-122-2001/RS-122-2002) was used to build the library.
  • the specific steps for building the library refer to the kit instructions TruSeq_RNA_SamplePrep_v2_Guide_15026495_A (version 2).
  • the specific process of building a database is as follows. The following database construction process is calculated according to a reaction amount.
  • RNA-added RNA standards were purified using mRNA Purification Beads. Specifically, 50 ⁇ L of RNA Purification Beads was added to the RNA-added RNA standard, followed by ice at 65 ° C for 5 min. Leave it for 1 min and let it stand at room temperature for 5 min. Place it on a magnetic stand and let it stand at room temperature for 5 min, remove the supernatant, and retain the magnetic beads; add 150 ⁇ L of Bead Washing Buffer to it, wash once, and, at the same time, statically stand on a magnetic stand.
  • RNA sample was removed; 50 ⁇ L of dissolution buffer (Elution Buffer) was added thereto, treated at 80 ° C for 2 min, then placed on ice for 1 min, and then placed on a magnetic stand for 5 min at room temperature, and the supernatant was taken; The supernatant was subjected to magnetic bead adsorption, washing and elution to obtain a purified RNA sample.
  • dissolution buffer Elution Buffer
  • the sample is interrupted by using a mixture of a solution, a primer, and a fragmentation mixture (Elute, Prime, Fragment Mix), that is, a Fragment Mix. Specifically, the purified RNA sample is added with 19.5 ⁇ L. The mixture was disrupted and treated at 94 ° C for 8 min to obtain an interrupted RNA solution.
  • a fragmentation mixture Elute, Prime, Fragment Mix
  • cDNA synthesis includes cDNA one-strand synthesis and cDNA two-strand synthesis, as follows:
  • the one-chain synthesis reaction system consisted of 17 ⁇ L of the interrupted RNA solution, 1 ⁇ L of reverse transcriptase (SuperScript II), and 7 ⁇ L of a one-strand synthesis reaction mixture (First Strand Master Mix), and the mixture was mixed and reacted.
  • the reaction conditions were: 10 ° C for 10 min, 42 ° C for 50 min, and 70 ° C for 15 min, and the reaction was completed at 4 ° C after completion of the reaction.
  • the reaction system for the two-chain synthesis was as follows: 25 ⁇ L of a chain product, 25 ⁇ L of a two-strand synthesis reaction mixture (Second Strand Master Mix), and the reaction was carried out after mixing.
  • the reaction conditions were as follows: intermittent shaking at 350 rpm for 15 s at 16 ° C for 2 min, and reacted for 1 h.
  • Purification of the two-strand product Add 90 ⁇ L of purified magnetic beads to a volume of 50 ⁇ L of the double-stranded product (Ampure XP Beads 1.8x), mix, stand at room temperature for 5 min, stand at room temperature for 5 min at room temperature, discard the supernatant, and then add to it. Wash twice with 200 ⁇ L of 80% ethanol, and recover 60 ⁇ L of Resuspension Buffer after drying.
  • the end-repair system was: 40 ⁇ L of the end-repair reaction mixture (End Repair Mix) was added to 60 ⁇ L of the two-chain purified product, and reacted at 30 ° C for 30 min. After the reaction was completed, the purified product was purified by using 1.6 volumes of purified magnetic beads (Ampure XP Beads 1.6x), and 17.5 ⁇ L of the terminal repair product was recovered, and the specific purification process was carried out with reference to the two-chain product.
  • End Repair Mix End Repair Mix
  • the system with polyA tail was added: 12.5 ⁇ L of the end plus A reaction mixture (A-Tailing Mix) was added to 17.5 ⁇ L of the end repair product, and reacted at 37 ° C for 30 min. Approximately 30 ⁇ L of the polyA tail reaction product was obtained.
  • the system for adding a linker is: adding a ligation reaction mixture to 30 ⁇ L of the polyA tail reaction product.
  • the addition product was purified by using purified magnetic beads in 1 volume (Ampure XP Beads 1x), and the first purified resuspension buffer (Resuspension Buffer) was used to recover 50 ⁇ L.
  • the purified product was purified a second time, and the suspension was resuspended (Resuspension). Buffer) recovered to obtain 20 ⁇ L of product.
  • the specific procedure for the two purifications is purified with reference to the di-chain product.
  • the purified adaptor product was subjected to PCR amplification.
  • the reaction system was: 5 ⁇ L of a PCR primer mix (PCR Primer Cocktail) was added to 20 ⁇ L of the purified product, and 25 ⁇ L of a PCR reaction mixture (PCR Master Mix) was mixed, and the reaction was carried out.
  • reaction conditions were 98 ° C for 30 s, and then entered 12 cycles: 98 ° C for 30 s, 60 ° C for 30 s, 70 ° C for 30 s, 72 ° C for 5 min after the end of the cycle, and then standby at 10 ° C.
  • the PCR amplification product was purified by using 1 volume of purified magnetic beads (Ampure XP Beads 1x), and recovered by resuspension buffer (Resuspension Buffer) to obtain 30 ⁇ L of the product.
  • the specific procedure of purification was purified with reference to the double-stranded product.
  • Illumina's Hiseq4000 sequencing platform was used for sequencing, PE100 was sequenced, and the library was tested for 1G.
  • the P1 sequence accounts for a low proportion of sample data, only 0.0033%, and does not waste data, nor does it affect the use of ERCC internal parameters in samples.
  • the base distribution map directly output by the sequencing platform can be seen as shown in Fig. 1.
  • the fragment added in this example is single, due to the limitation of the sequencing principle of Illumina, if a sequence of the same length is the same, the ratio is too high. This leads to base fluctuations, and it has been proved that the gene signature of the single fragment added in this example does not cause the base to fluctuate drastically and therefore does not affect sequencing.
  • the reason for the analysis is that it may be that the gene tag added in this case accounts for a small proportion of nucleic acid in the sequencing sample, and the reverse transcription reaction stage, the starting site is different, so that the same fragment, after finally forming a library, the fragment length has been changed. Sequence synchrony has been disrupted during sequencing, so it does not cause base fluctuations.
  • this example adds a designed gene signature to the sample to be tested, which can effectively distinguish nucleic acid samples without affecting the sequencing results.
  • the eight gene tags designed in this example in addition to the nucleic acid samples can be identified by different sequence gene tags, can also be distinguished by adding different concentrations of the same gene tag in different nucleic acid samples, that is, according to different nucleic acid samples.
  • concentration of the added gene tag identifies the nucleic acid sample.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本申请公开了一种用于核酸样品标识的基因标签、试剂盒及其应用。本申请的用于核酸样品标识的基因标签,为一段长度大于130bp的核酸,核酸序列的3'端具有polyA尾,并且核酸序列中随机位置插入有至少一段index序列。

Description

一种用于核酸样品标识的基因标签、试剂盒及其应用 技术领域
本申请涉及核酸样品处理领域,特别是涉及一种用于核酸样品标识的基因标签、试剂盒及其应用。
背景技术
高通量测序技术已经广泛应用生命科学研究,其基本的过程是从生物组织中提取核酸物质,DNA或RNA,然后通过分子生物学技术把这些核酸物质转变特定的序列结构,即测序文库,这个过程称为建库。测序文库通过质检后,进行高通量测序。
随着测序价格越来越便宜,基因测序科技应用越来越广,使得更多人参与到测序研究领域,同时建库的样品数也就越来越多,使得建库过程的复杂度变大,人员操作错误发生的概率越来越高。特别是在建库过程中容易出现样品调反、交叉污染等问题。
通常来说同时建库的两个样品如果是不同物种,可以在测序数据中通过信息分析技术,把测序数据与参考基因进行比对,比较相似度,通过比对率,进行分析识别。但是,如果两个样品是同一物种,利用参考数据比对识别的方法就行不通。而通常挨在一起的都是同一批次的样品,属于同一物种的概率比较高;因此,通过参考基因比对很难保障两个样品的正确性。
对于同一物种的两个样品,可以在建库过程的接头连或PCR阶段,通过在接头或引物中添加一段6-10bp的标签序列,用以在测序时区分每个样品。但是,该方法只能区分接头连接后或PCR后的样品,不能识别接头连接前的样品是否有调反或交叉污染。
另外,还可以通过选定样品的21个高频SNP位点,设计21个高频SNP位点的多重PCR引物,然后对PCR产物进行质谱分析,与测序数据进行比对,确定样品没有调反。但是,这种方法暂时只能应用于DNA测序,并且成本高,要单独做多重PCR和质谱;更为重要的是,这种方法对于一些来源于同一组织的不同部位的样品,例如癌和癌旁组织,其基因组SNP上难以区分,因此也无法起到样品标识作用。
发明内容
本申请的目的是提供一种新的用于核酸样品标识的基因标签,包含该基因 标签的试剂盒,及基因标签的应用。
为了实现上述目的,本申请采用了以下技术方案:
本申请的一方面公开了一种用于核酸样品标识的基因标签,该基因标签为一段长度大于130bp的核酸,核酸序列的3’端具有polyA尾,并且核酸序列中随机位置插入有至少一段index序列。
需要说明的是,原理上来讲只要片断大于130bp都可以作为本申请的基因标签,上不封顶,因为在后续的建库过程中有DNA打断步骤,再长的序列也会被打断,而要区分样品之间的差别,检测其中已知序列的基因标签即可,至于基因标签是否有被打断,并无影响;因此,本申请的基因标签是一段长度大于130bp的核酸。
需要说明的是,本申请的基因标签,使用时,将其加入到样品DNA或RNA中,最终通过检测所添加的基因标签的具体序列或者index序列,就可以知道检测对象是哪个样品;其中index序列即索引序列,该序列可以采用测序平台中常规的索引序列,不过,针对不同的测序平台,其index序列是不同的,本申请对具体index序列不做限定;另外,本申请的基因标签中,polyA尾主要是针对需要polyA捕获的建库方法而设计的。
可以理解,本申请的基因标签是用于对样品进行标识,以区分不同样品的,当然基因标签的核酸序列不能与被标识样品同源,也就是说,基因标签的核酸与被标识样品必须是没有同源性的,通常可以采用一段随机序列,并通过blast比对,确认其特异性。但是,本申请的一种实现方式中采用的是ERCC的mRNA分子中的序列,ERCC即切除修复互补交叉基因,本身可以作为内参基因添加到测序样品中,因此,本申请的一种实现方式中,从ERCC的8个mRNA分子中分别选取了一段不重复的序列作为基因标签,即由8组核酸组成的8个基因标签组合,这将在后续的方案中详细说明。
优选的,基因标签为一段长度130bp~160bp的核酸。更优选的基因标签为一段长度160bp的核酸。
需要说明的是,虽然前面已经说明,基因标签的长度只要大于130bp即可;但是,从DNA合成技术及成本而言,选择160bp的短片段是比较经济,且技术容易实现的。
优选的,index序列的长度为6-10bp。
优选的,核酸序列中均匀的插入有6个index序列。
需要说明的是,在基因标签的核酸序列均匀的插入多个index序列,是为了进一步的增强基因标签的识别性。
优选的,polyA尾的长度为24bp。
优选的,核酸为单链DNA、双链DNA或者RNA。
需要说明的是,作为基因标签,如果被标识的样品是RNA,则基因标签的核酸为RNA序列;如果被标识的样品是DNA,则基因标签的核酸为单链DNA或双链DNA序列。
优选的,基因标签由Seq ID No.1至Seq ID No.8所示序列的核酸组成。
需要说明的是,Seq ID No.1至Seq ID No.8所示的八组核酸,实际上是本申请的一种实现方式中,为了解释说明本申请的基因标签而设计的;可以理解,这八组核酸完全可以应用于测序或其它地方,以对样品进行标识;但是,本申请的基因标签,并不只限于这八组核酸,可以是其它随机序列,核酸的数量、长度也可以根据需要标识的样品而变化,index序列也可以根据不同的测序平台而改变。
本申请的另一面公开了本申请的基因标签在核酸测序中的应用。
需要说明的是,本申请的基因标签本身就是针对核酸测序过程中样品容易调反或交叉污染而设计的,通过添加本申请的基因标签,在测序数据分析时,根据基因标签的序列可以判断样品是否有调反或交叉污染,从而起到标识作用。
本申请的再一面公开了一种用于核酸样品标识的试剂盒,该试剂盒中含有本申请的基因标签。
需要说明的是,本申请的基因标签作为一个独立的核酸样本,可以加入到待测样品中,起到标识作用,因此,可以将基因标签的核酸冻干成粉末或者配制成高浓度核酸溶液,作为试剂盒,以方便使用和运输;这样可以很方便的应用于核酸测序或其它需要对核酸样品进行标识的地方。
基于本申请的基因标签,本申请的再一面公开了一种核酸测序方法,包括在原始的DNA或RNA样品中加入本申请的基因标签,然后再进行建库、上机测序。
需要说明的是,本申请的核酸测序方法,实际上就是本申请的基因标签的一个具体应用,即在原始的DNA或RNA样品中加入本申请的基因标签,以起到样品标识的作用,避免后续的建库、测序过程中发生调反等问题,也可以检验样品之间是否有交叉污染。例如,如果在两个样品中检测到相同的基因标签,则表示这两个样品在建库或者测序的过程中存在交叉污染,这样获得的测序结果就是不准确的,需要重新测序。又比如,在建库或者测序的过程中,所记录的样品信息跟检测的基因标签不符合,而检测的基因标签对应的是另一个样品,则表示发生了调反,需要根据检测的基因标签将相应的样品信息纠正过来。
可以理解,本申请的基因标签,其作用就是添加到核酸样品中,起到标识作用;不同的样品,添加不同的基因标签,这些基因标签本身的序列是已知的,因此,通过检测基因标签的核酸序列,就可以判断对象是哪个样品。因此,本申请的基因标签虽然是针对核酸测序而研究的,但是,其不只限用于核酸测序,凡是需要对核酸样品进行标识的地方都可以使用本申请的基因标签。另外,考虑到一些特殊的用途,还可以对本申请的基因标签的核酸序列进行一系列的修饰,例如荧光修饰等,以增强其识别性能,在此不做具体限定。
由于采用以上技术方案,本申请的有益效果在于:
本申请的用于核酸样品标识的基因标签,可以很方便的加入到核酸样品中,通过检测基因标签的具体序列,有效的区分不同的核酸样品,从而避免了样品调反、交叉污染等问题;将本申请的基因标签应用于核酸测序,可以更好的保障测序质量,避免因调反或交叉污染影响测序结果。
附图说明
图1是本申请实施例中测序结果的碱基分布图。
具体实施方式
随着核酸测序效率的提高,测序成本相应降低,越来越多的研究涉及到测序环节。这使得样品的处理,特别是建库的样品量越来越庞大,在这个过程中,容易出现样品调反或交叉污染。交叉污染直接导致测序结果不可用,如果没有发现交叉污染的问题,则会为研究人员提供错误的核酸测序结果,影响后续研究。而在其它程序和过程都正确的情况下,如果发生样品调反,则是更难被发现的问题,这样导致提供给研究人员的核酸测序结果,是完全不同的物种,或者与预期的结果相悖。因此,本申请研究了一种用于核酸样品标识的基因标签,使用时,直接将该基因标签添加到原始的或者处理之初的核酸样品中,记录好每个核酸样品所添加的基因标签;在经过后续的建库和测序后,根据测序结果中所检测到的基因标签,则可以准确的获知该测序结果属于哪个核酸样品,有效的避免了样品调反的问题,也可以很直观的判断是否存在交叉污染,从而保障了测序质量。
下面通过具体实施例和附图对本申请作进一步详细说明。以下实施例仅对本申请进行进一步说明,不应理解为对本申请的限制。
实施例
一、基因标签设计和合成
本例挑选外源RNA参考组(External RNA Controls consortium,ERCC)的8个mRNA分子,截取其3’端的124个碱基,这样可以保证每个分子的序列不会重复,其中,8个mRNA分子的3’端124个碱基中,都包含有一个24bp的polyA尾。挑选8个index序列,将8个index序列分别插入8个mRNA分子截取的片段中,每个mRNA分子截取片段加一种index序列,并且,每个mRNA分子截取片段中均匀的重复插入6次index序列即得到本例的基因标签。本例的8个基因标签P1-P8,其核酸序列分别如Seq ID No.1至Seq ID No.8所示。
P1(Seq ID No.1):
Figure PCTCN2016110457-appb-000001
P2(Seq ID No.2):
Figure PCTCN2016110457-appb-000002
P3(Seq ID No.3):
Figure PCTCN2016110457-appb-000003
P4(Seq ID No.4):
Figure PCTCN2016110457-appb-000004
P5(Seq ID No.5):
Figure PCTCN2016110457-appb-000005
Figure PCTCN2016110457-appb-000006
P6(Seq ID No.6):
Figure PCTCN2016110457-appb-000007
P7(Seq ID No.7):
Figure PCTCN2016110457-appb-000008
P8(Seq ID No.8):
Figure PCTCN2016110457-appb-000009
Seq ID No.1至Seq ID No.8所示序列中下划线加粗部分即index序列。本例的8个基因标签由Thermofisher香港分公司合成,然后用水溶解稀释至15nM备用。
二、建库
本例以通用人类参考RNA,(Universal Human Reference RNA,缩写UHRR)的RNA标准品进行建库和测序试验。在200ng的RNA标准品(品牌:Aglient,货号:740000-Universal Human Reference RNA)中加入4μL浓度为15nM的基因标签P1,然后用于后续的建库和测序。
本例采用试剂盒TruSeq_RNA_SamplePrep_v2kit(货号:RS-122-2001/RS-122-2002)进行建库,建库的具体步骤参考试剂盒说明书TruSeq_RNA_SamplePrep_v2_Guide_15026495_A(第2版本)。建库的具体过程如下,以下建库过程均是按照一个反应量计算的。
(1)mRNA的纯化和打断
本例采用mRNA纯化磁珠(mRNA Purification Beads)对添加基因标签的RNA标准品进行纯化,具体的,向添加基因标签的RNA标准品中加入50μL的RNA Purification Beads,然后在65℃5min、冰上放置1min、室温静置5min, 将其置于磁力架上再室温静置5min,去除上清液,保留磁珠;向其中加入150μL的磁珠清洗缓冲液(Bead Washing Buffer),清洗一次,同样的,在磁力架上室温静置5min后,移除清洗液;向其中加入50μL的溶解缓冲液(Elution Buffer),80℃处理2min,然后冰上放置1min,再将其置于磁力架上室温静置5min,取上清;对上清液再进行一次磁珠吸附、洗涤和洗脱,即获得纯化的RNA样品。
RNA样品纯化后,采用溶解、引物、片段化混合物,简称打断混合物(Elute,Prime,Fragment Mix),即Fragment Mix,对样品进行打断,具体的,纯化后的RNA样品中加入19.5μL的打断混合物(Fragment Mix),在94℃处理8min,即获得打断的RNA溶液。
(2)cDNA合成和纯化
cDNA合成包括cDNA一链合成和cDNA二链合成,具体如下:
一链合成的反应体系为:打断的RNA溶液17μL,加1μL的反转录酶(SuperScript II),和7μL的一链合成反应混合物(First Strand Master Mix),混匀后进行反应。反应条件为:25℃10min、42℃50min、70℃15min,反应完成后在4℃待机。
二链合成的反应体系为:一链产物25μL,加25μL的二链合成反应混合物(Second Strand Master Mix),混匀后进行反应。反应条件为:在16℃下,350rpm间歇震动15s、静止2min,如此反应1h。
二链产物纯化:向50μL的二链产物中加入90μL的纯化磁珠1.8体积(Ampure XP Beads 1.8x),混匀,室温放置5min、磁力架室温静置5min,弃上清,然后向其中加入200μL的80%乙醇洗两次,烘干后重悬缓冲液(Resuspension Buffer)回收60μL。
(3)末端修复
末端修复的体系为:向60μL二链纯化产物中加入40μL的末端修复反应混合物(End Repair Mix),在30℃反应30min。反应完成后,采用纯化磁珠1.6体积(Ampure XP Beads 1.6x)纯化产物,回收获得17.5μL末端修复产物,具体纯化过程参考二链产物纯化。
(4)加polyA尾
加polyA尾的体系为:向17.5μL末端修复产物中加入12.5μL的末端加A反应混合物(A-Tailing Mix),在37℃反应30min。获得约30μL的加polyA尾反应产物。
(5)加接头
加接头的体系为:向30μL加polyA尾反应产物中加入连接反应混合物 (Ligation Mix)2.5μL、重悬缓冲液(Resuspension Buffer)3μL、接头-标签(Adaptor Index)2μL,混匀后,30℃反应10min。然后向其中加入连接反应中止缓冲液(Stop Ligation Buffer)5μL,结束反应。
(6)样品纯化
加接头产物采用纯化磁珠1倍体积(Ampure XP Beads 1x)纯化,第一次纯化重悬缓冲液(Resuspension Buffer)回收50μL,对对一次纯化产物进行第二次纯化,重悬缓冲液(Resuspension Buffer)回收获得20μL产物。两次纯化的具体过程参考二链产物纯化。
(7)PCR扩增及纯化
对纯化的加接头产物进行PCR扩增,反应体系为:向20μL纯化产物中加入PCR引物混合物(PCR Primer Cocktail)5μL,PCR反应混合物(PCR Master Mix)25μL,混匀后进行反应。
反应条件为,98℃30s,然后进入12个循环:98℃30s、60℃30s、70℃30s,循环结束后72℃5min,然后10℃待机。
PCR扩增产物采用纯化磁珠1倍体积(Ampure XP Beads 1x)纯化,重悬缓冲液(Resuspension Buffer)回收获得30μL产物,纯化的具体过程参考二链产物纯化。
三、核酸测序及结果分析
本例采用Illumina公司的Hiseq4000测序平台进行测序,测序PE100,文库测1G。
(1)将测序结果与P1的序列比较,挑出含基因标签的序列。
(2)将测序结果与ERCC参考数据比对。
结果显示,测序结果中能够明显的检测出基因标签P1的核酸序列,RNA标准品的测序结果也与其序列相同,与预期相符。
并且,P1序列占样品数据比例较低,仅0.0033%,不会浪费数据量,也不会影响到样品使用ERCC内参。
另外,由测序平台直接输出的碱基分布图可见,如图1所示,因为本例加入的片段单一,由于Illumina的测序原理局限性,如果某一序列相同长度相同的片段,比例过高会导致碱基波动,而实践证明,本例加入的单一片段的基因标签并未导致碱基激烈波动,因此不会影响测序。分析原因认为,可能是本例加入的基因标签占测序样品核酸比例小,加上反转录反应阶段,起始位点不一样,使同一片段,最终形成文库后,其片段长度已发生改,测序时序列同步性已被打乱,所以不会导致碱基波动。
因此,总的来说,本例在待测样品中加入设计的基因标签,能够有效的区分核酸样品,并且不会影响测序结果。
此外,对于本例设计的P2至P8七个基因标签,本例也分别进行了相同的试验,结果与P1相当,显示本例设计的基因标签,能够有效的用于区分核酸样品,并不影响测序的正常进行。
另外,本例设计的八个基因标签,除了可以通过不同序列的基因标签对核酸样品进行标识以外,还可以按照不同的核酸样品中添加不同浓度的同一基因标签进行区分,即按照不同核酸样品中添加的基因标签的浓度对核酸样品进行标识。
以上内容是结合具体的实施方式对本申请所作的进一步详细说明,不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本申请的保护范围。

Claims (10)

  1. 一种用于核酸样品标识的基因标签,其特征在于:所述基因标签为一段长度大于130bp的核酸,核酸序列的3’端具有polyA尾,并且核酸序列中随机位置插入有至少一段index序列。
  2. 根据权利要求1所述的基因标签,其特征在于:所述基因标签为一段长度为130bp~160bp的核酸。
  3. 根据权利要求1所述的基因标签,其特征在于:所述index序列的长度为5~10bp。
  4. 根据权利要求1所述的基因标签,其特征在于:所述核酸序列中均匀的插入有1~15个index序列。
  5. 根据权利要求1所述的基因标签,其特征在于:所述polyA尾的长度为15~70bp。
  6. 根据权利要求1所述的基因标签,其特征在于:所述核酸为单链DNA、双链DNA或者RNA。
  7. 根据权利要求1-6任一项所述的基因标签,其特征在于:所述基因标签由Seq ID No.1至Seq ID No.8所示序列的核酸组成。
  8. 根据权利要求1-7任一项所述的基因标签在核酸测序中的应用。
  9. 一种用于核酸样品标识的试剂盒,其特征在于:所述试剂盒中含有权利要求1-7任一项所述的基因标签。
  10. 一种核酸测序方法,其特征在于:包括在原始的DNA或RNA样品中加入权利要求1-7任一项所述的基因标签,然后再进行建库、上机测序。
PCT/CN2016/110457 2016-12-16 2016-12-16 一种用于核酸样品标识的基因标签、试剂盒及其应用 WO2018107481A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/110457 WO2018107481A1 (zh) 2016-12-16 2016-12-16 一种用于核酸样品标识的基因标签、试剂盒及其应用
CN201680091177.4A CN109996877A (zh) 2016-12-16 2016-12-16 一种用于核酸样品标识的基因标签、试剂盒及其应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110457 WO2018107481A1 (zh) 2016-12-16 2016-12-16 一种用于核酸样品标识的基因标签、试剂盒及其应用

Publications (1)

Publication Number Publication Date
WO2018107481A1 true WO2018107481A1 (zh) 2018-06-21

Family

ID=62557880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110457 WO2018107481A1 (zh) 2016-12-16 2016-12-16 一种用于核酸样品标识的基因标签、试剂盒及其应用

Country Status (2)

Country Link
CN (1) CN109996877A (zh)
WO (1) WO2018107481A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110656157A (zh) * 2019-10-16 2020-01-07 重庆市人口和计划生育科学技术研究院 用于高通量测序样本溯源的质控品及其设计和使用方法
CN111304309A (zh) * 2020-03-06 2020-06-19 上海韦翰斯生物医药科技有限公司 一种测序平台标签序列污染的检测方法
CN112251501A (zh) * 2020-10-28 2021-01-22 深圳人体密码基因科技有限公司 一种内参基因集合及其筛选方法、通用引物组、试剂盒、反应体系及应用

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101720359A (zh) * 2007-06-01 2010-06-02 454生命科学公司 从多重混合物中识别个别样本的系统和方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102653784B (zh) * 2011-03-03 2015-01-21 深圳华大基因科技服务有限公司 用于多重核酸测序的标签及其使用方法
CN105349617A (zh) * 2014-08-19 2016-02-24 复旦大学 一种对高通量rna测序数据的质量控制方法及装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101720359A (zh) * 2007-06-01 2010-06-02 454生命科学公司 从多重混合物中识别个别样本的系统和方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110656157A (zh) * 2019-10-16 2020-01-07 重庆市人口和计划生育科学技术研究院 用于高通量测序样本溯源的质控品及其设计和使用方法
CN110656157B (zh) * 2019-10-16 2023-09-08 重庆市人口和计划生育科学技术研究院 用于高通量测序样本溯源的质控品及其设计和使用方法
CN111304309A (zh) * 2020-03-06 2020-06-19 上海韦翰斯生物医药科技有限公司 一种测序平台标签序列污染的检测方法
CN112251501A (zh) * 2020-10-28 2021-01-22 深圳人体密码基因科技有限公司 一种内参基因集合及其筛选方法、通用引物组、试剂盒、反应体系及应用

Also Published As

Publication number Publication date
CN109996877A (zh) 2019-07-09

Similar Documents

Publication Publication Date Title
CN106367485B (zh) 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN106555226B (zh) 一种构建高通量测序文库的方法和试剂盒
CN110036117B (zh) 通过多联短dna片段增加单分子测序的处理量的方法
CN105463585B (zh) 基于单链dna分子构建测序文库的方法及其应用
CN103088433B (zh) 全基因组甲基化高通量测序文库的构建方法及其应用
CN113661249A (zh) 用于分离无细胞dna的组合物和方法
US9334532B2 (en) Complexity reduction method
CN108517567B (zh) 用于cfDNA建库的接头、引物组、试剂盒和建库方法
US9309559B2 (en) Simultaneous extraction of DNA and RNA from FFPE tissues
CN102839168A (zh) 核酸探针及其制备方法和应用
WO2018107481A1 (zh) 一种用于核酸样品标识的基因标签、试剂盒及其应用
WO2017202389A1 (zh) 一种适用于超微量dna测序的接头及其应用
WO2018184495A1 (zh) 一步法构建扩增子文库的方法
WO2018133546A1 (zh) 无创产前胎儿α型地贫基因突变检测文库构建方法、检测方法和试剂盒
WO2023202030A1 (zh) 一种小分子rna的高通量测序文库构建方法
CN111378720A (zh) 长链非编码rna的测序文库构建方法及其应用
US20140336058A1 (en) Method and kit for characterizing rna in a composition
WO2017113148A1 (zh) 检测急性早幼粒细胞白血病相关融合基因的试剂盒
CN111575349B (zh) 一种接头序列及其应用
EP2510114A1 (en) Rna analytics method
US20190218606A1 (en) Methods of reducing errors in deep sequencing
CN112080555A (zh) Dna甲基化检测试剂盒及检测方法
CN113151521B (zh) 桑树赤锈病病原菌Puccinia sp.的核糖体RNA基因及其应用
CN113718343A (zh) 极速rna建库方法及试剂盒
WO2014086037A1 (zh) 构建核酸测序文库的方法及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16923793

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16923793

Country of ref document: EP

Kind code of ref document: A1