WO2017113655A1 - 引物组、锚定引物、试剂盒、文库构建及基因测序方法 - Google Patents

引物组、锚定引物、试剂盒、文库构建及基因测序方法 Download PDF

Info

Publication number
WO2017113655A1
WO2017113655A1 PCT/CN2016/086973 CN2016086973W WO2017113655A1 WO 2017113655 A1 WO2017113655 A1 WO 2017113655A1 CN 2016086973 W CN2016086973 W CN 2016086973W WO 2017113655 A1 WO2017113655 A1 WO 2017113655A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
tested
primer
snp site
region
Prior art date
Application number
PCT/CN2016/086973
Other languages
English (en)
French (fr)
Inventor
盛司潼
钟茂春
Original Assignee
深圳市华因康高通量生物技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市华因康高通量生物技术研究院 filed Critical 深圳市华因康高通量生物技术研究院
Publication of WO2017113655A1 publication Critical patent/WO2017113655A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to the field of molecular biology, and more particularly to a primer set, an anchor primer, a library construction method, and a gene sequencing method.
  • Second-generation high-throughput sequencing technologies include ligation sequencing and synthetic sequencing.
  • the ligation sequencing method is based on the fidelity of a ligase in a ligation reaction between nucleic acid fragments, using a nucleic acid fragment to be sequenced as a template, and anchoring primers (also referred to as sequencing primers, which are to be sequenced)
  • the nucleic acid fragment is complementary to the strand) and the oligonucleotide probe (with a fluorescent label at a specific position of the probe) is ligated, and the fluorescent label on the ligated product is detected to detect fluorescence on the oligonucleotide probe.
  • the information of the sequence corresponding to the specific location of the tag.
  • the synthetic sequencing method is based on the fidelity of the polymerase in the process of extending the nucleic acid strand, and the nucleic acid fragment to be sequenced is used as a template, and the anchor primer is complementaryly bound to the nucleic acid fragment to be sequenced, and the detection is generated during the extension process. A signal is used to determine sequence information at a corresponding position on the nucleic acid fragment to be sequenced.
  • the second generation of high-throughput sequencing technology is currently used for SNP typing because of its high throughput and low cost.
  • a SNP typing method using the second generation high-throughput sequencing technology in the prior art comprises the following steps: A, obtaining a nucleic acid sequence of a SNP site to be tested by primer amplification; B, constructing a product based on the step A Sequencing library; C, anchoring the anchor primer to the sequencing library molecule, and detecting the sequence information of the SNP site to be tested by high-throughput sequencing technology.
  • the anchor primer is generally designed in the vicinity of the SNP locus to be tested.
  • the object of the present invention is to provide a new and widely applicable primer set, anchor primer, kit, library construction and gene sequencing method, aiming at solving the non-specificity of anchor primers in the existing high-throughput gene sequencing technology. Anchored technical issues.
  • the present invention provides a primer set comprising a first primer pair, the first primer pair
  • the upstream primer consists of a first universal region and a first complementary region; the first complementary region is joined to the 3' end of the first universal region, the first complementary region being completely complementary to the first sequence;
  • the sequence is a sequence in the sequence of the SNP site to be tested, at the 3' end of the SNP site to be tested, and the 5' end of the first sequence is 1 to 7 bp from the SNP site to be tested; the first universal region Not complementary to the second sequence; the second sequence is a sequence joined to the 3' end of the first sequence on the sequence of the SNP site to be tested.
  • the first complementary region sequence is between 6 and 12 bp in length.
  • the first universal region sequence has a length between 6 and 16 bp.
  • the 5' end of the first sequence is 1 to 3 bp from the SNP site to be tested.
  • the primer set further comprises a second primer pair, wherein the first primer pair and the second primer pair are respectively an inner primer pair and an outer primer pair for amplifying the sequence of the SNP site to be tested.
  • the downstream primer in the first primer pair is composed of a second universal region and a second complementary region;
  • the second complementary region is connected to the 3' end of the second universal region; the second complementary region is a sequence in the sequence of the SNP site to be tested, and is at the 5' end of the SNP site to be tested;
  • the second universal region is different from the third sequence; the third sequence is a sequence connected to the 5' end of the second complementary region on the sequence of the SNP site to be tested.
  • the first universal area contains U.
  • the present invention also provides an anchoring primer which is completely complementary to a fourth sequence;
  • the fourth sequence is a single-stranded nucleic acid molecule, which is any of the above a sequence in the amplification product of the first primer pair in the primer set; the fourth sequence comprising a first sequence and a first normalization region, the first normalization region and the 3' end of the first sequence Linking, the 5' end of the fourth sequence is 1 to 7 bp from the SNP site to be tested; or the fourth sequence includes a first complementary region and a second normalized region, the second normalized region and the The 5' end of a complementary region is joined, and the 3' end of the fourth sequence is 1 to 7 bp from the SNP site to be tested.
  • the fourth sequence comprises a first sequence and a first normalization zone, wherein the first normalization zone is connected to the 3' end of the first sequence, and the 5' end of the first sequence is to be tested.
  • the fourth sequence includes a first complementary region and a second normalized region, the second normalized region is connected to the 5' end of the first complementary region, and the 3' end of the fourth sequence is separated from the SNP to be tested.
  • the site is 1 to 3 bp.
  • the first complementary region sequence is between 6 and 12 bp in length.
  • the length of the first normalization zone is between 8 and 14 bp.
  • the present invention also provides a kit comprising any of the above primer sets.
  • the kit further comprises any of the anchor primers described above.
  • the present invention also provides another kit, including any of the above anchors. Primer.
  • the kit further comprises any one of the above primer sets.
  • the present invention also provides a library construction method comprising the following steps:
  • the sample to be tested is subjected to PCR amplification to obtain an amplification product containing the SNP site to be tested;
  • the amplification product containing the SNP site to be tested is ligated to a linker to form a library molecule to be sequenced.
  • the step B is:
  • the amplification product containing the SNP site to be tested is directly ligated to the linker to form a library molecule to be sequenced, and is immobilized on the solid phase carrier by microspheres.
  • step B comprises the following steps:
  • the amplification product containing the SNP site to be tested is directly linked to a linker immobilized on the microsphere to form a library molecule to be sequenced immobilized on the microsphere;
  • step B2 The microspheres obtained in step B1 are addressably fixed on a solid phase carrier.
  • the ligation reaction in the step B1 is carried out in a cleavage-ligation reaction system comprising: a ligase, a cleavage agent, a first linker immobilized on the microsphere, and a connection buffer liquid;
  • the first universal area contains U;
  • the cleavage agent is used to specifically cleave U, and the amplification product containing the SNP site to be tested forms a first viscous end;
  • the first linker is a nucleic acid molecule comprising a second cohesive terminus that is fully complementary to the first cohesive terminus.
  • the linking buffer in step B1 contains PEG.
  • the present invention also provides a gene sequencing method comprising the following steps:
  • the sample to be tested is subjected to PCR amplification to obtain an amplification product containing the SNP site to be tested;
  • the amplification product containing the SNP site to be tested is ligated to the linker to form a library molecule to be sequenced;
  • the sequencing library molecules are sequenced to obtain sequence information of the SNP site to be tested.
  • the sequencing method in the step C is a ligation sequencing method; and the steps B and C further include a step D, using any of the above anchor primers, and replacing the fluorescent probe with a non-fluorescent label.
  • a fluorescently labeled oligonucleotide probe is used to perform a ligation and sequencing reaction on the library molecule to be sequenced.
  • the step D comprises the following steps:
  • the present invention specifically designs an amplification primer for a SNP site to be tested, so that the first universal region is included in the vicinity of the SNP site to be tested in the product amplified by the amplification primer, and the different SNPs to be tested are lowered.
  • the primer set of the present invention , anchor primers, kits, library construction and gene sequencing methods are applicable to the detection of many different SNP loci, especially for simultaneous detection of multiple SNP loci, which can avoid the detection efficiency of SNP loci. Due to the particularity of the sequence near the SNP site to be tested, the sequencing results are inaccurate or the sequencing failure phenomenon occurs.
  • FIG. 1 is a view showing the relationship between the upstream primer in the first primer pair and the sequence of the SNP site to be tested in the first exemplary embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing the relationship between a downstream primer in a first primer pair and a sequence of a SNP site to be tested in an embodiment of the present invention.
  • Fig. 3 is a view showing the relationship between the anchor primer and the amplification product of the first primer pair in the second exemplary embodiment of the present invention.
  • Figure 4 is a schematic view showing the structure of a first joint in the first embodiment of the present invention.
  • Figure 5 is a diagram showing the correspondence relationship between the tag sequence of the first linker and the sample source and SNP site in the first embodiment of the present invention.
  • Figure 6 is a flow chart showing the sequencing of the experimental group in the first embodiment of the present invention.
  • Figure 7 is a flow chart showing the sequencing of the first comparative experiment in the first embodiment of the present invention.
  • Figure 8 is a comparison diagram of preliminary experimental results of Sample 2 in the first embodiment of the present invention.
  • Figure 9 is a result of sequencing of rs671 and rs1801253 sites in the third comparative experiment of the present invention.
  • Figure 10 is a result of sequencing of rs671 and rs1801253 sites in the first embodiment of the present invention.
  • Figure 11 is a result of sequencing of the rs1799853 site in the third comparative experiment of the present invention.
  • Figure 12 is a result of sequencing of the rs1799853 site in the first embodiment of the present invention.
  • the present invention proposes a first exemplary embodiment, a primer set comprising a first primer pair, as shown in FIG. 1, the upstream primer in the first primer pair is composed of a first universal region and a first complementary region; a first complementary region is ligated to the 3' end of the first universal region, the first complementary region being completely complementary to the first sequence; the first sequence is a sequence in the sequence of the SNP site to be tested, Measuring the 3' end of the SNP site, the 5' end of the first sequence is 1 to 7 bp from the SNP site to be tested; the first universal region is not complementary to the second sequence; the second sequence is the SNP to be tested A sequence on the sequence of the site that is joined to the 3' end of the first sequence.
  • the primer set can be used for amplifying the sequence of the SNP site to be tested, and the obtained amplification product can be used for constructing a sequencing library containing the SNP site to be tested, thereby performing high-throughput of the SNP site to be tested. Sequencing detection.
  • the upstream primer in the first primer pair the present invention adds a sequence containing the first universal region in the vicinity of the SNP site to be tested based on the first primer pair.
  • the sequence of the first universal region is neither a sequence near the SNP site to be tested in the template molecule of the sample to be tested nor a complementary sequence near the SNP site to be tested in the template molecule of the sample to be tested.
  • the first universal region adds a normalization sequence to one end of the amplification product for the SNP site to be tested.
  • the first universal region is not complementary to the second sequence, has no repeating sequence, and does not itself form a hairpin.
  • the program is especially suitable for the sample to be tested with a complex sequence structure near the SNP site to be tested on the template molecule in the sample to be tested, which can effectively simplify the sequence structure near the SNP site to be tested and reduce the design difficulty of the anchor primer. Effectively ensure that the anchoring primer is accurately anchored at the target position, avoiding the inaccurate sequencing results or the occurrence of sequencing failure due to the particularity of the sequence near the SNP site to be tested, and is suitable for the detection of various types of SNP sites to be tested. .
  • the first universal region sequence may be the same for different SNP sites to be tested.
  • the 5' end of the first sequence is 1 to 7 bp away from the SNP site to be tested, so that when the high-throughput gene sequencing of the library molecule to be sequenced is obtained based on the first primer pair, the end of the anchor is anchored.
  • the distance between the SNP sites is between 1 and 7 bp.
  • the 5' end of the first sequence is 1 to 3 bp from the SNP site to be tested.
  • the program can effectively control the spacing between the SNP site to be tested and the first universal sequence, such that the anchor primer contains a first universal sequence of sufficient length or a complementary sequence of the first universal sequence;
  • the present invention provides an embodiment based on the first exemplary embodiment, wherein the sequence of the first complementary region is 6 to 12 bp in length. between.
  • the scheme ensures the specific set of the upstream primer in the first primer pair and the template sequence containing the SNP site to be tested, and avoids Non-specific amplification is eliminated, and sufficient discrimination between sequences near different SNP sites is ensured. More preferably, the first complementary region sequence is between 7 and 10 bp in length.
  • the present invention proposes another embodiment based on the above embodiments, wherein the first universal region sequence length is between 6 and 16 bp.
  • the first universal sequence largely avoids the possibility of forming a secondary structure within the first universal sequence due to the excessively long sequence or too short sequence, or the possibility of forming a secondary structure between the first universal sequence and other sequences.
  • Sexuality reduces the design difficulty of anchoring primers.
  • the primer set further includes a second primer pair, wherein the first primer pair and the second primer pair respectively perform a sequence of the SNP site to be tested. Amplified inner primer pair and outer primer pair.
  • the nested PCR amplification of the sequence of the SNP locus to be tested can be performed by using the inner primer pair and the outer primer, thereby improving the purity of the target molecule in the amplification product of the first primer pair.
  • the present invention proposes another specific embodiment, as shown in FIG. 2, the downstream primer in the first primer pair is composed of The second universal region and the second complementary region are composed; the second complementary region is connected to the 3′ end of the second universal region; and the second complementary region is a sequence in the sequence of the SNP site to be tested, The 5' end of the SNP site to be tested; the second universal region is different from the third sequence; the third sequence is a sequence connected to the 5' end of the second complementary region on the sequence of the SNP site to be tested.
  • the sequence of the second universal region is neither a sequence near the SNP site to be tested in the template molecule of the sample to be tested nor a complementary sequence near the SNP site to be tested in the template molecule of the sample to be tested. There is no repeating sequence in the second universal region, and the hair clip is not formed by itself.
  • the first primer pair based on the first primer pair has a universal sequence at each end of the SNP site to be tested, and the first universal region and the second universal region sequence may be the same for different SNP sites to be tested; Thereby reducing the sequence complexity of different regions near the SNP site to be tested. At the same time, the amplification efficiency for each SNP site to be tested is more consistent.
  • the present invention provides an embodiment in which the first universal region contains a cleavable site.
  • the cleavable site can be specifically cleaved by the cleavage agent such that the amplification product obtained based on the first primer pair forms a first viscous end.
  • the cleavable site is a ribonucleotide, an RNA sequence or a restriction enzyme cleavage site.
  • the cleavage agent is RNase H, RNase H or a restriction enzyme.
  • the cleavage agent is preferably a USER enzyme.
  • the cleavable site is from 1 to 6 bases from the 5' end of the strand.
  • the complementary pairing reaction of the first viscous end with the second viscous end of the linker has a higher reaction efficiency, more preferably 4 or 5 bp, and the efficiency of the ligation reaction is the highest at this time.
  • the present invention proposes a second exemplary embodiment, as shown in FIG. 3, an anchoring primer, which is completely complementary to the fourth sequence;
  • the fourth sequence is a single-stranded nucleic acid molecule, which is a sequence in an amplification product of a first primer pair in any of the above primer sets; as shown in FIG. 3a, the fourth sequence includes a first sequence and a first normalization region, The first normalization region is ligated to the 3' end of the first sequence, and the 5' end of the fourth sequence is 1 to 7 bp from the SNP site to be tested; or as shown in FIG.
  • the fourth sequence includes a first complementary region and a second normalized region, wherein the second normalized region is joined to the 5' end of the first complementary region, and the 3' end of the fourth sequence is 1 to 7 bp from the SNP site to be tested.
  • the first normalization region is a sequence of the first universal region connected to the 3′ end of the first sequence; the second normalization region is the first universal region and the first A sequence of connections at the 5' end of a complementary region.
  • the anchor primers in the present scheme are designed based on a sequencing library constructed by amplification products of the first primer pair of the primer set in any of the above schemes, and the anchor primers have a common first return for different SNP sites to be tested.
  • a sequence or a second normalized sequence which can be accurately anchored at the target position, avoiding the inaccuracy of sequencing results or the occurrence of sequencing failure due to the particularity of the sequence near the SNP site to be tested, and
  • the binding primer and the target position are more consistent and normalized, and the difference between the anchoring primers and the target position is reduced, thereby reducing the difference of the detection signals of the SNPs to be tested, and improving the accuracy of sequencing.
  • the fourth sequence includes a first sequence and a first normalization zone, and the first normalization zone is connected to the 3' end of the first sequence, The 5' end of the first sequence is 1 to 3 bp from the SNP site to be tested; or the fourth sequence includes a first complementary region and a second normalized region, the second normalized region and the first complementary region The 5' end is ligated, and the 3' end of the fourth sequence is 1 to 3 bp from the SNP site to be tested.
  • the distance between the end of the anchoring primer for extension and the SNP site to be tested is 1 to 3 bp, and when sequencing is performed using the ligation sequencing technique, it is ensured that only one anchoring primer and a specific position are fluorescently labeled.
  • the probe is connected to detect the SNP site to be tested, and the efficiency of high-throughput gene sequencing is improved, and 1 to 3 bp is in the high fidelity range of the fidelity range (1 to 7 bp) of the T4 ligase.
  • the fourth sequence when sequencing is performed using a synthetic sequencing technique, the fourth sequence includes a first complementary region and a second normalized region, and the second normalized region and the first The 5' end of the complementary region is ligated, and the 3' end of the fourth sequence is 1 bp away from the SNP site to be tested. At this time, only 1 bp needs to be extended, that is, 1 cycle of reaction is performed, and the SNP site to be tested can be realized. The detection is high in detection efficiency.
  • the first complementary region sequence is between 6 and 12 bp in length. This program further protects The specificity of binding of the anchor primer to the target position near the SNP site to be tested was confirmed. More preferably, the first complementary region sequence is between 7 and 10 bp in length.
  • the first normalized zone has a length between 8 and 14 bp. This scheme ensures the consistency of the binding efficiency of anchor primers and target locations designed for different SNP loci to be tested.
  • the present invention proposes a third exemplary embodiment, a kit comprising any of the above primer sets.
  • the kit in the present scheme may be an amplification kit, a library construction kit, or a sequencing kit.
  • the kit may further include other reagents required for PCR amplification; when the kit is a library construction kit, it may also include other reagents required for library construction
  • the reagent required for PCR amplification may also be included; when the kit is a sequencing kit, it may include other reagents required for sequencing, and may also include reagents required for PCR amplification, and may also include a library. The reagents needed during the construction process.
  • the kit further comprises any of the anchor primers described above.
  • the present invention proposes a fourth exemplary embodiment, a kit comprising any of the anchor primers described above.
  • the kit in the present scheme is generally a sequencing kit, which may include other reagents required for the sequencing process, and may also include reagents required for PCR amplification, and may also include reagents required for library construction.
  • the kit further comprises any of the primer sets described above.
  • the present invention proposes a fifth exemplary embodiment, a library construction method, comprising the following steps:
  • the sample to be tested is subjected to PCR amplification to obtain an amplification product containing the SNP site to be tested;
  • the amplification product containing the SNP site to be tested is ligated to a linker to form a library molecule to be sequenced.
  • the library molecules to be sequenced in the present scheme are obtained based on the amplification products obtained by PCR amplification of the sample to be tested by any one of the above primer sets, and therefore, an increase is added near the SNP site to be tested.
  • the first universal region is not complementary to the second sequence, has no repeating sequence, and does not itself form a hairpin.
  • the sequence structure near the SNP site to be tested can be effectively simplified, and the design difficulty of the anchor primer is reduced, which is especially suitable for the sample to be tested with a complex sequence structure near the SNP site to be tested on the template molecule in the sample to be tested, to avoid
  • the inaccurate sequencing results or the occurrence of sequencing failure caused by the particularity of the sequence near the SNP site to be tested are applicable to the detection of various types of SNP sites to be tested.
  • the first universal region sequence may be the same for different SNP sites to be tested.
  • only one of the amplification products containing the SNP site to be tested is ligated to a linker to form a library molecule to be sequenced.
  • the terminus can be either end of the amplification product containing the SNP site to be tested.
  • both ends of the amplification product containing the SNP site to be tested are ligated to a linker to form a library molecule to be sequenced.
  • sequences of the connectors connected at both ends may be the same or different.
  • the sequences of the linkers connected at both ends are different, so that the amplification of the library molecules to be sequenced based on the two different linkers can be verified, whether the library molecule structure is correct, or the library molecules can be sequenced based on the two different linkers.
  • Single molecule amplification is performed, and then a single-generation high-throughput gene sequencing is performed on the single-molecule amplification product to obtain sequence information of the SNP site to be tested.
  • the present invention provides another embodiment, and the step B is:
  • the amplification product containing the SNP site to be tested is directly ligated to the linker to form a library molecule to be sequenced, and is immobilized on the solid phase carrier by microspheres.
  • the amplification product containing the SNP site to be tested is directly connected to the linker, which can effectively reduce the experimental steps and improve the experimental efficiency.
  • step B may have various embodiments, which will be described below through various embodiments.
  • the step B comprises the following steps:
  • the amplification product containing the SNP site to be tested is directly linked to a linker immobilized on the microsphere to form a library molecule to be sequenced immobilized on the microsphere;
  • step B2 The microspheres obtained in step B1 are addressably fixed on a solid phase carrier.
  • the step B comprises the following steps:
  • the amplification product containing the SNP site to be tested is directly linked to the linker, and then the ligation product is immobilized on the microsphere, and the library molecule to be sequenced immobilized on the microsphere is obtained;
  • step B2 The microspheres obtained in step B1 are addressably fixed on a solid phase carrier.
  • the joint contains a modification mark for fixing it on the microsphere.
  • the modified label may be biotin, avidin, streptavidin, antigen, antibody, receptor, ligand, polyhistidine, nano gold, iodoacetyl, thiol, amino, aldehyde, carboxyl, Isothiocyanato, silane or acrylamide, all of which are capable of specifically binding to a corresponding group or molecule.
  • the addressable fixed means that the position information can be fixed. That is, the library molecules to be sequenced immobilized at each specific position on the immobilization carrier can be clearly distinguished from the library molecules to be sequenced immobilized at other specific positions.
  • the joint may take a variety of forms including, but not limited to, a flat end joint, a protruding end joint, a bifurcated joint, or a joint containing a stem-and-loop structure.
  • the blunt end linker refers to a double stranded nucleic acid linker that is fully complementary paired between the double strands.
  • the 5' ends of the two ends of the blunt-ended linker do not contain a phosphate group, which avoids the occurrence of linker self-ligation during the ligation process, reduces interference with subsequent sequencing experiments, and improves the accuracy of gene detection. .
  • the overhanging terminator refers to a double-stranded nucleic acid linker in which at least one end of the double-stranded nucleic acid molecule has an overhanging nucleotide sequence and the remaining nucleotides are completely complementary.
  • the overhanging end linker can be a single overhanging end or a double overhanging end containing two overhanging ends, which can be on one nucleotide strand or on a different nucleotide strand.
  • the protruding end joint can avoid the occurrence of joint self-connection phenomenon during the connection process, reduce interference to subsequent sequencing experiments, and improve the accuracy of gene detection.
  • the linker is preferably a single overhanging end linker, and the overhanging end is the 3' end of the chain in which it is located, and the base is T; the linker is capable of amplifying with A via Taq enzyme The tail PCR amplification products are directly linked to improve the efficiency of the ligation.
  • the bifurcated linker comprises a complementary region and a bifurcated region, wherein the complementary region double-stranded nucleotides are complementary paired, and the paired nucleotide pairs are not limited.
  • the end of the complementary region can be a blunt end or a protruding end.
  • the bifurcated joint can avoid the phenomenon of joint self-connection during the connection process, reduce interference to subsequent sequencing experiments, and improve the accuracy of SNP typing detection.
  • the linker is preferably a T-terminal bifurcation link in which the 3' end of the complementary region is a protruding end and the last base of the overhang is T; the linker is capable of amplifying by Taq enzyme
  • the A-tailed PCR amplification products are directly ligated to increase the efficiency of the ligation.
  • the joint with a stem-and-loop structure has various embodiments.
  • the linker is a single-stranded nucleic acid molecule comprising, in order, a first complementary pairing region, a stem-loop region, and a second complementary pairing region, the first complementary pairing region being capable of being identical to the second complementary pairing region Complementary pairing.
  • the linker with a stem-loop structure can also have a protruding end that can be located at the 3' end of the single-stranded nucleic acid molecule. The presence of the protruding end 4 can prevent the occurrence of the self-ligation of the joint, reduce the interference to the subsequent sequencing experiments, and improve the accuracy of the SNP typing detection.
  • the overhang is preferably T; the linker is directly ligated to the A-tailed PCR amplification product amplified by the Taq enzyme to increase ligation efficiency.
  • the embodiment in which the joint is previously fixed to the microspheres reduces the step of connecting the product to the microspheres, and the experiment is more efficient.
  • the entire reaction system is periodically oscillated during the joining process, and the joints fixed on the microspheres can be effectively improved.
  • the linking efficiency of the amplified product obtained in the step A avoids the occurrence of low connection efficiency caused by stratification of each component in the reaction system.
  • the amplification product containing the SNP site to be tested obtained in the step A is directly linked to the linker in the step B.
  • the ligation that is, the ligation reaction is carried out directly without purification, which results in a larger reaction system, a lower concentration of DNA molecules, and a lower connection efficiency.
  • PEG is added to the reaction system of step B.
  • PEG can not only increase the effective density of the molecules in the reaction system, increase the probability of contact between the linker and the corresponding amplification product, but also increase the density of the reaction system and prevent the sedimentation of the microspheres with the joint fixed; Both aspects effectively increase the efficiency of the attachment of the linker immobilized on the microsphere to the corresponding amplification product.
  • the specific concentration of PEG can be calculated according to the density of the microspheres in the step B and the density of the original reaction system, so that the density of the microspheres is substantially the same as the density of the reaction system after the addition of PEG.
  • the present invention further provides an embodiment, wherein the ligation reaction in the step B1 is performed in a cleavage-ligation reaction system, and the cleavage-ligation reaction system comprises: a ligase, a cleavage agent, and a micro-fixation a first linker and a ligation buffer on the sphere; the first universal region contains a cleavable site; the cleavable site is specifically cleaved by a cleaving agent such that an amplification product obtained based on the first primer pair is formed a first cohesive term; the first linker is a nucleic acid molecule comprising a second cohesive terminus that is fully complementary to the first cohesive terminus.
  • the amplification product is directly connected to the first linker containing the second viscous end after the first viscous end is cut by the cleavage agent, thereby improving the connection efficiency.
  • the cleavable site is a ribonucleotide, an RNA sequence or a restriction enzyme cleavage site.
  • the cleavage agent is RNase H, RNase H or a restriction enzyme.
  • the cleavage agent is preferably a USER enzyme.
  • the present invention provides a sixth exemplary embodiment, a gene sequencing method comprising the following steps:
  • the sample to be tested is subjected to PCR amplification to obtain an amplification product containing the SNP site to be tested;
  • the amplification product containing the SNP site to be tested is ligated to the linker to form a library molecule to be sequenced;
  • the sequencing library molecules are sequenced to obtain sequence information of the SNP site to be tested.
  • steps A and B are library construction steps, and the library construction can be carried out by using any of the above library construction methods.
  • the following is mainly explained by further explaining step C.
  • the sequencing method in step C may be a ligation sequencing method or a synthetic sequencing method.
  • the anchoring primer can be accurately anchored at the target position, avoiding the inaccurate sequencing result or the occurrence of sequencing failure due to the particularity of the sequence near the SNP site to be tested, and making each anchoring
  • the combination of primers and target positions is more consistent and normalized, and the difference between the binding primers and the target position is reduced, thereby reducing the difference of the detection signals of the SNPs to be tested and improving the accuracy of sequencing.
  • the sequencing method in the step C is a connection sequencing method, and the steps B and C further include a step D, using the anchor primer of any one of claims 7 to 11,
  • the fluorescently labeled oligonucleotide probe is replaced with a fluorescently labeled oligonucleotide probe, and the sequencing library molecule is subjected to a ligation sequencing reaction.
  • the non-fluorescent labeled oligonucleotide probe has the same sequence as the fluorescently labeled oligonucleotide probe used in the ligation sequencing technique, and the difference lies only in the presence or absence of fluorescent labeling; That is, the non-fluorescently labeled oligonucleotide probe sequence is (NNN...-N)n, and the N is A, G, C or T, and n is a positive integer.
  • Each ligation sequence includes the following steps: anchoring primer anchoring, rinsing (removing excess unanchored primers), probe ligation, rinsing (removing excess probes, ligase, etc.), taking pictures (obtaining probes) The sequence information of the position corresponding to the fluorescent label), denatured and eluted the ligation product (so that the anchoring of the anchor primer in the next ligation reaction was ligated).
  • the ligation reaction in step C because the non-fluorescent-labeled oligonucleotide probe is used, the mapping step can be omitted to reduce the experimental steps and improve the experimental efficiency. Of course, if the mapping step is performed, it can be verified whether the fluorescent probe used in the time is fluorescent-free, which has a reconfirmation effect.
  • the case of the anchored primer and the non-fluorescently labeled oligonucleotide in step D is eluted and removed, it cannot function as a seal.
  • the inventors of the present application have found in a specific experiment that Prior to ligation, add a fluorescent-labeled oligonucleotide probe to replace the fluorescently labeled probe for a ligation sequence, effectively reducing the false signal that occurs in subsequent ligation sequencing, thereby reducing sequencing by sequencing
  • the interference signal from the initial result to the analysis of the final sequence information improves the accuracy of SNP site detection.
  • the specific reason may be that the target binding site of the non-anchor primer and/or the target binding site of the non-probe are blocked after step D, so that anchoring primers and/or probes can be more in subsequent ligation sequencing experiments. More accurate binding at the target binding site, thereby reducing the generation of false signals, thereby increasing the proportion of correct signals in the gene detection process, and improving the accuracy of detecting SNP sites.
  • said n is between 6 and 10; more preferably between 7 and 9.
  • the non-fluorescently labeled oligonucleotide probe is identical to the fluorescently labeled oligonucleotide probe sequence in step D, with the only difference being the presence or absence of a fluorescent label.
  • This protocol can effectively reduce the design difficulty of fluorescent-labeled oligonucleotide probes.
  • the step D comprises the following steps:
  • the ligation product of step D3 is a ligation product of an anchor primer and a non-fluorescent labeled oligonucleotide probe, which is a single-stranded nucleic acid molecule and can be used with the library molecule to be sequenced. Complementary pairing. Place
  • the denaturation can be achieved by physical methods, such as increasing the temperature of the double-stranded nucleic acid molecule (the library molecule to be sequenced and the ligation product), or by chemical means, such as changing the double-stranded nucleic acid molecule (the library molecule to be sequenced and The pH at which the product is attached. Among them, chemical methods can be used only by adding acidic or alkaline reagents, and no additional heating parts are required, so that automation can be realized more simply and effectively.
  • the step D3 is: rinsing with 0.05M-0.15M NaOH.
  • the CYP2C9 gene*2 (rs1799853) site, the ALDH2 gene rs671 site, and the ADRB1 gene rs1801253 site were simultaneously detected using 10 normal human blood genomic DNAs as a template.
  • the primer pairs in rs1799853 are: SEQ ID NO: 1 and SEQ ID NO: 2; the rs1799853 outer primer pair is: SEQ ID NO: 3 and SEQ ID NO: 4; the primer pair in rs671 is: SEQ ID NO: 5 and SEQ ID NO:6; rs671 outer primer pair is: SEQ ID NO: 7 and SEQ ID NO: 8; rs1801253 primer pair is: SEQ ID NO: 9 and SEQ ID NO: 10; rs1801253 outer primer pair is: SEQ ID NO: 11 and SEQ ID NO: 12.
  • the blood genomic DNA was used as a template, and the sequence of each SNP locus was amplified by the above-mentioned external primer pair.
  • the reaction system was: F primer (10 ⁇ M), 0.4 ⁇ L; R primer (10 ⁇ M), 0.4 ⁇ L; dNTP ( 2.5 mM each, 2 ⁇ L; blood genomic DNA, 50 ng; Ex Taq (5 U/ ⁇ L), 0.1 ⁇ L; 10 ⁇ Ex Taq Buffer, 2 ⁇ L; ddH 2 O was added to 20 ⁇ L.
  • the PCR reaction conditions were as follows: 94 ° C for 5 min; 94 ° C for 20 s, 57 ° C for 20 s, 72 ° C for 25 s; repeated 30 cycles; 72 ° C for 3 min.
  • the resulting product is the first PCR amplification product.
  • the sequence of each SNP locus was amplified by the above internal primer pair, and the reaction system was: F primer (10 ⁇ M), 0.4 ⁇ L; R primer (10 ⁇ M), 0.4 ⁇ L; dNTPs (2.5 mM each), 2 ⁇ L; first PCR amplification product, 0.2 ⁇ L; Ex Taq (5 U/ ⁇ L), 0.1 ⁇ L; 10 ⁇ Ex Taq Buffer, 2 ⁇ L; ddH 2 O was added to 20 ⁇ L.
  • the PCR reaction conditions were as follows: 95 ° C for 5 min; 94 ° C for 20 s, 40 ° C for 20 s, and 57 ° C for 25 s; repeat 5 cycles; 94 ° C for 20 s, 57 ° C for 20 s, 72 ° C for 25 s; repeat 30 cycles; 72 ° C for 3 min.
  • the resulting product is the second amplification product.
  • the second amplification product of all samples contained the target molecule by agarose gel electrophoresis, and the gel electrophoresis pattern showed no bands and a single band; indicating the second PCR Amplification succeeded in obtaining the desired PCR amplification product.
  • the specific embodiment has designed a tag sequence on the first linker.
  • the basic structure of the first linker is shown in Figure 4 (SEQ ID NO: 13, SEQ ID NO: 14).
  • the NNNN in the linker is a tag sequence, and the correspondence between the tag sequence and the sample source and SNP site is as shown in FIG. 5.
  • Each of the first linkers described above is separately bound to streptavidin-modified Myone magnetic beads (Invitrogen), so that each of the first joints is immobilized on the surface of the magnetic beads, and the reaction system and the reaction process are: 200 ng first
  • the connector was mixed with 4 ⁇ L (about 4 ⁇ 10 7 magnetic beads) Myone magnetic beads in a spiral shake, reacted for 30 min, washed twice with appropriate amount of TE buffer (10 mM Tris-HCl, pH 8.0; 1 mM EDTA), and centrifuged.
  • the obtained magnetic beads were resuspended in 4 ⁇ L of binding buffer (10 mM Tris-HCl, pH 7.5; 1 mM EDTA; 1 M NaCl; 0.01% Triton X-100) to obtain magnetic beads to which the first linker was immobilized.
  • the following reaction system was separately configured: 20 ⁇ L of the amplified product obtained in the first step; USER enzyme (1 U/ ⁇ L, NEB, Cat# M5505S), 10 ⁇ L; buffer, 8 ⁇ L; T4 DNA ligase, 2 ⁇ L; Magnetic beads with the first linker fixed, 0.4 ⁇ L; add ddH 2 O to 40 ⁇ L.
  • the buffer is a solution containing 400 mM Tris, 100 mM MgCl2, 100 mM DTT, 5 mM ATP, 25% PEG 6000, pH 7.8.
  • the reaction was carried out at 25 ° C for 20 minutes to obtain a ligation product, that is, a library molecule to be sequenced.
  • the product obtained in the step 2 was spotted to an isothiocyanate-modified sample (slide) and fixed at 37 ° C for 1 h to complete the addressable immobilization of the molecules of the library to be sequenced immobilized on the magnetic beads.
  • the molecules of the library to be sequenced immobilized on the solid phase carrier obtained in the second step are sequenced.
  • the high-throughput gene sequencer Pstar II A sequencer of Shenzhen Huayinkang Gene Technology Co., Ltd. was used, and the sequencing was performed by the ligation sequencing method. During the sequencing process,
  • the anchor primer used is SEQ ID NO: 15; for the rs671 locus, the The anchor primer is SEQ ID NO: 16; for the rs1801253 site, the anchor primer used is SEQ ID NO: 17.
  • the tag sequence on the first linker is also detected, and the anchor primer for detecting the tag sequence is SEQ ID NO: 18.
  • SEQ ID NOS: 15-18 are all phosphorylated.
  • the SNP site anchor mixture to be tested is a mixture obtained by mixing the three anchor primers of SEQ ID NOS: 15-17 in the same number of moles.
  • Base1 is used to block the anchoring primer or probe and the non-specific binding site on the molecule of the library to be sequenced
  • Base2 is used to detect the SNP site to be tested
  • Base3-6 is used to detect the tag sequence.
  • Figure 8 shows the preliminary experimental results of the SNP sites of sample 2, rs1799853, rs671, rs1801253, in the experimental group and the first comparison group in the above specific examples.
  • R A/G
  • Y C/T
  • M A/C
  • K G/T
  • S C/G
  • W A/T
  • H A/C/T
  • B C/ G/T
  • V A/C/G
  • D A/G/T
  • N A/C/G/T
  • n indicates that the signal is too weak to determine which of A, C, G, and T is .
  • the inventors also verified the second amplification product by Sanger sequencing.
  • the results showed that the SNP site sequence types with the highest proportion in the above experimental group and the first comparison group were sequenced with Sanger. The results are exactly the same.
  • the inventors of the present experiment also conducted a third comparative experiment, and simultaneously detected the CYP2C9 gene*2 by using the blood genomic DNA of a normal human in the above specific example as a template. (rs1799853) locus, rsH2 gene rs671 locus, ADRB1 gene rs1801253 locus.
  • the primer pairs in rs1799853 are: SEQ ID NO: 19 and SEQ ID NO: 20; the rs1799853 outer primer pair is: SEQ ID NO: 3 and SEQ ID NO: 4; the primer pair in rs671 is: SEQ ID NO: 21 and SEQ ID NO:22; rs671 outer primer pair is: SEQ ID NO: 7 and SEQ ID NO: 8; rs1801253 primer pair is: SEQ ID NO: 23 and SEQ ID NO: 24; rs1801253 outer primer pair is: SEQ ID NO: 11 and SEQ ID NO: 12.
  • the blood genomic DNA was used as a template, and the sequence of each SNP locus was amplified by the above-mentioned external primer pair.
  • the reaction system was: F primer (10 ⁇ M), 0.4 ⁇ L; R primer (10 ⁇ M), 0.4 ⁇ L; dNTP ( 2.5 mM each, 2 ⁇ L; blood genomic DNA, 50 ng; Ex Taq (5 U/ ⁇ L), 0.1 ⁇ L; 10 ⁇ Ex Taq Buffer, 2 ⁇ L; ddH 2 O was added to 20 ⁇ L.
  • the PCR reaction conditions were as follows: 94 ° C for 5 min; 94 ° C for 20 s, 57 ° C for 20 s, 72 ° C for 25 s; repeated 30 cycles; 72 ° C for 3 min.
  • the resulting product is the first PCR amplification product.
  • the sequence of each SNP locus was amplified by the above internal primer pair, and the reaction system was: F primer (10 ⁇ M), 0.4 ⁇ L; R primer (10 ⁇ M), 0.4 ⁇ L; dNTPs (2.5 mM each), 2 ⁇ L; first PCR amplification product, 0.2 ⁇ L; Ex Taq (5 U/ ⁇ L), 0.1 ⁇ L; 10 ⁇ Ex Taq Buffer, 2 ⁇ L; ddH 2 O was added to 20 ⁇ L.
  • the PCR reaction conditions were as follows: 95 ° C for 3 min; 94 ° C for 20 s, 57 ° C for 20 s, 72 ° C for 30 s; repeated 30 cycles; 72 ° C for 3 min.
  • the resulting product is the second amplification product.
  • the second amplification product of all samples contained the target molecule by agarose gel electrophoresis, and the gel electrophoresis pattern showed no bands and a single band; indicating the second PCR Amplification succeeded in obtaining the desired PCR amplification product.
  • the sequencing library was constructed in the same manner as in the previous embodiment and the library molecules to be sequenced were addressable and immobilized.
  • the molecules of the library to be sequenced immobilized on the solid phase carrier obtained in the second step are sequenced.
  • the high-throughput gene sequencer Pstar II A sequencer from Shenzhen Huainkang Gene Technology Co., Ltd. was also used, and the sequencing was performed by ligation sequencing.
  • the anchor primer used was SEQ ID NO: 25 for the rs1799853 site; the anchor primer used for the rs671 site was SEQ ID NO: 26; for the rs1801253 site, the anchor used The primer was SEQ ID NO:27.
  • the tag sequence on the first linker is also detected, and the anchor primer for detecting the tag sequence is SEQ ID NO: 18.
  • SEQ ID NOS: 18 and 25-27 are all phosphorylated.
  • the sequencing is performed according to the sequence of FIG. 6; in addition, the products obtained in the second step of the specific embodiment are replaced with the anchor primers in the order of FIG. Sequencing, as a control.
  • the reason why the experimental group and the control group failed to be sequenced may be that a complex structure exists in the sequence near the two sites of rs671 and rs1801253, resulting in an anchor.
  • the primers could not be anchored, and thus the anchor primers could not be connected to the probes, so there was no signal when taking pictures.
  • the rs1799853 locus in the specific example of the experimental group and the control group were compared with the sequencing results of the experimental group and the control group in the previous specific embodiment, respectively, the fluorescence signal was significantly weaker.
  • the reason may be that in the specific embodiment, the binding effect of the anchor primer and the library molecule to be sequenced is inferior to the previous embodiment.
  • the primer pairs in rs1799853 in the above examples are directly used as: SEQ ID NO: 1 and SEQ ID NO: 2; primer pairs in rs671 are: SEQ ID NO: 5 and SEQ ID
  • the primer pair in NO:6; rs1801253 is: SEQ ID NO: 9 and SEQ ID NO: 10.
  • the above samples 1-10 were subjected to PCR amplification, and then the subsequent steps 2 and 3 were carried out, and the obtained experimental results were basically consistent with the results of the first specific example.
  • the number of SNP locus sequence types in the experimental group was the highest. More than 93%, and the number of SNP locus sequence types with the highest proportion in the control group was about 80%.
  • the number of detections and the total number of detections in the experimental group and the control group were lower than those in the previous example, and the number of SNP locus sequence types was the highest in both the experimental group and the control group. The proportion has decreased.
  • the primers used in the primers are all U, and are designed on the upstream primer, and the fragmentation sequence (corresponding to the first sticky end) is CGGU.
  • the cleavable site can also be designed on the downstream primer.
  • the fragmentation sequence can be designed as needed, as long as it can meet the requirements of subsequent connection experiments, such as: GCCU, AAAU, CGCU, GCCCU, CGGCU, TTTTU, and the like. As the fragmentation sequence changes, the relevant sequence of the first linker is subject to relevant adjustments.
  • the method of the invention can simultaneously simultaneously detect multiple SNP sites of multiple samples, and obtain accurate and credible results, and can effectively reduce the error signal in the result of the connection sequencing, and improve the SNP site. Detection The accuracy.

Abstract

提供了一种引物组、锚定引物、文库构建方法及基因测序方法。所述引物组包括第一引物对,所述第一引物对中的上游引物由第一通用区和第一互补区组成。所述第一互补区与所述第一通用区的3'末端连接,所述第一互补区与第一序列完全互补;所述第一序列为待测SNP位点所在序列中的一段序列,处于待测SNP位点的3'端,所述第一序列的5'末端距待测SNP位点1至7bp。所述第一通用区不与第二序列互补;所述第二序列为待测SNP位点所在序列上与第一序列的3'末端连接的一段序列。

Description

引物组、锚定引物、试剂盒、文库构建及基因测序方法 技术领域
本发明涉及分子生物学领域,更具体地说,涉及一种引物组、锚定引物、文库构建方法及基因测序方法。
背景技术
第二代高通量测序技术包括连接测序法和合成测序法。其中,所述连接测序法是基于连接酶在核酸片段之间进行连接反应的过程中的保真性来实现的,以待测序核酸片段为模板,锚定引物(又称测序引物,其与待测序核酸片段所在链互补)和寡核苷酸探针(该探针的特定位置上带有荧光标记)进行连接反应,通过检测连接产物上的荧光标记从而确定寡核苷酸探针上带有荧光标记的特定位置对应的序列的信息。所述合成测序法是基于聚合酶在延伸核酸链过程中的保真性来实现的,以待测序核酸片段为模板,锚定引物互补结合至待测序核酸片段上,通过检测在延伸过程中产生的信号来确定待测序核酸片段上相应位置的序列信息。第二代高通量测序技术因为其高通量和低成本,目前用于SNP分型检测。
现有技术中的一种利用第二代高通量测序技术的SNP分型方法,包括以下步骤:A、通过引物扩增获得待测SNP位点的核酸序列;B、基于步骤A的产物构建测序文库;C、将锚定引物锚定在测序文库分子上,通过高通量测序技术检测待测SNP位点的序列信息。为了加快待测SNP位点的检测,减少测序的循环数,提高SNP位点检测效率,一般将锚定引物设计在待测SNP位点的附近。但是这种设计方法,往往会因为SNP位点所在序列的特殊性,例如该SNP位点附近序列存在特殊结构——重复序列、发夹情况等,导致按照上述方法设计的锚定引物,无法准确的锚定在预定位置,造成测序结果不准确或测序失败;使得该方法无法大规模推广。
因此需要一种新的适用范围广的引物组、锚定引物、试剂盒、文库构建及基因测序方法。
发明内容
本发明的目的在于提供一种新的适用范围广的引物组、锚定引物、试剂盒、文库构建及基因测序方法,旨在解决现有的高通量基因测序技术中锚定引物的非特异性锚定的技术问题。
为了实现发明目的,本发明提供了一种引物组,包括第一引物对,所述第一引物对中的 上游引物由第一通用区和第一互补区组成;所述第一互补区与所述第一通用区的3’末端连接,所述第一互补区与第一序列完全互补;所述第一序列为待测SNP位点所在序列中的一段序列,处于待测SNP位点的3’端,所述第一序列的5’末端距待测SNP位点1至7bp;所述第一通用区不与第二序列互补;所述第二序列为待测SNP位点所在序列上与第一序列的3’末端连接的一段序列。
优选的,所述第一互补区序列长度在6至12bp之间。
优选的,所述第一通用区序列长度在6至16bp之间。
优选的,所述第一序列的5’末端距待测SNP位点1至3bp。
优选的,所述引物组还包括第二引物对,所述第一引物对和第二引物对分别为对待测SNP位点所在序列进行扩增的内引物对和外引物对。
优选的,所述第一引物对中的下游引物由第二通用区和第二互补区组成;
所述第二互补区与所述第二通用区的3’末端连接;所述第二互补区为待测SNP位点所在序列中的一段序列,处于待测SNP位点的5’端;所述第二通用区与第三序列不同;所述第三序列为待测SNP位点所在序列上与第二互补区的5’末端连接的一段序列。
优选的,所述第一通用区中含有U。
为了更好的实现本发明的目的,本发明还提供了一种锚定引物,所述锚定引物与第四序列完全互补配对;所述第四序列为单链核酸分子,是上述任一种引物组中的第一引物对的扩增产物中的一段序列;所述第四序列包括第一序列和第一归一化区,所述第一归一化区与第一序列的3’末端连接,所述第四序列的5’末端距待测SNP位点1至7bp;或所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1至7bp。
优选的,所述第四序列包括第一序列和第一归一化区,所述第一归一化区与第一序列的3’末端连接,所述第一序列的5’末端距待测SNP位点1至3bp;或
所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1至3bp。
优选的,所述第一互补区序列长度在6至12bp之间。
优选的,所述第一归一化区的长度在8至14bp之间。
为了更好的实现本发明的目的,本发明还提供了一种试剂盒,包括上述的任一种引物组。
优选的,所述试剂盒还包括上述的任一种锚定引物。
为了更好的实现本发明的目的,本发明还提供了另一种试剂盒,包括上述的任一种锚定 引物。
优选的,所述试剂盒还包括上述的任一种引物组。
为了更好的实现本发明的目的,本发明还提供了一种文库构建方法,包括以下步骤:
A、利用上述的任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子。
优选的,所述步骤B为:
将含待测SNP位点的扩增产物直接与接头连接,形成待测序文库分子,并通过微球可寻址的固定在固相载体上。
更优选的,所述步骤B包括以下步骤:
B1、将含待测SNP位点的扩增产物直接与固定在微球上的接头连接,形成固定在微球上的待测序文库分子;
B2、将步骤B1所得微球可寻址的固定在固相载体上。
更优选的,所述步骤B1中的连接反应是在切割-连接反应体系中进行的,所述切割-连接反应体系包括:连接酶、断裂剂、固定在微球上的第一接头和连接缓冲液;
所述第一通用区中含有U;
所述断裂剂用于特异性切割U,并使含待测SNP位点的扩增产物形成第一粘性末端;
所述第一接头为核酸分子,含有与第一粘性末端完全互补配对的第二粘性末端。
更优选的,步骤B1中所述连接缓冲液中含有PEG。
为了更好的实现本发明的目的,本发明还提供了一种基因测序方法,包括以下步骤:
A、利用上述的任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子;
C、利用上述的任一种锚定引物,对待测序文库分子进行测序,获得待测SNP位点的序列信息。
优选的,所述步骤C中的测序方法为连接测序法;所述步骤B、C之间还包括步骤D,采用上述任一种锚定引物,利用无荧光标记的寡核苷酸探针替换有荧光标记的寡核苷酸探针,对待测序文库分子进行一次连接测序反应。
优选的,所述步骤D包括以下步骤:
D1、将所述锚定引物锚定在待测序文库分子上;
D2、加入无荧光标记的寡核苷酸探针、连接酶以及相应的缓冲液,进行连接反应;
D3、变性去除连接产物。
由上可知,本发明通过对待测SNP位点的扩增引物进行特殊设计,使得通过该扩增引物扩增的产物中的待测SNP位点附近均包括第一通用区,降低不同待测SNP位点附近区域的复杂度,然后基于第一通用区与待测SNP位点之间的序列,可设计相应的锚定引物,使锚定引物准确的锚定在目标位置,本发明的引物组、锚定引物、试剂盒、文库构建及基因测序方法适用于多种不同的SNP位点的检测,尤其适用于多SNP位点的同时检测,能在保证SNP位点检测效率的前提下,避免因为待测SNP位点附近序列的特殊性而导致的测序结果不准确或测序失败现象的出现。
附图说明
图1是本发明第一典型实施例中第一引物对中的上游引物与待测SNP位点所在序列之间的关系示意图。
图2是本发明一实施例中第一引物对中的下游引物与待测SNP位点所在序列之间的关系示意图。
图3是本发明第二典型实施例中的锚定引物与第一引物对扩增产物之间的关系示意图。
图4是本发明第一具体实施例中第一接头的结构示意图。
图5是本发明第一具体实施例中第一接头的标签序列与样本来源和SNP位点的对应关系图。
图6是本发明第一具体实施例中实验组测序流程图。
图7是本发明第一具体实施例中第一对比实验的测序流程图。
图8是本发明第一具体实施例中样本2的初步实验结果对照图。
图9是本发明第三对比实验中rs671和rs1801253位点的测序采图结果。
图10是本发明第一具体实施例中rs671和rs1801253位点的测序采图结果。
图11是本发明第三对比实验中rs1799853位点的测序采图结果。
图12是本发明第一具体实施例中rs1799853位点的测序采图结果。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。
本发明提出第一典型实施例,一种引物组,包括第一引物对,如图1所示,所述第一引物对中的上游引物由第一通用区和第一互补区组成;所述第一互补区与所述第一通用区的3’末端连接,所述第一互补区与第一序列完全互补;所述第一序列为待测SNP位点所在序列中的一段序列,处于待测SNP位点的3’端,所述第一序列的5’末端距待测SNP位点1至7bp;所述第一通用区不与第二序列互补;所述第二序列为待测SNP位点所在序列上与第一序列的3’末端连接的一段序列。
需要说明的是,所述引物组可用于对待测SNP位点所在序列进行扩增,所得扩增产物可用于构建含待测SNP位点的测序文库,进而进行待测SNP位点的高通量测序检测。本发明通过对第一引物对中的上游引物的特殊设计,使得基于第一引物对的扩增产物中,待测SNP位点附近增加了一个含第一通用区的序列。所述第一通用区的序列既不是待测样本的模板分子中待测SNP位点附近的序列,也不是待测样本的模板分子中待测SNP位点附近的互补序列。所述第一通用区使得针对待测SNP位点的扩增产物的一端增加了一个归一化序列。所述第一通用区不与第二序列互补,无重复序列,自身不会形成发夹。本方案尤其适用于待测样本中的模板分子上的待测SNP位点附近序列结构复杂的待测样本,能够有效简化待测SNP位点附近的序列结构,降低锚定引物的设计难度,可有效保证锚定引物准确的锚定在目标位置,避免因为待测SNP位点附近序列的特殊性而导致的测序结果不准确或测序失败现象的出现,适用于各类型待测SNP位点的检测。针对不同的待测SNP位点,该第一通用区序列可相同。
另外,第一序列的5’末端距待测SNP位点1至7bp,使得基于所述第一引物对扩增产物所得待测序文库分子进行高通量基因测序时,锚定引物的末端与待测SNP位点之间的距离在1至7bp之间,当利用连接测序法和T4连接酶进行测序时,可保证仅通过一次锚定引物与特定位置含荧光标记的探针的连接,即可实现对待测SNP位点的检测,提高高通量基因测序的效率。
在本发明的一个实施例中,所述第一序列的5’末端距待测SNP位点1至3bp。
本方案可有效控制待测SNP位点与第一通用序列之间的间距,使得锚定引物中含有一段足够长度的第一通用序列或第一通用序列的互补序列;这在同时检测多个待测SNP位点时,既保证各锚定引物均能准确锚定在各自目标位置上,又使得各锚定引物与目标位置结合情况更为一致,归一化,减小因为各锚定引物与目标位置结合情况的差异,从而减小各待测SNP位点检测信号的差异,提高测序的准确性。
为了保证利用第一引物对进行扩增所得产物的纯度,避免非特异性扩增,本发明在第一典型实施例的基础上,提出一实施例,所述第一互补区序列长度在6至12bp之间。
本方案保证了第一引物对中的上游引物与含待测SNP位点的模板序列的特异性集合,避 免了非特异性扩增,又保证了不同待测SNP位点附近序列之间具有足够的区分度。更优选的,所述第一互补区序列长度在7至10bp之间。
为了进一步保证降低了不同待测SNP位点附近区域的序列复杂度,本发明在上述实施例的基础上提出另一实施例,所述第一通用区序列长度在6至16bp之间。
本方案中,第一通用序列较大程度的避免了因为序列过长或序列过短导致的第一通用序列内部形成二级结构,或第一通用序列与其他序列之间形成二级结构的可能性,降低了锚定引物的设计难度。
在上述任一实施例的基础上,本发明提出另一实施例,所述引物组还包括第二引物对,所述第一引物对和第二引物对分别为对待测SNP位点所在序列进行扩增的内引物对和外引物对。
本方案中,针对待测SNP位点,可利用内引物对和外引物对对待测SNP位点所在序列进行巢式PCR扩增,从而提高第一引物对扩增产物中目标分子的纯度。
为了进一步降低利用上述引物组扩增所得产物之间的结构差异,基于上述任一实施例,本发明提出另一具体实施例,如图2所示,所述第一引物对中的下游引物由第二通用区和第二互补区组成;所述第二互补区与所述第二通用区的3’末端连接;所述第二互补区为待测SNP位点所在序列中的一段序列,处于待测SNP位点的5’端;所述第二通用区与第三序列不同;所述第三序列为待测SNP位点所在序列上与第二互补区的5’末端连接的一段序列。
需要说明的是,所述第二通用区的序列既不是待测样本的模板分子中待测SNP位点附近的序列,也不是待测样本的模板分子中待测SNP位点附近的互补序列。所述第二通用区中无重复序列,自身不会形成发夹。基于本方案的第一引物对扩增所得产物,在待测SNP位点两端各有一通用序列,针对不同的待测SNP位点,该第一通用区和第二通用区序列可均相同;从而降低了不同待测SNP位点附近区域的序列复杂度。同时也使得针对各待测SNP位点的扩增效率更为一致。
基于上述任一实施例,为了使得基于第一引物对获得的扩增产物能够更快的完成测序文库的构建,本发明提出一实施例,所述第一通用区中含有可断裂位点。所述可断裂位点能被切割剂特异性切割,使得基于第一引物对获得的扩增产物形成第一粘性末端。
本方案中,基于第一通用区中的可断裂位点,扩增产物被切出第一粘性末端后,可与含有与第一粘性末端互补配对的第二粘性末端的接头分子直接连接,提高连接效率。
优选的,所述可断裂位点为核糖核苷酸、RNA序列或限制性内切酶酶切位点。相应的,所述断裂剂为RNase H、RNase H或限制性内切酶。
当所述可断裂位点为U时,所述断裂剂优选为USER酶。
优选的,所述可断裂位点与其所在链5’末端的距离为1至6个碱基。此时,第一粘性末端与接头的第二粘性末端的互补配对连接反应效率较高,更优选为4或5bp,此时连接反应的效率最高。
为了更好的实现本发明的目的,本发明提出第二典型实施例,如图3所示,一种锚定引物,所述锚定引物与第四序列完全互补配对;所述第四序列为单链核酸分子,是上述任一种引物组中的第一引物对的扩增产物中的一段序列;如图3a所示,所述第四序列包括第一序列和第一归一化区,所述第一归一化区与第一序列的3’末端连接,所述第四序列的5’末端距待测SNP位点1至7bp;或如图3b所示,所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1至7bp。
需要说明的是,所述第一归一化区为第一通用区的互补序列中与第一序列的3’末端连接一段序列;所述第二归一化区为第一通用区中与第一互补区的5’端连接的一段序列。本方案中的锚定引物基于上述任一方案中引物组的第一引物对的扩增产物构建的测序文库而设计,针对不同的待测SNP位点,这些锚定引物含有一个共同第一归一化序列或第二归一化序列,它们能准确锚定在目标位置上,避免因为待测SNP位点附近序列的特殊性而导致的测序结果不准确或测序失败现象的出现,又使得各锚定引物与目标位置结合情况更为一致、归一化,减小因为各锚定引物与目标位置结合情况的差异,从而减小各待测SNP位点检测信号的差异,提高测序的准确性。
基于上述实施例,本发明提出另一实施例,所述第四序列包括第一序列和第一归一化区,所述第一归一化区与第一序列的3’末端连接,所述第一序列的5’末端距待测SNP位点1至3bp;或所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1至3bp。
本方案中,锚定引物上用于延伸的一端与待测SNP位点的距离为1至3bp,当使用连接测序技术进行测序时,可保证仅通过一次锚定引物与特定位置含荧光标记的探针的连接,即实现对待测SNP位点的检测,提高高通量基因测序的效率,且1至3bp在T4连接酶的保真度范围(1至7bp)中处于高保真的范围,可有效提高待测SNP位点的检测准确度;当使用合成测序技术进行测序时,所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1bp,此时仅需要延伸1个bp,即进行1个循环的反应,即可实现对待测SNP位点的检测,检测效率高。
在本发明的一个实施例中,所述第一互补区序列长度在6至12bp之间。本方案进一步保 证了锚定引物与待测SNP位点附近的目标位置结合的特异性。更优选的,所述第一互补区序列长度在7至10bp之间。
在本发明的一个实施例中,所述第一归一化区的长度在8至14bp之间。本方案保证了针对不同待测SNP位点设计的锚定引物与目标位置的结合效率的一致性。
为了更好的实现本发明的目的,本发明提出第三典型实施例,一种试剂盒,包括上述的任一种引物组。
本方案中的试剂盒可为扩增试剂盒、文库构建试剂盒或测序试剂盒。当所述试剂盒为扩增试剂盒时,其还可包括PCR扩增所需的其他试剂;当所述试剂盒为文库构建试剂盒时,其还可包括文库构建过程中所需的其他试剂,也可包括PCR扩增所需的试剂;当所述试剂盒为测序试剂盒时,其可包括测序过程中所需的其他试剂,也可包括PCR扩增所需的试剂,也可包括文库构建过程中所需的试剂。
在本发明的一个实施例中,所述试剂盒还包括上述的任一种锚定引物。
为了更好的实现本发明的目的,本发明提出第四典型实施例,一种试剂盒,包括上述的任一种锚定引物。
本方案中的试剂盒一般为测序试剂盒,其可包括测序过程中所需的其他试剂,也可包括PCR扩增所需的试剂,也可包括文库构建过程中所需的试剂。
在本发明的一个实施例中,所述试剂盒还包括上述的任一种引物组。
为了更好的实现本发明的目的,本发明提出第五典型实施例,一种文库构建方法,包括以下步骤:
A、利用上述的任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子。
需要说明的是,本方案中的待测序文库分子是基于上述的任一种引物组对待测样本进行PCR扩增后所得的扩增产物获得的,因此,在待测SNP位点附近增加了一个含第一通用区的序列。所述第一通用区不与第二序列互补,无重复序列,自身不会形成发夹。因此能够有效简化待测SNP位点附近的序列结构,降低锚定引物的设计难度,尤其适用于待测样本中的模板分子上的待测SNP位点附近序列结构复杂的待测样本,避免因为待测SNP位点附近序列的特殊性而导致的测序结果不准确或测序失败现象的出现,适用于各类型待测SNP位点的检测。针对不同的待测SNP位点,该第一通用区序列可相同。
另外,对于步骤B中,含待测SNP位点的扩增产物与接头的连接,以下将从多个方面进 行进一步阐述。
在本发明的一个实施例中,含待测SNP位点的扩增产物中仅有一端与接头连接,进而形成待测序文库分子。该连接端可为含待测SNP位点的扩增产物的任一端。
在本发明的另一个实施例中,含待测SNP位点的扩增产物的两端均与接头连接,进而形成待测序文库分子。
需要说明的是,两端连接的接头的序列可相同或不同。
优选的,两端连接的接头的序列不同,这样可基于这两个不同的接头对待测序文库分子进行扩增验证,验证库分子结构是否正确,也可基于这两个不同的接头对待测序文库分子进行单分子扩增,然后对单分子扩增产物进行第二代高通量基因测序,从而获得待测SNP位点的序列信息。
基于上述任一种实施例,本发明提出另一实施例,所述步骤B为:
将含待测SNP位点的扩增产物直接与接头连接,形成待测序文库分子,并通过微球可寻址的固定在固相载体上。
本方案中,含待测SNP位点的扩增产物直接与接头连接,可有效减少实验步骤,提高实验效率。
需要说明的是所述步骤B可有多种实施方案,以下将通过多个实施例来进行说明。
在本发明的一个实施例中,所述步骤B包括以下步骤:
B1、将含待测SNP位点的扩增产物直接与固定在微球上的接头连接,形成固定在微球上的待测序文库分子;
B2、将步骤B1所得微球可寻址的固定在固相载体上。
在本发明的一个实施例中,所述步骤B包括以下步骤:
B1、将将含待测SNP位点的扩增产物直接与接头连接,然后将连接产物固定在微球上,得固定在微球上的待测序文库分子;
B2、将步骤B1所得微球可寻址的固定在固相载体上。
需要说明的是,在上述两个实施例中,所述接头上含有修饰标记,用于使其固定在微球上。所述修饰标记可为生物素、亲和素、链霉亲和素、抗原、抗体、受体、配体、多聚组氨酸、纳米金、碘乙酰、巯基、氨基、醛基、羧基、异硫氰基、硅烷基或丙烯酰胺,它们均能特异性的与相对应的基团或分子结合。
所述可寻址的固定,是指能够确定位置信息的固定。即,固定载体上每一具体位置上所固定的待测序文库分子与其它具体位置上所固定的待测序文库分子是能够明确区分的。
上述实施例中,所述接头可采用多种形式,包括但不限于平末端接头、突出末端接头、分叉接头或含茎环结构的接头。
所述平末端接头是指双链之间完全互补配对的双链核酸接头。优选的,所述平末端接头的两条链的5’末端均不含磷酸基团,其可避免连接过程中接头自连现象的出现,减少对后续测序实验的干扰,提高基因检测的准确率。
所述突出末端接头,是指双链核酸分子中的至少一端带有突出核苷酸序列,而其余核苷酸则完全互补的双链核酸接头。突出末端接头可为单突出末端,也可以是含有两个突出末端的双突出末端,这两个突出末端可在一条核苷酸链上或在不同的核苷酸链上。所述突出末端接头能够避免连接过程中接头自连现象的出现,减少对后续测序实验的干扰,提高基因检测的准确率。在本实施例的具体实施方式中,所述接头优选为单突出末端接头,且该突出末端为其所在链的3’末端,碱基为T;该接头能够与通过Taq酶扩增的含A尾的PCR扩增产物直接连接,提高连接效率。
所述分叉型接头包括互补区和分叉区,互补区双链的核苷酸互补配对,配对的核苷酸对数不限。互补区末端可为平末端或突出末端。所述分叉型接头能够避免连接过程中接头自连现象的出现,减少对后续测序实验的干扰,提高SNP分型检测的准确率。在本实施例的具体实施方式中,所述接头优选为互补区的3’末端为突出末端,且突出末端最后一个碱基为T的T末端分叉接头;该接头能够与通过Taq酶扩增的含A尾的PCR扩增产物直接连接,提高连接效率。
所述带茎环结构的接头,有多种实施方案。在一实施例中,该接头为单链核酸分子,该单链核酸分子依次包括第一互补配对区、茎环区和第二互补配对区,第一互补配对区能够与第二互补配对区完全互补配对。在另一实施例中,带茎环结构的接头还可带有突出末端,该突出末端可位于单链核酸分子的3’端。突出末端4的存在能够防止接头自连现象的发生,减少对后续测序实验的干扰,提高SNP分型检测的准确率。该突出末端优选为T;该接头能够与通过Taq酶扩增的含A尾的PCR扩增产物直接连接,提高连接效率。
上述两个实施例中,接头被预先固定在微球上的实施例减少了连接产物与微球连接的步骤,实验效率更高。
因为固定有接头的微球的密度可能会与步骤B反应体系中的其他成分的密度不同,所以,在连接过程中,使整个反应体系周期性震荡,可有效提高固定在微球上的接头与步骤A所得扩增产物的连接效率,避免反应体系中各成分分层而引起的连接效率低现象的出现。
另外,在本发明中,因为步骤A所得的含待测SNP位点的扩增产物直接与步骤B中的接头 连接,即不经纯化,直接进行连接反应,这使得反应体系较大,DNA分子浓度较低,连接效率不高。在本发明的一个优选实施例中,在所述步骤B的反应体系中加入PEG。
因为PEG既能够提高反应体系中发生反应的分子的有效密度,提高接头与相应的扩增产物之间接触的概率;又能提高反应体系的密度,防止固定有接头的微球沉降;本方案从两个方面有效提高了固定在微球上的接头与相应的扩增产物的连接效率。本方案中,PEG的具体浓度可根据所述步骤B中微球的密度和原反应体系的密度来计算,以使微球的密度与加入PEG后的反应体系的密度基本相同为宜。
基于上述实施例,本发明又提出一实施例,所述步骤B1中的连接反应是在切割-连接反应体系中进行的,所述切割-连接反应体系包括:连接酶、断裂剂、固定在微球上的第一接头和连接缓冲液;所述第一通用区中含有可断裂位点;所述可断裂位点能被切割剂特异性切割,使得基于第一引物对获得的扩增产物形成第一粘性末端;所述第一接头为核酸分子,含有与第一粘性末端完全互补配对的第二粘性末端。
本方案中,基于第一通用区中的可断裂位点,扩增产物被断裂剂切出第一粘性末端后,可与含有第二粘性末端的第一接头直接连接,提高连接效率。
优选的,所述可断裂位点为核糖核苷酸、RNA序列或限制性内切酶酶切位点。相应的,所述断裂剂为RNase H、RNase H或限制性内切酶。
当所述可断裂位点为U时,所述断裂剂优选为USER酶。
为了更好的实现本发明的目的,本发明提出第六典型实施例,一种基因测序方法,包括以下步骤:
A、利用上述任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子;
C、利用上述任一种锚定引物,对待测序文库分子进行测序,获得待测SNP位点的序列信息。
需要说明的是,上述步骤A、B为文库构建步骤,其可采用上述的任一种文库构建方法进行文库构建。以下主要通过对步骤C进行进一步阐述说明。步骤C中的测序方法可为连接测序法也可为合成测序法。
本方案在测序过程中,锚定引物能准确锚定在目标位置上,避免因为待测SNP位点附近序列的特殊性而导致的测序结果不准确或测序失败现象的出现,又使得各锚定引物与目标位置结合情况更为一致、归一化,减小因为各锚定引物与目标位置结合情况的差异,从而减小各待测SNP位点检测信号的差异,提高测序的准确性。
在本发明的一个实施例中,所述步骤C中的测序方法为连接测序法,所述步骤B、C之间还包括步骤D,采用权利要求7至11中的任一种锚定引物,利用无荧光标记的寡核苷酸探针替换有荧光标记的寡核苷酸探针,对待测序文库分子进行一次连接测序反应。
需要说明的是,所述无荧光标记的寡核苷酸探针与连接测序技术中所使用的有荧光标记的寡核苷酸探针相比,序列完全相同,差别仅在于有无荧光标记;即,所述无荧光标记的寡核苷酸探针序列为(N-N-N……-N)n,所述N为A、G、C或T,n为正整数。
每一次连接测序反应均包括以下步骤:锚定引物锚定,冲洗(除去多余的未锚定引物),探针连接,冲洗(除去多余探针、连接酶等),采图(获得探针上荧光标记所对应位置的序列信息),变性洗脱连接产物(以便下一次连接测序反应中锚定引物的锚定)。步骤C中的连接测序反应,因为使用的是无荧光标记的寡核苷酸探针,因此可不进行采图步骤,以减少实验步骤,提高实验效率。当然,如果进行采图步骤,可验证该次使用的荧光探针是否是无荧光标记的,起到一个再次确认的效果。
虽然,从理论上来说,步骤D中锚定引物与无荧光标记的寡核苷酸案子会被洗脱去除,无法起到封闭的作用,但是,本申请发明人在具体实验中对比发现,在连接测序前,增加一个使用无荧光标记的寡核苷酸探针替换有荧光标记的探针进行一次连接测序反应的步骤,有效的减少在后续的连接测序中出现的错误信号,从而减少由测序初始结果到最终的序列信息的分析过程中的干扰信号,提高SNP位点检测的准确性。具体原因可能是非锚定引物的目标结合位点和/或非探针的目标结合位点经过步骤D后被封闭,使得在后续的连接测序实验中,锚定引物和/或探针能够更多更准确的结合在目标结合位点上,从而减少了错误信号的产生,从而提高基因检测过程中正确信号的比例,提高了对SNP位点进行检测的准确性。
优选的,所述n在6-10之间;更优选在7-9之间。
更优选的,所述无荧光标记的寡核苷酸探针与步骤D中的有荧光标记的寡核苷酸探针序列完全相同,差别仅在于有无荧光标记。本方案可有效降低无荧光标记的寡核苷酸探针的设计难度。
在本发明的一个实施例中,所述步骤D包括以下步骤:
D1、将所述锚定引物锚定在待测序文库分子上;
D2、加入无荧光标记的寡核苷酸探针、连接酶以及相应的缓冲液,进行连接反应;
D3、变性去除连接产物。
对于步骤D3中变性去除连接产物需要说明的是,步骤D3所述连接产物为锚定引物与无荧光标记的寡核苷酸探针的连接产物,为单链核酸分子,可与待测序文库分子互补配对。所 述变性既可以通过物理的方法实现,例如提高双链核酸分子(待测序文库分子与连接产物)所处的温度,也可以通过化学的方法实现,例如改变双链核酸分子(待测序文库分子与连接产物)所处的pH值。其中,采用化学方法,只需加入酸性或碱性的试剂即可,无需额外的加热部件,能够更简单有效的实现自动化。
优选的,所述步骤D3为:用0.05M-0.15M NaOH冲洗。
针对上述各技术方案,为进一步说明本发明所记载技术方案的技术效果及优越性,本发明给出下述具体实施例。
在第一具体实施例中,以10个正常人的血液基因组DNA为模板,同时检测CYP2C9基因*2(rs1799853)位点、ALDH2基因rs671位点、ADRB1基因rs1801253位点。
针对这些位点,分别设计了一对内引物和一对外引物。具体如下所述:
rs1799853内引物对为:SEQ ID NO:1和SEQ ID NO:2;rs1799853外引物对为:SEQ ID NO:3和SEQ ID NO:4;rs671内引物对为:SEQ ID NO:5和SEQ ID NO:6;rs671外引物对为:SEQ ID NO:7和SEQ ID NO:8;rs1801253内引物对为:SEQ ID NO:9和SEQ ID NO:10;rs1801253外引物对为:SEQ ID NO:11和SEQ ID NO:12。
一、含待测SNP位点的扩增产物的获得。
1、第一次PCR扩增。
以血液基因组DNA为模板,利用上述的外引物对,分别对各SNP位点所在序列进行扩增,反应体系为:F引物(10μM),0.4μL;R引物(10μM),0.4μL;dNTP(各2.5mM),2μL;血液基因组DNA,50ng;Ex Taq(5U/μL),0.1μL;10×Ex Taq Buffer,2μL;ddH2O加至20μL。
PCR反应条件如下:94℃5min;94℃20s,57℃20s,72℃25s;重复30个循环;72℃3min。
所得产物即为第一次PCR扩增产物。
2、第二次PCR扩增。
以第一次PCR扩增产物为模板,利用上述的内引物对,分别对各SNP位点所在序列进行扩增,反应体系为:F引物(10μM),0.4μL;R引物(10μM),0.4μL;dNTP(各2.5mM),2μL;第一次PCR扩增产物,0.2μL;Ex Taq(5U/μL),0.1μL;10×Ex Taq Buffer,2μL;ddH2O加至20μL。
PCR反应条件如下:95℃5min;94℃20s,40℃20s,57℃25s;重复5个循环;94℃20s,57℃20s,72℃25s;重复30个循环;72℃3min。
所得产物即为第二次扩增产物。
第二次扩增完成后,经琼脂糖凝胶电泳检测,所有样本的第二次扩增产物中均含有目标分子,且凝胶电泳图显示无杂带,条带单一;说明第二次PCR扩增成功获得了所需的PCR扩增产物。
二、待测序文库分子的构建。
1、将第一接头固定在微球上。
为了在同时检测过程中能够区分不同的样本来源,本具体实施例在第一接头上设计了标签序列,第一接头的基本结构如图4所示(SEQ ID NO:13、SEQ ID NO:14),该接头中的NNNN为标签序列,标签序列与样本来源和SNP位点的对应关系如图5所示。
将上述各第一接头分别与带有链霉亲和素修饰的Myone磁珠(Invitrogen)结合,使得上述各第一接头被分别固定在磁珠表面,反应体系及反应过程为:将200ng第一接头与4μL(约4×107个磁珠)Myone磁珠螺旋振荡混匀,反应30min,以适量TE缓冲液(10mM Tris-HCl,pH8.0;1mM EDTA)清洗两次,离心分离,将得到的磁珠以4μL结合缓冲液(10mM Tris-HCl,pH7.5;1mM EDTA;1M NaCl;0.01%Triton X-100)重悬保存,得固定有第一接头的磁珠。
2、第二次PCR扩增产物与固定在磁珠上的第一接头的连接。
按按表2中的对应关系分别配置下述反应体系:步骤一所得扩增产物20μL;USER酶(1U/μL,NEB,Cat#M5505S),10μL;缓冲液,8μL;T4DNA连接酶,2μL;固定有第一接头的磁珠,0.4μL;加ddH2O至40μL。
其中,所述缓冲液为含400mM Tris、100mM MgCl2、100mM DTT、5mM ATP、25%PEG 6000,pH值7.8的溶液。
25℃反应20分钟,得连接产物,即待测序文库分子。
反应结束后,将30管连接产物合并成1管,2500g离心3min,磁铁吸附磁珠,去上清,用50μL TE洗涤二次,并最终悬浮于50μL TE中,从而将待测序文库分子固定在磁珠上。
3、固定在磁珠上的待测序文库分子的可寻址固定。
将步骤2所得产物点样至异硫氰基修饰的载样片(玻片),于37℃固定1h,即完成固定在磁珠上的待测序文库分子的可寻址固定。
三、测序。
对步骤二所得的固定在固相载体上的待测序文库分子进行测序。
在本具体实施例中采用深圳华因康基因科技有限公司的高通量基因测序仪Pstar II A测序仪,并采用连接测序法进行测序。测序过程中,
针对rs1799853位点,所使用的锚定引物为SEQ ID NO:15;针对rs671位点,所使用的 锚定引物为SEQ ID NO:16;针对rs1801253位点,所使用的锚定引物为SEQ ID NO:17。
另外,为了区分不同的样本来源,还需对第一接头上的标签序列进行检测,用于检测标签序列的锚定引物为SEQ ID NO:18。
需要说明的是,SEQ ID NO:15-18的的5’端均被磷酸化修饰。
整体的测序顺序如图6所示。其中,待测SNP位点anchor混合物为SEQ ID NO:15-17这3种锚定引物按相同摩尔数混合所得的混合物。其中,Base1用于封闭锚定引物或探针与待测序文库分子上的非特异性结合位点,Base2用于检测待测SNP位点,Base3-6用于检测标签序列。
另外,本实施例做了第一对比实验,同样步骤二所得产物,以图7中的测序顺序进行测序。即,未进行图6中的Base1。
图8示出了上述具体实施例中样本2的待测SNP位点——rs1799853、rs671、rs1801253在实验组和第一对比组中的初步实验结果。其中,R=A/G,Y=C/T,M=A/C,K=G/T,S=C/G,W=A/T,H=A/C/T,B=C/G/T,V=A/C/G,D=A/G/T,N=A/C/G/T;n表示信号太弱,无法判断是A、C、G和T中的哪一个。
由实验结果可知,相同样本在实验组和第一对比组中占比最高的SNP位点序列类型均相同,但是它们的占比有明显区别;实验组中占比最高的SNP位点序列类型检出数均在95%以上,而第一对比组中占比最高的SNP位点序列类型检出数基本在85%左右。
另外,本发明人还将第二次扩增产物通过Sanger测序进行验证,作为第二对比实验,结果显示,上述实验组和第一对比组中占比最高的SNP位点序列类型均与Sanger测序结果完全一致。
为了证明本发明的引物组、锚定引物的技术效果,本实验发明人还做了第三对比实验,同样以上述具体实施例中的正常人的血液基因组DNA为模板,同时检测CYP2C9基因*2(rs1799853)位点、ALDH2基因rs671位点、ADRB1基因rs1801253位点。
针对这些位点,分别设计了一对内引物和一对外引物。具体如下所述:
rs1799853内引物对为:SEQ ID NO:19和SEQ ID NO:20;rs1799853外引物对为:SEQ ID NO:3和SEQ ID NO:4;rs671内引物对为:SEQ ID NO:21和SEQ ID NO:22;rs671外引物对为:SEQ ID NO:7和SEQ ID NO:8;rs1801253内引物对为:SEQ ID NO:23和SEQ ID NO:24;rs1801253外引物对为:SEQ ID NO:11和SEQ ID NO:12。
一、含待测SNP位点的扩增产物的获得。
1、第一次PCR扩增
以血液基因组DNA为模板,利用上述的外引物对,分别对各SNP位点所在序列进行扩增,反应体系为:F引物(10μM),0.4μL;R引物(10μM),0.4μL;dNTP(各2.5mM),2μL;血液基因组DNA,50ng;Ex Taq(5U/μL),0.1μL;10×Ex Taq Buffer,2μL;ddH2O加至20μL。
PCR反应条件如下:94℃5min;94℃20s,57℃20s,72℃25s;重复30个循环;72℃3min。
所得产物即为第一次PCR扩增产物。
2、第二次PCR扩增
以第一次PCR扩增产物为模板,利用上述的内引物对,分别对各SNP位点所在序列进行扩增,反应体系为:F引物(10μM),0.4μL;R引物(10μM),0.4μL;dNTP(各2.5mM),2μL;第一次PCR扩增产物,0.2μL;Ex Taq(5U/μL),0.1μL;10×Ex Taq Buffer,2μL;ddH2O加至20μL。
PCR反应条件如下:95℃3min;94℃20s,57℃20s,72℃30s;重复30个循环;72℃3min。
所得产物即为第二次扩增产物。
第二次扩增完成后,经琼脂糖凝胶电泳检测,所有样本的第二次扩增产物中均含有目标分子,且凝胶电泳图显示无杂带,条带单一;说明第二次PCR扩增成功获得了所需的PCR扩增产物。
二、待测序文库分子的构建。
采用上一具体实施例相同的方法构建测序文库并将待测序文库分子可寻址固定。
三、测序。
对步骤二所得的固定在固相载体上的待测序文库分子进行测序。
在第三对比实验中同样采用深圳华因康基因科技有限公司的高通量基因测序仪Pstar II A测序仪,并采用连接测序法进行测序。测序过程中,针对rs1799853位点,所使用的锚定引物为SEQ ID NO:25;针对rs671位点,所使用的锚定引物为SEQ ID NO:26;针对rs1801253位点,所使用的锚定引物为SEQ ID NO:27。
另外,为了区分不同的样本来源,还需对第一接头上的标签序列进行检测,用于检测标签序列的锚定引物为SEQ ID NO:18。
需要说明的是,SEQ ID NO:18、25-27的5’端均被磷酸化修饰。
然后参考上一具体实施例中的方法,更换锚定引物后,按照图6的顺序进行测序;另外,同样以本具体实施例步骤二所得产物,更换锚定引物后,按照图7的顺序进行测序,作为对照。
其中,rs671和rs1801253位点的测序采图结果如图9(实验组和对照组)所示,由图9可知,这两个位点在两次测序过程中均无荧光发生,测序失败;这与上一具体实施例中rs671和rs1801253的测序采图结果(图10实验组和对照组)存在明显区别。而rs1799853位点的实验组测序采图结果如图11所示,与上一具体实施例中的实验组rs1799853位点的测序采图结果(图12)相比,同样存在明显区别,荧光信号明显较弱。rs1799853位点的对照组测序采图结果与上一具体实施例中对照组的测序采图结果,与实验组类似,荧光信号明显较弱。
经本发明人分析,rs671和rs1801253位点的在本具体实施例中,实验组和对照组均测序失败的原因可能是:rs671和rs1801253这两个位点附近的序列中存在复杂结构,导致锚定引物无法锚定,进而无法实现锚定引物与探针的连接,所以采图时无信号。
而rs1799853位点在本具体实施例中实验组和对照组的测序采图结果分别与上一具体实施例中的实验组和对照组的测序采图结果相比,荧光信号都明显较弱。原因可能是本具体实施例中,锚定引物与待测序文库分子的结合效果,较上一具体实施例差。
在本发明的第二具体实施例中,直接用上述实施例中的rs1799853内引物对为:SEQ ID NO:1和SEQ ID NO:2;rs671内引物对为:SEQ ID NO:5和SEQ ID NO:6;rs1801253内引物对为:SEQ ID NO:9和SEQ ID NO:10。对上述样本1-10进行PCR扩增,然后进行后续的步骤二、三,所得实验结果和第一具体实施例结果基本一致,实验组中占比最高的SNP位点序列类型检出数均在93%以上,而对照组中占比最高的SNP位点序列类型检出数基本在80%左右。整体来说,实验组和对照组的检出数和检出总数均较上一实施例低,且无论是在实验组还是在对照组中,占比最高的SNP位点序列类型检出数所占比例有所降低。
需要说明的是,上述具体实施例中,所述使用的引物中,可断裂位点均为U,且均设计在上游引物上,断裂序列(与第一粘性末端对应)为CGGU。当然,也可将可断裂位点设计在下游引物上。所述断裂序列可根据需要进行设计,只要能满足后续的连接实验的要求即可,例如:GCCU、AAAU、CGCU、GCCCU、CGGCU、TTTTU等。随着所述断裂序列的改变,所述第一接头的相关序列需进行相关调整。
另外,上述具体实施例中,测序文库分子构建完成后,均未采用单分子扩增技术进行扩增,而是直接进入到后续的测序步骤,简化了实验步骤。当然,本申请发明人经过实验发现,通过单分子扩增技术对待测序文库分子进行扩增后再进行测序,所得实验结果与上述实验结果基本相同。相对来说,测序采图所得信号会更高一点。
以上,本发明的方法能够高效的同时对多个样品的多个SNP位点同时进行检测,并得到准确可信的结果,并且能够有效的降低连接测序所得结果中的错误信号,提高SNP位点检测 的准确性。
应当说明的是,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (19)

  1. 一种引物组,包括第一引物对,其特征在于,所述第一引物对中的上游引物由第一通用区和第一互补区组成;
    所述第一互补区与所述第一通用区的3’末端连接,所述第一互补区与第一序列完全互补;
    所述第一序列为待测SNP位点所在序列中的一段序列,处于待测SNP位点的3’端,所述第一序列的5’末端距待测SNP位点1至7bp;
    所述第一通用区不与第二序列互补;
    所述第二序列为待测SNP位点所在序列上与第一序列的3’末端连接的一段序列。
  2. 根据权利要求1所述的引物组,其特征在于,所述第一互补区序列长度在6至12bp之间。
  3. 根据权利要求1所述的引物组,其特征在于,所述第一通用区序列长度在6至16bp之间。
  4. 根据权利要求1所述的引物组,其特征在于,所述引物组还包括第二引物对,所述第一引物对和第二引物对分别为对待测SNP位点所在序列进行扩增的内引物对和外引物对。
  5. 根据权利要求1所述的引物组,其特征在于,所述第一引物对中的下游引物由第二通用区和第二互补区组成;
    所述第二互补区与所述第二通用区的3’末端连接;
    所述第二互补区为待测SNP位点所在序列中的一段序列,处于待测SNP位点的5’端;
    所述第二通用区与第三序列不同;
    所述第三序列为待测SNP位点所在序列上与第二互补区的5’末端连接的一段序列。
  6. 根据权利要求1所述的引物组,其特征在于,所述第一通用区中含有U。
  7. 一种锚定引物,其特征在于,所述锚定引物与第四序列完全互补配对;所述第四序列为单链核酸分子,是权利要求1至6中任一种引物组中的第一引物对的扩增产物中的一段序列;
    所述第四序列包括第一序列和第一归一化区,所述第一归一化区与第一序列的3’末端连接,所述第四序列的5’末端距待测SNP位点1至7bp;
    或所述第四序列包括第一互补区和第二归一化区,所述第二归一化区与第一互补区的5’末端连接,所述第四序列的3’末端距待测SNP位点1至7bp。
  8. 根据权利要求7所述的锚定引物,其特征在于,所述第一互补区序列长度在6至12bp 之间。
  9. 根据权利要求7所述的锚定引物,其特征在于,所述第一归一化区的长度在8至14bp之间。
  10. 一种试剂盒,其特征在于,包括权利要求1至6中的任一种引物组。
  11. 一种试剂盒,其特征在于,包括权利要求7至9中的任一种锚定引物。
  12. 一种文库构建方法,其特征在于,包括以下步骤:
    A、利用权利要求1至6中的任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
    B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子。
  13. 根据权利要求12所述的文库构建方法,其特征在于,所述步骤B为:
    将含待测SNP位点的扩增产物直接与接头连接,形成待测序文库分子,并通过微球可寻址的固定在固相载体上。
  14. 根据权利要求13所述的文库构建方法,其特征在于,所述步骤B包括以下步骤:
    B1、将含待测SNP位点的扩增产物直接与固定在微球上的接头连接,形成固定在微球上的待测序文库分子;
    B2、将步骤B1所得微球可寻址的固定在固相载体上。
  15. 根据权利要求14所述的文库构建方法,其特征在于,所述步骤B1中的连接反应是在切割-连接反应体系中进行的,所述切割-连接反应体系包括:连接酶、断裂剂、固定在微球上的第一接头和连接缓冲液;
    所述第一通用区中含有U;
    所述断裂剂用于特异性切割U,并使含待测SNP位点的扩增产物形成第一粘性末端;
    所述第一接头为核酸分子,含有与第一粘性末端完全互补配对的第二粘性末端。
  16. 根据权利要求15所述的文库构建方法,其特征在于,步骤B1中所述连接缓冲液中含有PEG。
  17. 一种基因测序方法,其特征在于,包括以下步骤:
    A、利用权利要求1至6中的任一种引物组,对待测样本进行PCR扩增,得含待测SNP位点的扩增产物;
    B、将含待测SNP位点的扩增产物与接头连接,形成待测序文库分子;
    C、利用权利要求7至9中的任一种锚定引物,对待测序文库分子进行测序,获得待测SNP位点的序列信息。
  18. 根据权利要求17所述的基因测序方法,其特征在于,所述步骤C中的测序方法为连接测序法;所述步骤B、C之间还包括步骤D,采用权利要求7至9中的任一种锚定引物,利用无荧光标记的寡核苷酸探针替换有荧光标记的寡核苷酸探针,对待测序文库分子进行一次连接测序反应。
  19. 根据权利要求18所述的基因测序方法,其特征在于,所述步骤D包括以下步骤:
    D1、将所述锚定引物锚定在待测序文库分子上;
    D2、加入无荧光标记的寡核苷酸探针、连接酶以及相应的缓冲液,进行连接反应;
    D3、变性去除连接产物。
PCT/CN2016/086973 2015-12-30 2016-06-24 引物组、锚定引物、试剂盒、文库构建及基因测序方法 WO2017113655A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511007119.2A CN106929507A (zh) 2015-12-30 2015-12-30 引物组、锚定引物、试剂盒、文库构建及基因测序方法
CN201511007119.2 2015-12-30

Publications (1)

Publication Number Publication Date
WO2017113655A1 true WO2017113655A1 (zh) 2017-07-06

Family

ID=59224369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086973 WO2017113655A1 (zh) 2015-12-30 2016-06-24 引物组、锚定引物、试剂盒、文库构建及基因测序方法

Country Status (2)

Country Link
CN (1) CN106929507A (zh)
WO (1) WO2017113655A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111139315A (zh) * 2020-04-03 2020-05-12 杭州启棣生物技术有限公司 一种利用二代测序高通量检测呼吸道病毒的方法与应用
CN114807331A (zh) * 2022-05-12 2022-07-29 中国海洋大学 一种短链dna的纳米孔测序方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108018348A (zh) * 2016-11-01 2018-05-11 广州康昕瑞基因健康科技有限公司 Aldh2基因检测试剂盒及检测方法
CN108193284A (zh) * 2018-01-15 2018-06-22 武汉爱基百客生物科技有限公司 一种高效快速的均一化全长cDNA文库构建方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103937896A (zh) * 2014-04-25 2014-07-23 深圳华因康基因科技有限公司 一种snp分型方法及试剂盒
CN103981258A (zh) * 2014-04-25 2014-08-13 深圳华因康基因科技有限公司 一种易感基因snp位点检测方法及试剂盒

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103937896A (zh) * 2014-04-25 2014-07-23 深圳华因康基因科技有限公司 一种snp分型方法及试剂盒
CN103981258A (zh) * 2014-04-25 2014-08-13 深圳华因康基因科技有限公司 一种易感基因snp位点检测方法及试剂盒

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111139315A (zh) * 2020-04-03 2020-05-12 杭州启棣生物技术有限公司 一种利用二代测序高通量检测呼吸道病毒的方法与应用
CN114807331A (zh) * 2022-05-12 2022-07-29 中国海洋大学 一种短链dna的纳米孔测序方法

Also Published As

Publication number Publication date
CN106929507A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
KR102326769B1 (ko) 합성 핵산 스파이크-인
US7883848B2 (en) Regulation analysis by cis reactivity, RACR
CN104480534B (zh) 一种建库方法
US20110245101A1 (en) Co-localization affinity assays
JP6925424B2 (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
WO2017113655A1 (zh) 引物组、锚定引物、试剂盒、文库构建及基因测序方法
WO2011143583A1 (en) Binding assays for markers
WO2019144582A1 (zh) 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
BR112021006038A2 (pt) Complexos de stranspossomas ligados à superfície do complexo
CN106834427A (zh) 一种snp分型方法及试剂盒
JP2015516814A (ja) 標的化されたdnaの濃縮および配列決定
WO2018040961A1 (zh) 一种建库方法及snp分型方法
CN105886607B (zh) 一种mthfr基因检测方法及试剂盒
WO2006095550A1 (ja) Pcrプライマー、それを利用したpcr法及びpcr増幅産物、並びにpcr増幅産物を利用するデバイス及びdna-タンパク複合体
ES2942546T3 (es) Métodos de alta sensibilidad para la cuantificación paralela precisa de ácidos nucleicos
US20100035769A1 (en) Biomolecule assay chip
CN113493932A (zh) 一种构建高检测性能捕获文库的方法和试剂盒
US20210087613A1 (en) Methods and compositions for identifying ligands on arrays using indexes and barcodes
CN104726548A (zh) 一种基于杂交链式反应的dna、rna或蛋白质检测探针、检测方法及试剂盒
US11898202B2 (en) Methods for accurate parallel quantification of nucleic acids in dilute or non-purified samples
CN115698318A (zh) 用于邻近检测测定的对照
JP2005304489A (ja) 標的物質検出用プローブセット及び標的物質検出方法。
Hamidi et al. Simple in-vitro single stranded linear and circular DNA preparation and validation via SELEX using phosphor-derived modifications
US20230116205A1 (en) Multiplexed colocalization-by-linkage assays for the detection and analysis of analytes
EP3519571B1 (en) Compositions, methods and systems for identifying candidate nucleic acid agent

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880460

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880460

Country of ref document: EP

Kind code of ref document: A1