CN111020019B - Method for gene fusion detection based on nanopore technology - Google Patents

Method for gene fusion detection based on nanopore technology Download PDF

Info

Publication number
CN111020019B
CN111020019B CN202010149484.1A CN202010149484A CN111020019B CN 111020019 B CN111020019 B CN 111020019B CN 202010149484 A CN202010149484 A CN 202010149484A CN 111020019 B CN111020019 B CN 111020019B
Authority
CN
China
Prior art keywords
gene
dna
fusion
sequence
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010149484.1A
Other languages
Chinese (zh)
Other versions
CN111020019A (en
Inventor
王伟伟
宋蕾
孙雪
田埂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geneis Beijing Co ltd
Original Assignee
Geneis Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geneis Beijing Co ltd filed Critical Geneis Beijing Co ltd
Priority to CN202010149484.1A priority Critical patent/CN111020019B/en
Publication of CN111020019A publication Critical patent/CN111020019A/en
Application granted granted Critical
Publication of CN111020019B publication Critical patent/CN111020019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for gene fusion detection based on a nanopore technology. The method provided by the invention captures the fused long fragments in the enrichment repetitive sequence region through a probe, and then sequences the enriched long fragments based on a nanopore technology. The method can overcome the defects of large analysis difficulty, missing detection and misjudgment of gene fusion caused by long fragment reading length and fusion breakpoint positioned in a repetitive sequence, realizes accurate positioning and quick real-time detection of the fusion gene, greatly improves the accuracy of a detection result, and establishes a set of rapid fusion gene detection method based on nanopore sequencing on a clinical level.

Description

Method for gene fusion detection based on nanopore technology
Technical Field
The invention relates to gene sequencing, in particular to a method for gene fusion detection based on a nanopore technology.
Background
NGS for detecting somatic mutation is being widely used for molecular detection related to tumor diagnosis and treatment, including sequencing DNA/RNA of specific genes to search for changes of mutant genes related to tumor clinical diagnosis and treatment. Many diseases occur with the phenomenon of gene fusion. Therefore, the detection of the fusion gene is important for the diagnosis and treatment of diseases.
When the body cells are detected to be abnormal, most researchers select a targeted sequencing scheme at present, so that the sequencing cost is obviously reduced, and the burden of data analysis is also reduced. However, since the breakpoints of the fusion gene are generally in the intron region, the intron region is generally long and has a complex structure, if DNA is detected, the design of the probe is difficult, and the sequencing cost is high. Most of common tumor somatic cell detection schemes are based on DNA targeted sequencing of second-generation sequencing or fusion detection by combining DNA targeted sequencing with other auxiliary means such as qPCR, FISH and the like. For third generation sequencing, such as nanopore sequencing, there is currently only a protocol for targeted sequencing after enrichment by PCR.
Disclosure of Invention
Aiming at the problem that the prior art only carries out three-generation targeted sequencing after PCR enrichment, the inventor overcomes the problems that the efficiency is not high when a probe captures a long fragment, particularly the long fragment which is a non-unique sequence region, and even the long fragment can not be used for subsequent detection after the deep research, and provides a method for carrying out gene fusion detection by combining the probe capture long fragment with a nanopore technology. Specifically, the present invention includes the following.
A method for gene fusion detection within a genome non-unique sequence based on nanopore technology, comprising the steps of:
(1) enrichment of target fusion region, obtaining capture fragment with length of 1-3K from DNA from biological sample, mixing with human cot-1DNA, evaporating, re-dissolving in optimized hybridization solution, incubating at room temperature, keeping at 93-97 deg.C for 5-15 min, adding probe set composed of multiple probes, slowly cooling to hybridization temperature at 0.1 deg.C per minute to improve the specificity of capture target sequence and long fragment binding stability, hybridizing at 63-67 deg.C for 4-16 hr, mixing the product with streptavidin magnetic beads, incubating for 45min, washing the magnetic beads with washing solution, synthesizing and enriching with random primer, dNTP, and enzyme with single-stranded DNA as guide for two-strand synthesis activity to ensure the integrity and fidelity of long fragment double strands and the ratio of long fragments in total fragment distribution, purifying by using adsorption magnetic beads, and performing quality control to obtain enriched DNA;
(2) constructing a library and sequencing, namely constructing the library by using the enriched DNA, connecting a bar code joint at the tail end, enriching by using a bar code primer, purifying and then carrying out quality control, so that a target sequence or a part of the target sequence passes through a nanopore located in a chip, wherein the chip is arranged near an electrode, and the electrode can detect the current passing through the nanopore; wherein the probe set consists of a plurality of probes, each probe comprises a first sequence with the length of x and a second sequence with the length of y which are directly connected, the first sequence is complementary with a first region positioned in a first gene, the second sequence is complementary with a second region positioned in a second gene, and the first region and/or the second region are positioned in a genome non-unique sequence, wherein the length of the first sequence is x, the length of the second sequence is y, the number of repetitions of the first region is R1, the number of repetitions of the second region is R2, x and y are natural numbers between 40 and 150 respectively, and x + y is 100 to 300, when 1/R2 is not more than R1/R2<1, x = y when R1/R2=1, and x is less than y when 1< R1/R2 is not more than R1.
According to the method for gene fusion detection in genome non-unique sequences based on the nanopore technology, the genome non-unique sequence is preferably a repetitive sequence or a highly homologous sequence.
According to the method for gene fusion detection in genome non-unique sequences based on the nanopore technology, the highly homologous sequence is preferably a pseudogene sequence or a gene family sequence.
According to the method for detecting gene fusion in genome non-unique sequences based on the nanopore technology, DNA derived from a biological sample is preferably broken to the length of 1-3K by adjusting ultrasonic frequency and energy and/or optimizing breaking time and a system, and broken fragments are directly used as fragments for capturing; or by adjusting the ultrasonic frequency and energy and/or optimizing the breaking time and system, breaking DNA from the biological sample, connecting broken fragments and a joint to construct a library, and using the fragments in the library as the fragments for capturing.
According to the method for detecting gene fusion in a genome non-unique sequence based on the nanopore technology, the DNA derived from the biological sample is preferably DNA directly extracted from the biological sample or cDNA obtained by reverse transcription of RNA extracted from the biological sample.
According to the method for gene fusion detection in genome non-unique sequences based on nanopore technology, the first region and/or the second region are/is preferably located in an intron region.
According to the method for gene fusion detection in genome non-unique sequences based on the nanopore technology, the first region and the second region are preferably both located in a repetitive sequence.
According to the method for detecting gene fusion in a genome non-unique sequence based on the nanopore technology, preferably, when the first region is located in a repetitive sequence and the second region is not located in the repetitive sequence region, x is adjusted to be 40 to (x + y)/2; when the second area is located in the repeated sequence area and the first area is not located in the repeated sequence area, adjusting y to be 40 to (x + y)/2.
According to the method for detecting gene fusion in non-unique sequences of genomes based on the nanopore technology, preferably, the hybridization solution comprises SSPE, Denhart solution and SDS, wherein the SSPE contains 180-250g/L sodium chloride and 8-12g/L EDTA sodium salt, the content of the SDS is 0.3-0.8%, and the stringency of the hybridization process is increased by adjusting the salt ion strength and the detergent ratio of the hybridization solution and optimizing the hybridization conditions, so that the enrichment efficiency and the target specificity of long fragments containing the non-unique sequences are improved integrally.
The existing common tumor somatic mutation detection method has the problems that the detection result is not accurate enough (single DNA layer), the detection process is not uniform enough (DNA targeting sequencing + qPCR or FISH), or the RNA-Seq sequencing data volume is large, and the like. The method improves the accuracy of the detection result through probe capture, nanopore sequencing reading and rapid real-time sequencing analysis. The invention establishes a set of rapid fusion gene detection method based on probe capture and nanopore sequencing not only on experimental level but also on clinical level.
The method of the invention effectively improves the detection rate of the fusion gene, particularly the situation that the fusion breakpoint is in a repetitive sequence region or nearby. Because the standard initial amount required by the preparation of the nanopore sequencing sample is larger, in order to reduce the initial amount and not influence the efficiency of detecting the fusion gene and capturing the probe, the steps from extraction, fragmentation to capture are optimized, so that the length of the enriched DNA fragment is between 1K and 3K, and the reading length and the quality of the nanopore sequencing are ensured.
Drawings
FIG. 1 is an exemplary embodiment of the present inventionEML4-ALKMethod flow diagram for fusion detection.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that the upper and lower limits of the range, and each intervening value therebetween, is specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.
The invention provides a gene fusion detection method based on probe capture and nanopore technology, which at least comprises (1) enrichment step of a target fusion region and (2) three database building and sequencing steps. As described in detail below.
Enrichment step of target fusion region
Step (1) of the present invention is a step of enriching a target fusion region with a capture probe. Unlike the method of probe capture in the second-generation sequencing, the length of the fragment to be captured is longer, reaching 1K or more, for example 1K-3K, preferably 3K or more, and further preferably 5K or more. It is known that shorter fragments of nucleic acid molecules, such as 100-500bp, are more favorable for the capture of probes. When the nucleic acid molecule becomes long, the molecular structure becomes complicated, and the capture efficiency with the probe is greatly reduced. This is the reason that enrichment and second-generation sequencing based on probe capture is relatively mature technology, and no mature technology based on probe capture and nanopore sequencing exists so far.
The capture process comprises the steps of obtaining a capture fragment with the length of 1-3K from DNA derived from a biological sample, mixing the capture fragment with human cot-1DNA, evaporating to dryness, redissolving in an optimized hybridization solution, incubating at room temperature, keeping at the temperature of 93-97 ℃ for 5-15 minutes, adding a probe set consisting of a plurality of probes, hybridizing at the temperature of 63-67 ℃ for 4-16 hours, mixing a product with streptavidin magnetic beads, incubating for 45 minutes, washing the magnetic beads with a washing solution, enriching by using a random primer, dNTP and a double-strand synthetase with exonuclease activity, and purifying by using the magnetic beads, and carrying out quality control.
In the invention, the optimized hybridization solution has the ionic strength and the detergent ratio which are suitable for long-fragment capture. In an exemplary hybridization solution, SSPE, Denhart solution and SDS are included. Wherein, the SSPE contains 180-250g/L sodium chloride and 8-12g/L EDTA sodium salt. The content of SDS is 0.3-0.8%.
In the present invention, the capture fragment may be obtained by DNA disruption. Such means are known in the art. Such DNA disruption means can be referred to known textbooks, for example, publications such as molecular cloning instruction of Cold spring harbor fourth edition, etc. Preferably, the DNA derived from the biological sample is disrupted by adjusting the ultrasound frequency and energy and/or optimizing the disruption time and system, and the disrupted fragment is used directly as a capture fragment.
In the present invention, the biological sample is not limited, and examples thereof in general include, but are not limited to, a tissue sample or a fluid sample. The tissue sample includes a somatic cell sample, such as a diseased tissue such as a cancer tissue or a normal tissue. The fluid sample comprises blood or components thereof such as plasma, serum, and the like. The biological sample may be any sample of mammalian origin, but may also be a human sample.
In the present invention, the DNA may be deoxyribonucleic acid isolated directly from a biological sample, or deoxyribonucleic acid, i.e., cDNA, obtained by reverse transcription of ribonucleic acid (for example, mRNA) isolated from a biological sample. The reverse transcription may be a natural reverse transcription or an artificially controlled reverse transcription.
In the present invention, a probe set comprising a plurality of probes is used to contact and react with a capture fragment under conditions suitable for hybridization, thereby obtaining a complex, i.e., a capture process is performed. Contact and reaction herein refers to a probe set sequence whose 5 'end is paired with the 3' end of one sequence when aligned with the capture fragment sequence, complementary pairing does not require perfect pairing, and a stable duplex may contain mismatches or unpaired bases. "complementary" as used herein means that two nucleic acid sequences are bound to each other in sequence-specific manner by hydrogen bonding, and that the purine and/or pyrimidine bases form a double-stranded nucleic acid complex according to Watson-Crick rules, or that the nucleic acid sequence and the modified nucleic acid sequence form a nucleic acid double strand with another sequence according to Watson-Crick rules.
In the present invention, the number of probes in a probe set is not limited, and may be changed as necessary. Generally, the number of probes is 50 or more, preferably 100 or more, more preferably 500 or more, for example, 1000 or 2000. In the present invention, each probe comprises a first sequence and a second sequence directly linked to each other. Wherein the first sequence is complementary to a first region located within the first gene and the second sequence is complementary to a second region located within the second gene. Preferably, the first region of the invention is a fragment within a first gene and the second region is a fragment within a second gene, wherein the first region and/or the second region are located within a genome non-unique sequence. Examples of genomic non-unique sequences include, but are not limited to, repetitive sequence regions and highly homologous sequences. The term "highly homologous sequence" refers to a sequence having a sequence identity of 90% or more, preferably 95% or more, more preferably 97% or more, and still more preferably 99% or more. Examples of such highly homologous sequences include transposable regions and conserved regions of pseudogenes, gene families, and the like.
In the present invention, the lengths of the first sequence and the second sequence need to be adjusted according to the number of repetitions of the first region and the number of repetitions of the second region. Specifically, assuming that the length of the first sequence is x, the length of the second sequence is y, the number of repetitions of the first region is R1, the number of repetitions of the second region is R2, x and y are each natural numbers between 40 and 150, and x + y is 100 to 300, x is made greater than y when 1/R2< R1/R2<1, x = y when R1/R2=1, and x is made smaller than y when 1< R1/R2< R1.
In certain embodiments, when the first region is within a repeat sequence and the second region is not within a repeat sequence, x is adjusted to 40 to (x + y)/2; when the second area is located in the repeated sequence area and the first area is not located in the repeated sequence area, adjusting y to be 40 to (x + y)/2. The ratio of x to y can affect the binding of the probe to the fusion fragment present in small amounts within the large number of repeats, and thus the enrichment effect. The detection of the fusion breakpoint in the content subarea, especially the repetitive sequence with a larger repetition proportion in the content subarea, greatly improves the capture efficiency and the detection accuracy. Often fusion sites within these regions are not readily detected in conventional fusion assays, such as in situ fluorescence hybridization, PT-PCR, and immunohistochemistry, and are ignored by being judged as false negatives even in next-generation sequencing NGS.
In the present invention, the first gene and the second gene respectively represent a region consisting of a plurality of contiguous nucleotide molecules in a genome, and preferably, the region has a single function. The first gene and the second gene are preferably open reading frame ORFs. In certain embodiments, the first gene of the invention comprisesKIAA1549、API2、PML、Bcr、NUP98、PML、FIP1L1、E2A、 ETV6、BCOR、ETV6、COL1A1、Mfn2、BCR、TMPRSS2、EML4A gene. The second gene is a gene fused to the first gene, and the second gene of the present invention includesEGFR、MALT1、RARA、Abl、NRG、RARα、PDGFRA、PBX1、NTRK1、 CCNB3、NTRK3、PDGFB、HSG、ABL、ETS、ALKA gene.
In certain embodiments, each probe has a first region corresponding to within the first gene, different probes correspond to different first regions, respectively, and a portion of the plurality of probes correspond to the plurality of first regions at different locations, thereby enabling the set of probes to cover the entirety of the first gene. Similarly, each probe also has a second region corresponding to within a second gene, different probes corresponding to different second regions, respectively, and another portion of the plurality of probes may also correspond to multiple second regions at different locations, thereby enabling the set of probes to also cover the entirety of the second gene. In certain embodiments, probes directed to a known fusion type, e.g., a high frequency fusion site of a first gene and a second gene, are included in a probe set of the invention.
In certain embodiments, the first region or the second region of the invention is located within an intron of a gene, respectively. In further embodiments, the first region or the second region of the invention is located within a repeat region of a gene, respectively.
In certain embodiments, the probe may further comprise a functional group for separation, such as biotin, etc. The probe can be bound to or separated from a carrier or the like by such a functional group. For example, the probe is immobilized in advance on a carrier such as a magnetic bead or a substrate. In a specific embodiment, the target sequence-containing fragment is directly hybridized with the biotin-labeled probe, and then the fragment is anchored to the avidin-containing magnetic bead by a biotin avidin reaction, and the non-target sequence-containing fragment is washed away.
Library construction sequencing procedure
Step (2) of the present invention is a step of establishing a third generation sequencing library. Specifically, the method comprises the steps of taking the enriched DNA to construct a library, or further repairing the tail end, connecting a bar code joint at the tail end, enriching by adopting a bar code primer, purifying and then carrying out quality control, so that a target sequence or part of the target sequence passes through a nanopore located on a chip, wherein the chip is arranged near an electrode, and the electrode can detect the current passing through the nanopore.
Preferably, the probe is isolated from the complex under conditions suitable for probe isolation, and the resulting capture fragment is used to ligate a nanopore adaptor to construct a library with the fragments in the library as the target sequence. Construction of the library may be carried out by conventional procedures in the art, for example, using the PCR Barcoding Kit from Oxford nanopore technologies. In particular embodiments, A, T, G, C differences in base chemistry as the target sequence or portion thereof traverses a nanoscale channel result in a corresponding change in the electrochemical parameters of the nanopore, and detection of such changes can result in conversion of the target sequence nucleic acid sequence, such as used in a nanopore sequencing platform such as, for example, the MinION sequencing platform from Oxford Nanopore Technologies (ONT).
Examples
Based on the nanopore platformEML4-ALKThe flow chart of the fusion detection method is shown in fig. 1, and specifically comprises the following steps:
1. enrichment of fusion regions of interest
1) Taking 1 mu g of cell line and tissue gDNA, breaking to 1K-3K by using Covaris M220, mixing with 5 mu g of human cot-1DNA, evaporating to dryness at 60 ℃ by using a vacuum suction filter pump, re-dissolving in an optimized hybridization solution (containing 10XSSPE, 10XDenhart solution and 0.3% SDS), incubating for 10min at room temperature, adding a mixed 120ntssDNA probe (the probe sequence is shown in SEQ ID No. 1-63) after 10min at 95 ℃, slowly cooling to the hybridization temperature by reducing 0.1 ℃ per minute, and hybridizing for 4-16 h at 65 ℃.
2) Mixing the product obtained in the step 1) with streptavidin magnetic beads, incubating for 45min on a PCR instrument, and subsequently cleaning the magnetic beads with a cleaning solution.
3) The product of the step 2) is enriched by using random primers, dNTP, Klenow exo-and the like, and quality control is carried out by using Qubit4.0 and Agilent 2100 capillary electrophoresis after the product is purified by AgencourtAmpure XP magnetic beads.
2. Building warehouse
1) Taking 100ng of each sample DNA and the enriched DNA library obtained in the step 1), constructing the library by using a PCR coding Kit of Oxford nanopore technologies, and performing End repair by using NEB Ultra II End-prep reaction buffer and Ultra II End-prep enzyme mix;
2) ligating Barcode Adapters to the product of step 1) using Blunt/TA Ligase Master Mix;
3) and (3) enriching the product obtained in the step 2) by using LongAmp Taq 2x Master Mix and Barcode Primers, purifying the product by using Agencour AMpure XP magnetic beads, and performing quality control by using Qubit4.0 and Agilent 2100 capillary electrophoresis.
4) Libraries of different barcode were pooled together for on-machine sequencing.
3. Results and analysis:
sequencing using MinION, yielding 2.3G; and carrying out comparison and analysis on fusion breakpoints of the biological information process.
The breakpoint of the detection sample is located atEML4The scattered repetitive sequence region of the intron region of Gene No. 13 has an influence on the capture efficiency, and at the same time, in the case of the bioassay of NGS data, read length restriction occursEML4In the case of multiple chromosomal locations aligned by the end sequence, reads that cannot pinpoint the chromosomal location are often filtered out, resulting in false negative results.
As shown in the following table 1, the effective fusion reads are not detected by NGS sequencing, the fusion reads can be detected by nanopore direct sequencing, the fusion reads are improved by one to two orders of magnitude after being captured and enriched by a probe, and the breakpoint detection and the genome positioning are accurate in the aspect of detecting structural variation.
TABLE 1
Sample Raw_reads Raw_bases Fuse reads numbers
Direct sequencing libraries 175,717 80,119,036 2
Post-capture library 156,884 41,937,986 50
The cycle time for sample processing and capture is 24 hours, whereas the detection and analysis cycle for conventional NGS is typically around one week. The method of the present invention has significant advantages in the feedback time for the detection of the fusion gene.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. Many modifications and variations may be made to the exemplary embodiments of the present description without departing from the scope or spirit of the present invention. The scope of the claims is to be accorded the broadest interpretation so as to encompass all modifications and equivalent structures and functions.
Sequence listing
<110> basic code Gene science and technology (Beijing) Ltd
<120> method for gene fusion detection based on nanopore technology
<130>BH1900302-1
<141>2020-03-04
<160>63
<170>SIPOSequenceListing 1.0
<210>1
<211>100
<212>DNA
<213>Artificial Sequence
<400>1
caccccaaaa agaaagcctg tacccattag tagtcacttt ctatttctcc ctcccctcag 60
cccctaggta accaccaatt tcctttaggt ctctatagat 100
<210>2
<211>100
<212>DNA
<213>Artificial Sequence
<400>2
tgttttttga gatggagttt cactcttgtt gcccaggctg gagtgcagtg gtgcgatttc 60
ggctcactga acctccgcct cccaggttca agcgattctc 100
<210>3
<211>100
<212>DNA
<213>Artificial Sequence
<400>3
aagaaggtgt gtctttaatt gaagcatgat ttaaagtaaa tgcaaagcta aaaatcagat 60
atatggaaaa taattatttg tattatatag ggcagagtca 100
<210>4
<211>100
<212>DNA
<213>Artificial Sequence
<400>4
tgttagtctg gttcctccaa gaagcagact ggagatggga ttagacccaa tatggtctgc 60
agattttatt agaagaaatg cccatgagag gaaatgggga 100
<210>5
<211>100
<212>DNA
<213>Artificial Sequence
<400>5
aaggcaacag gtccccagct ctgaaactgc ccaagggaac agagaacctt aggagcagta 60
agatccctgt cactgggcat gtttaagtgg aggcaggatg 100
<210>6
<211>100
<212>DNA
<213>Artificial Sequence
<400>6
gccccttggt gggggtggta gagggcttat tctatagtag aggattttta agactccttc 60
aggagccatg acccaccttt cacacagtgg tcagagcact 100
<210>7
<211>100
<212>DNA
<213>Artificial Sequence
<400>7
cgagctgtgg caggtagggg agggacagaa agtttacaaa accgaatcca gggtgttctg 60
gaacccagaa accatttgtg gtcatgggcc aaatctcagg 100
<210>8
<211>100
<212>DNA
<213>Artificial Sequence
<400>8
catgctaaat taaataaagg agatagtttc cctttaccac tgaataagaa aattcaatta 60
ttttcttcag aaagtatacg tttgttgtgt taaataactg 100
<210>9
<211>100
<212>DNA
<213>Artificial Sequence
<400>9
tactacctaa ctcagtgaaa atcacatagt tttaaaaaat atgtacatta taaattttaa 60
agcagtaatt taaactttgc tcaacggtat tatggcttgt 100
<210>10
<211>100
<212>DNA
<213>Artificial Sequence
<400>10
tttaagtaaa ctaacggaac cacatactga aatataagac atgtactgaa atgacagata 60
tacgatttac aaggttccca taattatttc tactgtccta 100
<210>11
<211>100
<212>DNA
<213>Artificial Sequence
<400>11
accttgcatc agttttctct agtagctctc catgtattta tttttttccc taaaatgttg 60
atctttcaga gtacatctaa cttagattta acaacaaatt 100
<210>12
<211>100
<212>DNA
<213>Artificial Sequence
<400>12
ggtaaaggtt gctatggttc atttttctat atttagaaat aactctggaa tcacaaatcc 60
aattgaaggg ctaaaactta agctgaatac atgcagggac 100
<210>13
<211>100
<212>DNA
<213>Artificial Sequence
<400>13
aatttttgtg atctacatgc ctaagctctt tatgaactgt ctccatctca gtcaccttta 60
ccaacatggg cctaaagaaa ttaggatgct aggtttgcat 100
<210>14
<211>100
<212>DNA
<213>Artificial Sequence
<400>14
tttctttgaa taagatttca atatctcagt ttctctagac ttcagcttac tcacttcatg 60
caaaacagaa ataacatgat aggatcaaat gttatcgctg 100
<210>15
<211>100
<212>DNA
<213>Artificial Sequence
<400>15
ggctattgtg agtggcttaa aattataaaa gttttacaga cagaccaata ggatagacta 60
gagagttgag aagtagaccc caaagatgac aaacacgtga 100
<210>16
<211>100
<212>DNA
<213>Artificial Sequence
<400>16
cttgtatggc cctgttccca tgctgttaat tggtatcatt catcaatcac agcattcttc 60
ttcaaaaaca agatacagcc tcagaaatcc ttctcagtac 100
<210>17
<211>100
<212>DNA
<213>Artificial Sequence
<400>17
agcactccaa gaagccacta ccaacagaac aaaggtggca tcataaacaa ctcctatcta 60
gccttaactg tcctgataca tataaaaact gtaaaggagt 100
<210>18
<211>100
<212>DNA
<213>Artificial Sequence
<400>18
ggccgggccc ggtggctcat gcctgtaatc ccagcacttt gggaggccga ggtgggtgga 60
tcaagaggtc aggagatcga gaccatcctg gctaacacgg 100
<210>19
<211>100
<212>DNA
<213>Artificial Sequence
<400>19
tggataggag aaatcaaagg attatttaat gatgctagaa taatttgttt aactatttgg 60
gatattttcc taaaatcaag aaatacttct atgcttacag 100
<210>20
<211>100
<212>DNA
<213>Artificial Sequence
<400>20
cacctcccaa tctaccatct agcaaatcta ttagacaaac aaacaaaaaa agcaagatca 60
gtgtggcaga ctgggaacag aaaacccgtg ctgaatgtgt 100
<210>21
<211>100
<212>DNA
<213>Artificial Sequence
<400>21
attcttttat aaatacaaac catattctaa taatctattc taattttccc taggtagcca 60
gactaaggta gaaaggtgta tctgggccag gtacggtggc 100
<210>22
<211>100
<212>DNA
<213>Artificial Sequence
<400>22
actaaaaata caaaaaaaaa aaaaaaaaat tagcgaggca tggtggtggg cacctgtagt 60
cccagctact cgggaggctg aggcaggaga atggcgtgaa 100
<210>23
<211>100
<212>DNA
<213>Artificial Sequence
<400>23
cccaggaggc ggagcttgca gtgagctgac atcacgccac tgcgccccag cctgggtgac 60
agtgcgagac tccatctcaa aaaaaaagaa aaaaaacagg 100
<210>24
<211>100
<212>DNA
<213>Artificial Sequence
<400>24
tgtatctgaa gagttatgcc aacaggacac cccaattctc agagttccac atataaatat 60
attgcctaat ctaccttctt ttaccctaga ggcttaatcc 100
<210>25
<211>100
<212>DNA
<213>Artificial Sequence
<400>25
ctccatcttc tagactctga tgagttgcca gctgtaattc agagggacca ccaccaacat 60
cttacagagt cctcgttctc actgcctgac aaaaagagca 100
<210>26
<211>100
<212>DNA
<213>Artificial Sequence
<400>26
ggcaaaatcc ataagaagca tgtggaacct caaattccaa gtatgaatat attttcaaca 60
ttaaatatta ggtattaagt aataaaagca actcttgggg 100
<210>27
<211>100
<212>DNA
<213>Artificial Sequence
<400>27
ggaaaaactt gtttctagtc tcaccacctc agtataacct tacaaagaat aaagataaat 60
gacatttatt actgtcattg cagattacta tttactaagt 100
<210>28
<211>100
<212>DNA
<213>Artificial Sequence
<400>28
atctcacttt aaataaataa accaaaagtc ctctgctttt aatgtttggg tagagaagaa 60
aaatgaattg tgtttaccac tggagaacag gatcagaatc 100
<210>29
<211>100
<212>DNA
<213>Artificial Sequence
<400>29
ggaagaggta gagtttgtaa taataagcat gtattacttt tcttcctgac atagaaggtt 60
ccactccact tctaccttcg atgagcttat aaatcaattt 100
<210>30
<211>100
<212>DNA
<213>Artificial Sequence
<400>30
aggaaattaa gagaaaacca aagaagttat gggttgcaca ggatgcctaa gtagattaga 60
aaaatgaata aactgtcact tcaaaaacca ttaggagggc 100
<210>31
<211>100
<212>DNA
<213>Artificial Sequence
<400>31
catgacacaa ggggacgtgg gattctttat actgtggcta aaaaaaatag ttaatagctc 60
ttagttccat tctttttcct cctaaagttc ctgaatgtgg 100
<210>32
<211>100
<212>DNA
<213>Artificial Sequence
<400>32
agactaaatg ctgctgcaga agttaggctt cacacagttt gaaactgatc actttttaca 60
tgcatgtata atctcagagt tgttattaca cttaactggt 100
<210>33
<211>100
<212>DNA
<213>Artificial Sequence
<400>33
tcacctgagg tcaggagttc gagaccagcc tgaccaacat ggagaaaccc tgtctctact 60
aaaaatacag catcagccgg gtgtggtggt gcatgcttgt 100
<210>34
<211>100
<212>DNA
<213>Artificial Sequence
<400>34
cctactaata agtgaattat atctatattt cttaatatct gatttacaag taatatttta 60
tattgtttga tggtatgaag aagtaggaaa aagaaaaaga 100
<210>35
<211>100
<212>DNA
<213>Artificial Sequence
<400>35
agaaataatt caaaaatata ataaatggtt cagctgttaa aacatgttct agttacagaa 60
actcctgtgg acaagaaaga ccaaaacagt tttccacctg 100
<210>36
<211>100
<212>DNA
<213>Artificial Sequence
<400>36
aaaccgaaga attatgtttt tgggatgagg aattcctcca tgctttaaga agaaatctct 60
tacctccata tttcctaatc tttctcagga tttcaagagc 100
<210>37
<211>100
<212>DNA
<213>Artificial Sequence
<400>37
atgatgtact ttgtactgca catctgaaag ttcccattaa ctctgggcag acccagcaag 60
acaaggttat gctggtgcat gttaccccat gaactctgat 100
<210>38
<211>100
<212>DNA
<213>Artificial Sequence
<400>38
ccttccagta agtgtagaac ataagcagaa aatggggaac taaaagaaag gggaagaggg 60
aaaaggaagg aaggtggtgg aaaatgatct aaatgacact 100
<210>39
<211>100
<212>DNA
<213>Artificial Sequence
<400>39
aaatggtgcc aaaagacaag caggtcagaa aaaataacta atacacaaat aattagaaag 60
taacttacag ttgtaaatgc tgctagaaaa tactagacat 100
<210>40
<211>100
<212>DNA
<213>Artificial Sequence
<400>40
taacttcttt ttaacactta aacttccatt aaaattagct taccagaatc atagcttgga 60
ggcttcatat ctccctgatt caattctgtc ccgtatttca 100
<210>41
<211>100
<212>DNA
<213>Artificial Sequence
<400>41
tttctgctgc aagtacttct gatgttcaag atcgcctgtc agctcttgag tcacgagttc 60
agcaacaaga agatgaaatc actgtgctaa aggcggcttt 100
<210>42
<211>100
<212>DNA
<213>Artificial Sequence
<400>42
ggctgatgtt ttgaggcgtc ttgcaatctc tgaagatcat gtggcctcag tgaaaaaatc 60
agtctcaagt aaaggtaatt gtgttgtaaa gttaaaaaga 100
<210>43
<211>100
<212>DNA
<213>Artificial Sequence
<400>43
gtcttgcttt ttgcaatatt ttctttgaaa gttgaagctg gaaatataaa actagtttct 60
tatgtggatt acttgtgatt atagtttgtt ttccatttcg 100
<210>44
<211>100
<212>DNA
<213>Artificial Sequence
<400>44
tttttttaat tcccaaaaag ttctgaaagt ttattcttta ttatttaaaa taaagaattt 60
ttgtgtaatc cactgattat actcacaggt tttttatgtt 100
<210>45
<211>100
<212>DNA
<213>Artificial Sequence
<400>45
acagtatttg tgtgaagtta gtatcttcca actagatgat aagtttactc agggcaggaa 60
ctgaatcatc gttttgtgtc tctcatttca tattgaacat 100
<210>46
<211>100
<212>DNA
<213>Artificial Sequence
<400>46
aggtggtagg cactcatgtg ttcttaaaat tctgttgtca atacagaaac ataattaaaa 60
tcatcatagt ttaaatagct ttattctgaa ccctctgtgg 100
<210>47
<211>100
<212>DNA
<213>Artificial Sequence
<400>47
ttgttagcag aatcctgaaa aaattaattt aagctctgaa gcttacgagc ccagtataat 60
gggacttcta actttccagt attgggagtt ttcaaggttg 100
<210>48
<211>100
<212>DNA
<213>Artificial Sequence
<400>48
agacaaacag tcgtcaagac cctgttgctc tgagtttgga gcagatgtag tgtacctttt 60
cccttacttc aaccaccaag aaagaattta ttattctttc 100
<210>49
<211>100
<212>DNA
<213>Artificial Sequence
<400>49
acatgaatca agatttgaac atatcatttg atgtctctta ctgatttatt ggtagagtat 60
atggggatag gcgtgattat gcccattttg gcagtgggaa 100
<210>50
<211>100
<212>DNA
<213>Artificial Sequence
<400>50
aagtgaggac atttattggc ttgccttcct acttcgatgt caatgtaaat tattaaccta 60
gggaggagtt ttatataact cccaaattct agctgagtag 100
<210>51
<211>100
<212>DNA
<213>Artificial Sequence
<400>51
gagatagtgc agtcagccaa ccagcaaatg agaagcagat ttagaagaaa gattgtgaat 60
tcagtttctg aatgttgttt agggtgtata tggtatgtag 100
<210>52
<211>100
<212>DNA
<213>Artificial Sequence
<400>52
agatacatat tggtagctga attacatatg gatctagtgc tcatgttagt tttggctagc 60
agtaagaatt tgggaatgtt cctcttatgg atgatagttg 100
<210>53
<211>100
<212>DNA
<213>Artificial Sequence
<400>53
aagccataga ggatgtaaaa tgagaaggca gtccagcata gttccactgg gagctccaac 60
atttcagcat agcacaaaaa gtgatgcaca ttgccagctg 100
<210>54
<211>100
<212>DNA
<213>Artificial Sequence
<400>54
cttacagaga tgtcaaataa aacaaagact gagagatgac cattggattc ggcatttaca 60
gcaatattag aaacctctcc aaaagcaatt ttagtggagt 100
<210>55
<211>100
<212>DNA
<213>Artificial Sequence
<400>55
aatagaactg caaatcaaat accagtattt tgagaagcag caattaagaa taattgtaga 60
attgagagca acaaatatag ccactctcaa aaaattcacc 100
<210>56
<211>100
<212>DNA
<213>Artificial Sequence
<400>56
tgagtggaag acggtggtaa tagctaggag aggatgctat gttgaaggcc agttttaatt 60
actttttaaa atttgttaaa aatatgtgat agacttgaac 100
<210>57
<211>100
<212>DNA
<213>Artificial Sequence
<400>57
acaaagatag gttgaggaca aaaaggcagc agagaggaag accaatacgg ggaaatgggg 60
tggtaattga tgaggcaagt gtcctgagaa tttgagaagg 100
<210>58
<211>100
<212>DNA
<213>Artificial Sequence
<400>58
gatctttgag aaccacaact tctagccctg ataactgtga atatatatcg agataatgat 60
aacctttagg tgcaaagata ggtcgtcagt ggtagaaaaa 100
<210>59
<211>100
<212>DNA
<213>Artificial Sequence
<400>59
aagagaatgc atgtttgaaa tggttaaagc ggagaatgga agaccgtaga catttgaaag 60
agatgctgag agaccctgag gctatggagg cgggatactt 100
<210>60
<211>100
<212>DNA
<213>Artificial Sequence
<400>60
attgtggagg aaaggtgagt atcattctgc ccggttttgt gattcccact cttttccctc 60
ttggcctcat gtactctggg tctgtgtatt tacagtggag 100
<210>61
<211>100
<212>DNA
<213>Artificial Sequence
<400>61
agttagatgg aagaaaggat tgatcttttt cattagtgat agtgagataa atatgtcaat 60
gacaaaggac aaggggcctg agggtttggc aaaagacttg 100
<210>62
<211>100
<212>DNA
<213>Artificial Sequence
<400>62
ataaaaacca ctttgactaa taaagtgtta tattatgctt gcttgcttga tttatttatt 60
tatttattta tttatttaga gagtcaactt tgaggaccgc 100
<210>63
<211>100
<212>DNA
<213>Artificial Sequence
<400>63
caagatcaag ccagaaaaca attttgtttg gttatgaatt gatttttata ggaggatttt 60
ggatttttag agtaggataa ggagctagat ctgttaatgt 100

Claims (2)

1. A method for gene fusion detection based on a nanopore technology is characterized by comprising the following steps:
(1) an enrichment step of the fusion region of interest,
obtaining a capture fragment with the length of 1-3K from gDNA derived from a biological sample, mixing the capture fragment with human cot-1DNA, evaporating to dryness, redissolving in an optimized hybridization solution, incubating at room temperature, keeping at the temperature of 93-97 ℃ for 5-15 minutes, adding a probe set consisting of a plurality of probes, slowly cooling to the hybridization temperature by reducing the temperature by 0.1 ℃ per minute to improve the specificity of a capture target sequence and the binding stability of a long fragment, hybridizing at 63-67 ℃ for 4-16 hours, mixing a product with streptavidin magnetic beads, incubating for 45 minutes, cleaning the magnetic beads with a cleaning solution, synthesizing and enriching by using a random primer, dNTP and an enzyme with single-stranded DNA as a guide double-strand synthesis activity to ensure the integrity and fidelity of the long fragment double strands and the proportion of the long fragments in the total fragment distribution, purifying by adsorption, performing quality control to obtain enriched DNA;
(2) library construction sequencing step
Constructing a library by using the enriched DNA, connecting a bar code joint at the tail end, enriching by using a bar code primer, purifying and then carrying out quality control so that a target sequence or a part of the target sequence passes through a nanopore located in a chip, wherein the chip is arranged near an electrode, and the electrode can detect the current passing through the nanopore;
wherein the probe group consists of a plurality of probes with biotin labels, and the sequence of each probe is shown as SEQ ID No. 1-63;
the optimized hybridization solution comprises SSPE, Denhart solution and SDS, wherein the SSPE contains 180-250g/L sodium chloride and 8-12g/L EDTA sodium salt, the content of the SDS in the optimized hybridization solution is 0.3-0.8%, and the stringency of the hybridization process is increased by adjusting the salt ion strength of the hybridization solution and the proportion of detergents and optimizing the hybridization conditions, so that the enrichment efficiency of long fragments and the target specificity are integrally improved;
wherein, the gene fusion refers to the fusion of a first gene and a second gene, wherein the first gene isEML4The second gene isALKA gene.
2. The method for gene fusion assay based on nanopore technology according to claim 1, wherein gDNA from biological sample is disrupted to a length of 1-3K by adjusting ultrasonic frequency and energy and/or optimizing disruption time and system, and the disrupted fragment is directly used as a capture fragment; or
gDNA from a biological sample is disrupted by adjusting ultrasonic frequency and energy and/or optimizing disruption time and system, and the disrupted fragments are ligated with linkers to construct a library, and the fragments in the library are used as fragments for capture.
CN202010149484.1A 2020-03-06 2020-03-06 Method for gene fusion detection based on nanopore technology Active CN111020019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010149484.1A CN111020019B (en) 2020-03-06 2020-03-06 Method for gene fusion detection based on nanopore technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010149484.1A CN111020019B (en) 2020-03-06 2020-03-06 Method for gene fusion detection based on nanopore technology

Publications (2)

Publication Number Publication Date
CN111020019A CN111020019A (en) 2020-04-17
CN111020019B true CN111020019B (en) 2020-06-19

Family

ID=70199416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010149484.1A Active CN111020019B (en) 2020-03-06 2020-03-06 Method for gene fusion detection based on nanopore technology

Country Status (1)

Country Link
CN (1) CN111020019B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113862344A (en) * 2021-09-09 2021-12-31 成都齐碳科技有限公司 Method and apparatus for detecting gene fusion
CN114561467B (en) * 2022-02-18 2023-06-09 江苏省中医院 MET fusion gene detection method, kit and probe library

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110527714B (en) * 2019-09-06 2023-03-28 元码基因科技(北京)股份有限公司 Method for detecting integration site of HPV in host genome

Also Published As

Publication number Publication date
CN111020019A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN108004301B (en) Gene target region enrichment method and library construction kit
KR101850437B1 (en) Method for predicting transplantation rejection using next generation sequencing
USH2191H1 (en) Identification and mapping of single nucleotide polymorphisms in the human genome
US10648037B2 (en) Method and kit for non-invasively detecting EGFR gene mutations
DK3260555T3 (en) Hitherto UNKNOWN PROTOCOL FOR PREPARING SEQUENCE LIBRARIES
CN110719958B (en) Method and kit for constructing nucleic acid library
JP2021176302A (en) Deep sequencing profiling of tumors
CN110541033B (en) Composition for EGFR gene mutation detection and detection method
CN106148323B (en) Method and kit for constructing ALK gene fusion mutation detection library
CN110079594B (en) High-throughput method based on DNA and RNA gene mutation detection
KR20110014997A (en) Substances and methods for a dna based profiling assay
CN106995851B (en) PCR primer for amplifying PKD1 exon ultra-long fragment, kit for detecting PKD1 gene mutation and application
CN111020019B (en) Method for gene fusion detection based on nanopore technology
WO2019062614A1 (en) A method of amplifying a target nucleic acid
US20180291369A1 (en) Error-proof nucleic acid library construction method and kit
JP2007528227A (en) Artificial mutation control for diagnostic testing
CN108624686B (en) A kind of probe library, detection method and the kit of detection BRCA1/2 mutation
CN111349691B (en) Composition, kit and detection method for EGFR gene deletion mutation detection
WO2022007863A1 (en) Method for rapidly enriching target gene region
CN114774553A (en) Method for detecting multigene site mutation by using high-throughput sequencing technology
CN111500748B (en) Primer combination for detecting 617 SNPs and InDel and application thereof in forensic identification and genetic relationship identification
EP3988665B1 (en) Method and use for construction of sequencing library based on dna samples
CN113969307A (en) DNA methylation sequencing library, preparation method and DNA methylation detection method
CN110846398A (en) BMPR2 gene probe set for detecting pulmonary hypertension and application thereof
WO2020199127A1 (en) Design of sequencing primers and pcr-based method for sequencing whole genome

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant