CN111321202A - Gene fusion variation library construction method, detection method, device, equipment and storage medium - Google Patents
Gene fusion variation library construction method, detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111321202A CN111321202A CN201911419273.9A CN201911419273A CN111321202A CN 111321202 A CN111321202 A CN 111321202A CN 201911419273 A CN201911419273 A CN 201911419273A CN 111321202 A CN111321202 A CN 111321202A
- Authority
- CN
- China
- Prior art keywords
- gene
- fusion
- reads
- library
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 328
- 230000004927 fusion Effects 0.000 title claims abstract description 257
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 238000010276 construction Methods 0.000 title abstract description 12
- 239000000523 sample Substances 0.000 claims abstract description 72
- 238000012163 sequencing technique Methods 0.000 claims abstract description 47
- 230000035772 mutation Effects 0.000 claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000014509 gene expression Effects 0.000 claims abstract description 15
- 238000013461 design Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 239000002299 complementary DNA Substances 0.000 claims description 60
- 108020004414 DNA Proteins 0.000 claims description 25
- 239000012634 fragment Substances 0.000 claims description 20
- 238000012216 screening Methods 0.000 claims description 13
- 102000053602 DNA Human genes 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 12
- 230000002068 genetic effect Effects 0.000 claims description 12
- 230000003321 amplification Effects 0.000 claims description 11
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 11
- AHCYMLUZIRLXAA-SHYZEUOFSA-N Deoxyuridine 5'-triphosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C=C1 AHCYMLUZIRLXAA-SHYZEUOFSA-N 0.000 claims description 9
- 230000008439 repair process Effects 0.000 claims description 9
- 238000013518 transcription Methods 0.000 claims description 9
- 230000035897 transcription Effects 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 claims description 8
- 238000001976 enzyme digestion Methods 0.000 claims description 7
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 6
- 102000057361 Pseudogenes Human genes 0.000 claims description 6
- 108091008109 Pseudogenes Proteins 0.000 claims description 6
- 238000013441 quality evaluation Methods 0.000 claims description 5
- 210000001185 bone marrow Anatomy 0.000 claims description 4
- 230000010261 cell growth Effects 0.000 claims description 4
- 230000004663 cell proliferation Effects 0.000 claims description 4
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 claims description 4
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 claims description 4
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 claims description 4
- 101150044508 key gene Proteins 0.000 claims description 4
- 210000005259 peripheral blood Anatomy 0.000 claims description 4
- 239000011886 peripheral blood Substances 0.000 claims description 4
- 230000019491 signal transduction Effects 0.000 claims description 4
- 108010090804 Streptavidin Proteins 0.000 claims description 3
- 229960002685 biotin Drugs 0.000 claims description 3
- 235000020958 biotin Nutrition 0.000 claims description 3
- 239000011616 biotin Substances 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 claims description 3
- 238000009396 hybridization Methods 0.000 abstract description 11
- 239000003298 DNA probe Substances 0.000 abstract description 8
- 108020003215 DNA Probes Proteins 0.000 abstract description 5
- 238000004445 quantitative analysis Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 20
- -1 CRLF Proteins 0.000 description 17
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 8
- 101001045846 Homo sapiens Histone-lysine N-methyltransferase 2A Proteins 0.000 description 6
- 239000011324 bead Substances 0.000 description 6
- 201000005787 hematologic cancer Diseases 0.000 description 5
- 108010006519 Molecular Chaperones Proteins 0.000 description 4
- 208000032839 leukemia Diseases 0.000 description 4
- 102100022103 Histone-lysine N-methyltransferase 2A Human genes 0.000 description 3
- 206010025323 Lymphomas Diseases 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 108020004999 messenger RNA Proteins 0.000 description 3
- 102000039446 nucleic acids Human genes 0.000 description 3
- 108020004707 nucleic acids Proteins 0.000 description 3
- 150000007523 nucleic acids Chemical class 0.000 description 3
- 230000008707 rearrangement Effects 0.000 description 3
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 2
- 102100034798 CCAAT/enhancer-binding protein beta Human genes 0.000 description 2
- 102100021975 CREB-binding protein Human genes 0.000 description 2
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 description 2
- 108010060313 Core Binding Factor beta Subunit Proteins 0.000 description 2
- 102000008147 Core Binding Factor beta Subunit Human genes 0.000 description 2
- 108091008794 FGF receptors Proteins 0.000 description 2
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 description 2
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 description 2
- 206010064571 Gene mutation Diseases 0.000 description 2
- 241000131390 Glis Species 0.000 description 2
- 102100039121 Histone-lysine N-methyltransferase MECOM Human genes 0.000 description 2
- 101000945963 Homo sapiens CCAAT/enhancer-binding protein beta Proteins 0.000 description 2
- 101000896987 Homo sapiens CREB-binding protein Proteins 0.000 description 2
- 101100076418 Homo sapiens MECOM gene Proteins 0.000 description 2
- 101000996563 Homo sapiens Nuclear pore complex protein Nup214 Proteins 0.000 description 2
- 101000583474 Homo sapiens Phosphatidylinositol-binding clathrin assembly protein Proteins 0.000 description 2
- 101000800488 Homo sapiens T-cell leukemia homeobox protein 1 Proteins 0.000 description 2
- 101000845170 Homo sapiens Thymic stromal lymphopoietin Proteins 0.000 description 2
- 101000636213 Homo sapiens Transcriptional activator Myb Proteins 0.000 description 2
- 101000823316 Homo sapiens Tyrosine-protein kinase ABL1 Proteins 0.000 description 2
- 101000964718 Homo sapiens Zinc finger protein 384 Proteins 0.000 description 2
- 108700024831 MDS1 and EVI1 Complex Locus Proteins 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 102100033819 Nuclear pore complex protein Nup214 Human genes 0.000 description 2
- 102100031014 Phosphatidylinositol-binding clathrin assembly protein Human genes 0.000 description 2
- 101150055297 SET1 gene Proteins 0.000 description 2
- 102100033111 T-cell leukemia homeobox protein 1 Human genes 0.000 description 2
- 102100031294 Thymic stromal lymphopoietin Human genes 0.000 description 2
- 102100030780 Transcriptional activator Myb Human genes 0.000 description 2
- 102100022596 Tyrosine-protein kinase ABL1 Human genes 0.000 description 2
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 description 2
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 2
- 102100040731 Zinc finger protein 384 Human genes 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 102000052178 fibroblast growth factor receptor activity proteins Human genes 0.000 description 2
- 238000012224 gene deletion Methods 0.000 description 2
- 230000004545 gene duplication Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- UCSJYZPVAKXKNQ-HZYVHMACSA-N streptomycin Chemical compound CN[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O[C@H]1O[C@@H]1[C@](C=O)(O)[C@H](C)O[C@H]1O[C@@H]1[C@@H](NC(N)=N)[C@H](O)[C@@H](NC(N)=N)[C@H](O)[C@H]1O UCSJYZPVAKXKNQ-HZYVHMACSA-N 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 101150058734 A gene Proteins 0.000 description 1
- 101000783817 Agaricus bisporus lectin Proteins 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 108091011896 CSF1 Proteins 0.000 description 1
- 101100381481 Caenorhabditis elegans baz-2 gene Proteins 0.000 description 1
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 description 1
- 102000004594 DNA Polymerase I Human genes 0.000 description 1
- 108010017826 DNA Polymerase I Proteins 0.000 description 1
- 102000004099 Deoxyribonuclease (Pyrimidine Dimer) Human genes 0.000 description 1
- 108010082610 Deoxyribonuclease (Pyrimidine Dimer) Proteins 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 101100179475 Equus asinus IGHA gene Proteins 0.000 description 1
- 101001077417 Gallus gallus Potassium voltage-gated channel subfamily H member 6 Proteins 0.000 description 1
- 208000034951 Genetic Translocation Diseases 0.000 description 1
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 description 1
- 101000840267 Homo sapiens Immunoglobulin lambda-like polypeptide 1 Proteins 0.000 description 1
- 101001046999 Homo sapiens Kynurenine-oxoglutarate transaminase 3 Proteins 0.000 description 1
- 101000588130 Homo sapiens Microsomal triglyceride transfer protein large subunit Proteins 0.000 description 1
- 101001109719 Homo sapiens Nucleophosmin Proteins 0.000 description 1
- 101000742844 Homo sapiens RNA-binding motif protein, Y chromosome, family 1 member A1 Proteins 0.000 description 1
- 101000891113 Homo sapiens T-cell acute lymphocytic leukemia protein 1 Proteins 0.000 description 1
- 101001010792 Homo sapiens Transcriptional regulator ERG Proteins 0.000 description 1
- 102000043138 IRF family Human genes 0.000 description 1
- 108091054729 IRF family Proteins 0.000 description 1
- 102100029616 Immunoglobulin lambda-like polypeptide 1 Human genes 0.000 description 1
- 102000004289 Interferon regulatory factor 1 Human genes 0.000 description 1
- 108090000890 Interferon regulatory factor 1 Proteins 0.000 description 1
- 108091082332 JAK family Proteins 0.000 description 1
- 102100022892 Kynurenine-oxoglutarate transaminase 3 Human genes 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 101150113681 MALT1 gene Proteins 0.000 description 1
- 102100028123 Macrophage colony-stimulating factor 1 Human genes 0.000 description 1
- 108700026676 Mucosa-Associated Lymphoid Tissue Lymphoma Translocation 1 Proteins 0.000 description 1
- 102100038732 Mucosa-associated lymphoid tissue lymphoma translocation protein 1 Human genes 0.000 description 1
- 101100518987 Mus musculus Pax1 gene Proteins 0.000 description 1
- 101100467856 Mus musculus Rbmy1a1 gene Proteins 0.000 description 1
- 101100467858 Mus musculus Rbmy1b gene Proteins 0.000 description 1
- GXCLVBGFBYZDAG-UHFFFAOYSA-N N-[2-(1H-indol-3-yl)ethyl]-N-methylprop-2-en-1-amine Chemical compound CN(CCC1=CNC2=C1C=CC=C2)CC=C GXCLVBGFBYZDAG-UHFFFAOYSA-N 0.000 description 1
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 1
- 102100022678 Nucleophosmin Human genes 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 101150093908 PDGFRB gene Proteins 0.000 description 1
- 102100024894 PR domain zinc finger protein 1 Human genes 0.000 description 1
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 description 1
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 1
- 108010009975 Positive Regulatory Domain I-Binding Factor 1 Proteins 0.000 description 1
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 description 1
- 238000002123 RNA extraction Methods 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 101100372762 Rattus norvegicus Flt1 gene Proteins 0.000 description 1
- 102100025373 Runt-related transcription factor 1 Human genes 0.000 description 1
- 101000702553 Schistosoma mansoni Antigen Sm21.7 Proteins 0.000 description 1
- 101000714192 Schistosoma mansoni Tegument antigen Proteins 0.000 description 1
- 102100040365 T-cell acute lymphocytic leukemia protein 1 Human genes 0.000 description 1
- 101150088517 TCTA gene Proteins 0.000 description 1
- 108700025690 abl Genes Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 230000002559 cytogenic effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002558 medical inspection Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 229960005322 streptomycin Drugs 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000003828 vacuum filtration Methods 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Zoology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Medicinal Chemistry (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Immunology (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Chemical & Material Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a gene fusion mutation library construction method, a gene fusion mutation library detection device, computer equipment and a computer storage medium. The gene fusion mutation library construction method, the gene fusion mutation detection method and the gene fusion mutation detection device are based on a DNA probe hybridization capture multigene RNA targeted sequencing technology, target fusion genes are captured through hybridization of the fusion gene capture probes, and a gene fusion mutation library is constructed. Furthermore, the invention also designs a fusion gene quantitative analysis method, which can obtain the variation proportion of the fusion gene through calculation so as to obtain the accurate expression quantity value of the fusion gene.
Description
Technical Field
The invention relates to the technical field of molecular biology and bioinformatics, in particular to a gene fusion variation library construction method, a gene fusion variation library detection device, a gene fusion variation library equipment and a storage medium.
Background
Cytogenetic studies have found that a series of hematological tumors, including AML, ALL, CML, NHLs and the like, have multiple chromosomal translocations, which result in abnormal expression of oncogenes and/or transcriptional expression of fusion genes, and ALL promote transformation and survival of cancer cells. These core driver genes (e.g., MLL, ALK, etc.) often have multiple fusion gene partners (partner) and may also have different breakpoints (breakpoints) with the same fusion gene, thereby forming different subtypes, e.g., there are 54 known fusion partners for the MLL gene, and the cosmoc database contains up to 15 fusion subtypes for the KMT 2A-AFF 1 fusion gene (https:// cancer. sanger. ac. uk/COSMIC/fusion/overview?fid ═ 359723& gid ═ 271430). These fusion gene variations affect clinical prognosis and can direct molecular typing and targeted therapy of hematological tumors. It is therefore desirable to develop a genomic detection reagent to identify gene fusion variants in hematological tumors.
RT-PCR and Fluorescence In Situ Hybridization (FISH) are two commonly used gene fusion detection techniques. Both of them detect a single specific type of known gene fusion, with narrow application range and low efficiency, and even impossible to detect new gene fusion variations. Therefore, the deficiency of the fusion gene detection technology still limits the auxiliary diagnosis and accurate medical treatment of the blood tumor.
Disclosure of Invention
In view of the above, it is desirable to provide a method for constructing a gene fusion mutation library, a method for detecting a gene fusion mutation library, an apparatus, a computer device, and a computer storage medium, which have a wide application range and high detection efficiency and can detect a newly-discovered gene fusion mutation.
A method for constructing a gene fusion variation library comprises the following steps:
extracting total RNA of the sample, and removing rRNA in the total RNA;
reverse transcribing the total RNA after rRNA is removed, synthesizing double-stranded cDNA, and synthesizing by using dUTP instead of dTTP when synthesizing a second strand of the double-stranded cDNA;
performing end repair and adding a connecting joint to the synthesized double-stranded cDNA;
digesting dUTP in the double-stranded DNA after the end repair and the addition of the connecting joint by enzyme digestion to generate a gap in the double-stranded cDNA;
amplifying the double-stranded DNA after enzyme digestion to construct a cDNA pre-library;
hybridizing and capturing a target fusion cDNA in the cDNA pre-library using a fusion gene capture probe, the target fusion cDNA being formed by fusing at least two different genes, the fusion gene capture probe comprising a sequence capable of complementary pairing with a sequence of one of the genes of the target fusion cDNA;
and amplifying the captured target fusion cDNA to obtain the gene fusion variation library.
In one embodiment, the fusion gene capture probe is designed as follows:
(1) the fusion gene capture probe is designed aiming at a core gene in target fusion cDNA, wherein the core gene refers to a gene which has a plurality of gene partners and is easy to generate fusion variation, or a key gene in a cell growth or proliferation signal pathway, or a driving gene;
(2) the fusion gene capture probe is designed aiming at the transcript sequence of the core gene;
(3) the fusion gene capture probe is designed aiming at a core gene in the hg19 reference genome, and the coverage density is a2 × double-tile sequence;
(4) the length of the fusion gene capture probe is 120 bp;
(5) the fusion gene capture probe needs to be compared to a human transcriptome sequence during design, the number of all Blast matches is counted, if the number of the Blast matches is not more than 50, the fusion gene capture probe is qualified, and if the number of the Blast matches is more than 50, the fusion gene capture probe is redesigned in a mode of replacing mismatched bases until the highest matching performance of the target gene sequence is obtained and the number of the Blast matches is not more than 50.
In one embodiment, the 5' end of the fusion gene capture probe is labeled with a linker for capture;
optionally, the linker is biotin or streptavidin.
In one embodiment, the sample total RNA is total RNA of a peripheral blood or bone marrow sample.
In one embodiment, the end repair is the addition of one dATP at the 3' end of the synthesized double-stranded cDNA;
the joint format introduced by adding the connecting joint is P5-Real1primer-DNAINSERT-IndexReadprimer-index-P7, and specifically comprises the following steps: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCT TCCGATC T-DNA fragment sequence to be detected-GTTCGTCTTCTGCCGTATGCTCTA-index-C ACTGACCTCAAGTCTGCACACGAGAAGGCTAG-P, wherein P5 and P7 are joints, Real1primer and IndexReadprimer are primer sequences, DNAINSERT is the DNA fragment sequence to be detected, index is a unique sample label of 12nt, and P is a phosphate group.
In one embodiment, the amplification of the digested double-stranded DNA and the captured target fusion cDNA is performed using primers paired with the adaptor P5 and P7 sequences.
A gene fusion variation detection method comprises the following steps:
obtaining sequencing data of a gene fusion variation library, wherein the gene fusion variation library is an amplification library of a target fusion gene obtained by hybridizing and capturing a transcription sequence of a sample to be detected through a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence which can be complementarily paired with a sequence of one gene of the target fusion gene;
comparing the sequencing data with human transcriptome and genome data, and screening reads capable of being matched with at least two genes simultaneously;
and analyzing whether the reads which can be matched with at least two genes simultaneously meet the preset threshold requirement, and if so, indicating that a plurality of genes contained in the reads are subjected to gene fusion.
In one embodiment, the step of aligning the sequencing data to human transcriptome and genome data and screening for reads that can simultaneously match to at least two genes further comprises:
and performing quality evaluation on the sequencing data, and removing low-quality reads to obtain clean sequencing data.
In one embodiment, the culling low quality reads comprises:
removing reads containing the linker sequence;
removing reads with mass value lower than 15 and low mass base ratio ≧ 50%;
the reads with N content larger than 1% are removed.
In one embodiment, the method further comprises the step of rejecting false positive events in the clean sequencing data according to a preset control standard after comparing the sequencing data with human transcriptome and genome data;
specifically, the screened gene fusion variant events are annotated, false and true are removed, and the gene fusion variant events meeting the following standards are removed:
different genes of the fusion gene are paralogous with each other;
different genes of the fusion gene are pseudogenes;
this gene fusion variation has been detected in normal healthy persons.
In one embodiment, the preset threshold requirement refers to: if the fusion gene variation has clinical significance, the number of unique spanning reads matched with the two genes is more than 3; if the fusion gene variation is of unknown clinical significance, more than 10 unique spanning reads can be matched to the two genes simultaneously.
In one embodiment, the method further comprises the following steps:
calculating the variation ratio of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the refgeneFPKM is the normalized expression value of the reference gene;
the FPKM is defined as the Reads Per Kibase of exon model Per Million mapped Reads, i.e., the number of Reads aligned to every 1K bases of an exon in every 1 Million aligned Reads.
A gene fusion mutation detection device comprising:
the system comprises a sequencing data acquisition module, a gene fusion variation library analysis module and a data analysis module, wherein the sequencing data acquisition module is used for acquiring sequencing data of a gene fusion variation library, the gene fusion variation library is an amplification library of a target fusion gene obtained by hybridizing and capturing a transcription sequence of a sample to be detected through a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence which can be complementarily paired with a sequence of one gene of the target fusion gene;
the comparison screening module is used for comparing the sequencing data with human transcriptome and genome data and screening reads which can be matched with at least two genes simultaneously; and
and the fusion analysis module is used for analyzing whether the reads which can be matched with at least two genes simultaneously meet the preset threshold requirement, and if so, the fact that the genes contained in the reads are subjected to gene fusion is indicated.
In one embodiment, the method further comprises the following steps:
and the variation ratio calculation module is used for calculating the variation ratio of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the refgeneFPKM is the normalized expression value of the reference gene;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the FPKM is defined as the Reads Per Kibase of exon model Per Million mapped Reads, i.e., the number of Reads aligned to every 1K bases of an exon in every 1 Million aligned Reads.
A computer device having a processor and a memory, the memory storing a computer program, the processor implementing the steps of the method for detecting a genetic fusion mutation as described in any of the above embodiments when executing the computer program.
A computer storage medium having stored thereon a computer program that, when executed, performs the steps of the method for detecting a genetic fusion mutation as described in any of the above embodiments.
A single driver gene may be genetically fused to multiple other genes (chaperone genes) to form, after transcription of the fused gene, the junction of the core gene exon and the chaperone gene exon (i.e., breakpoint). The gene fusion mutation library construction method, the gene fusion mutation detection method and the gene fusion mutation detection device are based on a DNA probe hybridization capture multigene RNA targeted sequencing technology, target fusion genes are captured through hybridization of the fusion gene capture probes, and a gene fusion mutation library is constructed.
The gene fusion mutation library construction method, the gene fusion mutation detection method and the device can be used for detecting known or newly-discovered gene rearrangement, gene deletion, gene duplication and other gene mutation information related to various blood tumor hot spot fusion genes. Compared with the traditional fluorescence quantitative method, the technical concept of the invention is more comprehensive and efficient, and has efficiency and economy.
Furthermore, the invention also designs a fusion gene quantitative analysis method, which can obtain the variation proportion of the fusion gene through calculation so as to obtain the accurate expression quantity value of the fusion gene.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting a mutation in a fusion gene according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a fusion genetic variation detection apparatus according to an embodiment of the present invention.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The fusion gene refers to a gene which is spliced together by a chromosome rearrangement mechanism and the like on different gene coordinates and is transcribed to form a new fusion protein, and the expression form of the gene A/gene B or a gene A-gene B, such as BCR-ABL1, wherein the gene A and the gene B are fusion gene partners.
The selected gene is a key core fusion gene, the core gene refers to the gene which has high fusion variation frequency, and the core gene is found to have a plurality of fusion gene partners, or refers to a key gene in a cell growth or proliferation signal pathway, or a driver gene (driver gene).
The "reads" refers to a sequence fragment obtained by high-throughput sequencing.
The sequencing quality refers to the accuracy of the base in the read sequence.
The "human transcriptome" is the combination of the products of all gene expressions in human cells.
The human genome is hg 19.
The Paralogs (Paralogs) are those proteins in a species that are derived from gene replication and may evolve new functions related to the original. To describe homologous genes that have been isolated within the same species as a result of gene replication.
The pseudogene can be considered as a non-functional copy of genomic DNA in the genome that closely resembles the sequence of the encoding gene.
The Body Map 2.0 is transcriptome sequencing data for a panel of human normal tissues.
The "gene distance" refers to the distance between the gene coordinates of two genes.
The invention provides a method for constructing a gene fusion variation library, which comprises the following steps:
extracting total RNA of the sample, and removing rRNA in the total RNA;
performing reverse transcription on the total RNA from which the rRNA is removed to synthesize double-stranded cDNA, and synthesizing by using dUTP instead of dTTP when synthesizing a second strand of the double-stranded cDNA;
performing end repair and adding a connecting joint on the synthesized double-stranded cDNA;
digesting dUTP in the double-stranded DNA after the end repair and the addition of the connecting joint by enzyme digestion to generate a gap in the double-stranded cDNA;
amplifying double-stranded DNA after enzyme digestion to construct a cDNA pre-library;
hybridizing a target fusion cDNA in a captured cDNA pre-library using a fusion gene capture probe, the target fusion cDNA being formed by the fusion of at least two different genes, the fusion gene capture probe comprising a sequence capable of complementary pairing with the sequence of one of the genes of the target fusion cDNA;
amplifying the captured target fusion cDNA to obtain a gene fusion variation library.
In a specific example, the sample total RNA is total RNA of a peripheral blood or bone marrow sample. After the total RNA of the sample is extracted, the method preferably further comprises the step of determining the concentration of the nucleic acid and the A260/A280 value.
In a specific example, the removing of the rRNA is performed by hybridizing the total RNA with an rRNA synthetic single-stranded DNA probe, and removing the rRNA by hybridizing the rRNA synthetic single-stranded DNA probe with the rRNA in the total RNA.
In one specific example, end repair is the addition of one dATP at the 3' end of the synthesized double-stranded cDNA; the format of the joint introduced by adding the connecting joint is as follows: P5-Real1 primer-DNAINSERT-IndexReadprimer-index-P7. Specifically, the linker sequence is: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC T-DNA fragment sequence to be detected-GTTCGTCTTCTGCCGTATGCTCTA-index-C ACTGACCTCAAGTCTGCACACGAGAAGGCTAG-P, wherein P5 (5'-AATGATACGGCGACCACCGA-3', SEQ ID NO:1) and P7 (5'-CAAGCAGAAGACGGCATACGAGAT-3', SEQ ID NO:2) are joints, Real1primer (GATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, SEQ ID NO:3) and IndexReadprimer (GTTCGTCTTCTGCCGTATGCTCTA, SEQ ID NO:4) are primer sequences, DNAINSERT is the DNA fragment sequence to be detected, index is a unique sample label of 12nt, and P is a phosphate group.
In one specific example, the fusion gene capture probe is designed as follows: (1) the fusion gene capture probe is designed aiming at a core gene in target fusion cDNA, wherein the core gene refers to a gene which has a plurality of gene partners and is easy to generate fusion variation, or a key gene in a cell growth or proliferation signal pathway, or a driving gene;
(2) the fusion gene capture probe is designed aiming at the transcript sequence of the core gene;
(3) the fusion gene capture probe is designed aiming at a core gene in the hg19 reference genome, and the coverage density is2 × imbricated sequence (2 × tiling);
(4) the length of the fusion gene capture probe is 120 bp;
(5) the fusion gene capture probe needs to be compared to a human transcriptome sequence during design, the number of all Blast matches (BLAST hits) is counted, if the number of the Blast matches is not more than 50, the fusion gene capture probe is qualified, and if the number of the Blast matches is more than 50, the fusion gene capture probe is redesigned in a mode of replacing mismatched bases until the highest matching performance on a target gene sequence is obtained and the number of the Blast matches is not more than 50.
For example, in some specific examples, 54 core genes, ABL, CREBBP, CRLF, MECOM, TP, TSLP, LMO, PRDM, MYC, ETV, RARA, NUP214, BCL, MYB, IRF, CBFB, CEBPB, ZNF384, RUNX, FGFR, MALT, ERG, NPM, PAX, JAK, PICALM, FLT, GLIS, PDGFRB, PML, TLX, ITK, FGFR, IL2, TAL, NTRK, NUP, EPOR, RBM, CSF1, KMT2, BCL, BCR, LYN, TLX, ccbpa, TCF, cend, ABL, ALK, pdgfr, IGLL, IGHA, transcript serial numbers from the ensel database, and the overlapping 2-mbwa sequence (tilingl) designed based on their sequences (5' tag probes) can be selected for hematological tumors (leukemias and lymphomas) to obtain a biomarker probe.
Further, the 5' -end of the fusion gene capture probe is labeled with a linker for capture, for example, a linker for immobilization on a substrate, such as biotin or streptavidin.
In one specific example, amplification of digested double-stranded DNA and amplification of captured target fusion cDNA is performed using primers paired with adaptor P5 and P7 sequences.
As shown in FIG. 1, the present invention also provides a method for detecting gene fusion variation, which comprises the following steps:
step S110: obtaining sequencing data of a gene fusion variation library, wherein the gene fusion variation library is an amplification library of a target fusion gene obtained by hybridizing and capturing a transcription sequence of a sample to be detected through a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence which can be complementarily paired with a sequence of one gene of the target fusion gene;
step S120: comparing the sequencing data with human transcriptome and genome data, and screening reads capable of being matched with at least two genes simultaneously;
step S130: analyzing whether reads which can be matched with at least two genes simultaneously meet the preset threshold requirement, and if so, indicating that a plurality of genes contained in the reads are subjected to gene fusion.
In one specific example, the gene fusion variant library can be subjected to high-throughput sequencing using, but not limited to, a Novaseq 6000 high-throughput sequencer, and the sequencing depth can be, but is not limited to, 5000X.
In one specific example, the step of aligning the sequencing data with the human transcriptome and genomic data and screening for reads that can simultaneously match to at least two genes further comprises:
and performing quality evaluation on the sequencing data, and removing low-quality reads to obtain clean sequencing data.
Specifically, raw data can be converted by using but not limited to bcl2fastq software to obtain a raw fastq file, quality evaluation can be performed on the raw fastq data by using the fastQC software, and low-quality reads can be removed by using but not limited to trimmatic software to obtain the clean sequencing data.
Further, in one particular example, the culling low quality reads includes:
removing reads containing the linker sequence;
removing reads with mass value lower than 15 and low mass base ratio ≧ 50%;
the reads with N content larger than 1% are removed.
In a specific example, the gene fusion mutation detection method further comprises the step of removing false positive events in clean sequencing data according to a preset control standard after comparing the sequencing data with the human transcriptome and genome data;
specifically, the screened gene fusion variant events are annotated, false and true are removed, and the gene fusion variant events meeting the following standards are removed:
different genes of the fusion gene are paralogous with each other;
different genes of the fusion gene are pseudogenes;
the gene fusion variation has been detected in normal healthy persons (e.g., Body Map 2.0 is a transcriptome of normal human tissue, and the gene fusion variation detected by analyzing the data is judged to be false positive).
Specifically, all reads can be aligned to the human transcriptome and genome using, but not limited to, BOWTIE, STAR, SPOTLIGHT, etc. software, and reads that match to transcripts of both genes simultaneously can be screened. False positive events are then eliminated by a series of criteria such as paralogs (paralogs), pseudogenes, Body Map 2.0, gene distance, etc. If the reads matched to two genes at the same time exceed the preset threshold requirement, the two genes are determined to be subjected to gene fusion.
More specifically, the preset threshold requirement refers to: if the fusion gene variation has clinical significance, the number of unique spanning reads matched with the two genes is more than 3 (the spanning reads are reads matched with a junction (junction) of gene fusion); if the fusion gene variation is of unknown clinical significance, more than 10 unique spanning reads can be matched to the two genes simultaneously.
Further, the gene fusion mutation detection method provided by the invention also comprises the step of calculating the mutation proportion of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the refgeneFPKM is the normalized expression value of the reference gene;
the FPKM is defined as Reads Per Kibase of exon model Per Million mappedreads, i.e., every 1 Million (10)9) The number of reads aligned to every 1K bases of an exon in reads aligned to each alignment.
The gene transcript quantitative model is obtained by calculation according to stringtie software, and mainly aims at the Pair-end sequencing expression quantity. The difference between FPKM and RPKM is that one is fragment and one is read. For single-ended sequencing data, FPKM is equivalent to RPKM (RPKM total extensions/(Millions) extension (KB)) since Cufflinks calculates a read as a fragment. For double-ended sequencing, if a pair of paired-reads are aligned, then the paired-reads are referred to as a fragment, and if only one of the paired-reads is aligned and the other is not aligned, then the aligned Read is referred to as a fragment.
Based on the same idea as the above detection method, as shown in fig. 2, the present invention also provides a gene fusion mutation detection apparatus 200, comprising:
a sequencing data acquisition module 210, configured to acquire sequencing data of a gene fusion variant library, where the gene fusion variant library is an amplification library of a target fusion gene obtained by capturing a transcription sequence of a sample to be detected through hybridization with a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence that can be complementarily paired with a sequence of one of the genes of the target fusion gene;
a comparison screening module 220 for comparing the sequencing data with the human transcriptome and genome data, screening reads that can be matched to at least two genes simultaneously; and
and a fusion analysis module 230, configured to analyze whether reads that can be simultaneously matched to at least two genes meet a preset threshold requirement, and if so, indicate that gene fusion occurs in multiple genes included in the reads.
Optionally, the gene fusion mutation detection apparatus 200 further includes:
a variation ratio calculating module 240, configured to calculate a variation ratio of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the refgeneFPKM is the normalized expression value of the reference gene;
the FPKM is defined as Reads Per Kibase of exon model Per Million mappedreads, i.e., every 1 Million (10)9) The number of reads aligned to every 1K bases of an exon in reads aligned to each alignment.
Based on the above embodiments, the present invention further provides a computer device for genetic fusion mutation detection, which has a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the genetic fusion mutation detection method according to any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the above methods may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and in the embodiments of the present invention, the program may be stored in the storage medium of a computer system and executed by at least one processor in the computer system to implement the processes of the embodiments including the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Accordingly, the present invention also provides a computer storage medium for genetic fusion mutation detection, wherein a computer program is stored thereon, and when being executed, the computer program implements the steps of the genetic fusion mutation detection method according to any of the above embodiments.
A single driver gene may be genetically fused to multiple other genes (chaperone genes) to form, after transcription of the fused gene, the junction of the core gene exon and the chaperone gene exon (i.e., breakpoint). The gene fusion mutation library construction method, the gene fusion mutation detection method and the gene fusion mutation detection device are based on a DNA probe hybridization capture multigene RNA targeted sequencing technology, target fusion genes are captured through hybridization of the fusion gene capture probes, and a gene fusion mutation library is constructed.
The gene fusion mutation library construction method, the gene fusion mutation detection method and the device can be used for detecting known or newly-discovered gene rearrangement, gene deletion, gene duplication and other gene mutation information related to various blood tumor hot spot fusion genes. Compared with the traditional fluorescence quantitative method, the technical concept of the invention is more comprehensive and efficient, and has efficiency and economy.
Furthermore, the invention also designs a fusion gene quantitative analysis method, which can obtain the variation proportion of the fusion gene through calculation so as to obtain the accurate expression quantity value of the fusion gene.
The construction method and detection method of the gene fusion variant library of the present invention will be described in further detail below with reference to the specific case of library construction and detection methods.
1) DNA probe design based on mRNA sequence
The transcription sequence of the fusion gene is captured by DNA probe hybridization, high-throughput sequencing is carried out, and the hot spot or new fusion form in which the fusion gene participates can be obtained by biological information analysis.
For hematological tumors (leukemias and lymphomas), 54 core genes, i.e., ABL1, CREBBP, CRLF 1, MECOM, TP 1, TSLP, LMO 1, PRDM1, MYC, ETV 1, RARA, NUP214, BCL 1, MYB, IRF 1, CBFB, CEBPB, ZNF384, RUNX1, FGFR1, MALT1, ERG, NPM1, PAX 1, JAK 1, PICALM, FLT 1, GLIS 1, frpdgb, PML, TLX1, itbpa, FGFR1, IL 21, TAL1, NTRK 1, NUP 1, EPOR, RBM1, CSF 11, KMT2 72, BCL 1, BCR, bnn, TLX1, ccmbl 1, tcmbl 1, pdslnd 1, seq id 1, mrna 1, pdgf 1, mrna 1.
2) Total RNA extraction from samples
Total RNA from peripheral blood or bone marrow samples of leukemia lymphoma patients was extracted using QIAGEN QIAsymphony RNA Kit (Cat # 931636). The specific operation steps are detailed in the specification of the manufacturer.
Measuring the concentration of nucleic acid and A260/A280 value (expected value is between 1.9 and 2.1) by using (1) a NanoDrop spectrophotometer; (2) using a QubitTMRNA HS Assay Kit (Cat. # Q32855) was used to determine the nucleic acid concentration.
3) Elimination of ribosomal rRNA
Hybridizing 500ng of the total RNA extracted in the step 2) with a single-stranded DNA probe synthesized by rRNA, and carrying out enzyme digestion on the rRNA by RNaseH, wherein the specific operation steps are detailed in the specification of a NEBNext rRNA deletion Kit. By usingXP beads purify RNA samples after elimination of rRNA.
4) Reverse transcription to synthesize cDNA
The RNA was fragmented by incubation at 94 ℃ for 6 minutes in a PCR instrument. The fragmented RNA was reverse transcribed into single-stranded c' DNA using reverse transcriptase (retrotransposase).
5) Synthesis of second Strand of cDNA
The single-stranded c' DNA was synthesized into double-stranded cDNA using DNA Polymerase I, Large (Klenow) Fragment. dUTP is used here instead of dTTP. Thus, the second strand cDNA is embedded in dUTP. Double-stranded cDNA was purified using AMPure XP Beads.
6) Tip repair
Double stranded cDNA was treated with NEBNext Ultra II End Prep Enzyme Mix and one dATP was added at the 3' End.
7) Connecting joint
The Ligase, Index linker containing 12nt unique sequence and the end-repair cDNA were mixed and incubated for 60 minutes at 16 ℃ in a PCR instrument to obtain a linker-ligated cDNA library.
Joint format: P5-Read1primer-DNA INSERT-IndexReadprimer-index-P7.
8) Preparation of second Strand nicks of cDNA by enzymatic cleavage
Uracil DNA Glycosylase (UDG) and Endonuclease VIII mix were added to the above system, which synergistically digested dUTP in the cDNA library fragments to produce gaps.
9) Library amplification
The above cDNA library was amplified in a PCR instrument using KAPA HiFi HotStart ReadyMix, primers (P5: 5'-AATGATACGGCGACCACCGA-3', SEQ ID NO: 1; P7: 5'-CAAGCAGAAGACGGCATACGAGAT-3', SEQ ID NO:2) paired with linker P5, P7 sequences. The cDNA pre-library was purified using AMPure XP Beads.
10) Probe capture hybridization
100ng of the prepared cDNA library was combined withUniversal bottles-TS Mix and Human Cot-1DNA were mixed and dried to a dry powder using a vacuum filtration system (60 ℃). Then adding hybridization buffer solution and fusion gene probe library, mixing, incubating for 30 seconds at 95 ℃ in a PCR instrument, and hybridizing for 16-18 hours at 65 ℃.
The system and streptomycin avidin magnetic beads are combinedM-270Streptavidin beads were mixed and incubated on a PCR instrument at 65 ℃ for 45min, during which time remixing was performed at 15min intervals. All transcribed sequence fragments containing the fusion gene were screened.
The cDNA library captured by the above hybridization was amplified in a PCR instrument using KAPA HiFi HotStart ReadyMix, primers (P5: 5'-AATGATACGGCGACCACCGA-3', SEQ ID NO: 1; P7: 5'-CAAGCAGAAGACGGCATACGAGAT-3', SEQ ID NO:2) paired with linker P5, P7 sequences. And (5) purifying the target cDNA library by adopting AMPure XP Beads to obtain a library to be sequenced.
11) Illumina platform sequencing
The library to be sequenced was sequenced using a Novaseq 6000 high throughput sequencer to an average sequencing depth of 5000 x. The sequencing procedure is detailed in the manufacturer's instructions.
12) Sequencing data analysis
A. Sequencing data preprocessing
And converting the original data by using bcl2fastq software to obtain a raw fastq file, performing quality evaluation on the raw fastq data by using fastqc software, and removing low-quality reads by using Trimmomatic software to obtain the clean fastq file.
B. Identification of fusion genes
And (3) comparing all reads with a human transcriptome and genome by using BOWTIE, STAR and SPOTLIGHT software, and screening the reads matched with the transcripts of the two genes simultaneously. False positive events are then eliminated by a series of criteria such as paralogs (paralogs), pseudogenes, Body Map 2.0, gene distance, etc. If the reads matched to two genes at the same time exceed the set threshold value, the two genes are determined to be subjected to gene fusion.
C. Example of results of analysis of fused Gene assay data
With the present invention we tested 3 cases of leukemia samples and obtained the following results:
the 3 samples all have fusion genes in which MLL (KMT2A) participates, and the break point sequences of the MLL gene and the partner gene thereof can be simultaneously captured only by a probe targeting the MLL gene transcript sequence, so that the specific fusion form thereof is identified by alignment analysis, and fusionFPKM is calculated as an index of the expression amount thereof.
The results are shown in Table 1 below.
TABLE 1
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Sequence listing
<110> Guangzhou gold area medical inspection group GmbH
<120> gene fusion variant library construction method, detection method, apparatus, device and storage medium
<140>2019114192739
<141>2019-12-31
<160>7
<170>SIPOSequenceListing 1.0
<210>1
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
aatgatacgg cgaccaccga 20
<210>2
<211>24
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
caagcagaag acggcatacg agat 24
<210>3
<211>38
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>3
gatctacact ctttccctac acgacgctct tccgatct 38
<210>4
<211>24
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gttcgtcttc tgccgtatgc tcta 24
<210>5
<211>86
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
tccccgccca agtatccctg taaaacaaaa accaaaagaa aagtctgaac aacccagtcc 60
tgccagctcc agctccagct ccagct 86
<210>6
<211>86
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
tccccgccca agtatccctg taaaacaaaa accaaaagaa aaggaaatga cccattcatg 60
gccgcctcct ttgacagcaa tacata 86
<210>7
<211>86
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
aattccagca gatggagtcc acaggatcag agtggacttt aaggattctg tttcactgag 60
gccatctatc cgatttcaag gaagcc 86
Claims (16)
1. A method for constructing a gene fusion variation library is characterized by comprising the following steps:
extracting total RNA of the sample, and removing rRNA in the total RNA;
reverse transcribing the total RNA after rRNA is removed, synthesizing double-stranded cDNA, and synthesizing by using dUTP instead of dTTP when synthesizing a second strand of the double-stranded cDNA;
performing end repair and adding a connecting joint to the synthesized double-stranded cDNA;
digesting dUTP in the double-stranded DNA after the end repair and the addition of the connecting joint by enzyme digestion to generate a gap in the double-stranded cDNA;
amplifying the double-stranded DNA after enzyme digestion to construct a cDNA pre-library;
hybridizing and capturing a target fusion cDNA in the cDNA pre-library using a fusion gene capture probe, the target fusion cDNA being formed by fusing at least two different genes, the fusion gene capture probe comprising a sequence capable of complementary pairing with a sequence of one of the genes of the target fusion cDNA;
and amplifying the captured target fusion cDNA to obtain the gene fusion variation library.
2. The method of claim 1, wherein the fusion gene capture probe is designed according to the following rules:
(1) the fusion gene capture probe is designed aiming at a core gene in target fusion cDNA, wherein the core gene refers to a gene which has a plurality of gene partners and is easy to generate fusion variation, or a key gene in a cell growth or proliferation signal pathway, or a driving gene;
(2) the fusion gene capture probe is designed aiming at the transcript sequence of the core gene;
(3) the fusion gene capture probe is designed aiming at a core gene in the hg19 reference genome, and the coverage density is a2 × double-tile sequence;
(4) the length of the fusion gene capture probe is 120 bp;
(5) the fusion gene capture probe needs to be compared to a human transcriptome sequence during design, the number of all Blast matches is counted, if the number of the Blast matches is not more than 50, the fusion gene capture probe is qualified, and if the number of the Blast matches is more than 50, the fusion gene capture probe is redesigned in a mode of replacing mismatched bases until the highest matching performance of the target gene sequence is obtained and the number of the Blast matches is not more than 50.
3. The method of constructing a library of fusion variants according to claim 1 or 2 wherein the 5' end of the fusion gene capture probe is labeled with a linker for capture;
optionally, the linker is biotin or streptavidin.
4. The method of constructing a library of fused variants according to claim 1 or 2, wherein the total RNA in the sample is total RNA in a peripheral blood or bone marrow sample.
5. The method of constructing a library of fusion variants according to claim 1 or 2 wherein the end-repair is the addition of a dATP to the 3' end of the double-stranded cDNA synthesized;
the joint format introduced by adding the connecting joint is P5-Real1primer-DNAINSERT-IndexReadprimer-index-P7, and specifically comprises the following steps: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC T-DNA fragment sequence to be detected-GTTCGTCTTCTGCCGTATGCTCTA-index-CACTGACCTCAAGTCTGCACACGAGAAGGCTAG-P, wherein P5 and P7 are joints, Real1primer and IndexReadprimer are primer sequences, DNAINSERT is the DNA fragment sequence to be detected, index is a unique sample label of 12nt, and P is a phosphate group.
6. The method of constructing a library of fusion variants according to claim 5 wherein the amplification of the digested double-stranded DNA and the captured target fusion cDNA is performed using primers that pair to the sequences of adaptors P5 and P7.
7. A gene fusion mutation detection method is characterized by comprising the following steps:
obtaining sequencing data of a gene fusion variation library, wherein the gene fusion variation library is an amplification library of a target fusion gene obtained by hybridizing and capturing a transcription sequence of a sample to be detected through a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence which can be complementarily paired with a sequence of one gene of the target fusion gene;
comparing the sequencing data with human transcriptome and genome data, and screening reads capable of being matched with at least two genes simultaneously;
and analyzing whether the reads which can be matched with at least two genes simultaneously meet the preset threshold requirement, and if so, indicating that a plurality of genes contained in the reads are subjected to gene fusion.
8. The method of detecting genetic fusion mutations according to claim 7, wherein the step of comparing the sequencing data to human transcriptome and genome data and screening for reads that match to at least two genes simultaneously further comprises:
and performing quality evaluation on the sequencing data, and removing low-quality reads to obtain clean sequencing data.
9. The method of detecting genetic fusion mutations according to claim 8, wherein said knocking out low quality reads comprises:
removing reads containing the linker sequence;
removing reads with mass value lower than 15 and low mass base ratio ≧ 50%;
the reads with N content larger than 1% are removed.
10. The method of detecting genetic fusion mutations according to claim 9 further comprising the step of rejecting false positive events in the clean sequencing data according to a predetermined control criterion after comparing the sequencing data to human transcriptome and genomic data;
specifically, the screened gene fusion variant events are annotated, false and true are removed, and the gene fusion variant events meeting the following standards are removed:
different genes of the fusion gene are paralogous with each other;
different genes of the fusion gene are pseudogenes;
this gene fusion variation has been detected in normal healthy persons.
11. The method of detecting genetic fusion mutation according to claim 10 wherein the predetermined threshold requirement is: if the fusion gene variation has clinical significance, the number of unique spanning reads matched with the two genes is more than 3; if the fusion gene variation is of unknown clinical significance, more than 10 unique spanning reads can be matched to the two genes simultaneously.
12. The method of detecting a genetic fusion mutation according to any one of claims 7 to 11, further comprising:
calculating the variation ratio of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the refgeneFPKM is the normalized expression value of the reference gene;
the FPKM is defined as the Reads Per Kibase of exon model Per Million mapped Reads, i.e., the number of Reads aligned to every 1K bases of an exon in every 1 Million aligned Reads.
13. A gene fusion mutation detection device, comprising:
the system comprises a sequencing data acquisition module, a gene fusion variation library analysis module and a data analysis module, wherein the sequencing data acquisition module is used for acquiring sequencing data of a gene fusion variation library, the gene fusion variation library is an amplification library of a target fusion gene obtained by hybridizing and capturing a transcription sequence of a sample to be detected through a fusion gene capture probe, the target fusion gene is formed by fusing at least two different genes, and the fusion gene capture probe contains a sequence which can be complementarily paired with a sequence of one gene of the target fusion gene;
the comparison screening module is used for comparing the sequencing data with human transcriptome and genome data and screening reads which can be matched with at least two genes simultaneously; and
and the fusion analysis module is used for analyzing whether the reads which can be matched with at least two genes simultaneously meet the preset threshold requirement, and if so, the fact that the genes contained in the reads are subjected to gene fusion is indicated.
14. The apparatus for detecting gene fusion mutation according to claim 13, further comprising:
and the variation ratio calculation module is used for calculating the variation ratio of the fusion gene according to the following formula:
the fusion supporting read pages refer to the pairs of reads supporting the gene fusion;
the # mappable reads refers to the number of reads of the genome on the alignment;
the weighted-average of insert-size-read length refers to the weighted average length of cDNA fragments inserted into the library;
the refgeneFPKM is the normalized expression value of the reference gene;
the FPKM is defined as the Reads Per Kibase of exon model Per Million mapped Reads, i.e., the number of Reads aligned to every 1K bases of an exon in every 1 Million aligned Reads.
15. A computer device having a processor and a memory, the memory storing a computer program, the processor implementing the steps of the gene fusion mutation detection method according to any one of claims 7 to 12 when executing the computer program.
16. A computer storage medium having a computer program stored thereon, wherein the computer program when executed implements the steps of the method of detecting a genetic fusion mutation according to any one of claims 7 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419273.9A CN111321202A (en) | 2019-12-31 | 2019-12-31 | Gene fusion variation library construction method, detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419273.9A CN111321202A (en) | 2019-12-31 | 2019-12-31 | Gene fusion variation library construction method, detection method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111321202A true CN111321202A (en) | 2020-06-23 |
Family
ID=71165123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911419273.9A Pending CN111321202A (en) | 2019-12-31 | 2019-12-31 | Gene fusion variation library construction method, detection method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111321202A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111979307A (en) * | 2020-08-31 | 2020-11-24 | 伯科生物科技有限公司 | Targeted sequencing method for detecting gene fusion |
CN112397144A (en) * | 2020-10-29 | 2021-02-23 | 无锡臻和生物科技有限公司 | Method and device for detecting gene mutation and expression quantity |
CN112662771A (en) * | 2020-12-30 | 2021-04-16 | 苏州大学附属第一医院 | Targeting capture probe of tumor fusion gene and application thereof |
CN114300051A (en) * | 2021-12-22 | 2022-04-08 | 北京吉因加医学检验实验室有限公司 | Method and device for calculating fusion gene frequency |
CN114395619A (en) * | 2021-12-29 | 2022-04-26 | 福建和瑞基因科技有限公司 | High-throughput sequencing method and internal reference quality control product |
CN115083516A (en) * | 2022-07-13 | 2022-09-20 | 北京先声医学检验实验室有限公司 | Panel design and evaluation method for detecting gene fusion based on targeted RNA sequencing technology |
CN115662520A (en) * | 2022-10-27 | 2023-01-31 | 黑龙江金域医学检验实验室有限公司 | Detection method of BCR/ABL1 fusion gene and related equipment |
EP4400599A3 (en) * | 2021-05-20 | 2024-08-28 | Sophia Genetics S.A. | Capture probes and uses thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104894111A (en) * | 2015-05-18 | 2015-09-09 | 李卫东 | DNA targeted capture array for leukemia chromosome aberration high-throughput sequencing |
CN106929504A (en) * | 2015-12-30 | 2017-07-07 | 安诺优达基因科技(北京)有限公司 | Detect the kit of acute promyelocytic leukemia correlation fusion gene |
CN108486235A (en) * | 2018-03-07 | 2018-09-04 | 北京圣谷智汇医学检验所有限公司 | A kind of method and system of high-efficiency and economic detection fusion gene |
WO2019144582A1 (en) * | 2018-01-26 | 2019-08-01 | 厦门艾德生物医药科技股份有限公司 | Probe and method for high-throughput sequencing targeted capture target region used for detecting gene mutations as well as known and unknown gene fusion types |
-
2019
- 2019-12-31 CN CN201911419273.9A patent/CN111321202A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104894111A (en) * | 2015-05-18 | 2015-09-09 | 李卫东 | DNA targeted capture array for leukemia chromosome aberration high-throughput sequencing |
CN106929504A (en) * | 2015-12-30 | 2017-07-07 | 安诺优达基因科技(北京)有限公司 | Detect the kit of acute promyelocytic leukemia correlation fusion gene |
WO2019144582A1 (en) * | 2018-01-26 | 2019-08-01 | 厦门艾德生物医药科技股份有限公司 | Probe and method for high-throughput sequencing targeted capture target region used for detecting gene mutations as well as known and unknown gene fusion types |
CN108486235A (en) * | 2018-03-07 | 2018-09-04 | 北京圣谷智汇医学检验所有限公司 | A kind of method and system of high-efficiency and economic detection fusion gene |
Non-Patent Citations (1)
Title |
---|
方向东等编著, 天津:天津科技翻译出版有限公司 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111979307A (en) * | 2020-08-31 | 2020-11-24 | 伯科生物科技有限公司 | Targeted sequencing method for detecting gene fusion |
CN111979307B (en) * | 2020-08-31 | 2022-07-08 | 伯科生物科技有限公司 | Targeted sequencing method for detecting gene fusion |
WO2022089033A1 (en) * | 2020-10-29 | 2022-05-05 | 无锡臻和生物科技有限公司 | Method and device for detecting genetic mutation and expression |
CN112397144B (en) * | 2020-10-29 | 2021-06-15 | 无锡臻和生物科技股份有限公司 | Method and device for detecting gene mutation and expression quantity |
CN112397144A (en) * | 2020-10-29 | 2021-02-23 | 无锡臻和生物科技有限公司 | Method and device for detecting gene mutation and expression quantity |
CN112662771A (en) * | 2020-12-30 | 2021-04-16 | 苏州大学附属第一医院 | Targeting capture probe of tumor fusion gene and application thereof |
CN112662771B (en) * | 2020-12-30 | 2024-04-02 | 苏州大学附属第一医院 | Targeting capture probe of tumor fusion gene and application thereof |
EP4400599A3 (en) * | 2021-05-20 | 2024-08-28 | Sophia Genetics S.A. | Capture probes and uses thereof |
CN114300051A (en) * | 2021-12-22 | 2022-04-08 | 北京吉因加医学检验实验室有限公司 | Method and device for calculating fusion gene frequency |
CN114395619A (en) * | 2021-12-29 | 2022-04-26 | 福建和瑞基因科技有限公司 | High-throughput sequencing method and internal reference quality control product |
CN114395619B (en) * | 2021-12-29 | 2024-04-30 | 福建和瑞基因科技有限公司 | High-throughput sequencing method and internal reference quality control product |
CN115083516A (en) * | 2022-07-13 | 2022-09-20 | 北京先声医学检验实验室有限公司 | Panel design and evaluation method for detecting gene fusion based on targeted RNA sequencing technology |
CN115662520A (en) * | 2022-10-27 | 2023-01-31 | 黑龙江金域医学检验实验室有限公司 | Detection method of BCR/ABL1 fusion gene and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230193381A1 (en) | Compositions and methods for accurately identifying mutations | |
CN111321202A (en) | Gene fusion variation library construction method, detection method, device, equipment and storage medium | |
US11898198B2 (en) | Universal short adapters with variable length non-random unique molecular identifiers | |
AU2018266377B2 (en) | Universal short adapters for indexing of polynucleotide samples | |
CA3220983A1 (en) | Optimal index sequences for multiplex massively parallel sequencing | |
CN108359723B (en) | Method for reducing deep sequencing errors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |