WO2022089033A1 - 检测基因突变及表达量的方法及装置 - Google Patents
检测基因突变及表达量的方法及装置 Download PDFInfo
- Publication number
- WO2022089033A1 WO2022089033A1 PCT/CN2021/117533 CN2021117533W WO2022089033A1 WO 2022089033 A1 WO2022089033 A1 WO 2022089033A1 CN 2021117533 W CN2021117533 W CN 2021117533W WO 2022089033 A1 WO2022089033 A1 WO 2022089033A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- expression
- analysis
- fusion
- rna
- Prior art date
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 155
- 230000035772 mutation Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 39
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 207
- 230000004927 fusion Effects 0.000 claims abstract description 123
- 238000012163 sequencing technique Methods 0.000 claims abstract description 84
- 238000004458 analytical method Methods 0.000 claims abstract description 69
- 239000000523 sample Substances 0.000 claims abstract description 65
- 239000002773 nucleotide Substances 0.000 claims abstract description 51
- 238000010195 expression analysis Methods 0.000 claims abstract description 37
- 239000002299 complementary DNA Substances 0.000 claims abstract description 17
- 238000003208 gene overexpression Methods 0.000 claims abstract description 8
- 108700039887 Essential Genes Proteins 0.000 claims description 57
- 125000003729 nucleotide group Chemical group 0.000 claims description 48
- 206010064571 Gene mutation Diseases 0.000 claims description 30
- 238000001514 detection method Methods 0.000 claims description 28
- 238000003908 quality control method Methods 0.000 claims description 21
- 238000001914 filtration Methods 0.000 claims description 16
- 238000002864 sequence alignment Methods 0.000 claims description 14
- 238000012937 correction Methods 0.000 claims description 13
- 238000003776 cleavage reaction Methods 0.000 claims description 10
- 230000007017 scission Effects 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 230000008439 repair process Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010839 reverse transcription Methods 0.000 claims description 6
- 238000002123 RNA extraction Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 206010028980 Neoplasm Diseases 0.000 abstract description 21
- 238000009396 hybridization Methods 0.000 abstract description 10
- 230000008859 change Effects 0.000 abstract description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 63
- 108020004414 DNA Proteins 0.000 description 30
- 238000003559 RNA-seq method Methods 0.000 description 14
- 238000012070 whole genome sequencing analysis Methods 0.000 description 7
- 239000003814 drug Substances 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000003364 immunohistochemistry Methods 0.000 description 5
- 206010059866 Drug resistance Diseases 0.000 description 4
- 239000011324 bead Substances 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 4
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 4
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 108700028369 Alleles Proteins 0.000 description 3
- 101150039808 Egfr gene Proteins 0.000 description 3
- 108700024394 Exon Proteins 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 108700021358 erbB-1 Genes Proteins 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 239000002096 quantum dot Substances 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 230000002103 transcriptional effect Effects 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 description 2
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000005251 capillar electrophoresis Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036438 mutation frequency Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 206010069754 Acquired gene mutation Diseases 0.000 description 1
- 102100035080 BDNF/NT-3 growth factors receptor Human genes 0.000 description 1
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 description 1
- 108010005939 Ciliary Neurotrophic Factor Proteins 0.000 description 1
- 102100031614 Ciliary neurotrophic factor Human genes 0.000 description 1
- 102100039563 ETS translocation variant 1 Human genes 0.000 description 1
- 102100039578 ETS translocation variant 4 Human genes 0.000 description 1
- 102100039577 ETS translocation variant 5 Human genes 0.000 description 1
- 102100027100 Echinoderm microtubule-associated protein-like 4 Human genes 0.000 description 1
- 102100038595 Estrogen receptor Human genes 0.000 description 1
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 description 1
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 description 1
- 102100023600 Fibroblast growth factor receptor 2 Human genes 0.000 description 1
- 101710182389 Fibroblast growth factor receptor 2 Proteins 0.000 description 1
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 description 1
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 description 1
- 102100027844 Fibroblast growth factor receptor 4 Human genes 0.000 description 1
- 101001077417 Gallus gallus Potassium voltage-gated channel subfamily H member 6 Proteins 0.000 description 1
- 102100030490 HEAT repeat-containing protein 4 Human genes 0.000 description 1
- 102100030595 HLA class II histocompatibility antigen gamma chain Human genes 0.000 description 1
- 102100035108 High affinity nerve growth factor receptor Human genes 0.000 description 1
- 101000596896 Homo sapiens BDNF/NT-3 growth factors receptor Proteins 0.000 description 1
- 101000813729 Homo sapiens ETS translocation variant 1 Proteins 0.000 description 1
- 101000813747 Homo sapiens ETS translocation variant 4 Proteins 0.000 description 1
- 101000813745 Homo sapiens ETS translocation variant 5 Proteins 0.000 description 1
- 101001057929 Homo sapiens Echinoderm microtubule-associated protein-like 4 Proteins 0.000 description 1
- 101000882584 Homo sapiens Estrogen receptor Proteins 0.000 description 1
- 101000917134 Homo sapiens Fibroblast growth factor receptor 4 Proteins 0.000 description 1
- 101000990568 Homo sapiens HEAT repeat-containing protein 4 Proteins 0.000 description 1
- 101001082627 Homo sapiens HLA class II histocompatibility antigen gamma chain Proteins 0.000 description 1
- 101000596894 Homo sapiens High affinity nerve growth factor receptor Proteins 0.000 description 1
- 101001050559 Homo sapiens Kinesin-1 heavy chain Proteins 0.000 description 1
- 101000876418 Homo sapiens Laforin Proteins 0.000 description 1
- 101000882389 Homo sapiens Laforin, isoform 9 Proteins 0.000 description 1
- 101000604135 Homo sapiens Nucleolar protein 10 Proteins 0.000 description 1
- 101000741790 Homo sapiens Peroxisome proliferator-activated receptor gamma Proteins 0.000 description 1
- 101001126417 Homo sapiens Platelet-derived growth factor receptor alpha Proteins 0.000 description 1
- 101000686031 Homo sapiens Proto-oncogene tyrosine-protein kinase ROS Proteins 0.000 description 1
- 101000579425 Homo sapiens Proto-oncogene tyrosine-protein kinase receptor Ret Proteins 0.000 description 1
- 101000984753 Homo sapiens Serine/threonine-protein kinase B-raf Proteins 0.000 description 1
- 101000813738 Homo sapiens Transcription factor ETV6 Proteins 0.000 description 1
- 101001010792 Homo sapiens Transcriptional regulator ERG Proteins 0.000 description 1
- 101000850794 Homo sapiens Tropomyosin alpha-3 chain Proteins 0.000 description 1
- 101000864776 Homo sapiens Vesicle transport protein SFT2C Proteins 0.000 description 1
- 101001104110 Homo sapiens X-linked retinitis pigmentosa GTPase regulator-interacting protein 1 Proteins 0.000 description 1
- 102100023422 Kinesin-1 heavy chain Human genes 0.000 description 1
- 102100035192 Laforin Human genes 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- -1 MET Proteins 0.000 description 1
- 102100029166 NT-3 growth factor receptor Human genes 0.000 description 1
- 102000048238 Neuregulin-1 Human genes 0.000 description 1
- 108090000556 Neuregulin-1 Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 102100038456 Nucleolar protein 10 Human genes 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 102100038825 Peroxisome proliferator-activated receptor gamma Human genes 0.000 description 1
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 description 1
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 description 1
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 1
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 description 1
- 102100023347 Proto-oncogene tyrosine-protein kinase ROS Human genes 0.000 description 1
- 102100028286 Proto-oncogene tyrosine-protein kinase receptor Ret Human genes 0.000 description 1
- 102000004229 RNA-binding protein EWS Human genes 0.000 description 1
- 108090000740 RNA-binding protein EWS Proteins 0.000 description 1
- 108091006576 SLC34A2 Proteins 0.000 description 1
- 108091007568 SLC45A3 Proteins 0.000 description 1
- 102100027103 Serine/threonine-protein kinase B-raf Human genes 0.000 description 1
- 102100038437 Sodium-dependent phosphate transport protein 2B Human genes 0.000 description 1
- 102100037253 Solute carrier family 45 member 3 Human genes 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 102100039580 Transcription factor ETV6 Human genes 0.000 description 1
- 102100033080 Tropomyosin alpha-3 chain Human genes 0.000 description 1
- 102100030061 Vesicle transport protein SFT2C Human genes 0.000 description 1
- 102100040089 X-linked retinitis pigmentosa GTPase regulator-interacting protein 1 Human genes 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 229940044683 chemotherapy drug Drugs 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 239000007791 liquid phase Substances 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 230000006508 oncogene activation Effects 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000037439 somatic mutation Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- 108010064892 trkC Receptor Proteins 0.000 description 1
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
Definitions
- the present invention relates to the technical field of biology, in particular, to a method and device for detecting gene mutation and expression level.
- Gene mutation refers to a sudden, heritable variation in genomic DNA molecules.
- gene mutation refers to the change of the base pair composition or arrangement order of the gene in structure. Genes are stable enough to replicate themselves precisely as cells divide, but this stability is relative. Under certain conditions, a gene can also suddenly change from its original form of existence to another new form of existence, that is, at a site, a new gene suddenly appears to replace the original gene. This gene is called a mutant gene. Then suddenly new traits appear in the performance of the offspring that were never present in the ancestors.
- Gene mutation is one of the important factors of biological evolution, so the study of gene mutation has extensive biological significance in addition to its own theoretical significance. Some gene mutations are formed due to structural changes in chromosomes. Under the influence of natural conditions or human factors, the structural variation of chromosomes mainly includes: deletion, duplication, inversion and translocation. Among them, gene fusion is also a kind of structural variation of chromosomes.
- the present invention aims to provide a method and device for detecting gene mutation and expression level, and to detect gene mutation and expression level.
- the present invention based on RNA targeted sequencing (targeted RNA sequencing) gene mutation (including gene fusion) and expression detection method, can efficiently enrich the RNA transcripts expressed by tumor-related genes, and completely detect the transcripts expressed by these genes. It includes fusion, single-base and multi-base substitution (SNV/MNV), insertion deletion mutation (indel) and other mutation types, and analyzes the expression of these tumor driver genes in tumor tissues.
- RNA for mutation detection has a stronger functional correlation.
- the mutation frequency of both SNVs is 1%, but because of different expression levels, the clinical impact of the mutation will be different.
- the invention can not only detect the conventional gene expression amount and gene fusion of RNAseq, but also can detect the SNV and CNV of the DNA panel, and can detect the expression amount of various mutations. Achieve a single assay covering all mutation types and relative expression levels.
- the system of the present invention performs RNA panel targeting of target genes, compared with RNAseq to detect the whole transcriptome, the sequencing cost is lower, and the target region can be significantly enriched, especially for low-expressed genes or mutations, the detection sensitivity is higher. And the RNA-targeted sequencing panel design only needs to cover the exon region, compared with the DNA panel design, which needs to cover the exons and introns, which saves the cost of probes and sequencing, and is more suitable for clinical kit development.
- a method for detecting gene mutation and expression level includes the following steps: S1, extracting the RNA of the sample to be detected, interrupting the RNA of the sample to be detected, and performing reverse transcription to obtain cDNA; S2, using the cDNA to construct a gene library through the steps of end repair, linker ligation and library enrichment; S3 , using the specific hybridization of the capture probe and the target region to capture and enrich the target gene from the gene library; S4, use high-throughput sequencer to obtain RNA-targeted sequencing data; S5, analyze the gene mutation in the RNA-targeted sequencing data S5 specifically includes: S51, gene expression analysis: quantitatively evaluate the expression of the target gene in the test sample using the RPKM method; S52, gene overexpression analysis: call the baseline sample population and analyze the RPKM value of the target gene distribution, determine the threshold value of the expression level of the target gene, and determine whether the target gene of the sample to be tested is overex
- S5 also includes: after filtering out low-quality sequencing data and reads containing adapter sequences and performing quality control, obtaining data that meets the standards and then analyzing the changes in gene mutations and expression levels in the RNA-targeted sequencing data, wherein,
- the threshold is as follows:
- SeedReads+RescueReads represent reads across fusion breakpoints
- HKA represents housekeeping gene A
- HKB represents housekeeping gene B
- HKC represents housekeeping gene C
- count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
- length represents The sequence length of the sequenced sequence aligned to the reference genome.
- Gene Average Depth represents the average depth of genes
- ALT count indicates the depth of mutation
- HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
- a device for detecting gene mutation and expression level includes: an RNA extraction module, configured to extract RNA from a sample to be detected, interrupt the RNA of the sample to be detected, and perform reverse transcription to obtain cDNA; a gene library construction module, configured to use cDNA through end repair, joint connection and library enrichment
- the target gene enrichment module is set to capture and enrich the target gene from the gene library by using the specific hybridization between the capture probe and the target region;
- the sequencing module is set to use a high-throughput sequencer to sequence to obtain RNA Targeted sequencing data; analysis module, set to analyze gene mutation and expression changes in RNA targeted sequencing data; analysis module specifically includes: gene expression analysis sub-module, set to use RPKM method to quantitatively evaluate the detection of target genes in samples.
- gene overexpression analysis sub-module set to call the baseline sample population, analyze the RPKM value distribution of the target gene, determine the threshold value of the target gene expression level, and judge the sample to be tested according to the RPKM value of the target gene of the sample to be tested Whether the target gene is overexpressed;
- Gene fusion analysis sub-module set to filter fusion genes belonging to the same gene family, fusion genes belonging to the same paralog group, and fusion genes derived from the same gene model, and filter out the fusion genes that belong to the same gene family according to the threshold.
- fusion mutation relative expression analysis sub-module set to perform expression correction based on the quantitative expression results of housekeeping genes and the results of gene fusion analysis obtained in the gene fusion analysis sub-module Normalization to obtain the relative expression of the fusion gene
- single nucleotide variation analysis sub-module set to determine the variant single nucleotide through gene alignment
- single nucleotide variation expression analysis sub-module set to be based on single nucleotide Variation analysis results, housekeeping gene expression quantification results and sequence alignment statistical results are used to perform quantitative expression analysis of single nucleotide variation to obtain the expression level of single nucleotide variation.
- the analysis module also includes a filtering sub-module: it is set to filter out low-quality sequencing data and reads containing adapter sequences and perform quality control, and then obtain data that meets the standards and then analyze gene mutations and expressions in the RNA-targeted sequencing data.
- the quality control includes: aligning the sequencing data obtained after filtering out low-quality sequencing data and reads containing adapter sequences to the reference genome, obtaining sequence alignment results, and performing quality control assessment on the comparison results.
- the threshold is as follows:
- the standardization formula used for the standardization of expression correction is as follows:
- SeedReads+RescueReads represent reads across fusion breakpoints
- HKA represents housekeeping gene A
- HKB represents housekeeping gene B
- HKC represents housekeeping gene C
- count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
- length represents The sequence length of the sequenced sequence aligned to the reference genome.
- sequencing module adopts double-end or single-end mode for sequencing.
- the expression calculation formula of single nucleotide variation is:
- Gene Average Depth represents the average depth of genes
- ALT count indicates the depth of mutation
- HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
- RNA targeted sequencing targeted RNA sequencing
- the expressed transcripts contain fusions, single-base and multiple-base substitutions (SNV/MNV), insertion deletion mutations (indels) and other mutation types, and the expression levels of these tumor genes in tumor tissues are analyzed at the same time.
- FIG. 1 shows a schematic flowchart of a method for detecting gene mutation and expression level according to an embodiment of the present invention
- Figure 2 shows a schematic diagram of the correlation between RNA Panel and RNAseq sequencing gene expression in the embodiment
- Figure 3 shows a schematic diagram of the correlation between the important cancer driver gene RNA Panel and RNAseq sequencing gene expression levels in the example.
- liquid-phase probe-capture-based RNA-targeted sequencing can cover transcripts expressed by major tumor driver genes, as well as fusion, activating mutations, and drug resistance mutations at an ultra-high-depth sequencing level, and retain all Transcript relative expression information relative to housekeeping genes. And because it only covers a few tumor target genes, the amount of sequencing data is small and the cost is low, which is more suitable for the development of clinical detection kits.
- RNA is closer to downstream functional proteins and is more suitable for explaining the active state of cellular functional pathways.
- RNA was rarely used to detect somatic mutation SNV/Indel in the past, and RNA expression was not used to replace DNA copy number analysis, mainly because there are some factors that affect the detection accuracy, these factors mainly include: 1) single-stranded; 2) Inversion error; 3) RNA quality causes noise; 4) Affected by the amount of expression, non-expressed mutation cannot be detected; 5) The mutation at the transcription level leads to inconsistency, etc., and for these technical problems, the present invention has carried out technical improvements, Mainly include: 1) By anchoring the gene list of SNV/Indel and optimizing the filtering criteria of RNA SNV mutation, the accuracy of activating mutation and drug resistance mutation SNV/indel is improved; 2) Mutant allele transcript and wild type, etc.
- Relative expression of alleles 3) cis analysis of fusion mutations and drug resistance point mutations and correlation analysis of relative expression; 4) to establish the corresponding relationship between the increase in copy number and expression of tumor driver genes, and RNA expression can be used to replace DNA copy number analysis.
- the DNA panel in the prior art has the problem of missing detection in fusion detection (reason: RNA-level fusion caused by complex structural variation at the DNA level, or DNA panel probes do not cover breakpoints, etc.), so fusion detection needs to be supplemented by RNA methods .
- fusion detection needs to be supplemented by RNA methods .
- the actionable mutations of solid tumor targeted drugs are mainly SNV/indel/CNV
- the primary NGS screening of clinical samples is mainly based on DNA methods, supplemented by RNA or FISH/IHC and other verification methods, resulting in a process Complex, high sample demand and high cost.
- the present invention uses a high-throughput sequencing (NGS) to capture all mutation types in the panel covering the main TKI-targeted drugs of tumors, which greatly simplifies the operation process, saves samples, and improves sequencing while reducing costs.
- NGS high-throughput sequencing
- the depth, fusion mutation and activation point mutation accuracy are improved, and the information that DNA panels cannot provide, such as the expression level of driver genes and the specific expression level of mutant alleles, can be obtained, providing an auxiliary reference for the selection of tumor-targeted drugs.
- the present invention based on RNA targeted sequencing (targeted RNA sequencing) gene mutation (including gene fusion) and expression detection method, can efficiently enrich the RNA transcripts expressed by tumor-related genes, and completely detect the transcripts expressed by these genes. It includes fusion, single-base and multi-base substitution (SNV/MNV), insertion deletion mutation (indel) and other mutation types, and analyzes the expression of these tumor driver genes in tumor tissues.
- the method for acquiring RNA-targeted sequencing data may include the following steps: extracting total RNA from the FFPE sample without removing ribosomal RNA, interrupting the total RNA of the sample, and reverse transcribing it into cDNA; by including The steps of end repair, adapter ligation and library enrichment construct a gene library; the capture probe uses a nucleic acid probe that can specifically hybridize to the target region to capture and enrich the target gene from the constructed cDNA library; a high-throughput sequencer is used to capture and enrich the target gene. Sequencing is performed in paired-end mode, thereby obtaining RNA-targeted sequencing data.
- a method for detecting gene mutation and expression level includes the following steps: S1, extracting the RNA of the sample to be detected, interrupting the RNA of the sample to be detected, and performing reverse transcription to obtain cDNA; S2, using cDNA to construct through the steps of end repair, adapter ligation and library enrichment Gene library; S3, capture and enrich the target gene from the gene library by specific hybridization of capture probe and target region; S4, use high-throughput sequencer to obtain RNA-targeted sequencing data; S5, analyze RNA-targeted sequencing Changes in gene mutation and expression in the data; S5 specifically includes: S51, gene expression analysis: quantitatively evaluate the expression of the target gene in the test sample using the RPKM method; S52, gene overexpression analysis: call the baseline sample population, analyze the target The RPKM value distribution of genes, determine the threshold value of the target gene expression level, and determine whether the target gene of the sample to be tested is overexpressed according to the RPKM
- the fusion threshold is as follows in Table 1:
- the standardization formula adopted for the standardization of expression level correction is as follows:
- SeedReads+RescueReads represent reads across fusion breakpoints
- HKA represents housekeeping gene A
- HKB represents housekeeping gene B
- HKC represents housekeeping gene C
- count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
- length represents The sequence length of the sequenced sequence aligned to the reference genome.
- the expression calculation formula of the single nucleotide variation is:
- Gene Average Depth represents the average depth of genes
- ALT count indicates the depth of mutation
- HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
- a device for detecting gene mutation and expression level includes an RNA extraction module, a gene library building module, a target gene enrichment module, a sequencing module and an analysis module, wherein the RNA extraction module is configured to extract the total RNA or mRNA of the sample to be detected, interrupt the RNA of the sample to be detected, and perform reverse reactions. Transcribe to obtain cDNA; the gene library construction module is set to use cDNA to construct the gene library through the steps of end repair, adapter ligation and library enrichment; the target gene enrichment module is set to use the capture probe to specifically hybridize with the target region from the gene library.
- the sequencing module is set to use a high-throughput sequencer to sequence to obtain RNA-targeted sequencing data
- the analysis module is set to analyze the changes in gene mutations and expression levels in the RNA-targeted sequencing data; among them, the analysis module specifically Including gene expression analysis submodule, gene overexpression analysis submodule, gene fusion analysis submodule, fusion mutation expression analysis submodule, single nucleotide variation analysis submodule and single nucleotide variation mutation expression analysis submodule,
- the gene expression analysis sub-module is set to use the RPKM method to quantitatively evaluate the expression of the target gene in the test sample; the gene overexpression analysis sub-module is set to retrieve the baseline sample population, analyze the RPKM value distribution of the target gene, and determine the level of target gene expression.
- the gene fusion analysis sub-module is set to filter out fusion genes belonging to the same gene family and belonging to the same paralogous group.
- the fusion genes from the same gene model, and the fusion genes derived from the same gene model filter the fusion genes that do not meet the conditions according to the threshold, and obtain the fusion genes in the test sample;
- the results of gene fusion analysis obtained by the fusion analysis sub-module are subjected to expression correction and standardization to obtain the relative expression of the fusion gene;
- the single-nucleotide variation analysis sub-module is set to determine the variation of single nucleotides through gene alignment; single nucleotides
- the variation expression analysis sub-module is set to perform quantitative expression analysis of single nucleotide variation according to the results of single nucleotide variation analysis, the expression quantitative results of housekeeping genes and the statistical results of sequence alignment, and obtain single nucleotide variation expression.
- a filtering sub-module configured to filter out low-quality sequencing data and reads containing adapter sequences and perform quality control, and then analyze the RNA target after obtaining data that meets the standards Changes in gene mutations and expression levels in the sequencing data
- quality control includes: aligning the sequencing data obtained after filtering out low-quality sequencing data
- the threshold is as shown in Table 1.
- the standardization formula used for the standardization of expression correction is as follows:
- SeedReads+RescueReads represent reads across fusion breakpoints
- HKA represents housekeeping gene A
- HKB represents housekeeping gene B
- HKC represents housekeeping gene C
- count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
- length represents The sequence length of the sequenced sequence aligned to the reference genome.
- the expression calculation formula of single nucleotide variation is:
- Gene Average Depth represents the average depth of genes
- ALT count indicates the depth of mutation
- HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
- RNA for mutation detection has a stronger functional correlation.
- the mutation frequency of both SNVs is 1%, but because of different expression levels, the clinical impact of the mutation will be different.
- the invention can not only detect the conventional gene expression amount and gene fusion of RNAseq, but also can detect the SNV and CNV of the DNA panel, and can detect the expression amount of various mutations. Achieve a single assay covering all mutation types and relative expression levels.
- the system of the present invention performs RNA panel targeting of target genes, compared with RNAseq to detect the whole transcriptome, the sequencing cost is lower, and the target region can be significantly enriched, especially for low-expressed genes or mutations, the detection sensitivity is higher. And the RNA-targeted sequencing panel design only needs to cover the exon region, compared with the DNA panel design, which needs to cover the exons and introns, which saves the cost of probes and sequencing, and is more suitable for clinical kit development.
- Nucleotide library construction was performed using ABclonal's mRNA-seq Lib Prep Module for illumina: including cDNA reverse transcription, fragmentation, end repair, adapter ligation, library enrichment and other steps.
- the constructed library was purified with Agencourt AMpure XP magnetic beads, and then used Qubit 3.0 and Agilent 2100 capillary electrophoresis for concentration detection and quality control.
- target genes AK, ESR1, FGFR1, NRG1, RET, ERG, BRAF, ETV1, FGFR2, NTRK1, ROS1, EWSR1, CD74, ETV4, FGFR3, NTRK2, SLC34A2, MET, EGFR, ETV5, FGFR4, NTRK3, SLC45A3, PPARG, EML4, ETV6, KIF5B, PDGFRA, TPM3, PDGFRB, SFT2D3, CNTF, EPM2A, NOL10, HEATR4 and RPGRIP1), design non-overlapping tiled probe sequences based on their transcript sequences, probe 5 ' end is labeled with biotin.
- the eluted product was subjected to the next PCR amplification experiment, followed by purification with Agencourt AMPure XP magnetic beads, and Qubit 3.0 and Agilent 2100 capillary electrophoresis were used for concentration determination and quality control.
- RNA panel capture reads perform on-machine sequencing to obtain the original sequencing off-machine sequence, and use Trimmomatic-0.36 to process the sequence as follows to obtain a high-quality sequencing sequence
- the high-quality sequencing sequences (standards adopt general standards in the field) are compared to the reference genome using STAR to obtain the sequence alignment results, and the comparison results are subjected to quality control evaluation, and the next step is analyzed according to the following table 2 indicators (including: gene Expression analysis, gene fusion analysis, fusion mutation relative expression analysis, SNV analysis, SNV mutation expression analysis).
- the RPKM method is used to quantitatively evaluate the gene expression.
- the RPKM formula is as follows:
- Total exon reads The number of sequences aligned to all exons of a gene, evaluated using FeatureCounts software based on gene annotation files and alignment results.
- Mapped reads(millions) The number of all sequences aligned to the genome, obtained according to the statistical results of the alignment results.
- Exon length (KB): The length of the exon of the gene, calculated from the annotation file of the genome.
- SeedReads+RescueReads represents the reads across fusion breakpoints
- HKA represents housekeeping gene A
- HKB represents housekeeping gene B
- HKC represents housekeeping gene C
- count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
- length represents The sequence length of the sequenced sequence aligned to the reference genome.
- HKA count is the number of sequences in the housekeeping gene A sequencing sequence aligned with the reference genome.
- Transcript selection determine whether it is a drug site transcript/pathogenic locus in Clinvar/whether there is this transcript in the Transvar result/whether it is located in an intron non-splice/classical transcript/whether it is in an exon region;
- the quantitative analysis of the expression of the SNV was carried out with the quantitative results of the expression of housekeeping genes and the statistical results of the sequence alignment, and the expression level of the SNV was obtained.
- HK_expression_Coeffient Calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard;
- IGV was used to confirm the authenticity of breakpoints in the 5 samples of RNA fusions, and the number of detected samples was higher than the filtering standard. Among them, 3 cases were confirmed by next-generation sequencing to confirm the real existence of fusions, indicating that DNA fusions may be missed.
- RNA test and DNA test were positive for RNA, and the fusion form was the same as that of DNA, and RNA fusion was detected by alternative splicing.
- RNA panel The oncogene activation mutation and fusion secondary drug resistance primary and secondary mutation sites covered by the RNA panel were investigated (11 genes in total, 226 snv sites), and the consistency of DNA-targeted sequencing and snv detection results in paired RNA samples .
- a total of 40 non-small cell lung cancer clinical samples were included, 29 samples were not detected in DNA and RNA, and 11 samples were detected in total, and the mutations were mainly concentrated in the EGFR gene.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
一种检测基因突变及表达量的方法及装置。该方法包括以下步骤:S1,提取RNA,打断、反转录,得到cDNA;S2,采用cDNA构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析;S52,基因过表达分析;S53,基因融合分析;S54,融合突变表达量分析;S55,单核苷酸变异分析;S56,单核苷酸变异突变表达量分析。通过该方法能够高效富集肿瘤相关基因表达的RNA转录本,分析这些肿瘤基因在肿瘤组织中的表达量和突变情况。
Description
本发明涉及生物学技术领域,具体而言,涉及一种检测基因突变及表达量的方法及装置。
基因突变是指基因组DNA分子发生的突然的、可遗传的变异现象(gene mutation)。从分子水平上看,基因突变是指基因在结构上发生碱基对组成或排列顺序的改变。基因虽然十分稳定,能在细胞分裂时精确地复制自己,但这种稳定性是相对的。在一定的条件下基因也可以从原来的存在形式突然改变成另一种新的存在形式,就是在一个位点上,突然出现了一个新基因,代替了原有基因,这个基因叫做突变基因。于是后代的表现中也就突然地出现祖先从未有的新性状。
基因突变是生物进化的重要因素之一,所以研究基因突变除了本身的理论意义以外还有广泛的生物学意义。有的基因突变是由于染色体发生结构变异形成。在自然条件或人为因素的影响下,染色体发生的结构变异主要有:缺失、重复、倒位和易位,其中,基因融合也是染色体发生结构变异的一种。
随着测序技术的发展,成本的降低,在人类健康领域,人全基因组测序必将成为今后的主流趋势,精准医疗将是测序的最终目的。准确注释人类基因组的变异是实现精准医疗的必要手段。
目前常规方法一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。但是在DNA水平检测突变,不能反映突变在转录水平的真实表现。
发明内容
本发明旨在提供一种检测基因突变及表达量的方法及装置,检测基因突变及表达量。
本发明基于RNA靶向测序(targeted RNA sequencing)的基因突变(包括基因融合)及表达量检测方法,能够高效富集肿瘤相关基因所表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤驱动基因在肿瘤组织中的表达量。
现有技术中一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。传统方法在DNA水平检测突变,不能反映突变在转录水平的真实表现,利用RNA进行突变检测,功能相关性更强。例如两个SNV突变频率都是1%,但因为表达量不同,突变的临床影响会有差异。本发明不仅能够检测RNAseq常规的基因表达量、基因融合,还能够检测DNA panel的SNV和CNV,并且能够检测各种突变的表达量。实现一次检测,覆盖所有突变类型 和相对表达量。
本发明系统进行RNA panel靶向目标基因,相对RNAseq检测全转录组,测序费用更低,并且能显著富集目标区域,特别是对于低表达的基因或突变,检测灵敏度更高。并且RNA靶向测序panel设计只需要覆盖外显子区域,相比DNA panel设计需要覆盖外显子和内含子,更节省探针和测序成本,更适用于临床试剂盒开发。
为了实现上述目的,根据本发明的一个方面,提供了一种检测基因突变及表达量的方法。该方法包括以下步骤:S1,提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;S2,采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析:使用RPKM方法定量评估检测样本中目标基因的表达量;S52,基因过表达分析:调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;S53,基因融合分析:过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得检测样本中融合基因;S54,融合突变相对表达量分析:根据看家基因的表达定量结果和S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对融合表达量;S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;S56,单核苷酸变异表达量分析:根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
进一步地,S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控步骤包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
进一步地,S53中,阈值如下表:
特异序列 | 外显子边界 | 不是外显子边界 |
经典剪切位点 | ≥3 | ≥5 |
非经典剪切位点 | ≥5 | ≥10 |
进一步地,S54中,融合表达量校正标准化采用的标准化公式如下:
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
进一步地,S4中采用双端或单端模式进行测序。
进一步地,S56中,单核苷酸变异的表达量计算公式为:
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
根据本发明的另一个方面,提供一种检测基因突变及表达量的装置。该装置包括:RNA提取模块,设置为提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;基因文库构建模块,设置为采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;目标基因富集模块,设置为利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;测序模块,设置为利用高通量测序仪测序,获得RNA靶向测序数据;分析模块,设置为分析RNA靶向测序数据中基因突变及表达量的变化;分析模块具体包括:基因表达量分析子模块,设置为使用RPKM方法定量评估检测样本中目标基因的表达量;基因过表达分析子模块:设置为调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;基因融合分析子模块:设置为过滤属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤掉未满足条件的融合基因,获得检测样本中融合基因;融合突变相对表达量分析子模块:设置为根据看家基因的表达定量结果和基因融合分析子模块中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;单核苷酸变异分析子模块:设置为通过基因比对确定变异单核苷酸;单核苷酸变异表达量分析子模块:设置为根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
进一步地,分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
进一步地,基因融合分析子模块中,阈值如下表:
特异序列 | 外显子边界 | 不是外显子边界 |
经典剪切位点 | ≥3 | ≥5 |
非经典剪切位点 | ≥5 | ≥10 |
进一步地,融合突变表达量分析子模块中,表达量校正标准化采用的标准化公式如下:
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
进一步地,测序模块中采用双端或单端模式进行测序。
进一步地,单核苷酸变异表达量分析子模块中,单核苷酸变异的表达量计算公式为:
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
应用本发明的技术方案,以待检测样本总RNA或mRNA为检测对象,采用RNA靶向测序(targeted RNA sequencing)的方法,能够高效富集肿瘤相关基因表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤基因在肿瘤组织中的表达量。
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1示出了根据本发明实施方式的检测基因突变及表达量的方法的流程示意图;
图2示出了实施例中RNA Panel和RNAseq测序基因表达量的相关性示意图;
图3示出了实施例中重要的癌症驱动基因RNA Panel和RNAseq测序基因表达量的相关性示意图。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
基于液相探针捕获的RNA靶向测序相较于传统RNA-seq,能够以超高深度的测序水平覆盖主要肿瘤驱动基因所表达的转录本以及融合、激活突变、耐药突变,并且保留所有转录本相对于看家基因的相对表达量信息。并且由于仅覆盖少数肿瘤目标基因,测序数据量少、成本低,更适用于临床检测试剂盒开发。
相较于DNA,RNA更靠近下游功能蛋白,更适于阐释细胞功能通路的活性状态。但既往很少用RNA检测体细胞突变SNV/Indel,也不会使用RNA表达量替代DNA的拷贝数分析,主要是因为存在一些影响检测准确度的因素,这些因素主要包括:1)单链;2)反转错误;3)RNA质量引起噪音;4)受表达量影响,非表达的突变无法检测;5)转录水平的突变导致不一致等,而针对这些技术问题,本发明进行了技术改进,主要包括:1)通过锚定SNV/Indel的基因列表和优化RNA SNV突变的过滤标准,提高了激活突变与耐药突变SNV/indel的准确度;2)突变等位基因转录本与野生型等位基因的相对表达量;3)融合突变与耐药点突变的顺式分析以及相对表达量关联分析;4)建立肿瘤驱动基因拷贝数增加与表达量的对应关系,可以通过RNA表达量替代DNA的拷贝数分析。
另外,现有技术中DNA panel在融合检测中有漏检问题(原因:DNA水平复杂结构变异导致的RNA水平融合,或DNA panel探针没有覆盖断点等),因此融合检测需要RNA方法的补充。由于实体肿瘤靶向药物的有功能的突变(actionable mutations)主要以SNV/indel/CNV为主,因此临床样本NGS初筛以DNA方法为主,辅助以RNA或者FISH/IHC等复核方法,造成流程复杂,样本需求量高,成本高等问题。在本发明一典型的实施例中,本发明以一个高通量测序(NGS)捕获panel内涵盖肿瘤主要TKI靶向药物的所有突变类型,大大简化操作流程、节省样本、成本减少情况下提高测序深度、融合突变与激活点突变准确度提高、并且能够获得驱动基因的表达量以及突变等位基因的特异性表达量等DNA panel不能提供的信息,为肿瘤靶向药物的选择提供辅助参考。
本发明基于RNA靶向测序(targeted RNA sequencing)的基因突变(包括基因融合)及表达量检测方法,能够高效富集肿瘤相关基因所表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤驱动基因在肿瘤组织中的表达量。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
在本发明实施方式中,RNA靶向测序数据的获取方法可以包括以下步骤:从FFPE样本中提取总RNA,无需去除核糖体RNA,将样本总RNA打断,并反转录为cDNA;通过包括末端修复、接头连接和文库富集步骤构建基因文库;捕获探针利用能够与目标区域特异性杂交的核酸探针从所构建的cDNA文库中捕获并富集目标基因;利用高通量测序仪以双端模式 进行测序,由此获取RNA靶向测序数据。
根据本发明一种典型的实施方式,提供一种检测基因突变及表达量的方法。参见图1,该方法包括以下步骤:S1,提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;S2,采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析:使用RPKM方法定量评估检测样本中目标基因的表达量;S52,基因过表达分析:调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;S53,基因融合分析:过滤属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤掉未满足条件的融合基因,获得检测样本中融合基因;S54,融合突变相对表达量分析:根据看家基因的表达定量结果和S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;S56,单核苷酸变异表达量分析:根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
具体的,在本发明一实施方式中,S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控步骤包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)看家基因表达个数>=4。
优选地,S53中,融合阈值如下表1:
表1 融合突变阈值标准
优选地,S54中,表达量校正标准化采用的标准化公式如下:
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
优选的,S56中,单核苷酸变异的表达量计算公式为:
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
为了更方便的实施本发明的上述方法,根据本发明一种典型的实施方式,提供一种检测基因突变及表达量的装置。该装置包括RNA提取模块、基因文库构建模块、目标基因富集模块、测序模块和分析模块,其中,RNA提取模块设置为提取待检测样本总RNA或mRNA,将待检测样本RNA打断,进行反转录,得到cDNA;基因文库构建模块设置为采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;目标基因富集模块设置为利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;测序模块设置为利用高通量测序仪测序,获得RNA靶向测序数据;分析模块设置为分析RNA靶向测序数据中基因突变及表达量的变化;其中,分析模块具体包括基因表达量分析子模块、基因过表达分析子模块、基因融合分析子模块、融合突变表达量分析子模块、单核苷酸变异分析子模块和单核苷酸变异突变表达量分析子模块,基因表达量分析子模块设置为使用RPKM方法定量评估检测样本中目标基因的表达量;基因过表达分析子模块设置为调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;基因融合分析子模块设置为过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得检测样本中融合基因;融合突变相对表达量分析子模块设置为根据看家基因的表达定量结果和基因融合分析子模块获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;单核苷酸变异分析子模块设置为通过基因比对确定变异单核苷酸;单核苷酸变异表达量分析子模块设置为根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
具体的,在本发明一实施方式中,分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)看家基因表达个数>=4。
优选地,基因融合分析子模块中,阈值如表1。
优选地,融合突变表达量分析子模块中,表达量校正标准化采用的标准化公式如下:
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
优选地,单核苷酸变异突变表达量分析子模块中,单核苷酸变异的表达量计算公式为:
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
现有技术中一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。传统方法在DNA水平检测突变,不能反映突变在转录水平的真实表现,利用RNA进行突变检测,功能相关性更强。例如两个SNV突变频率都是1%,但因为表达量不同,突变的临床影响会有差异。本发明不仅能够检测RNAseq常规的基因表达量、基因融合,还能够检测DNA panel的SNV和CNV,并且能够检测各种突变的表达量。实现一次检测,覆盖所有突变类型和相对表达量。
本发明系统进行RNA panel靶向目标基因,相对RNAseq检测全转录组,测序费用更低,并且能显著富集目标区域,特别是对于低表达的基因或突变,检测灵敏度更高。并且RNA靶向测序panel设计只需要覆盖外显子区域,相比DNA panel设计需要覆盖外显子和内含子,更节省探针和测序成本,更适用于临床试剂盒开发。
下面将结合实施例进一步说明本发明的有益效果。
实施例
一、实验:
1.RNA提取:
使用肺癌患者石蜡包埋的病理切片,采用Qiagen的RNeasy FFPE Kit(Cat No./ID:73504)进行总RNA提取。使用Qubit RNA HS对RNA的含量进行测定,使用Labchip检测对RNA进行质控。
2.杂交前核苷酸文库制备:
使用ABclonal公司的mRNA-seq Lib Prep Module for illumina进行核苷酸文库构建:包括 cDNA反转录、片段化、末端修复、接头连接、文库富集等步骤。所构建文库使用Agencourt AMpure XP磁珠纯化后,使用Qubit 3.0以及Agilent 2100毛细管电泳用于浓度检测和质控。
3.探针捕获杂交:
根据选取的36个靶基因(ALK、ESR1、FGFR1、NRG1、RET、ERG、BRAF、ETV1、FGFR2、NTRK1、ROS1、EWSR1、CD74、ETV4、FGFR3、NTRK2、SLC34A2、MET、EGFR、ETV5、FGFR4、NTRK3、SLC45A3、PPARG、EML4、ETV6、KIF5B、PDGFRA、TPM3、PDGFRB、SFT2D3、CNTF、EPM2A、NOL10、HEATR4和RPGRIP1),根据其转录本序列设计non-overlapping的平铺探针序列,探针5’端用生物素标记。将2ug制备好的杂交前文库与5uL Human Cot DNA(IDT),2uL xGen Universal Blockers-TS Mix混合,使用真空离心浓缩仪蒸干(60℃,约20min-1hr)后,再复溶于杂交液中,室温孵育10min后,移至PCR仪中65℃杂交16h。将捕获过夜的杂交产物与链霉亲和素磁珠混合,在PCR仪中孵育45min后,用清洗液对磁珠进行清洗。将洗脱产物进行下一步PCR扩增实验,后续用Agencourt AMPure XP磁珠纯化,使用Qubit 3.0以及Agilent 2100毛细管电泳进行浓度测定和质控。
4.高通量测序:使用Illumina Nextseq、Novaseq等,以双端模式进行测序。
二、测序数据分析:
根据RNA panel捕获reads进行上机测序,得到原始测序下机序列,使用Trimmomatic-0.36对序列进行如下处理得到高质量的测序序列
a)除低质量的测序序列
b)去掉含有接头序列的reads
将高质量的测序序列(标准采用本领域通用标准)使用STAR比对到参考基因组,得到序列比对结果,并对比对结果进行质量控制评估,符合如下表2指标进行下一步分析(包括:基因表达量分析、基因融合分析、融合突变相对表达量分析、SNV分析、SNV突变表达量分析)。
表2 RNA panel下机质控标准
序列回帖比对率 | 阈值 | >=80% |
目标区域数据量 | 阈值 | >=2M |
表达的看家基因个数 | 阈值 | >=4 |
1.基因表达量分析
根据序列比对结果和参考基因组的注释文件,使用RPKM方法定量评估基因表达量,RPKM公式如下:
Total exon reads:比对到基因所有外显子的序列数目,使用FeatureCounts软件根据基因注释文件和比对结果进行评估。
Mapped reads(millions):比对到基因组上所有序列的数目,根据比对结果的统计结果得到。
Exon length(KB):基因的外显子长度,根据基因组的注释文件计算得到。
2.基因融合分析
将高质量的测序序列使用FusionMap用于识别基因融合,得到初步的基因融合结果,根据基因融合结果按照以下规则进行过滤:
1)基因融合结果中Filter标识为空,表示意思如下:
a)过滤掉属于同一基因家族的融合基因;
b)过滤掉属于同一旁系同源组(由Ensembl v74定义而来)的融合基因;
c)过滤掉来源于同一基因模型的融合基因。
2)根据制定阈值过滤掉未满足条件的融合基因,阈值标准如下表3:
表3 融合突变阈值标准
uniqcount | 外显子边界 | 不是外显子边界 |
经典剪切位点 | ≥3 | ≥5 |
非经典剪切位点 | ≥5 | ≥10 |
3.融合突变表达量分析
根据识别到基因融合结果和看家基因的表达定量结果进行校正标准化,得到融合基因的融合表达量结果,标准化公式如下:
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。例如,HKA
count则为看家基因A测序序列与参考基因组比对上的序列数目。
4.SNV分析
分析流程:
1)测序数据分析比对,得到bam数据文件;
2)使用VarDict caller抓取出与参考基因组(hg19)比对后的突变位点和插入缺失区域, 结果文件为VCF格式;
3)对VCF文件使用ANNOVAR注释,并对部分注释不准确位点再使用transvar注释,得到全部结果文件;此处使用transvar矫正注释结果,结果更加准确全面;
4)合并两次结果;对合并文件进行正负链矫正并统计reads数和freq;
此处对链偏好性的矫正,重新矫正结果注释;
5)使用证据位点数据库过滤注释和转录本支持选择;
基因突变及基因数据库模块:
a)整理出不同肿瘤,疾病高发的基因,建立一个明确的靶向位点及化疗药物相关性的热点基因列表;
b)公共数据库,包括EXAC/千人/gnomAD/HGMD/OMIM/cosmic;
转录本选择:判断是否是用药位点转录本/Clinvar中致病性位点/Transvar结果中是否有该转录本/是否有位于内含子非splice/经典转录本/是否在外显子区;
7)根据验证得到阈值标准对合并结果进行过滤,得到最终结果;
针对不同的基因和热点进行了独立验证和大量样本平行验证,对结果进行可视化判断矫正,计算出最优性能后逆推出一套质控阈值标准;
过滤标准:
a)过滤测序深度小于10的突变位点;
b)过滤掉黑名单中的突变,保留白名单中的突变;
c)过滤掉forward和reverse中没有reads支持的突变;
d)过滤掉freq和support reads不符合要求的突变。
5.SNV突变表达量分析
根据SNV结果,以看家基因的表达定量结果和序列比对的统计结果,进行SNV的表达定量分析,得到SNV的表达量。
Gene Average Depth:基因的平均深度
HK_expression_Coeffient:根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数;
三、结果部分:
1.RNA panel检测基因融合的准确性
通过配对DNA样本靶向测序,对RNA样本融合基因检测进行一致性验证,性能见下表4。在57例DNA融合阴性样本中,52例RNA融合检出阴性,5例RNA融合检出阳性。因此,DNA与RNA融合检出结果的阴性一致性为52/57=91.23%。RNA检出融合的5例样本均用IGV确认断点真实性,检出条数均高于过滤标准,其中3例利用一代测序确认融合真实存在,说明DNA存在融合漏检可能。在16例临检DNA融合阳性样本中,16例RNA均检出阳性,且检出融合形式与DNA一致,RNA存在融合可变剪切检出。RNA检测与DNA检测的阳性一致率为16/16=100%,阴性一致率为52/57=91.23%。
表4 RNA panel检测融合性能
2.RNA panel检测SNV的准确性
考察RNA panel所覆盖的oncogene激活突变以及融合继发耐药一二级突变位点(共11个基因,226个snv位点),在DNA靶向测序与配对RNA样本snv检出结果的一致性。共计40例非小细胞肺癌临床样本,29例DNA与RNA均未检出,11例共检出样本,突变主要集中在EGFR基因上。RNA与DNA检出考察范围snv结果的阳性一致率与阴性一致率均为100%。结果见表5。
表5 RNA panel检测SNV性能
3.RNA panel检测基因表达量的准确性
30例FFPE构建RNA文库,然后分别进行RNAseq测序和使用RNA panel捕获后测序,分析RNAseq和RNA panel检测基因表达量的一致性,结果显示两种方法对于panel包含的所有基因,表达量检测的一致性R值>0.8。结果见图2RNAseq和RNA panel基因表达量的相关性结果。
对于panel中重要的癌症驱动基因,比如ALK、MET、NTRK、EGFR等,RNAseq和RNA panel基因表达量的R值>0.9。结果见图3。
4.通过RNA表达量替代DNA的拷贝数分析
165例FFPE样本使用RNA panel捕获后测序,统计EGFR基因的表达量rpkm值分布,确定EGFR表达量的阈值,将EGFR表达量top 10%且有剩余切片的样本,进行免疫组化(IHC)实验和DNA靶向测序。实验结果表明,EGFR基因的表达量和免疫组化即蛋白水平的结果,比DNA CNV结果和免疫组化结果一致性更好。结果见表6。
表6 RNA panel检测CNV性能
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (12)
- 一种检测基因突变及表达量的方法,其特征在于,包括以下步骤:S1,提取待检测样本RNA,将所述待检测样本RNA打断,进行反转录,得到cDNA;S2,采用所述cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;S3,利用捕获探针与目标区域特异性杂交从所述基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析所述RNA靶向测序数据中基因突变及表达量的变化;所述S5具体包括:S51,基因表达量分析:使用RPKM方法定量评估所述检测样本中目标基因的表达量;S52,基因过表达分析:调取基线样本群体,分析所述目标基因的RPKM值分布,确定所述目标基因表达量高低的阈值,根据所述待检测样本的目标基因的RPKM值,判断所述待检测样本的目标基因是否为过表达;S53,基因融合分析:过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得所述检测样本中融合基因;S54,融合突变相对表达量分析:根据看家基因的表达定量结果和所述S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;S56,单核苷酸变异表达量分析:根据所述单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
- 根据权利要求1所述的方法,其特征在于,所述S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析所述RNA靶向测序数据中基因突变及表达量的变化,其中,所述质控步骤包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
- 根据权利要求1所述的方法,其特征在于,所述S53中,所述阈值如下表:
特异序列 外显子边界 不是外显子边界 经典剪切位点 ≥3 ≥5 非经典剪切位点 ≥5 ≥10 - 根据权利要求1所述的方法,其特征在于,所述S4中采用双端或单端模式进行测序。
- 一种检测基因突变及表达量的装置,其特征在于,包括:RNA提取模块,设置为提取待检测样本RNA,将所述待检测样本RNA打断,进行反转录,得到cDNA;基因文库构建模块,设置为采用所述cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;目标基因富集模块,设置为利用捕获探针与目标区域特异性杂交从所述基因文库中捕获并富集目标基因;测序模块,设置为利用高通量测序仪测序,获得RNA靶向测序数据;分析模块,设置为分析所述RNA靶向测序数据中基因突变及表达量的变化;所述分析模块具体包括:基因表达量分析子模块,设置为使用RPKM方法定量评估所述检测样本中目标基因的表达量;基因过表达分析子模块:设置为调取基线样本群体,分析所述目标基因的RPKM值分布,确定所述目标基因表达量高低的阈值,根据所述待检测样本的目标基因的RPKM值,判断所述待检测样本的目标基因是否为过表达;基因融合分析子模块:设置为过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得所述检测样本中融合基因;融合突变相对表达量分析子模块:设置为根据看家基因的表达定量结果和所述基因融合分析子模块中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;单核苷酸变异分析子模块:设置为通过基因比对确定变异单核苷酸;单核苷酸变异表达量分析子模块:设置为根据所述单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
- 根据权利要求7所述的装置,其特征在于,所述分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析所述RNA靶向测序数据中基因突变及表达量的变化,其中,所述质控包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
- 根据权利要求7所述的装置,其特征在于,所述基因融合分析子模块中,所述阈值如下表:
特异序列 外显子边界 不是外显子边界 经典剪切位点 ≥3 ≥5 非经典剪切位点 ≥5 ≥10 - 根据权利要求7所述的装置,其特征在于,所述测序模块中采用双端或单端模式进行测序。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022566482A JP2023524722A (ja) | 2020-10-29 | 2021-09-09 | 遺伝子の突然変異及び発現量を検出する方法及び装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011182844.4 | 2020-10-29 | ||
CN202011182844.4A CN112397144B (zh) | 2020-10-29 | 2020-10-29 | 检测基因突变及表达量的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022089033A1 true WO2022089033A1 (zh) | 2022-05-05 |
Family
ID=74597910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/117533 WO2022089033A1 (zh) | 2020-10-29 | 2021-09-09 | 检测基因突变及表达量的方法及装置 |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP2023524722A (zh) |
CN (1) | CN112397144B (zh) |
WO (1) | WO2022089033A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798584A (zh) * | 2022-12-14 | 2023-03-14 | 上海华测艾普医学检验所有限公司 | 一种同时检测egfr基因t790m和c797s顺反式突变的方法 |
CN116994656A (zh) * | 2023-09-25 | 2023-11-03 | 北京求臻医学检验实验室有限公司 | 一种用于提高二代测序检测准确度的方法 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112397144B (zh) * | 2020-10-29 | 2021-06-15 | 无锡臻和生物科技股份有限公司 | 检测基因突变及表达量的方法及装置 |
CN113470745B (zh) * | 2021-08-25 | 2023-09-08 | 南京立顶医疗科技有限公司 | SARS-CoV-2潜在突变位点的筛选方法及其应用 |
CN113981078B (zh) * | 2021-09-16 | 2023-11-24 | 北京肿瘤医院(北京大学肿瘤医院) | 用于预测晚期食管癌患者抗egfr靶向治疗疗效的生物标志物及疗效预测试剂盒 |
CN114317753A (zh) * | 2021-12-30 | 2022-04-12 | 北京迈基诺基因科技股份有限公司 | 眼肿瘤融合基因的检测模型及构建方法和检测方法 |
CN114369665A (zh) * | 2022-01-22 | 2022-04-19 | 河南省肿瘤医院 | 基于NanoString平台检测基因融合用于辅助诊断软组织肉瘤的方法 |
KR102518091B1 (ko) * | 2022-07-12 | 2023-04-06 | 주식회사 아이엠비디엑스 | 상동 재조합 결핍 정보를 제공하는 방법 |
CN115083516B (zh) * | 2022-07-13 | 2023-03-21 | 北京先声医学检验实验室有限公司 | 一种基于靶向RNA测序技术检测基因融合的Panel设计和评估方法 |
CN115896256A (zh) * | 2022-11-25 | 2023-04-04 | 臻悦生物科技江苏有限公司 | 基于二代测序技术的rna插入缺失突变的检测方法、装置、设备和存储介质 |
CN116926198A (zh) * | 2023-09-15 | 2023-10-24 | 臻和(北京)生物科技有限公司 | 检测胃癌组织Claudin18.2蛋白阳性的方法、装置、设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160340722A1 (en) * | 2014-01-22 | 2016-11-24 | Adam Platt | Methods And Systems For Detecting Genetic Mutations |
US20170321260A1 (en) * | 2016-05-09 | 2017-11-09 | Health In Code, S.L. | Mutation identification method |
CN110079594A (zh) * | 2019-04-22 | 2019-08-02 | 元码基因科技(苏州)有限公司 | 基于dna和rna基因突变检测的高通量方法 |
CN110628880A (zh) * | 2019-09-30 | 2019-12-31 | 深圳恒特基因有限公司 | 一种同步使用信使rna与基因组dna模板检测基因变异的方法 |
CN111321202A (zh) * | 2019-12-31 | 2020-06-23 | 广州金域医学检验集团股份有限公司 | 基因融合变异文库构建方法、检测方法、装置、设备及存储介质 |
CN112397144A (zh) * | 2020-10-29 | 2021-02-23 | 无锡臻和生物科技有限公司 | 检测基因突变及表达量的方法及装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2875173B1 (en) * | 2012-07-17 | 2017-06-28 | Counsyl, Inc. | System and methods for detecting genetic variation |
AU2015249846B2 (en) * | 2014-04-21 | 2021-07-22 | Natera, Inc. | Detecting mutations and ploidy in chromosomal segments |
-
2020
- 2020-10-29 CN CN202011182844.4A patent/CN112397144B/zh active Active
-
2021
- 2021-09-09 WO PCT/CN2021/117533 patent/WO2022089033A1/zh active Application Filing
- 2021-09-09 JP JP2022566482A patent/JP2023524722A/ja not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160340722A1 (en) * | 2014-01-22 | 2016-11-24 | Adam Platt | Methods And Systems For Detecting Genetic Mutations |
US20170321260A1 (en) * | 2016-05-09 | 2017-11-09 | Health In Code, S.L. | Mutation identification method |
CN110079594A (zh) * | 2019-04-22 | 2019-08-02 | 元码基因科技(苏州)有限公司 | 基于dna和rna基因突变检测的高通量方法 |
CN110628880A (zh) * | 2019-09-30 | 2019-12-31 | 深圳恒特基因有限公司 | 一种同步使用信使rna与基因组dna模板检测基因变异的方法 |
CN111321202A (zh) * | 2019-12-31 | 2020-06-23 | 广州金域医学检验集团股份有限公司 | 基因融合变异文库构建方法、检测方法、装置、设备及存储介质 |
CN112397144A (zh) * | 2020-10-29 | 2021-02-23 | 无锡臻和生物科技有限公司 | 检测基因突变及表达量的方法及装置 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798584A (zh) * | 2022-12-14 | 2023-03-14 | 上海华测艾普医学检验所有限公司 | 一种同时检测egfr基因t790m和c797s顺反式突变的方法 |
CN115798584B (zh) * | 2022-12-14 | 2024-03-29 | 上海华测艾普医学检验所有限公司 | 一种同时检测egfr基因t790m和c797s顺反式突变的方法 |
CN116994656A (zh) * | 2023-09-25 | 2023-11-03 | 北京求臻医学检验实验室有限公司 | 一种用于提高二代测序检测准确度的方法 |
CN116994656B (zh) * | 2023-09-25 | 2024-01-02 | 北京求臻医学检验实验室有限公司 | 一种用于提高二代测序检测准确度的方法 |
Also Published As
Publication number | Publication date |
---|---|
JP2023524722A (ja) | 2023-06-13 |
CN112397144A (zh) | 2021-02-23 |
CN112397144B (zh) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022089033A1 (zh) | 检测基因突变及表达量的方法及装置 | |
KR102028375B1 (ko) | 희귀 돌연변이 및 카피수 변이를 검출하기 위한 시스템 및 방법 | |
Li et al. | A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes | |
US12129514B2 (en) | Methods and compositions for evaluating genetic markers | |
CN106414768B (zh) | 与癌症相关的基因融合体和基因变异体 | |
JP6073461B2 (ja) | 標的大規模並列配列決定法を使用した対立遺伝子比分析による胎児トリソミーの非侵襲的出生前診断 | |
JP2018531583A (ja) | 血漿dnaの単分子配列決定 | |
CN106715711A (zh) | 确定探针序列的方法和基因组结构变异的检测方法 | |
US20170329893A1 (en) | Methods of determining genomic health risk | |
EP3564391B1 (en) | Method, device and kit for detecting fetal genetic mutation | |
WO2012068919A1 (zh) | DNA文库及其制备方法、以及检测SNPs的方法和装置 | |
CN110564838A (zh) | 用于新生儿糖原累积病基因分型的多重pcr引物系统及其用途 | |
Yadav et al. | Next-Generation sequencing transforming clinical practice and precision medicine | |
CN109461473B (zh) | 胎儿游离dna浓度获取方法和装置 | |
CN105648044A (zh) | 确定胎儿目标区域单体型的方法和装置 | |
Shen et al. | Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes | |
EP3553183A1 (en) | Hybridization solution for capturing a nucleic acid including a target oligonucleotide sequence | |
CN108753934B (zh) | 一种检测基因突变的方法、试剂盒及其制备方法 | |
CN112251512B (zh) | 用于非小细胞肺癌患者基因检测的目标基因组以及相关的评估方法、用途和试剂盒 | |
CN111172248B (zh) | 一种基于片段分析技术验证拷贝数变异的通用试剂盒 | |
EP3524695B1 (en) | Method for the enrichment of genomic regions | |
CN108642173B (zh) | 一种无创检测slc26a4基因突变的方法和试剂盒 | |
EP3696278A1 (en) | Method of determining the origin of nucleic acids in a mixed sample | |
US20190316195A1 (en) | Methods of capturing a nucleic acid including a target oligonucleotide sequence and uses thereof | |
US20220356513A1 (en) | Synthetic polynucleotides and method of use thereof in genetic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21884763 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022566482 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21884763 Country of ref document: EP Kind code of ref document: A1 |