WO2022089033A1 - 检测基因突变及表达量的方法及装置 - Google Patents

检测基因突变及表达量的方法及装置 Download PDF

Info

Publication number
WO2022089033A1
WO2022089033A1 PCT/CN2021/117533 CN2021117533W WO2022089033A1 WO 2022089033 A1 WO2022089033 A1 WO 2022089033A1 CN 2021117533 W CN2021117533 W CN 2021117533W WO 2022089033 A1 WO2022089033 A1 WO 2022089033A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
expression
analysis
fusion
rna
Prior art date
Application number
PCT/CN2021/117533
Other languages
English (en)
French (fr)
Inventor
洪媛媛
苏琳
曾雪霞
张卓
张琦
林小静
尤松霞
杨滢
陈维之
Original Assignee
无锡臻和生物科技有限公司
臻悦生物科技江苏有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 无锡臻和生物科技有限公司, 臻悦生物科技江苏有限公司 filed Critical 无锡臻和生物科技有限公司
Priority to JP2022566482A priority Critical patent/JP2023524722A/ja
Publication of WO2022089033A1 publication Critical patent/WO2022089033A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Definitions

  • the present invention relates to the technical field of biology, in particular, to a method and device for detecting gene mutation and expression level.
  • Gene mutation refers to a sudden, heritable variation in genomic DNA molecules.
  • gene mutation refers to the change of the base pair composition or arrangement order of the gene in structure. Genes are stable enough to replicate themselves precisely as cells divide, but this stability is relative. Under certain conditions, a gene can also suddenly change from its original form of existence to another new form of existence, that is, at a site, a new gene suddenly appears to replace the original gene. This gene is called a mutant gene. Then suddenly new traits appear in the performance of the offspring that were never present in the ancestors.
  • Gene mutation is one of the important factors of biological evolution, so the study of gene mutation has extensive biological significance in addition to its own theoretical significance. Some gene mutations are formed due to structural changes in chromosomes. Under the influence of natural conditions or human factors, the structural variation of chromosomes mainly includes: deletion, duplication, inversion and translocation. Among them, gene fusion is also a kind of structural variation of chromosomes.
  • the present invention aims to provide a method and device for detecting gene mutation and expression level, and to detect gene mutation and expression level.
  • the present invention based on RNA targeted sequencing (targeted RNA sequencing) gene mutation (including gene fusion) and expression detection method, can efficiently enrich the RNA transcripts expressed by tumor-related genes, and completely detect the transcripts expressed by these genes. It includes fusion, single-base and multi-base substitution (SNV/MNV), insertion deletion mutation (indel) and other mutation types, and analyzes the expression of these tumor driver genes in tumor tissues.
  • RNA for mutation detection has a stronger functional correlation.
  • the mutation frequency of both SNVs is 1%, but because of different expression levels, the clinical impact of the mutation will be different.
  • the invention can not only detect the conventional gene expression amount and gene fusion of RNAseq, but also can detect the SNV and CNV of the DNA panel, and can detect the expression amount of various mutations. Achieve a single assay covering all mutation types and relative expression levels.
  • the system of the present invention performs RNA panel targeting of target genes, compared with RNAseq to detect the whole transcriptome, the sequencing cost is lower, and the target region can be significantly enriched, especially for low-expressed genes or mutations, the detection sensitivity is higher. And the RNA-targeted sequencing panel design only needs to cover the exon region, compared with the DNA panel design, which needs to cover the exons and introns, which saves the cost of probes and sequencing, and is more suitable for clinical kit development.
  • a method for detecting gene mutation and expression level includes the following steps: S1, extracting the RNA of the sample to be detected, interrupting the RNA of the sample to be detected, and performing reverse transcription to obtain cDNA; S2, using the cDNA to construct a gene library through the steps of end repair, linker ligation and library enrichment; S3 , using the specific hybridization of the capture probe and the target region to capture and enrich the target gene from the gene library; S4, use high-throughput sequencer to obtain RNA-targeted sequencing data; S5, analyze the gene mutation in the RNA-targeted sequencing data S5 specifically includes: S51, gene expression analysis: quantitatively evaluate the expression of the target gene in the test sample using the RPKM method; S52, gene overexpression analysis: call the baseline sample population and analyze the RPKM value of the target gene distribution, determine the threshold value of the expression level of the target gene, and determine whether the target gene of the sample to be tested is overex
  • S5 also includes: after filtering out low-quality sequencing data and reads containing adapter sequences and performing quality control, obtaining data that meets the standards and then analyzing the changes in gene mutations and expression levels in the RNA-targeted sequencing data, wherein,
  • the threshold is as follows:
  • SeedReads+RescueReads represent reads across fusion breakpoints
  • HKA represents housekeeping gene A
  • HKB represents housekeeping gene B
  • HKC represents housekeeping gene C
  • count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
  • length represents The sequence length of the sequenced sequence aligned to the reference genome.
  • Gene Average Depth represents the average depth of genes
  • ALT count indicates the depth of mutation
  • HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
  • a device for detecting gene mutation and expression level includes: an RNA extraction module, configured to extract RNA from a sample to be detected, interrupt the RNA of the sample to be detected, and perform reverse transcription to obtain cDNA; a gene library construction module, configured to use cDNA through end repair, joint connection and library enrichment
  • the target gene enrichment module is set to capture and enrich the target gene from the gene library by using the specific hybridization between the capture probe and the target region;
  • the sequencing module is set to use a high-throughput sequencer to sequence to obtain RNA Targeted sequencing data; analysis module, set to analyze gene mutation and expression changes in RNA targeted sequencing data; analysis module specifically includes: gene expression analysis sub-module, set to use RPKM method to quantitatively evaluate the detection of target genes in samples.
  • gene overexpression analysis sub-module set to call the baseline sample population, analyze the RPKM value distribution of the target gene, determine the threshold value of the target gene expression level, and judge the sample to be tested according to the RPKM value of the target gene of the sample to be tested Whether the target gene is overexpressed;
  • Gene fusion analysis sub-module set to filter fusion genes belonging to the same gene family, fusion genes belonging to the same paralog group, and fusion genes derived from the same gene model, and filter out the fusion genes that belong to the same gene family according to the threshold.
  • fusion mutation relative expression analysis sub-module set to perform expression correction based on the quantitative expression results of housekeeping genes and the results of gene fusion analysis obtained in the gene fusion analysis sub-module Normalization to obtain the relative expression of the fusion gene
  • single nucleotide variation analysis sub-module set to determine the variant single nucleotide through gene alignment
  • single nucleotide variation expression analysis sub-module set to be based on single nucleotide Variation analysis results, housekeeping gene expression quantification results and sequence alignment statistical results are used to perform quantitative expression analysis of single nucleotide variation to obtain the expression level of single nucleotide variation.
  • the analysis module also includes a filtering sub-module: it is set to filter out low-quality sequencing data and reads containing adapter sequences and perform quality control, and then obtain data that meets the standards and then analyze gene mutations and expressions in the RNA-targeted sequencing data.
  • the quality control includes: aligning the sequencing data obtained after filtering out low-quality sequencing data and reads containing adapter sequences to the reference genome, obtaining sequence alignment results, and performing quality control assessment on the comparison results.
  • the threshold is as follows:
  • the standardization formula used for the standardization of expression correction is as follows:
  • SeedReads+RescueReads represent reads across fusion breakpoints
  • HKA represents housekeeping gene A
  • HKB represents housekeeping gene B
  • HKC represents housekeeping gene C
  • count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
  • length represents The sequence length of the sequenced sequence aligned to the reference genome.
  • sequencing module adopts double-end or single-end mode for sequencing.
  • the expression calculation formula of single nucleotide variation is:
  • Gene Average Depth represents the average depth of genes
  • ALT count indicates the depth of mutation
  • HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
  • RNA targeted sequencing targeted RNA sequencing
  • the expressed transcripts contain fusions, single-base and multiple-base substitutions (SNV/MNV), insertion deletion mutations (indels) and other mutation types, and the expression levels of these tumor genes in tumor tissues are analyzed at the same time.
  • FIG. 1 shows a schematic flowchart of a method for detecting gene mutation and expression level according to an embodiment of the present invention
  • Figure 2 shows a schematic diagram of the correlation between RNA Panel and RNAseq sequencing gene expression in the embodiment
  • Figure 3 shows a schematic diagram of the correlation between the important cancer driver gene RNA Panel and RNAseq sequencing gene expression levels in the example.
  • liquid-phase probe-capture-based RNA-targeted sequencing can cover transcripts expressed by major tumor driver genes, as well as fusion, activating mutations, and drug resistance mutations at an ultra-high-depth sequencing level, and retain all Transcript relative expression information relative to housekeeping genes. And because it only covers a few tumor target genes, the amount of sequencing data is small and the cost is low, which is more suitable for the development of clinical detection kits.
  • RNA is closer to downstream functional proteins and is more suitable for explaining the active state of cellular functional pathways.
  • RNA was rarely used to detect somatic mutation SNV/Indel in the past, and RNA expression was not used to replace DNA copy number analysis, mainly because there are some factors that affect the detection accuracy, these factors mainly include: 1) single-stranded; 2) Inversion error; 3) RNA quality causes noise; 4) Affected by the amount of expression, non-expressed mutation cannot be detected; 5) The mutation at the transcription level leads to inconsistency, etc., and for these technical problems, the present invention has carried out technical improvements, Mainly include: 1) By anchoring the gene list of SNV/Indel and optimizing the filtering criteria of RNA SNV mutation, the accuracy of activating mutation and drug resistance mutation SNV/indel is improved; 2) Mutant allele transcript and wild type, etc.
  • Relative expression of alleles 3) cis analysis of fusion mutations and drug resistance point mutations and correlation analysis of relative expression; 4) to establish the corresponding relationship between the increase in copy number and expression of tumor driver genes, and RNA expression can be used to replace DNA copy number analysis.
  • the DNA panel in the prior art has the problem of missing detection in fusion detection (reason: RNA-level fusion caused by complex structural variation at the DNA level, or DNA panel probes do not cover breakpoints, etc.), so fusion detection needs to be supplemented by RNA methods .
  • fusion detection needs to be supplemented by RNA methods .
  • the actionable mutations of solid tumor targeted drugs are mainly SNV/indel/CNV
  • the primary NGS screening of clinical samples is mainly based on DNA methods, supplemented by RNA or FISH/IHC and other verification methods, resulting in a process Complex, high sample demand and high cost.
  • the present invention uses a high-throughput sequencing (NGS) to capture all mutation types in the panel covering the main TKI-targeted drugs of tumors, which greatly simplifies the operation process, saves samples, and improves sequencing while reducing costs.
  • NGS high-throughput sequencing
  • the depth, fusion mutation and activation point mutation accuracy are improved, and the information that DNA panels cannot provide, such as the expression level of driver genes and the specific expression level of mutant alleles, can be obtained, providing an auxiliary reference for the selection of tumor-targeted drugs.
  • the present invention based on RNA targeted sequencing (targeted RNA sequencing) gene mutation (including gene fusion) and expression detection method, can efficiently enrich the RNA transcripts expressed by tumor-related genes, and completely detect the transcripts expressed by these genes. It includes fusion, single-base and multi-base substitution (SNV/MNV), insertion deletion mutation (indel) and other mutation types, and analyzes the expression of these tumor driver genes in tumor tissues.
  • the method for acquiring RNA-targeted sequencing data may include the following steps: extracting total RNA from the FFPE sample without removing ribosomal RNA, interrupting the total RNA of the sample, and reverse transcribing it into cDNA; by including The steps of end repair, adapter ligation and library enrichment construct a gene library; the capture probe uses a nucleic acid probe that can specifically hybridize to the target region to capture and enrich the target gene from the constructed cDNA library; a high-throughput sequencer is used to capture and enrich the target gene. Sequencing is performed in paired-end mode, thereby obtaining RNA-targeted sequencing data.
  • a method for detecting gene mutation and expression level includes the following steps: S1, extracting the RNA of the sample to be detected, interrupting the RNA of the sample to be detected, and performing reverse transcription to obtain cDNA; S2, using cDNA to construct through the steps of end repair, adapter ligation and library enrichment Gene library; S3, capture and enrich the target gene from the gene library by specific hybridization of capture probe and target region; S4, use high-throughput sequencer to obtain RNA-targeted sequencing data; S5, analyze RNA-targeted sequencing Changes in gene mutation and expression in the data; S5 specifically includes: S51, gene expression analysis: quantitatively evaluate the expression of the target gene in the test sample using the RPKM method; S52, gene overexpression analysis: call the baseline sample population, analyze the target The RPKM value distribution of genes, determine the threshold value of the target gene expression level, and determine whether the target gene of the sample to be tested is overexpressed according to the RPKM
  • the fusion threshold is as follows in Table 1:
  • the standardization formula adopted for the standardization of expression level correction is as follows:
  • SeedReads+RescueReads represent reads across fusion breakpoints
  • HKA represents housekeeping gene A
  • HKB represents housekeeping gene B
  • HKC represents housekeeping gene C
  • count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
  • length represents The sequence length of the sequenced sequence aligned to the reference genome.
  • the expression calculation formula of the single nucleotide variation is:
  • Gene Average Depth represents the average depth of genes
  • ALT count indicates the depth of mutation
  • HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
  • a device for detecting gene mutation and expression level includes an RNA extraction module, a gene library building module, a target gene enrichment module, a sequencing module and an analysis module, wherein the RNA extraction module is configured to extract the total RNA or mRNA of the sample to be detected, interrupt the RNA of the sample to be detected, and perform reverse reactions. Transcribe to obtain cDNA; the gene library construction module is set to use cDNA to construct the gene library through the steps of end repair, adapter ligation and library enrichment; the target gene enrichment module is set to use the capture probe to specifically hybridize with the target region from the gene library.
  • the sequencing module is set to use a high-throughput sequencer to sequence to obtain RNA-targeted sequencing data
  • the analysis module is set to analyze the changes in gene mutations and expression levels in the RNA-targeted sequencing data; among them, the analysis module specifically Including gene expression analysis submodule, gene overexpression analysis submodule, gene fusion analysis submodule, fusion mutation expression analysis submodule, single nucleotide variation analysis submodule and single nucleotide variation mutation expression analysis submodule,
  • the gene expression analysis sub-module is set to use the RPKM method to quantitatively evaluate the expression of the target gene in the test sample; the gene overexpression analysis sub-module is set to retrieve the baseline sample population, analyze the RPKM value distribution of the target gene, and determine the level of target gene expression.
  • the gene fusion analysis sub-module is set to filter out fusion genes belonging to the same gene family and belonging to the same paralogous group.
  • the fusion genes from the same gene model, and the fusion genes derived from the same gene model filter the fusion genes that do not meet the conditions according to the threshold, and obtain the fusion genes in the test sample;
  • the results of gene fusion analysis obtained by the fusion analysis sub-module are subjected to expression correction and standardization to obtain the relative expression of the fusion gene;
  • the single-nucleotide variation analysis sub-module is set to determine the variation of single nucleotides through gene alignment; single nucleotides
  • the variation expression analysis sub-module is set to perform quantitative expression analysis of single nucleotide variation according to the results of single nucleotide variation analysis, the expression quantitative results of housekeeping genes and the statistical results of sequence alignment, and obtain single nucleotide variation expression.
  • a filtering sub-module configured to filter out low-quality sequencing data and reads containing adapter sequences and perform quality control, and then analyze the RNA target after obtaining data that meets the standards Changes in gene mutations and expression levels in the sequencing data
  • quality control includes: aligning the sequencing data obtained after filtering out low-quality sequencing data
  • the threshold is as shown in Table 1.
  • the standardization formula used for the standardization of expression correction is as follows:
  • SeedReads+RescueReads represent reads across fusion breakpoints
  • HKA represents housekeeping gene A
  • HKB represents housekeeping gene B
  • HKC represents housekeeping gene C
  • count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
  • length represents The sequence length of the sequenced sequence aligned to the reference genome.
  • the expression calculation formula of single nucleotide variation is:
  • Gene Average Depth represents the average depth of genes
  • ALT count indicates the depth of mutation
  • HK_expression_Coeffient means to calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard.
  • RNA for mutation detection has a stronger functional correlation.
  • the mutation frequency of both SNVs is 1%, but because of different expression levels, the clinical impact of the mutation will be different.
  • the invention can not only detect the conventional gene expression amount and gene fusion of RNAseq, but also can detect the SNV and CNV of the DNA panel, and can detect the expression amount of various mutations. Achieve a single assay covering all mutation types and relative expression levels.
  • the system of the present invention performs RNA panel targeting of target genes, compared with RNAseq to detect the whole transcriptome, the sequencing cost is lower, and the target region can be significantly enriched, especially for low-expressed genes or mutations, the detection sensitivity is higher. And the RNA-targeted sequencing panel design only needs to cover the exon region, compared with the DNA panel design, which needs to cover the exons and introns, which saves the cost of probes and sequencing, and is more suitable for clinical kit development.
  • Nucleotide library construction was performed using ABclonal's mRNA-seq Lib Prep Module for illumina: including cDNA reverse transcription, fragmentation, end repair, adapter ligation, library enrichment and other steps.
  • the constructed library was purified with Agencourt AMpure XP magnetic beads, and then used Qubit 3.0 and Agilent 2100 capillary electrophoresis for concentration detection and quality control.
  • target genes AK, ESR1, FGFR1, NRG1, RET, ERG, BRAF, ETV1, FGFR2, NTRK1, ROS1, EWSR1, CD74, ETV4, FGFR3, NTRK2, SLC34A2, MET, EGFR, ETV5, FGFR4, NTRK3, SLC45A3, PPARG, EML4, ETV6, KIF5B, PDGFRA, TPM3, PDGFRB, SFT2D3, CNTF, EPM2A, NOL10, HEATR4 and RPGRIP1), design non-overlapping tiled probe sequences based on their transcript sequences, probe 5 ' end is labeled with biotin.
  • the eluted product was subjected to the next PCR amplification experiment, followed by purification with Agencourt AMPure XP magnetic beads, and Qubit 3.0 and Agilent 2100 capillary electrophoresis were used for concentration determination and quality control.
  • RNA panel capture reads perform on-machine sequencing to obtain the original sequencing off-machine sequence, and use Trimmomatic-0.36 to process the sequence as follows to obtain a high-quality sequencing sequence
  • the high-quality sequencing sequences (standards adopt general standards in the field) are compared to the reference genome using STAR to obtain the sequence alignment results, and the comparison results are subjected to quality control evaluation, and the next step is analyzed according to the following table 2 indicators (including: gene Expression analysis, gene fusion analysis, fusion mutation relative expression analysis, SNV analysis, SNV mutation expression analysis).
  • the RPKM method is used to quantitatively evaluate the gene expression.
  • the RPKM formula is as follows:
  • Total exon reads The number of sequences aligned to all exons of a gene, evaluated using FeatureCounts software based on gene annotation files and alignment results.
  • Mapped reads(millions) The number of all sequences aligned to the genome, obtained according to the statistical results of the alignment results.
  • Exon length (KB): The length of the exon of the gene, calculated from the annotation file of the genome.
  • SeedReads+RescueReads represents the reads across fusion breakpoints
  • HKA represents housekeeping gene A
  • HKB represents housekeeping gene B
  • HKC represents housekeeping gene C
  • count represents the number of sequences in the alignment between the sequenced sequence and the reference genome
  • length represents The sequence length of the sequenced sequence aligned to the reference genome.
  • HKA count is the number of sequences in the housekeeping gene A sequencing sequence aligned with the reference genome.
  • Transcript selection determine whether it is a drug site transcript/pathogenic locus in Clinvar/whether there is this transcript in the Transvar result/whether it is located in an intron non-splice/classical transcript/whether it is in an exon region;
  • the quantitative analysis of the expression of the SNV was carried out with the quantitative results of the expression of housekeeping genes and the statistical results of the sequence alignment, and the expression level of the SNV was obtained.
  • HK_expression_Coeffient Calculate the coefficient of variation of the expression according to the expression of the housekeeping gene in the sample and the expression of the housekeeping gene in the standard;
  • IGV was used to confirm the authenticity of breakpoints in the 5 samples of RNA fusions, and the number of detected samples was higher than the filtering standard. Among them, 3 cases were confirmed by next-generation sequencing to confirm the real existence of fusions, indicating that DNA fusions may be missed.
  • RNA test and DNA test were positive for RNA, and the fusion form was the same as that of DNA, and RNA fusion was detected by alternative splicing.
  • RNA panel The oncogene activation mutation and fusion secondary drug resistance primary and secondary mutation sites covered by the RNA panel were investigated (11 genes in total, 226 snv sites), and the consistency of DNA-targeted sequencing and snv detection results in paired RNA samples .
  • a total of 40 non-small cell lung cancer clinical samples were included, 29 samples were not detected in DNA and RNA, and 11 samples were detected in total, and the mutations were mainly concentrated in the EGFR gene.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

一种检测基因突变及表达量的方法及装置。该方法包括以下步骤:S1,提取RNA,打断、反转录,得到cDNA;S2,采用cDNA构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析;S52,基因过表达分析;S53,基因融合分析;S54,融合突变表达量分析;S55,单核苷酸变异分析;S56,单核苷酸变异突变表达量分析。通过该方法能够高效富集肿瘤相关基因表达的RNA转录本,分析这些肿瘤基因在肿瘤组织中的表达量和突变情况。

Description

检测基因突变及表达量的方法及装置 技术领域
本发明涉及生物学技术领域,具体而言,涉及一种检测基因突变及表达量的方法及装置。
背景技术
基因突变是指基因组DNA分子发生的突然的、可遗传的变异现象(gene mutation)。从分子水平上看,基因突变是指基因在结构上发生碱基对组成或排列顺序的改变。基因虽然十分稳定,能在细胞分裂时精确地复制自己,但这种稳定性是相对的。在一定的条件下基因也可以从原来的存在形式突然改变成另一种新的存在形式,就是在一个位点上,突然出现了一个新基因,代替了原有基因,这个基因叫做突变基因。于是后代的表现中也就突然地出现祖先从未有的新性状。
基因突变是生物进化的重要因素之一,所以研究基因突变除了本身的理论意义以外还有广泛的生物学意义。有的基因突变是由于染色体发生结构变异形成。在自然条件或人为因素的影响下,染色体发生的结构变异主要有:缺失、重复、倒位和易位,其中,基因融合也是染色体发生结构变异的一种。
随着测序技术的发展,成本的降低,在人类健康领域,人全基因组测序必将成为今后的主流趋势,精准医疗将是测序的最终目的。准确注释人类基因组的变异是实现精准医疗的必要手段。
目前常规方法一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。但是在DNA水平检测突变,不能反映突变在转录水平的真实表现。
发明内容
本发明旨在提供一种检测基因突变及表达量的方法及装置,检测基因突变及表达量。
本发明基于RNA靶向测序(targeted RNA sequencing)的基因突变(包括基因融合)及表达量检测方法,能够高效富集肿瘤相关基因所表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤驱动基因在肿瘤组织中的表达量。
现有技术中一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。传统方法在DNA水平检测突变,不能反映突变在转录水平的真实表现,利用RNA进行突变检测,功能相关性更强。例如两个SNV突变频率都是1%,但因为表达量不同,突变的临床影响会有差异。本发明不仅能够检测RNAseq常规的基因表达量、基因融合,还能够检测DNA panel的SNV和CNV,并且能够检测各种突变的表达量。实现一次检测,覆盖所有突变类型 和相对表达量。
本发明系统进行RNA panel靶向目标基因,相对RNAseq检测全转录组,测序费用更低,并且能显著富集目标区域,特别是对于低表达的基因或突变,检测灵敏度更高。并且RNA靶向测序panel设计只需要覆盖外显子区域,相比DNA panel设计需要覆盖外显子和内含子,更节省探针和测序成本,更适用于临床试剂盒开发。
为了实现上述目的,根据本发明的一个方面,提供了一种检测基因突变及表达量的方法。该方法包括以下步骤:S1,提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;S2,采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析:使用RPKM方法定量评估检测样本中目标基因的表达量;S52,基因过表达分析:调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;S53,基因融合分析:过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得检测样本中融合基因;S54,融合突变相对表达量分析:根据看家基因的表达定量结果和S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对融合表达量;S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;S56,单核苷酸变异表达量分析:根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
进一步地,S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控步骤包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
进一步地,S53中,阈值如下表:
特异序列 外显子边界 不是外显子边界
经典剪切位点 ≥3 ≥5
非经典剪切位点 ≥5 ≥10
进一步地,S54中,融合表达量校正标准化采用的标准化公式如下:
Figure PCTCN2021117533-appb-000001
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
进一步地,S4中采用双端或单端模式进行测序。
进一步地,S56中,单核苷酸变异的表达量计算公式为:
Figure PCTCN2021117533-appb-000002
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
根据本发明的另一个方面,提供一种检测基因突变及表达量的装置。该装置包括:RNA提取模块,设置为提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;基因文库构建模块,设置为采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;目标基因富集模块,设置为利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;测序模块,设置为利用高通量测序仪测序,获得RNA靶向测序数据;分析模块,设置为分析RNA靶向测序数据中基因突变及表达量的变化;分析模块具体包括:基因表达量分析子模块,设置为使用RPKM方法定量评估检测样本中目标基因的表达量;基因过表达分析子模块:设置为调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;基因融合分析子模块:设置为过滤属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤掉未满足条件的融合基因,获得检测样本中融合基因;融合突变相对表达量分析子模块:设置为根据看家基因的表达定量结果和基因融合分析子模块中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;单核苷酸变异分析子模块:设置为通过基因比对确定变异单核苷酸;单核苷酸变异表达量分析子模块:设置为根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
进一步地,分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
进一步地,基因融合分析子模块中,阈值如下表:
特异序列 外显子边界 不是外显子边界
经典剪切位点 ≥3 ≥5
非经典剪切位点 ≥5 ≥10
进一步地,融合突变表达量分析子模块中,表达量校正标准化采用的标准化公式如下:
Figure PCTCN2021117533-appb-000003
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
进一步地,测序模块中采用双端或单端模式进行测序。
进一步地,单核苷酸变异表达量分析子模块中,单核苷酸变异的表达量计算公式为:
Figure PCTCN2021117533-appb-000004
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
应用本发明的技术方案,以待检测样本总RNA或mRNA为检测对象,采用RNA靶向测序(targeted RNA sequencing)的方法,能够高效富集肿瘤相关基因表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤基因在肿瘤组织中的表达量。
附图说明
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1示出了根据本发明实施方式的检测基因突变及表达量的方法的流程示意图;
图2示出了实施例中RNA Panel和RNAseq测序基因表达量的相关性示意图;
图3示出了实施例中重要的癌症驱动基因RNA Panel和RNAseq测序基因表达量的相关性示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
基于液相探针捕获的RNA靶向测序相较于传统RNA-seq,能够以超高深度的测序水平覆盖主要肿瘤驱动基因所表达的转录本以及融合、激活突变、耐药突变,并且保留所有转录本相对于看家基因的相对表达量信息。并且由于仅覆盖少数肿瘤目标基因,测序数据量少、成本低,更适用于临床检测试剂盒开发。
相较于DNA,RNA更靠近下游功能蛋白,更适于阐释细胞功能通路的活性状态。但既往很少用RNA检测体细胞突变SNV/Indel,也不会使用RNA表达量替代DNA的拷贝数分析,主要是因为存在一些影响检测准确度的因素,这些因素主要包括:1)单链;2)反转错误;3)RNA质量引起噪音;4)受表达量影响,非表达的突变无法检测;5)转录水平的突变导致不一致等,而针对这些技术问题,本发明进行了技术改进,主要包括:1)通过锚定SNV/Indel的基因列表和优化RNA SNV突变的过滤标准,提高了激活突变与耐药突变SNV/indel的准确度;2)突变等位基因转录本与野生型等位基因的相对表达量;3)融合突变与耐药点突变的顺式分析以及相对表达量关联分析;4)建立肿瘤驱动基因拷贝数增加与表达量的对应关系,可以通过RNA表达量替代DNA的拷贝数分析。
另外,现有技术中DNA panel在融合检测中有漏检问题(原因:DNA水平复杂结构变异导致的RNA水平融合,或DNA panel探针没有覆盖断点等),因此融合检测需要RNA方法的补充。由于实体肿瘤靶向药物的有功能的突变(actionable mutations)主要以SNV/indel/CNV为主,因此临床样本NGS初筛以DNA方法为主,辅助以RNA或者FISH/IHC等复核方法,造成流程复杂,样本需求量高,成本高等问题。在本发明一典型的实施例中,本发明以一个高通量测序(NGS)捕获panel内涵盖肿瘤主要TKI靶向药物的所有突变类型,大大简化操作流程、节省样本、成本减少情况下提高测序深度、融合突变与激活点突变准确度提高、并且能够获得驱动基因的表达量以及突变等位基因的特异性表达量等DNA panel不能提供的信息,为肿瘤靶向药物的选择提供辅助参考。
本发明基于RNA靶向测序(targeted RNA sequencing)的基因突变(包括基因融合)及表达量检测方法,能够高效富集肿瘤相关基因所表达的RNA转录本,并完整检测这些基因表达的转录本上的包含融合、单碱基与多碱基替换(SNV/MNV)、插入缺失突变(indel)等多种突变类型,同时分析这些肿瘤驱动基因在肿瘤组织中的表达量。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
在本发明实施方式中,RNA靶向测序数据的获取方法可以包括以下步骤:从FFPE样本中提取总RNA,无需去除核糖体RNA,将样本总RNA打断,并反转录为cDNA;通过包括末端修复、接头连接和文库富集步骤构建基因文库;捕获探针利用能够与目标区域特异性杂交的核酸探针从所构建的cDNA文库中捕获并富集目标基因;利用高通量测序仪以双端模式 进行测序,由此获取RNA靶向测序数据。
根据本发明一种典型的实施方式,提供一种检测基因突变及表达量的方法。参见图1,该方法包括以下步骤:S1,提取待检测样本RNA,将待检测样本RNA打断,进行反转录,得到cDNA;S2,采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;S3,利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;S4,利用高通量测序仪测序,获得RNA靶向测序数据;S5,分析RNA靶向测序数据中基因突变及表达量的变化;S5具体包括:S51,基因表达量分析:使用RPKM方法定量评估检测样本中目标基因的表达量;S52,基因过表达分析:调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;S53,基因融合分析:过滤属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤掉未满足条件的融合基因,获得检测样本中融合基因;S54,融合突变相对表达量分析:根据看家基因的表达定量结果和S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;S56,单核苷酸变异表达量分析:根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
具体的,在本发明一实施方式中,S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控步骤包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)看家基因表达个数>=4。
优选地,S53中,融合阈值如下表1:
表1 融合突变阈值标准
Figure PCTCN2021117533-appb-000005
优选地,S54中,表达量校正标准化采用的标准化公式如下:
Figure PCTCN2021117533-appb-000006
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
优选的,S56中,单核苷酸变异的表达量计算公式为:
Figure PCTCN2021117533-appb-000007
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
为了更方便的实施本发明的上述方法,根据本发明一种典型的实施方式,提供一种检测基因突变及表达量的装置。该装置包括RNA提取模块、基因文库构建模块、目标基因富集模块、测序模块和分析模块,其中,RNA提取模块设置为提取待检测样本总RNA或mRNA,将待检测样本RNA打断,进行反转录,得到cDNA;基因文库构建模块设置为采用cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;目标基因富集模块设置为利用捕获探针与目标区域特异性杂交从基因文库中捕获并富集目标基因;测序模块设置为利用高通量测序仪测序,获得RNA靶向测序数据;分析模块设置为分析RNA靶向测序数据中基因突变及表达量的变化;其中,分析模块具体包括基因表达量分析子模块、基因过表达分析子模块、基因融合分析子模块、融合突变表达量分析子模块、单核苷酸变异分析子模块和单核苷酸变异突变表达量分析子模块,基因表达量分析子模块设置为使用RPKM方法定量评估检测样本中目标基因的表达量;基因过表达分析子模块设置为调取基线样本群体,分析目标基因的RPKM值分布,确定目标基因表达量高低的阈值,根据待检测样本的目标基因的RPKM值,判断待检测样本的目标基因是否为过表达;基因融合分析子模块设置为过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得检测样本中融合基因;融合突变相对表达量分析子模块设置为根据看家基因的表达定量结果和基因融合分析子模块获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;单核苷酸变异分析子模块设置为通过基因比对确定变异单核苷酸;单核苷酸变异表达量分析子模块设置为根据单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
具体的,在本发明一实施方式中,分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析RNA靶向测序数据中基因突变及表达量的变化,其中,质控包括:将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)看家基因表达个数>=4。
优选地,基因融合分析子模块中,阈值如表1。
优选地,融合突变表达量分析子模块中,表达量校正标准化采用的标准化公式如下:
Figure PCTCN2021117533-appb-000008
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
优选地,单核苷酸变异突变表达量分析子模块中,单核苷酸变异的表达量计算公式为:
Figure PCTCN2021117533-appb-000009
其中,Gene Average Depth表示基因的平均深度;
ALT count表示突变的深度;
HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
现有技术中一般利用全基因组测序WGS或DNA panel进行SNV、CNV和融合的检测。传统方法在DNA水平检测突变,不能反映突变在转录水平的真实表现,利用RNA进行突变检测,功能相关性更强。例如两个SNV突变频率都是1%,但因为表达量不同,突变的临床影响会有差异。本发明不仅能够检测RNAseq常规的基因表达量、基因融合,还能够检测DNA panel的SNV和CNV,并且能够检测各种突变的表达量。实现一次检测,覆盖所有突变类型和相对表达量。
本发明系统进行RNA panel靶向目标基因,相对RNAseq检测全转录组,测序费用更低,并且能显著富集目标区域,特别是对于低表达的基因或突变,检测灵敏度更高。并且RNA靶向测序panel设计只需要覆盖外显子区域,相比DNA panel设计需要覆盖外显子和内含子,更节省探针和测序成本,更适用于临床试剂盒开发。
下面将结合实施例进一步说明本发明的有益效果。
实施例
一、实验:
1.RNA提取:
使用肺癌患者石蜡包埋的病理切片,采用Qiagen的RNeasy FFPE Kit(Cat No./ID:73504)进行总RNA提取。使用Qubit RNA HS对RNA的含量进行测定,使用Labchip检测对RNA进行质控。
2.杂交前核苷酸文库制备:
使用ABclonal公司的mRNA-seq Lib Prep Module for illumina进行核苷酸文库构建:包括 cDNA反转录、片段化、末端修复、接头连接、文库富集等步骤。所构建文库使用Agencourt AMpure XP磁珠纯化后,使用Qubit 3.0以及Agilent 2100毛细管电泳用于浓度检测和质控。
3.探针捕获杂交:
根据选取的36个靶基因(ALK、ESR1、FGFR1、NRG1、RET、ERG、BRAF、ETV1、FGFR2、NTRK1、ROS1、EWSR1、CD74、ETV4、FGFR3、NTRK2、SLC34A2、MET、EGFR、ETV5、FGFR4、NTRK3、SLC45A3、PPARG、EML4、ETV6、KIF5B、PDGFRA、TPM3、PDGFRB、SFT2D3、CNTF、EPM2A、NOL10、HEATR4和RPGRIP1),根据其转录本序列设计non-overlapping的平铺探针序列,探针5’端用生物素标记。将2ug制备好的杂交前文库与5uL Human Cot DNA(IDT),2uL xGen Universal Blockers-TS Mix混合,使用真空离心浓缩仪蒸干(60℃,约20min-1hr)后,再复溶于杂交液中,室温孵育10min后,移至PCR仪中65℃杂交16h。将捕获过夜的杂交产物与链霉亲和素磁珠混合,在PCR仪中孵育45min后,用清洗液对磁珠进行清洗。将洗脱产物进行下一步PCR扩增实验,后续用Agencourt AMPure XP磁珠纯化,使用Qubit 3.0以及Agilent 2100毛细管电泳进行浓度测定和质控。
4.高通量测序:使用Illumina Nextseq、Novaseq等,以双端模式进行测序。
二、测序数据分析:
根据RNA panel捕获reads进行上机测序,得到原始测序下机序列,使用Trimmomatic-0.36对序列进行如下处理得到高质量的测序序列
a)除低质量的测序序列
b)去掉含有接头序列的reads
将高质量的测序序列(标准采用本领域通用标准)使用STAR比对到参考基因组,得到序列比对结果,并对比对结果进行质量控制评估,符合如下表2指标进行下一步分析(包括:基因表达量分析、基因融合分析、融合突变相对表达量分析、SNV分析、SNV突变表达量分析)。
表2 RNA panel下机质控标准
序列回帖比对率 阈值 >=80%
目标区域数据量 阈值 >=2M
表达的看家基因个数 阈值 >=4
1.基因表达量分析
根据序列比对结果和参考基因组的注释文件,使用RPKM方法定量评估基因表达量,RPKM公式如下:
Figure PCTCN2021117533-appb-000010
Total exon reads:比对到基因所有外显子的序列数目,使用FeatureCounts软件根据基因注释文件和比对结果进行评估。
Mapped reads(millions):比对到基因组上所有序列的数目,根据比对结果的统计结果得到。
Exon length(KB):基因的外显子长度,根据基因组的注释文件计算得到。
2.基因融合分析
将高质量的测序序列使用FusionMap用于识别基因融合,得到初步的基因融合结果,根据基因融合结果按照以下规则进行过滤:
1)基因融合结果中Filter标识为空,表示意思如下:
a)过滤掉属于同一基因家族的融合基因;
b)过滤掉属于同一旁系同源组(由Ensembl v74定义而来)的融合基因;
c)过滤掉来源于同一基因模型的融合基因。
2)根据制定阈值过滤掉未满足条件的融合基因,阈值标准如下表3:
表3 融合突变阈值标准
uniqcount 外显子边界 不是外显子边界
经典剪切位点 ≥3 ≥5
非经典剪切位点 ≥5 ≥10
3.融合突变表达量分析
根据识别到基因融合结果和看家基因的表达定量结果进行校正标准化,得到融合基因的融合表达量结果,标准化公式如下:
Figure PCTCN2021117533-appb-000011
其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。例如,HKA count则为看家基因A测序序列与参考基因组比对上的序列数目。
4.SNV分析
分析流程:
1)测序数据分析比对,得到bam数据文件;
2)使用VarDict caller抓取出与参考基因组(hg19)比对后的突变位点和插入缺失区域, 结果文件为VCF格式;
3)对VCF文件使用ANNOVAR注释,并对部分注释不准确位点再使用transvar注释,得到全部结果文件;此处使用transvar矫正注释结果,结果更加准确全面;
4)合并两次结果;对合并文件进行正负链矫正并统计reads数和freq;
此处对链偏好性的矫正,重新矫正结果注释;
5)使用证据位点数据库过滤注释和转录本支持选择;
基因突变及基因数据库模块:
a)整理出不同肿瘤,疾病高发的基因,建立一个明确的靶向位点及化疗药物相关性的热点基因列表;
b)公共数据库,包括EXAC/千人/gnomAD/HGMD/OMIM/cosmic;
转录本选择:判断是否是用药位点转录本/Clinvar中致病性位点/Transvar结果中是否有该转录本/是否有位于内含子非splice/经典转录本/是否在外显子区;
7)根据验证得到阈值标准对合并结果进行过滤,得到最终结果;
针对不同的基因和热点进行了独立验证和大量样本平行验证,对结果进行可视化判断矫正,计算出最优性能后逆推出一套质控阈值标准;
过滤标准:
a)过滤测序深度小于10的突变位点;
b)过滤掉黑名单中的突变,保留白名单中的突变;
c)过滤掉forward和reverse中没有reads支持的突变;
d)过滤掉freq和support reads不符合要求的突变。
5.SNV突变表达量分析
根据SNV结果,以看家基因的表达定量结果和序列比对的统计结果,进行SNV的表达定量分析,得到SNV的表达量。
Figure PCTCN2021117533-appb-000012
Gene Average Depth:基因的平均深度
HK_expression_Coeffient:根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数;
三、结果部分:
1.RNA panel检测基因融合的准确性
通过配对DNA样本靶向测序,对RNA样本融合基因检测进行一致性验证,性能见下表4。在57例DNA融合阴性样本中,52例RNA融合检出阴性,5例RNA融合检出阳性。因此,DNA与RNA融合检出结果的阴性一致性为52/57=91.23%。RNA检出融合的5例样本均用IGV确认断点真实性,检出条数均高于过滤标准,其中3例利用一代测序确认融合真实存在,说明DNA存在融合漏检可能。在16例临检DNA融合阳性样本中,16例RNA均检出阳性,且检出融合形式与DNA一致,RNA存在融合可变剪切检出。RNA检测与DNA检测的阳性一致率为16/16=100%,阴性一致率为52/57=91.23%。
表4 RNA panel检测融合性能
Figure PCTCN2021117533-appb-000013
2.RNA panel检测SNV的准确性
考察RNA panel所覆盖的oncogene激活突变以及融合继发耐药一二级突变位点(共11个基因,226个snv位点),在DNA靶向测序与配对RNA样本snv检出结果的一致性。共计40例非小细胞肺癌临床样本,29例DNA与RNA均未检出,11例共检出样本,突变主要集中在EGFR基因上。RNA与DNA检出考察范围snv结果的阳性一致率与阴性一致率均为100%。结果见表5。
表5 RNA panel检测SNV性能
Figure PCTCN2021117533-appb-000014
3.RNA panel检测基因表达量的准确性
30例FFPE构建RNA文库,然后分别进行RNAseq测序和使用RNA panel捕获后测序,分析RNAseq和RNA panel检测基因表达量的一致性,结果显示两种方法对于panel包含的所有基因,表达量检测的一致性R值>0.8。结果见图2RNAseq和RNA panel基因表达量的相关性结果。
对于panel中重要的癌症驱动基因,比如ALK、MET、NTRK、EGFR等,RNAseq和RNA  panel基因表达量的R值>0.9。结果见图3。
4.通过RNA表达量替代DNA的拷贝数分析
165例FFPE样本使用RNA panel捕获后测序,统计EGFR基因的表达量rpkm值分布,确定EGFR表达量的阈值,将EGFR表达量top 10%且有剩余切片的样本,进行免疫组化(IHC)实验和DNA靶向测序。实验结果表明,EGFR基因的表达量和免疫组化即蛋白水平的结果,比DNA CNV结果和免疫组化结果一致性更好。结果见表6。
表6 RNA panel检测CNV性能
Figure PCTCN2021117533-appb-000015
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种检测基因突变及表达量的方法,其特征在于,包括以下步骤:
    S1,提取待检测样本RNA,将所述待检测样本RNA打断,进行反转录,得到cDNA;
    S2,采用所述cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;
    S3,利用捕获探针与目标区域特异性杂交从所述基因文库中捕获并富集目标基因;
    S4,利用高通量测序仪测序,获得RNA靶向测序数据;
    S5,分析所述RNA靶向测序数据中基因突变及表达量的变化;
    所述S5具体包括:
    S51,基因表达量分析:使用RPKM方法定量评估所述检测样本中目标基因的表达量;
    S52,基因过表达分析:调取基线样本群体,分析所述目标基因的RPKM值分布,确定所述目标基因表达量高低的阈值,根据所述待检测样本的目标基因的RPKM值,判断所述待检测样本的目标基因是否为过表达;
    S53,基因融合分析:过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得所述检测样本中融合基因;
    S54,融合突变相对表达量分析:根据看家基因的表达定量结果和所述S53中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;
    S55,单核苷酸变异分析:通过基因比对确定变异单核苷酸;
    S56,单核苷酸变异表达量分析:根据所述单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
  2. 根据权利要求1所述的方法,其特征在于,所述S5还包括:过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析所述RNA靶向测序数据中基因突变及表达量的变化,其中,所述质控步骤包括:
    将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
  3. 根据权利要求1所述的方法,其特征在于,所述S53中,所述阈值如下表:
    特异序列 外显子边界 不是外显子边界
    经典剪切位点 ≥3 ≥5 非经典剪切位点 ≥5 ≥10
  4. 根据权利要求1所述的方法,其特征在于,所述S54中,所述表达量校正标准化采用的标准化公式如下:
    Figure PCTCN2021117533-appb-100001
    其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
  5. 根据权利要求1所述的方法,其特征在于,所述S4中采用双端或单端模式进行测序。
  6. 根据权利要求1所述的方法,其特征在于,所述S56中,所述单核苷酸变异的表达量计算公式为:
    Figure PCTCN2021117533-appb-100002
    其中,Gene Average Depth表示基因的平均深度;
    ALT count表示突变的深度;
    HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
  7. 一种检测基因突变及表达量的装置,其特征在于,包括:
    RNA提取模块,设置为提取待检测样本RNA,将所述待检测样本RNA打断,进行反转录,得到cDNA;
    基因文库构建模块,设置为采用所述cDNA通过末端修复、接头连接和文库富集步骤构建基因文库;
    目标基因富集模块,设置为利用捕获探针与目标区域特异性杂交从所述基因文库中捕获并富集目标基因;
    测序模块,设置为利用高通量测序仪测序,获得RNA靶向测序数据;
    分析模块,设置为分析所述RNA靶向测序数据中基因突变及表达量的变化;
    所述分析模块具体包括:
    基因表达量分析子模块,设置为使用RPKM方法定量评估所述检测样本中目标基因的表达量;
    基因过表达分析子模块:设置为调取基线样本群体,分析所述目标基因的RPKM值分布,确定所述目标基因表达量高低的阈值,根据所述待检测样本的目标基因的RPKM值,判断所述待检测样本的目标基因是否为过表达;
    基因融合分析子模块:设置为过滤掉属于同一基因家族的融合基因、属于同一旁系同源组的融合基因、来源于同一基因模型的融合基因,根据阈值过滤未满足条件的融合基因,获得所述检测样本中融合基因;
    融合突变相对表达量分析子模块:设置为根据看家基因的表达定量结果和所述基因融合分析子模块中获得的基因融合分析的结果进行表达量校正标准化,得到融合基因的相对表达量;
    单核苷酸变异分析子模块:设置为通过基因比对确定变异单核苷酸;
    单核苷酸变异表达量分析子模块:设置为根据所述单核苷酸变异分析的结果和看家基因的表达定量结果和序列比对的统计结果,进行单核苷酸变异的表达定量分析,得到单核苷酸变异的表达量。
  8. 根据权利要求7所述的装置,其特征在于,所述分析模块还包括过滤子模块:设置为过滤掉低质量的测序数据和含有接头序列的reads并进行质控后,得到符合标准的数据再进行分析所述RNA靶向测序数据中基因突变及表达量的变化,其中,所述质控包括:
    将过滤掉低质量的测序数据和含有接头序列的reads后得到的测序数据比对到参考基因组,得到序列比对结果,对比对结果进行质量控制评估,符合如下三项指标后进行后续分析:1)序列回帖比对率,阈值,>=80%;2)目标区域数据量,阈值,>=2M;3)表达的看家基因个数>=4。
  9. 根据权利要求7所述的装置,其特征在于,所述基因融合分析子模块中,所述阈值如下表:
    特异序列 外显子边界 不是外显子边界 经典剪切位点 ≥3 ≥5 非经典剪切位点 ≥5 ≥10
  10. 根据权利要求7所述的装置,其特征在于,所述融合突变表达量分析子模块中,所述表达量校正标准化采用的标准化公式如下:
    Figure PCTCN2021117533-appb-100003
    其中,SeedReads+RescueReads表示跨融合断点的reads,HKA表示看家基因A,HKB表示看家基因B,HKC表示看家基因C,count表示测序序列与参考基因组比对上的序列数目,length表示测序序列与参考基因组比对上的序列长度。
  11. 根据权利要求7所述的装置,其特征在于,所述测序模块中采用双端或单端模式进行测序。
  12. 根据权利要求7所述的装置,其特征在于,所述单核苷酸变异表达量分析子模块中,所述单核苷酸变异的表达量计算公式为:
    Figure PCTCN2021117533-appb-100004
    其中,Gene Average Depth表示基因的平均深度;
    ALT count表示突变的深度;
    HK_expression_Coeffient表示根据样本中看家基因的表达量与标准品中看家基因的表达量计算表达量变化系数。
PCT/CN2021/117533 2020-10-29 2021-09-09 检测基因突变及表达量的方法及装置 WO2022089033A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022566482A JP2023524722A (ja) 2020-10-29 2021-09-09 遺伝子の突然変異及び発現量を検出する方法及び装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011182844.4 2020-10-29
CN202011182844.4A CN112397144B (zh) 2020-10-29 2020-10-29 检测基因突变及表达量的方法及装置

Publications (1)

Publication Number Publication Date
WO2022089033A1 true WO2022089033A1 (zh) 2022-05-05

Family

ID=74597910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117533 WO2022089033A1 (zh) 2020-10-29 2021-09-09 检测基因突变及表达量的方法及装置

Country Status (3)

Country Link
JP (1) JP2023524722A (zh)
CN (1) CN112397144B (zh)
WO (1) WO2022089033A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798584A (zh) * 2022-12-14 2023-03-14 上海华测艾普医学检验所有限公司 一种同时检测egfr基因t790m和c797s顺反式突变的方法
CN116994656A (zh) * 2023-09-25 2023-11-03 北京求臻医学检验实验室有限公司 一种用于提高二代测序检测准确度的方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397144B (zh) * 2020-10-29 2021-06-15 无锡臻和生物科技股份有限公司 检测基因突变及表达量的方法及装置
CN113470745B (zh) * 2021-08-25 2023-09-08 南京立顶医疗科技有限公司 SARS-CoV-2潜在突变位点的筛选方法及其应用
CN113981078B (zh) * 2021-09-16 2023-11-24 北京肿瘤医院(北京大学肿瘤医院) 用于预测晚期食管癌患者抗egfr靶向治疗疗效的生物标志物及疗效预测试剂盒
CN114317753A (zh) * 2021-12-30 2022-04-12 北京迈基诺基因科技股份有限公司 眼肿瘤融合基因的检测模型及构建方法和检测方法
CN114369665A (zh) * 2022-01-22 2022-04-19 河南省肿瘤医院 基于NanoString平台检测基因融合用于辅助诊断软组织肉瘤的方法
KR102518091B1 (ko) * 2022-07-12 2023-04-06 주식회사 아이엠비디엑스 상동 재조합 결핍 정보를 제공하는 방법
CN115083516B (zh) * 2022-07-13 2023-03-21 北京先声医学检验实验室有限公司 一种基于靶向RNA测序技术检测基因融合的Panel设计和评估方法
CN115896256A (zh) * 2022-11-25 2023-04-04 臻悦生物科技江苏有限公司 基于二代测序技术的rna插入缺失突变的检测方法、装置、设备和存储介质
CN116926198A (zh) * 2023-09-15 2023-10-24 臻和(北京)生物科技有限公司 检测胃癌组织Claudin18.2蛋白阳性的方法、装置、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160340722A1 (en) * 2014-01-22 2016-11-24 Adam Platt Methods And Systems For Detecting Genetic Mutations
US20170321260A1 (en) * 2016-05-09 2017-11-09 Health In Code, S.L. Mutation identification method
CN110079594A (zh) * 2019-04-22 2019-08-02 元码基因科技(苏州)有限公司 基于dna和rna基因突变检测的高通量方法
CN110628880A (zh) * 2019-09-30 2019-12-31 深圳恒特基因有限公司 一种同步使用信使rna与基因组dna模板检测基因变异的方法
CN111321202A (zh) * 2019-12-31 2020-06-23 广州金域医学检验集团股份有限公司 基因融合变异文库构建方法、检测方法、装置、设备及存储介质
CN112397144A (zh) * 2020-10-29 2021-02-23 无锡臻和生物科技有限公司 检测基因突变及表达量的方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2875173B1 (en) * 2012-07-17 2017-06-28 Counsyl, Inc. System and methods for detecting genetic variation
AU2015249846B2 (en) * 2014-04-21 2021-07-22 Natera, Inc. Detecting mutations and ploidy in chromosomal segments

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160340722A1 (en) * 2014-01-22 2016-11-24 Adam Platt Methods And Systems For Detecting Genetic Mutations
US20170321260A1 (en) * 2016-05-09 2017-11-09 Health In Code, S.L. Mutation identification method
CN110079594A (zh) * 2019-04-22 2019-08-02 元码基因科技(苏州)有限公司 基于dna和rna基因突变检测的高通量方法
CN110628880A (zh) * 2019-09-30 2019-12-31 深圳恒特基因有限公司 一种同步使用信使rna与基因组dna模板检测基因变异的方法
CN111321202A (zh) * 2019-12-31 2020-06-23 广州金域医学检验集团股份有限公司 基因融合变异文库构建方法、检测方法、装置、设备及存储介质
CN112397144A (zh) * 2020-10-29 2021-02-23 无锡臻和生物科技有限公司 检测基因突变及表达量的方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798584A (zh) * 2022-12-14 2023-03-14 上海华测艾普医学检验所有限公司 一种同时检测egfr基因t790m和c797s顺反式突变的方法
CN115798584B (zh) * 2022-12-14 2024-03-29 上海华测艾普医学检验所有限公司 一种同时检测egfr基因t790m和c797s顺反式突变的方法
CN116994656A (zh) * 2023-09-25 2023-11-03 北京求臻医学检验实验室有限公司 一种用于提高二代测序检测准确度的方法
CN116994656B (zh) * 2023-09-25 2024-01-02 北京求臻医学检验实验室有限公司 一种用于提高二代测序检测准确度的方法

Also Published As

Publication number Publication date
JP2023524722A (ja) 2023-06-13
CN112397144A (zh) 2021-02-23
CN112397144B (zh) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2022089033A1 (zh) 检测基因突变及表达量的方法及装置
KR102028375B1 (ko) 희귀 돌연변이 및 카피수 변이를 검출하기 위한 시스템 및 방법
Li et al. A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes
US12129514B2 (en) Methods and compositions for evaluating genetic markers
CN106414768B (zh) 与癌症相关的基因融合体和基因变异体
JP6073461B2 (ja) 標的大規模並列配列決定法を使用した対立遺伝子比分析による胎児トリソミーの非侵襲的出生前診断
JP2018531583A (ja) 血漿dnaの単分子配列決定
CN106715711A (zh) 确定探针序列的方法和基因组结构变异的检测方法
US20170329893A1 (en) Methods of determining genomic health risk
EP3564391B1 (en) Method, device and kit for detecting fetal genetic mutation
WO2012068919A1 (zh) DNA文库及其制备方法、以及检测SNPs的方法和装置
CN110564838A (zh) 用于新生儿糖原累积病基因分型的多重pcr引物系统及其用途
Yadav et al. Next-Generation sequencing transforming clinical practice and precision medicine
CN109461473B (zh) 胎儿游离dna浓度获取方法和装置
CN105648044A (zh) 确定胎儿目标区域单体型的方法和装置
Shen et al. Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes
EP3553183A1 (en) Hybridization solution for capturing a nucleic acid including a target oligonucleotide sequence
CN108753934B (zh) 一种检测基因突变的方法、试剂盒及其制备方法
CN112251512B (zh) 用于非小细胞肺癌患者基因检测的目标基因组以及相关的评估方法、用途和试剂盒
CN111172248B (zh) 一种基于片段分析技术验证拷贝数变异的通用试剂盒
EP3524695B1 (en) Method for the enrichment of genomic regions
CN108642173B (zh) 一种无创检测slc26a4基因突变的方法和试剂盒
EP3696278A1 (en) Method of determining the origin of nucleic acids in a mixed sample
US20190316195A1 (en) Methods of capturing a nucleic acid including a target oligonucleotide sequence and uses thereof
US20220356513A1 (en) Synthetic polynucleotides and method of use thereof in genetic analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884763

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022566482

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884763

Country of ref document: EP

Kind code of ref document: A1