CN110684830A - RNA analysis method for paraffin section tissue - Google Patents

RNA analysis method for paraffin section tissue Download PDF

Info

Publication number
CN110684830A
CN110684830A CN201910962113.2A CN201910962113A CN110684830A CN 110684830 A CN110684830 A CN 110684830A CN 201910962113 A CN201910962113 A CN 201910962113A CN 110684830 A CN110684830 A CN 110684830A
Authority
CN
China
Prior art keywords
paraffin section
sample
analysis
rna
quality control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910962113.2A
Other languages
Chinese (zh)
Inventor
黄毅
易鑫
吴玲清
刘久成
王长希
李俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Guiinga Medical Laboratory
Original Assignee
Shenzhen Guiinga Medical Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Guiinga Medical Laboratory filed Critical Shenzhen Guiinga Medical Laboratory
Priority to CN201910962113.2A priority Critical patent/CN110684830A/en
Publication of CN110684830A publication Critical patent/CN110684830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Physics & Mathematics (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a paraffin section tissue RNA analysis method, which comprises the following steps: carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA; preparing a paraffin section sample nucleic acid library, and sequencing sample RNA; performing quality control on sample data obtained by sequencing; comparing the sample data after quality control with the reference genome, and performing quality control on the result by comparison; and performing transcriptome assembly and transcript quantification on the sample data subjected to quality control of the comparison result, and performing quantitative analysis, gene differential expression analysis and fusion gene analysis on gene expression. The invention provides an index and a detection method for completely evaluating RNA quality of paraffin section tissue, which can comprehensively evaluate the RNA of the paraffin section tissue, have accurate and effective evaluation results and provide effective reference basis for the accuracy of subsequent analysis.

Description

RNA analysis method for paraffin section tissue
Technical Field
The invention belongs to the field of second-generation high-throughput sequencing analysis, and particularly relates to a paraffin section tissue RNA analysis method.
Background
RNA sequencing (RNA-seq) is a sensitive and accurate method of quantifying gene expression. Second generation high throughput sequencing (NGS) has created a new era for RNA-seq transcriptome analysis. The design of the broad spectrum application process of RNA-seq involves sequencing technology, sample type, demand analysis of genome and its computational resources. The analysis process is evaluated based on accuracy, calculation speed and the cost of analysis.
The gene expression profile of a tumor sample is a powerful biomarker for identifying prognosis and prediction. To date, transcriptomic profiling has been performed on a large number of cancer frozen tissue samples. However, since fresh frozen tissues of tumor samples of clinical patients are not easy to collect and store for long-term follow-up, formalin-fixed paraffin-embedded tissue (FFPE) is a more widely used biomaterial in the medical field. Genome-wide gene expression profiling of tumor samples is essential for cancer research and also facilitates extensive retrospective clinical genomic studies. FFPE is subjected to fixation, paraffin embedding, sectioning and staining to prevent degradation of cellular tissues, and these preparation processes and storage have significant negative effects on DNA and RNA quality. FFPE samples generally have severe degradation, chemical modification, cross-linking of nucleic acids and proteins, and variability in tissue handling and processing, and these molecular changes will directly affect data quality, causing several problems, such as sample degradation leading to lower sequencing data alignment quality, more soft-cut sequences, more repetitive sequences, formaldehyde fixation leading to random C-T transformation of nucleic acids, which makes FFPE isolated nucleic acids incompatible with downstream high-throughput molecular techniques. In addition to deepening the sequencing depth to supplement the problem of nucleic acid degradation, a complete index and detection method for evaluating the RNA quality of paraffin tissues are urgently needed to ensure the reliability of subsequent analysis. Meanwhile, a complete paraffin section RNA analysis process is needed to study the difference of gene expression levels of organisms in different environments or different physiological states, so that the reaction mechanism of a body can be known and an intracellular regulation network can be constructed. Meanwhile, the fusion new gene formed by connecting all or part of two genes in series due to chromosome translocation or reverse splicing plays an important role in researching the cause and development of various cancer types.
Disclosure of Invention
In order to solve the technical problems, the invention provides a paraffin section tissue RNA analysis method.
A method for analyzing RNA of paraffin section tissue, the method comprising the steps of:
carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA;
preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library;
performing quality control on the sample data obtained by sequencing to remove rRNA data;
comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result;
carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression;
based on the transcript quantification results, gene differential expression analysis was performed.
Further, the analysis method can also perform fusion gene analysis;
the Fusion gene analysis is performed by selecting one or more software selected from JAFFA, STAR-Fusion, TopHat-Fusion, Fusion catcher, or SOAPfuse.
Further, the preparation of the paraffin section sample nucleic acid library comprises the following steps:
extracting nucleic acid in the paraffin section sample;
synthesizing a single-stranded cDNA based on the nucleic acid;
synthesizing a double-stranded cDNA based on the single-stranded cDNA;
repairing the double-stranded cDNA ends;
determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample.
Further, the quality control of the sample data obtained by sequencing comprises:
removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base;
screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration;
and selecting the data and samples meeting the set threshold value for subsequent analysis.
Further, the comparing the sample data after quality control with the reference genome comprises:
comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome;
sample sequences of the aligned reference genomes are obtained.
Further, the comparing the sample sequence containing the nucleic acid sequence information with the reference genome, and selecting one or more of TopHat, STAR, or HISAT2 to compare the sample sequence with the reference genome.
Further, the quality control of the comparison result comprises:
carrying out quality evaluation on the comparison result file of the paraffin section tissues;
the repetitive sequence is removed.
Further, the quality evaluation of the comparison result file of the paraffin section tissue comprises:
evaluating one or more of the ratio of duplicate sequences, alignment, unique alignment, exon alignment, intron alignment, intergenic region alignment, expression efficiency, detected transcript, detected gene or sequence coverage uniformity.
Further, the quantifying gene expression is performed by selecting one or more of RSEM, eXpress, HTseq, Cufflinks, StringTie, Sailfish, Salmonon, quasi-mapping, or Kallisto software.
Further, the gene differential expression analysis is performed by selecting one or more of DESeq, limma, edgeR, Cuffdiff, Ballgown, DESeq2, or slauth software.
The invention provides an index and a detection method for completely evaluating RNA quality of paraffin section tissues, which can research the difference between gene expression quantities of organisms under different environments or different physiological states, so that the reaction mechanism of the organisms can be known and an intracellular regulation network can be constructed; meanwhile, the invention can also carry out fusion gene analysis, and the detection and analysis of the fusion new gene can play an important role in researching the cause and development of various cancer types.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 shows a flow chart of one embodiment of the RNA analysis method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in FIG. 1, a method for analyzing RNA of paraffin section tissue comprises the following steps: carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA; preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library; performing quality control on the sample data obtained by sequencing to remove rRNA data; comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result; carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression; performing gene differential expression analysis based on the transcript quantification result; also, fusion gene analysis was also performed using the method described in this example. The software for each step and its results file are detailed in table 1.
TABLE 1 result file presentation of paraffin section RNA analysis method
Figure BDA0002229286830000041
Figure BDA0002229286830000051
Preparation of a sample: selecting 3 stomach cancer paraffin section tissues (Beijing Jiyin medical examination laboratory, sample numbers are 199003859T, 199003855T and 199003848T) and 3 paracancer stones of stomach cancer patientsWax section tissue (Beijing Gionee plus medical laboratory, sample numbers 199003859N, 199003848N, 199003855N), DNA degradation in nuclease-free water without RNase, RNA extraction kit (MagMAX MAX) for FFPE sampleTMFFPE DNA/RNA Ultra kit) to obtain purified total RNA; re-use Ribo-ZeroTMRibosomal rna (rRNA) removal kit to remove rRNA;
library construction and sequencing: the preparation of the nucleic acid library of the paraffin section sample comprises the following steps: extracting nucleic acid in the paraffin section sample; synthesizing a single-stranded cDNA based on the nucleic acid; synthesizing a double-stranded cDNA based on the single-stranded cDNA; repairing the double-stranded cDNA ends; determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample. Specifically, a kit for obtaining high-quality library yield by using only 10 ng-1. mu.gRNA: (UltraTM RNA library preparation kit) and a DNA library is precisely quantified using a Qubit fluorescer in order to obtain high quality sequencing results. The distribution range of the fragment length of the DNA library is detected by using an Agilent 2100 bioanalyzer, and the size of the library has a narrow peak at 300 bp. RNA sequencing (RNA-Seq) was performed using a second generation high throughput sequencing platform (Illumina HiSeq Xten sequencing platform).
The quality control of the sample data obtained by sequencing comprises the following steps: removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base; screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration; and selecting the data and samples meeting the set threshold value for subsequent analysis. Specifically, the fast software (a data quality control software) is used for quality control, then the bowtie2 software (a sequencing sequence and reference sequence alignment software) is used for aligning the data after quality control with the ribosomal RNA (rRNA) database of the National Center for Biotechnology Information, NCBI for short), and the data are comparedrRNA data were removed. Filtering standard of quality control index: the number of bases after the linker removal is Clean _ Base (Cleandata reads in Table 150bp in length)>2500Mb, percentage Q20 of base mass greater than 20>90% percent of Q30 having a base mass of more than 30>85% GC content>40% and<60% N content<0.100% average read length>120bp and<150bp and rRNA alignment rate<40% of the read number after filtration (number of reads after removal of quality control not up to standard and removal of rRNA)>4*107And (5) screening. Software of bowtie2 was compared with the selected parameters: "- - -positive-D15-R2- -N0-L22-i S,1, 1.15". Specifically, see table 2, where the percentage of the paracancerous rRNA tissue with sample number 199003855N is 85.37%, the percentage is higher than the threshold, and the rRNA filtered data is only 31,165,304 reads (sequences generated by a high-throughput sequencing platform), and the number of the reads is lower than the number of the filtered data, which does not meet the requirement of the subsequent analysis, and requires resampling or rRNA degradation of the sample.
Comparing the sample data after quality control with the reference genome comprises: comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome; sample sequences of the aligned reference genomes are obtained. Specifically, The method adopts HISAT2 software (RNA-Seq Genome comparison tool software) for comparison, takes a 37 th edition of Human Genome sequence (The Genome reference consensus Human Genome Build 37, GRCh37 for short) as a reference Genome, needs to construct a HISAT2 index for The reference Genome, adopts default parameters for comparison, and adjusts individual sample parameters based on The comparison quality control result of The next sample. Preferably, this embodiment selects TopHat (a Bowtie-based RNA-Seq data analysis software) or STAR (spread proteins Alignment to a Reference, an RNA-Seq genome Alignment tool software) instead of HISAT2, or one or more combinations of TopHat, STAR or HISAT2 to align a sample sequence with a Reference genome.
The quality control of the comparison result comprises the following steps: carrying out quality evaluation on the comparison result file of the paraffin section tissues; the repetitive sequence is removed. Specifically, RNA-SeQC software (a software tool for quality control and expression evaluation of RNA-Seq data) is used for analysis, and it is necessary to construct an index for the comparison result file and operate commands: samtools index; an index is also constructed for the reference genome GRCh37, and commands: samtools false, while creating a dit index using createsequence dictionary. It is necessary to ensure that contig names of the bam file, the reference genome, and the genome gtf file are consistent. And the quality control of the comparison result can evaluate the sample and remove unqualified samples, so that the reliability of the analysis result is improved.
And (3) carrying out quality evaluation on the comparison result file of the paraffin section tissues, wherein the specific threshold value is set as follows: the Duplication Rate < 60%, the alignment Rate > 85%, the Unique alignment Rate Mappled Unique Rate > 50%, the exon alignment Rate > 50%, the intron alignment Rate < 40%, the Intergenic Rate < 10%, the expression efficiency > 45%, the detected transcript >130000, the detected gene >2000, the sequence coverage uniformity bias < 0.500%.
Table 2 shows the comparison and quality control results of 6 samples in this example, the Duplicate rate of the tissue beside cancer of sample No. 199003859N is 62.52%, which is higher, and more strict parameters are required to be used in the subsequent operation of removing the repeated sequence, so that the rate is reduced to be within the threshold range. The Mapping Rate of the tissue beside cancer with the sample number of 199003855N is less than 85%, and loose alignment conditions are required to improve the alignment Rate. All samples which do not reach the threshold value and can not directly enter the next expression quantity analysis need to be sent again or the quality control or the stricter parameter comparison is adjusted under the condition of ensuring the required data quantity so as to enable the data to reach the standard.
TABLE 2 comparison of samples and quality control results
Figure BDA0002229286830000081
Removing repeated sequences: PCR duplication is removed by Picard software (a software that operates on high throughput sequencing data and formats) because PCR amplification generates repetitive sequences that interfere with the actual enrichment signal. The Picard software REMOVEs PCR repeats with the addition of the parameter REMOVE _ DUPLICATES ═ true, otherwise only the repeat sequence is marked and not removed.
Transcriptome assembly and transcript quantification: by using StringTie software (a transcriptome marker expression quantitative software), an output file with a removed repetitive sequence can be used as an input file only by sequencing a generated comparison result file, and a reference genome annotation file is also required. The parameter used "-m 200" -m sets the minimum length allowed for the predicted transcript. StringTie software runs using the merge option, known transcripts and assembled new transcripts can be merged and assembled into a non-redundant set of transcripts. Preferably, this embodiment may further select one or more of RSEM (RNA-Seq by expectentationvalidation, abbreviated as RSEM, that is, an RNA-Seq data quantification software), efpress (an RNA-Seq data quantification software), HTseq (an RNA-Seq data analysis software), Cufflinks (an RNA-Seq transcriptome data assembly software), Sailfish (an RNA-Seq data rapid quantification software), salmonella (an RNA-Seq data quantification software), quasi-mapping (an unaligned RNA-Seq data quantification software), or kali (an RNA-Seq data rapid quantification software) to perform quantitative analysis of gene expression.
Differential expression analysis: differential expression analysis was performed using the transcript quantification results of the previous step as an input file for this step, using DESeq2 software (a software for RNA-Seq differential expression analysis based on the number of reads); cancer tissue samples were comma segmented; paracarcinoma tissue samples were also comma segmented; the space between the cancer tissue and the tissue beside the cancer is divided by a blank space. Preferably, the embodiment may further select one or more of DESeq (a piece of RNA-Seq differential expression analysis software based on the read number), limma (a piece of RNA-Seq differential expression analysis software based on the read number), edgeR (a piece of RNA-Seq differential expression analysis software based on the read number), Cuffdiff (a piece of RNA-Seq differential expression analysis software based on the assembly technology), Ballgown (a piece of RNA-Seq differential expression analysis software based on the assembly technology), or sluuth (a piece of alignment-free RNA-Seq differential expression quantitative analysis software) for gene differential expression analysis.
Analysis of fusion gene: fusion gene detection was predicted using fusion catcher software (a version of fusion gene analysis software). -d-parameter specifies the directory where the reference genome of the species is located, -i-parameter specifies the directory where the raw sequencing data fastq file corresponding to the sample is located, -o-parameter specifies the directory where the result is output. For humans, the authorities provide databases built on the Ensemblerelease 90 version. Preferably, the present embodiment may select one or more of JAFFA (a software for gene Fusion analysis based on comparing transcriptome to reference re-transcriptome), STAR-Fusion (a software for identifying Fusion gene based on STAR alignment), TopHat-Fusion (a software for identifying Fusion gene using RNA-Seq data), or SOAPfuse (an open software for probing Fusion transcript in the genome-wide range of human RNA-Seq data) for Fusion gene analysis.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A paraffin section tissue RNA analysis method is characterized by comprising the following steps:
carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA;
preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library;
performing quality control on the sample data obtained by sequencing to remove rRNA data;
comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result;
carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression;
based on the transcript quantification results, gene differential expression analysis was performed.
2. The method for RNA analysis of paraffin-cut tissue according to claim 1, wherein the method further comprises performing fusion gene analysis;
the Fusion gene analysis is performed by selecting one or more software selected from JAFFA, STAR-Fusion, TopHat-Fusion, Fusion catcher, or SOAPfuse.
3. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the preparation of the paraffin section sample nucleic acid library comprises the steps of:
extracting nucleic acid in the paraffin section sample;
synthesizing a single-stranded cDNA based on the nucleic acid;
synthesizing a double-stranded cDNA based on the single-stranded cDNA;
repairing the double-stranded cDNA ends;
determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample.
4. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quality control of the sample data obtained by sequencing comprises:
removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base;
screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration;
and selecting the data and samples meeting the set threshold value for subsequent analysis.
5. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the comparing the sample data after quality control with the reference genome comprises:
comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome;
sample sequences of the aligned reference genomes are obtained.
6. The method for RNA analysis of paraffin section tissue according to claim 5, wherein the sample sequence containing nucleic acid sequence information is aligned with the reference genome, and one or more of TopHat, STAR or HISAT2 is selected to align the sample sequence with the reference genome.
7. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quality control of the comparison result comprises:
carrying out quality evaluation on the comparison result file of the paraffin section tissues;
the repetitive sequence is removed.
8. The method for RNA analysis of paraffin section tissue according to claim 7, wherein the quality evaluation of the comparison result file of paraffin section tissue comprises:
evaluating one or more of the ratio of duplicate sequences, alignment, unique alignment, exon alignment, intron alignment, intergenic region alignment, expression efficiency, detected transcript, detected gene or sequence coverage uniformity.
9. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quantitative analysis of gene expression is performed by selecting one or more software selected from RSEM, eXpress, HTseq, Cufflinks, StringTie, Sailfish, Salmonon, quasi-mapping and Kallisto.
10. The method for RNA analysis of paraffin-cut tissue according to claim 1 or 2, wherein the gene differential expression analysis is performed by one or more software selected from DESeq, limma, edgeR, Cuffdiff, Ballgown, DESeq2, or slauth.
CN201910962113.2A 2019-10-11 2019-10-11 RNA analysis method for paraffin section tissue Pending CN110684830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910962113.2A CN110684830A (en) 2019-10-11 2019-10-11 RNA analysis method for paraffin section tissue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910962113.2A CN110684830A (en) 2019-10-11 2019-10-11 RNA analysis method for paraffin section tissue

Publications (1)

Publication Number Publication Date
CN110684830A true CN110684830A (en) 2020-01-14

Family

ID=69111995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910962113.2A Pending CN110684830A (en) 2019-10-11 2019-10-11 RNA analysis method for paraffin section tissue

Country Status (1)

Country Link
CN (1) CN110684830A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696629A (en) * 2020-06-29 2020-09-22 电子科技大学 Method for calculating gene expression quantity of RNA sequencing data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070254305A1 (en) * 2006-04-28 2007-11-01 Nsabp Foundation, Inc. Methods of whole genome or microarray expression profiling using nucleic acids prepared from formalin fixed paraffin embedded tissue
CN102409099A (en) * 2011-11-29 2012-04-11 浙江大学 Method for analyzing difference of gene expression of porcine mammary gland tissue by sequencing technology
CN102485979A (en) * 2010-12-02 2012-06-06 深圳华大基因科技有限公司 Formalin-fixed paraffin-embedded (FFPE) sample nucleic acid library
CN104630206A (en) * 2015-02-05 2015-05-20 北京诺禾致源生物信息科技有限公司 Method for constructing transcriptome library
CN104657628A (en) * 2015-01-08 2015-05-27 深圳华大基因科技服务有限公司 Proton-based transcriptome sequencing data comparison and analysis method and system
CN106055925A (en) * 2016-05-24 2016-10-26 中国水产科学研究院 Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data
CN107828857A (en) * 2017-11-23 2018-03-23 南宁科城汇信息科技有限公司 A kind of transcript profile sequencing and RNAseq data analysing methods
CN108823297A (en) * 2018-06-13 2018-11-16 领星生物科技(上海)有限公司 Transcript profile sequencing approach based on RT-WES
CN109182329A (en) * 2018-09-14 2019-01-11 求臻医学科技(北京)有限公司 A kind of application for the RNA extraction method of paraffin-embedded tissue sample and its in high-flux sequence

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070254305A1 (en) * 2006-04-28 2007-11-01 Nsabp Foundation, Inc. Methods of whole genome or microarray expression profiling using nucleic acids prepared from formalin fixed paraffin embedded tissue
CN102485979A (en) * 2010-12-02 2012-06-06 深圳华大基因科技有限公司 Formalin-fixed paraffin-embedded (FFPE) sample nucleic acid library
CN102409099A (en) * 2011-11-29 2012-04-11 浙江大学 Method for analyzing difference of gene expression of porcine mammary gland tissue by sequencing technology
CN104657628A (en) * 2015-01-08 2015-05-27 深圳华大基因科技服务有限公司 Proton-based transcriptome sequencing data comparison and analysis method and system
CN104630206A (en) * 2015-02-05 2015-05-20 北京诺禾致源生物信息科技有限公司 Method for constructing transcriptome library
CN106055925A (en) * 2016-05-24 2016-10-26 中国水产科学研究院 Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data
CN107828857A (en) * 2017-11-23 2018-03-23 南宁科城汇信息科技有限公司 A kind of transcript profile sequencing and RNAseq data analysing methods
CN108823297A (en) * 2018-06-13 2018-11-16 领星生物科技(上海)有限公司 Transcript profile sequencing approach based on RT-WES
CN109182329A (en) * 2018-09-14 2019-01-11 求臻医学科技(北京)有限公司 A kind of application for the RNA extraction method of paraffin-embedded tissue sample and its in high-flux sequence

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANIRUDDHA CHATTERJEE ET AL.: "A Guide for Designing and Analyzing RNA-Seq Data", 《METHODS IN MOLECULAR BIOLOGY》 *
ISAAC D. RAPLEE ET AL.: "Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research", 《J. PERS. MED.》 *
MILICA VUKMIROVIC ET AL.: "Identification and validation of differentially expressed transcripts by RNA-sequencing of formalin-fixed, paraffin-embedded (FFPE)lung tissue from patients with Idiopathic Pulmonary Fibrosis", 《BMC PULMONARY MEDICINE》 *
XIAN ADICONIS ET AL.: "Comprehensive comparative analysis of RNA sequencing methods for degraded or low input samples", 《NAT METHODS》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696629A (en) * 2020-06-29 2020-09-22 电子科技大学 Method for calculating gene expression quantity of RNA sequencing data
CN111696629B (en) * 2020-06-29 2023-04-18 电子科技大学 Method for calculating gene expression quantity of RNA sequencing data

Similar Documents

Publication Publication Date Title
Lowe et al. Transcriptomics technologies
TWI793586B (en) Single-molecule sequencing of plasma dna
CN110800063B (en) Detection of tumor-associated variants using cell-free DNA fragment size
JP2022088566A (en) Method and system for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths
JP2018524993A (en) Nucleic acids and methods for detecting chromosomal abnormalities
CN105506111B (en) Method for detecting CNV (CNV) marker of MAPK10 gene of Nanyang cattle and application of CNV marker
CN112289376B (en) Method and device for detecting somatic cell mutation
CA2906725C (en) Characterization of biological material using unassembled sequence information, probabilistic methods and trait-specific database catalogs
CN107506614B (en) Bacterial ncRNA prediction method
CN113470743A (en) Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
CN111192637B (en) Analytical method for lncRNA identification and expression quantification
CN112735517A (en) Method, device and storage medium for detecting joint deletion of chromosomes
CN110556163A (en) Analysis method of long-chain non-coding RNA translation small peptide based on translation group
CN111292806B (en) Transcriptome analysis method by using nanopore sequencing
CN110684830A (en) RNA analysis method for paraffin section tissue
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
CN111192636B (en) mRNA second-generation sequencing result analysis method suitable for oligo dT enrichment
Forsberg et al. CLC Bio Integrated Platform for Handling and Analysis of Tag Sequencing Data
Ye et al. Discovery of alternative polyadenylation dynamics from single cell types
CN111370065A (en) Method and device for detecting cross-sample contamination rate of RNA
CA3119980A1 (en) Methods, compositions, and systems for improving recovery of nucleic acid molecules
WO2023184330A1 (en) Method and apparatus for processing genome methylation sequencing data, device, and medium
CN108410995A (en) The screening of the more unrestrained sheep physiological period ovary genes in Xinjiang and identification method
KR101977976B1 (en) Method for increasing read data analysis accuracy in amplicon based NGS by using primer remover
Freedman et al. Building better genome annotations across the tree of life

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200114