CN114171121A - Rapid detection method for mRNA5 '3' terminal difference - Google Patents
Rapid detection method for mRNA5 '3' terminal difference Download PDFInfo
- Publication number
- CN114171121A CN114171121A CN202010943960.7A CN202010943960A CN114171121A CN 114171121 A CN114171121 A CN 114171121A CN 202010943960 A CN202010943960 A CN 202010943960A CN 114171121 A CN114171121 A CN 114171121A
- Authority
- CN
- China
- Prior art keywords
- experimental group
- control group
- fpkm
- length
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 8
- 239000012634 fragment Substances 0.000 claims abstract description 18
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 108020004999 messenger RNA Proteins 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 10
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 7
- 230000014509 gene expression Effects 0.000 claims description 9
- 108010029485 Protein Isoforms Proteins 0.000 claims description 3
- 102000001708 Protein Isoforms Human genes 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 108091029795 Intergenic region Proteins 0.000 description 2
- 238000003559 RNA-seq method Methods 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a rapid detection method for mRNA5 '3' terminal difference. The method comprises the steps of firstly utilizing HISAT2 to compare read measured by a control group and an experimental group to a GRCh38 reference sequence, selecting fragments of intergenic regions for subsequent analysis through sequence annotation information, then utilizing overlapping sequences among the read to assemble the read one by one to form a5 'or 3' extension sequence of a gene, then forming an isotype transcript according to mRNA5 'or 3' extension sequence of the gene, comparing the experimental group and the control group, and if the requirement that the length of the experimental group is longer than the length of the control group by more than 200nt and the length of the experimental group is longer than the length of the control group and the requirement that the length of the experimental group is longer than the length of the control group and the length of the experimental group is equal to KM FPKMExperimental group/FPKMControl groupAnd (3) if the ratio of (a) to (b) is 2 or more, extracting a transcript unique to the experimental group relative to the control group as downstream analysis data. The invention effectively solves the problem of error assembly of the existing mRNA fragments, improves the credibility of differential analysis, realizes the quick search of the specific transcript of an experimental group, and enables the transcriptome data to further reflect the objective authenticity of a research object.
Description
Technical Field
The invention belongs to the technical field of RNA sequencing, and relates to a rapid detection method for mRNA5 '3' terminal difference.
Background
RNA sequencing and transcriptome analysis are powerful tools and methods for explaining the molecular mechanism of the life process, and provide basic and rich research and development data for life science industries such as downstream medicine, agriculture and the like by systematically analyzing an expression regulation network. The mRNA is an important object of transcriptome research, directly reflects the expression condition of active regions of a genome in existence or not and how much, can intuitively establish the corresponding relation between gene expression and phenotype of a life body by matching with comparative analysis among samples, and is beneficial to researching the basic process of life from the basic nucleic acid molecular level of life.
mRNA sequencing work mainly focuses on quantitative difference analysis, and two paths are provided before quantification to obtain the number of mRNA fragments (reads or fragments) for quantification, wherein one is to perform difference analysis by comparing an existing genome annotation file with a known sequence to obtain the existing sequence expression quantity of a database; secondly, based on the known mRNA fragment sequence, the real sequencing sequence (read) is combined for head assembly. The assembly result not only can be used for carrying out expression quantity difference analysis, but also can be used for finding out potential different transcripts (isofrorm) in corresponding experiments, and the specific transcripts are research targets of gene functions or life processes.
For mRNA fragment (read) assembly, Cufflinks and StringTie are two common software, the latter is an updated version of the former, and the StringTie improves the detection sensitivity of new transcripts and has shorter running time in addition to keeping the basic properties of the former. However, the improvement of the sensitivity brings about a plurality of incredible false positive assembly results, and causes pollution to objective single positive results; meanwhile, when the coverage of mRNA fragments (reads) is not high, the software still considers to be reliable and derives an assembly result, so that when a subsequent experimental group and a control group are compared in pairs, positive assembly fragments are erased, and a false negative result is caused.
Disclosure of Invention
The invention provides a simple, high-efficiency and high-precision rapid detection method for mRNA5 '3' terminal difference aiming at the errors introduced by the existing software in the process of de novo assembly of mRNA fragments (reads).
The technical scheme of the invention is as follows:
a method for rapidly detecting the difference of the 5 '3' terminal of mRNA comprises the following steps:
step 1, using HISAT2 to compare mRNA fragments (read) measured by a control group and an experimental group to a GRCh38 reference sequence, and selecting fragments of an intergenic region for subsequent analysis through sequence annotation information;
step 2, assembling the reads one by utilizing the overlapping sequence among the reads to form a5 'or 3' extended sequence of the gene;
step 3, forming an isoform (isoform) transcript based on the mRNA5 'or 3' elongation sequence of the gene, comparing the experimental group with the control group if the experimental group is longer than the control group by 200nt and FPKM is satisfiedExperimental group/FPKMControl groupIf the ratio of the two is more than or equal to 2, the difference between the tail ends of the experimental group and the control group is obvious, and then the specific transcript of the experimental group relative to the control group is extracted and used as downstream analysis data; if the length of the experimental group is not satisfied, the length of the control group is more than 200nt and FPKM is adoptedExperimental group/FPKMControl groupThe ratio of (a) to (b) is greater than or equal to 2, which indicates that the difference between the ends of the experimental group and the control group is not significant, and no subsequent analysis is performed.
In the embodiment of the present invention, FPKM is set for improving the credibility of the assembly sequencebinAnd FPKMextTwo parameters, FPKMbinRepresenting the read expression per bin unit (selected sequence window), FPKMextThe expression level of read representing the total length of extension, both of which are generally greater than 1, is reliable. Meanwhile, the window sequence length parameter bin can be manually set according to actual operation requirements, the smaller the numerical value is, the more accurate the assembly result is, but the more time is consumed. In a specific embodiment of the present invention, the window sequence length parameter bin is set to 10 nt.
Compared with the prior art, the invention has the following advantages:
the invention effectively solves the problem of error assembly of the existing mRNA fragments, improves the credibility of differential analysis, realizes the quick search of the specific transcript of an experimental group, and enables the transcriptome data to further reflect the objective authenticity of a research object.
Drawings
FIG. 1 is a schematic flow chart of the method for rapidly detecting the difference in 5 '3' terminal of mRNA according to the present invention.
Fig. 2 is a graph showing the results of comparison between the experimental group and the control group generated in the examples.
Detailed Description
The present invention will be described in further detail with reference to the following examples and the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, those skilled in the art will also recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances.
Example 1
The flow of the rapid detection method for mRNA5 '3' end difference of the invention is shown in figure 1, and comprises the following steps:
(1) collecting single-stranded RNA fragments (reads) aligned to intergenic regions
Downloading a FASTA file of a GRCh38 genome sequence and a corresponding GTF annotation file, generating an exon and cut site file according to annotation contents by using HISAT2 software, establishing an index through the generated two files, and then completing alignment by using HISAT2 to generate a SAM file. Alignment procedure-k parameter was set to 1, i.e. only one of the corresponding RNA fragments in the best aligned position was retained.
(2) Elongation of fragment (read) Assembly determination of actual unique transcripts
From the alignment, a fragment (read) of the intergenic region was extracted. Setting bin Window Length to 10nt, FPKMbinAnd FPKMextBoth greater than 1, and then assembled stepwise with overlapping sequences between the fragments. After assembly, 5 'or 3' extension sequences are obtained. Comparing the experimental group with the control group, the length of the experimental group longer than the control group is set to be more than 200nt, and the FPKM is effectiveExperimental group/FPKMControl groupThe ratio of (a) to (b) is not less than 2, and finally, a transcript unique to the experimental group relative to the control group is extracted, and a result file is generated as follows and is shown in fig. 2.
Claims (3)
1. A method for rapidly detecting the difference of the 5 '3' terminal of mRNA is characterized by comprising the following steps:
step 1, comparing read measured by a control group and an experimental group to a GRCh38 reference sequence by using HISAT2, and selecting fragments of intergenic regions for subsequent analysis through sequence annotation information;
step 2, assembling the reads one by utilizing the overlapping sequence among the reads to form a5 'or 3' extended sequence of the gene;
step 3, forming an isoform transcript from the mRNA5 'or 3' elongation sequence of the gene, comparing the experimental group with the control group if the experimental group is longer than the control group by 200nt or more and the FPKM is satisfiedExperimental group/FPKMControl groupIf the ratio of the two is more than or equal to 2, the difference between the tail ends of the experimental group and the control group is obvious, and then the specific transcript of the experimental group relative to the control group is extracted and used as downstream analysis data; if the length of the experimental group is not satisfied, the length of the control group is more than 200nt and FPKM is adoptedExperimental group/FPKMControl groupThe ratio of (a) to (b) is greater than or equal to 2, which indicates that the difference between the ends of the experimental group and the control group is not significant, and no subsequent analysis is performed.
2. The rapid test method according to claim 1, wherein in step 1, FPKM is setbinAnd FPKMextTwo parameters, FPKMbinRepresenting the amount of read expression per bin unit, FPKMextRepresenting the amount of read expression extended over the total length, FPKMbinAnd FPKMextAre all greater than 1.
3. The rapid detection method according to claim 1, wherein in step 1, the window sequence length parameter bin is set to 10 nt.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010943960.7A CN114171121B (en) | 2020-09-10 | 2020-09-10 | Quick detection method for mRNA 5'3' terminal difference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010943960.7A CN114171121B (en) | 2020-09-10 | 2020-09-10 | Quick detection method for mRNA 5'3' terminal difference |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114171121A true CN114171121A (en) | 2022-03-11 |
CN114171121B CN114171121B (en) | 2024-05-17 |
Family
ID=80475666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010943960.7A Active CN114171121B (en) | 2020-09-10 | 2020-09-10 | Quick detection method for mRNA 5'3' terminal difference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114171121B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002079502A1 (en) * | 2001-03-28 | 2002-10-10 | The University Of Queensland | A method for nucleic acid sequence analysis |
CN103146686A (en) * | 2013-03-07 | 2013-06-12 | 新疆农垦科学院 | Mass acquisition method of different loci and flanking sequence thereof in genome DNAs (deoxyribonucleic acids) |
CN103343392A (en) * | 2013-07-03 | 2013-10-09 | 中山大学 | MRNA (Messenger Ribonucleic Acid) 3' terminal library as well as construction and sequence measuring methods and application thereof |
CN104965999A (en) * | 2015-06-05 | 2015-10-07 | 西安交通大学 | Analysis and integration method and device for sequencing of medium-short gene segment |
US20160244827A1 (en) * | 2013-10-31 | 2016-08-25 | Lexogen Gmbh | Nucleic acid copy number determination based on fragment estimates |
CN105986007A (en) * | 2015-02-11 | 2016-10-05 | 深圳华大基因股份有限公司 | Detection method of cancer tumor suppressor gene cluster (TSG) |
CN107203703A (en) * | 2017-05-22 | 2017-09-26 | 人和未来生物科技(长沙)有限公司 | A kind of transcript profile sequencing data calculates deciphering method |
CN107766696A (en) * | 2016-08-23 | 2018-03-06 | 武汉生命之美科技有限公司 | Eucaryote alternative splicing analysis method and system based on RNA seq data |
US20180157787A1 (en) * | 2016-10-19 | 2018-06-07 | Pacific Biosciences Of California, Inc. | Coding genome reconstruction from transcript sequences |
CN109234267A (en) * | 2018-09-12 | 2019-01-18 | 中国科学院遗传与发育生物学研究所 | A kind of genome assemble method |
CN110029185A (en) * | 2019-04-11 | 2019-07-19 | 广西壮族自治区亚热带作物研究所(广西亚热带农产品加工研究所) | A kind of research method carrying out sisal leaves Fibre Development in transcript profile sequencing level |
US20200048709A1 (en) * | 2018-08-10 | 2020-02-13 | Exxonmobil Research And Engineering Company | Automated differential expression analysis of rna sequencing data |
CN111575272A (en) * | 2019-12-11 | 2020-08-25 | 清华大学 | High-copy DNA repetitive sequence in vitro rapid synthesis based on blocking type chain polymerization amplification reaction |
-
2020
- 2020-09-10 CN CN202010943960.7A patent/CN114171121B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002079502A1 (en) * | 2001-03-28 | 2002-10-10 | The University Of Queensland | A method for nucleic acid sequence analysis |
CN103146686A (en) * | 2013-03-07 | 2013-06-12 | 新疆农垦科学院 | Mass acquisition method of different loci and flanking sequence thereof in genome DNAs (deoxyribonucleic acids) |
CN103343392A (en) * | 2013-07-03 | 2013-10-09 | 中山大学 | MRNA (Messenger Ribonucleic Acid) 3' terminal library as well as construction and sequence measuring methods and application thereof |
US20160244827A1 (en) * | 2013-10-31 | 2016-08-25 | Lexogen Gmbh | Nucleic acid copy number determination based on fragment estimates |
CN105986007A (en) * | 2015-02-11 | 2016-10-05 | 深圳华大基因股份有限公司 | Detection method of cancer tumor suppressor gene cluster (TSG) |
CN104965999A (en) * | 2015-06-05 | 2015-10-07 | 西安交通大学 | Analysis and integration method and device for sequencing of medium-short gene segment |
CN107766696A (en) * | 2016-08-23 | 2018-03-06 | 武汉生命之美科技有限公司 | Eucaryote alternative splicing analysis method and system based on RNA seq data |
US20180157787A1 (en) * | 2016-10-19 | 2018-06-07 | Pacific Biosciences Of California, Inc. | Coding genome reconstruction from transcript sequences |
CN107203703A (en) * | 2017-05-22 | 2017-09-26 | 人和未来生物科技(长沙)有限公司 | A kind of transcript profile sequencing data calculates deciphering method |
US20200048709A1 (en) * | 2018-08-10 | 2020-02-13 | Exxonmobil Research And Engineering Company | Automated differential expression analysis of rna sequencing data |
CN109234267A (en) * | 2018-09-12 | 2019-01-18 | 中国科学院遗传与发育生物学研究所 | A kind of genome assemble method |
CN110029185A (en) * | 2019-04-11 | 2019-07-19 | 广西壮族自治区亚热带作物研究所(广西亚热带农产品加工研究所) | A kind of research method carrying out sisal leaves Fibre Development in transcript profile sequencing level |
CN111575272A (en) * | 2019-12-11 | 2020-08-25 | 清华大学 | High-copy DNA repetitive sequence in vitro rapid synthesis based on blocking type chain polymerization amplification reaction |
Non-Patent Citations (2)
Title |
---|
XIAOHONG LI: "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data", COMPARATIVE STUDY, vol. 12, no. 5, 31 December 2017 (2017-12-31) * |
洪奇阳;毕行建;王大宁;李子真;俞海;夏宁邵;李少伟;: "转录组测序技术研究进展", 中国生化药物杂志, no. 06, 31 December 2017 (2017-12-31) * |
Also Published As
Publication number | Publication date |
---|---|
CN114171121B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bentolila et al. | Comprehensive high-resolution analysis of the role of an Arabidopsis gene family in RNA editing | |
Wang et al. | Computational resources for ribosome profiling: from database to Web server and software | |
CN111354418B (en) | High-throughput sequencing technology animal tRFs data analysis method based on reference genome annotation file | |
CN115101128B (en) | Method for evaluating off-target risk of hybridization capture probe | |
CN107506614B (en) | Bacterial ncRNA prediction method | |
CN112309503A (en) | Base interpretation method, interpretation equipment and storage medium based on nanopore electric signal | |
CN110556162A (en) | Detection and analysis method of cyclic RNA translation polypeptide based on translation group | |
CN101914619A (en) | RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression | |
CN110993023A (en) | Detection method and detection device for complex mutation | |
CN112795654A (en) | Method and kit for organism fusion gene detection and fusion abundance quantification | |
CN111192637A (en) | Analytical method for lncRNA identification and expression quantification | |
CN110556163A (en) | Analysis method of long-chain non-coding RNA translation small peptide based on translation group | |
CN111292806B (en) | Transcriptome analysis method by using nanopore sequencing | |
KR20080111537A (en) | Individual discrimination method and apparatus | |
CN111710362B (en) | Design method and application of capture probe based on next generation sequencing | |
CN112750501B (en) | Optimized analysis method for macro virus group flow | |
CN114171121B (en) | Quick detection method for mRNA 5'3' terminal difference | |
CN117133354B (en) | Method for efficiently identifying key breeding gene modules of forest tree | |
CN113793644A (en) | Quality evaluation method of DNA detection data | |
KR20070086080A (en) | Method, program and system for the standardization of gene expression amount | |
CN115394356A (en) | Method and device for filtering rRNA sequences in transcriptome sequencing data | |
CN112489724A (en) | Transcriptome data automatic analysis method based on next generation sequencing | |
CN110684830A (en) | RNA analysis method for paraffin section tissue | |
CN110592093B (en) | Aptamer capable of recognizing EpCAM protein, and preparation method and application thereof | |
CN117095748B (en) | Method for constructing plant miRNA genetic regulation pathway |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |