CN114171121A - Rapid detection method for mRNA5 '3' terminal difference - Google Patents

Rapid detection method for mRNA5 '3' terminal difference Download PDF

Info

Publication number
CN114171121A
CN114171121A CN202010943960.7A CN202010943960A CN114171121A CN 114171121 A CN114171121 A CN 114171121A CN 202010943960 A CN202010943960 A CN 202010943960A CN 114171121 A CN114171121 A CN 114171121A
Authority
CN
China
Prior art keywords
experimental group
control group
fpkm
length
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010943960.7A
Other languages
Chinese (zh)
Other versions
CN114171121B (en
Inventor
孙海汐
高峰
顾颖
沈玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN202010943960.7A priority Critical patent/CN114171121B/en
Publication of CN114171121A publication Critical patent/CN114171121A/en
Application granted granted Critical
Publication of CN114171121B publication Critical patent/CN114171121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a rapid detection method for mRNA5 '3' terminal difference. The method comprises the steps of firstly utilizing HISAT2 to compare read measured by a control group and an experimental group to a GRCh38 reference sequence, selecting fragments of intergenic regions for subsequent analysis through sequence annotation information, then utilizing overlapping sequences among the read to assemble the read one by one to form a5 'or 3' extension sequence of a gene, then forming an isotype transcript according to mRNA5 'or 3' extension sequence of the gene, comparing the experimental group and the control group, and if the requirement that the length of the experimental group is longer than the length of the control group by more than 200nt and the length of the experimental group is longer than the length of the control group and the requirement that the length of the experimental group is longer than the length of the control group and the length of the experimental group is equal to KM FPKMExperimental group/FPKMControl groupAnd (3) if the ratio of (a) to (b) is 2 or more, extracting a transcript unique to the experimental group relative to the control group as downstream analysis data. The invention effectively solves the problem of error assembly of the existing mRNA fragments, improves the credibility of differential analysis, realizes the quick search of the specific transcript of an experimental group, and enables the transcriptome data to further reflect the objective authenticity of a research object.

Description

Rapid detection method for mRNA5 '3' terminal difference
Technical Field
The invention belongs to the technical field of RNA sequencing, and relates to a rapid detection method for mRNA5 '3' terminal difference.
Background
RNA sequencing and transcriptome analysis are powerful tools and methods for explaining the molecular mechanism of the life process, and provide basic and rich research and development data for life science industries such as downstream medicine, agriculture and the like by systematically analyzing an expression regulation network. The mRNA is an important object of transcriptome research, directly reflects the expression condition of active regions of a genome in existence or not and how much, can intuitively establish the corresponding relation between gene expression and phenotype of a life body by matching with comparative analysis among samples, and is beneficial to researching the basic process of life from the basic nucleic acid molecular level of life.
mRNA sequencing work mainly focuses on quantitative difference analysis, and two paths are provided before quantification to obtain the number of mRNA fragments (reads or fragments) for quantification, wherein one is to perform difference analysis by comparing an existing genome annotation file with a known sequence to obtain the existing sequence expression quantity of a database; secondly, based on the known mRNA fragment sequence, the real sequencing sequence (read) is combined for head assembly. The assembly result not only can be used for carrying out expression quantity difference analysis, but also can be used for finding out potential different transcripts (isofrorm) in corresponding experiments, and the specific transcripts are research targets of gene functions or life processes.
For mRNA fragment (read) assembly, Cufflinks and StringTie are two common software, the latter is an updated version of the former, and the StringTie improves the detection sensitivity of new transcripts and has shorter running time in addition to keeping the basic properties of the former. However, the improvement of the sensitivity brings about a plurality of incredible false positive assembly results, and causes pollution to objective single positive results; meanwhile, when the coverage of mRNA fragments (reads) is not high, the software still considers to be reliable and derives an assembly result, so that when a subsequent experimental group and a control group are compared in pairs, positive assembly fragments are erased, and a false negative result is caused.
Disclosure of Invention
The invention provides a simple, high-efficiency and high-precision rapid detection method for mRNA5 '3' terminal difference aiming at the errors introduced by the existing software in the process of de novo assembly of mRNA fragments (reads).
The technical scheme of the invention is as follows:
a method for rapidly detecting the difference of the 5 '3' terminal of mRNA comprises the following steps:
step 1, using HISAT2 to compare mRNA fragments (read) measured by a control group and an experimental group to a GRCh38 reference sequence, and selecting fragments of an intergenic region for subsequent analysis through sequence annotation information;
step 2, assembling the reads one by utilizing the overlapping sequence among the reads to form a5 'or 3' extended sequence of the gene;
step 3, forming an isoform (isoform) transcript based on the mRNA5 'or 3' elongation sequence of the gene, comparing the experimental group with the control group if the experimental group is longer than the control group by 200nt and FPKM is satisfiedExperimental group/FPKMControl groupIf the ratio of the two is more than or equal to 2, the difference between the tail ends of the experimental group and the control group is obvious, and then the specific transcript of the experimental group relative to the control group is extracted and used as downstream analysis data; if the length of the experimental group is not satisfied, the length of the control group is more than 200nt and FPKM is adoptedExperimental group/FPKMControl groupThe ratio of (a) to (b) is greater than or equal to 2, which indicates that the difference between the ends of the experimental group and the control group is not significant, and no subsequent analysis is performed.
In the embodiment of the present invention, FPKM is set for improving the credibility of the assembly sequencebinAnd FPKMextTwo parameters, FPKMbinRepresenting the read expression per bin unit (selected sequence window), FPKMextThe expression level of read representing the total length of extension, both of which are generally greater than 1, is reliable. Meanwhile, the window sequence length parameter bin can be manually set according to actual operation requirements, the smaller the numerical value is, the more accurate the assembly result is, but the more time is consumed. In a specific embodiment of the present invention, the window sequence length parameter bin is set to 10 nt.
Compared with the prior art, the invention has the following advantages:
the invention effectively solves the problem of error assembly of the existing mRNA fragments, improves the credibility of differential analysis, realizes the quick search of the specific transcript of an experimental group, and enables the transcriptome data to further reflect the objective authenticity of a research object.
Drawings
FIG. 1 is a schematic flow chart of the method for rapidly detecting the difference in 5 '3' terminal of mRNA according to the present invention.
Fig. 2 is a graph showing the results of comparison between the experimental group and the control group generated in the examples.
Detailed Description
The present invention will be described in further detail with reference to the following examples and the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, those skilled in the art will also recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances.
Example 1
The flow of the rapid detection method for mRNA5 '3' end difference of the invention is shown in figure 1, and comprises the following steps:
(1) collecting single-stranded RNA fragments (reads) aligned to intergenic regions
Downloading a FASTA file of a GRCh38 genome sequence and a corresponding GTF annotation file, generating an exon and cut site file according to annotation contents by using HISAT2 software, establishing an index through the generated two files, and then completing alignment by using HISAT2 to generate a SAM file. Alignment procedure-k parameter was set to 1, i.e. only one of the corresponding RNA fragments in the best aligned position was retained.
(2) Elongation of fragment (read) Assembly determination of actual unique transcripts
From the alignment, a fragment (read) of the intergenic region was extracted. Setting bin Window Length to 10nt, FPKMbinAnd FPKMextBoth greater than 1, and then assembled stepwise with overlapping sequences between the fragments. After assembly, 5 'or 3' extension sequences are obtained. Comparing the experimental group with the control group, the length of the experimental group longer than the control group is set to be more than 200nt, and the FPKM is effectiveExperimental group/FPKMControl groupThe ratio of (a) to (b) is not less than 2, and finally, a transcript unique to the experimental group relative to the control group is extracted, and a result file is generated as follows and is shown in fig. 2.
Figure BDA0002674601170000041

Claims (3)

1. A method for rapidly detecting the difference of the 5 '3' terminal of mRNA is characterized by comprising the following steps:
step 1, comparing read measured by a control group and an experimental group to a GRCh38 reference sequence by using HISAT2, and selecting fragments of intergenic regions for subsequent analysis through sequence annotation information;
step 2, assembling the reads one by utilizing the overlapping sequence among the reads to form a5 'or 3' extended sequence of the gene;
step 3, forming an isoform transcript from the mRNA5 'or 3' elongation sequence of the gene, comparing the experimental group with the control group if the experimental group is longer than the control group by 200nt or more and the FPKM is satisfiedExperimental group/FPKMControl groupIf the ratio of the two is more than or equal to 2, the difference between the tail ends of the experimental group and the control group is obvious, and then the specific transcript of the experimental group relative to the control group is extracted and used as downstream analysis data; if the length of the experimental group is not satisfied, the length of the control group is more than 200nt and FPKM is adoptedExperimental group/FPKMControl groupThe ratio of (a) to (b) is greater than or equal to 2, which indicates that the difference between the ends of the experimental group and the control group is not significant, and no subsequent analysis is performed.
2. The rapid test method according to claim 1, wherein in step 1, FPKM is setbinAnd FPKMextTwo parameters, FPKMbinRepresenting the amount of read expression per bin unit, FPKMextRepresenting the amount of read expression extended over the total length, FPKMbinAnd FPKMextAre all greater than 1.
3. The rapid detection method according to claim 1, wherein in step 1, the window sequence length parameter bin is set to 10 nt.
CN202010943960.7A 2020-09-10 2020-09-10 Quick detection method for mRNA 5'3' terminal difference Active CN114171121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010943960.7A CN114171121B (en) 2020-09-10 2020-09-10 Quick detection method for mRNA 5'3' terminal difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010943960.7A CN114171121B (en) 2020-09-10 2020-09-10 Quick detection method for mRNA 5'3' terminal difference

Publications (2)

Publication Number Publication Date
CN114171121A true CN114171121A (en) 2022-03-11
CN114171121B CN114171121B (en) 2024-05-17

Family

ID=80475666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010943960.7A Active CN114171121B (en) 2020-09-10 2020-09-10 Quick detection method for mRNA 5'3' terminal difference

Country Status (1)

Country Link
CN (1) CN114171121B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002079502A1 (en) * 2001-03-28 2002-10-10 The University Of Queensland A method for nucleic acid sequence analysis
CN103146686A (en) * 2013-03-07 2013-06-12 新疆农垦科学院 Mass acquisition method of different loci and flanking sequence thereof in genome DNAs (deoxyribonucleic acids)
CN103343392A (en) * 2013-07-03 2013-10-09 中山大学 MRNA (Messenger Ribonucleic Acid) 3' terminal library as well as construction and sequence measuring methods and application thereof
CN104965999A (en) * 2015-06-05 2015-10-07 西安交通大学 Analysis and integration method and device for sequencing of medium-short gene segment
US20160244827A1 (en) * 2013-10-31 2016-08-25 Lexogen Gmbh Nucleic acid copy number determination based on fragment estimates
CN105986007A (en) * 2015-02-11 2016-10-05 深圳华大基因股份有限公司 Detection method of cancer tumor suppressor gene cluster (TSG)
CN107203703A (en) * 2017-05-22 2017-09-26 人和未来生物科技(长沙)有限公司 A kind of transcript profile sequencing data calculates deciphering method
CN107766696A (en) * 2016-08-23 2018-03-06 武汉生命之美科技有限公司 Eucaryote alternative splicing analysis method and system based on RNA seq data
US20180157787A1 (en) * 2016-10-19 2018-06-07 Pacific Biosciences Of California, Inc. Coding genome reconstruction from transcript sequences
CN109234267A (en) * 2018-09-12 2019-01-18 中国科学院遗传与发育生物学研究所 A kind of genome assemble method
CN110029185A (en) * 2019-04-11 2019-07-19 广西壮族自治区亚热带作物研究所(广西亚热带农产品加工研究所) A kind of research method carrying out sisal leaves Fibre Development in transcript profile sequencing level
US20200048709A1 (en) * 2018-08-10 2020-02-13 Exxonmobil Research And Engineering Company Automated differential expression analysis of rna sequencing data
CN111575272A (en) * 2019-12-11 2020-08-25 清华大学 High-copy DNA repetitive sequence in vitro rapid synthesis based on blocking type chain polymerization amplification reaction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002079502A1 (en) * 2001-03-28 2002-10-10 The University Of Queensland A method for nucleic acid sequence analysis
CN103146686A (en) * 2013-03-07 2013-06-12 新疆农垦科学院 Mass acquisition method of different loci and flanking sequence thereof in genome DNAs (deoxyribonucleic acids)
CN103343392A (en) * 2013-07-03 2013-10-09 中山大学 MRNA (Messenger Ribonucleic Acid) 3' terminal library as well as construction and sequence measuring methods and application thereof
US20160244827A1 (en) * 2013-10-31 2016-08-25 Lexogen Gmbh Nucleic acid copy number determination based on fragment estimates
CN105986007A (en) * 2015-02-11 2016-10-05 深圳华大基因股份有限公司 Detection method of cancer tumor suppressor gene cluster (TSG)
CN104965999A (en) * 2015-06-05 2015-10-07 西安交通大学 Analysis and integration method and device for sequencing of medium-short gene segment
CN107766696A (en) * 2016-08-23 2018-03-06 武汉生命之美科技有限公司 Eucaryote alternative splicing analysis method and system based on RNA seq data
US20180157787A1 (en) * 2016-10-19 2018-06-07 Pacific Biosciences Of California, Inc. Coding genome reconstruction from transcript sequences
CN107203703A (en) * 2017-05-22 2017-09-26 人和未来生物科技(长沙)有限公司 A kind of transcript profile sequencing data calculates deciphering method
US20200048709A1 (en) * 2018-08-10 2020-02-13 Exxonmobil Research And Engineering Company Automated differential expression analysis of rna sequencing data
CN109234267A (en) * 2018-09-12 2019-01-18 中国科学院遗传与发育生物学研究所 A kind of genome assemble method
CN110029185A (en) * 2019-04-11 2019-07-19 广西壮族自治区亚热带作物研究所(广西亚热带农产品加工研究所) A kind of research method carrying out sisal leaves Fibre Development in transcript profile sequencing level
CN111575272A (en) * 2019-12-11 2020-08-25 清华大学 High-copy DNA repetitive sequence in vitro rapid synthesis based on blocking type chain polymerization amplification reaction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOHONG LI: "A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data", COMPARATIVE STUDY, vol. 12, no. 5, 31 December 2017 (2017-12-31) *
洪奇阳;毕行建;王大宁;李子真;俞海;夏宁邵;李少伟;: "转录组测序技术研究进展", 中国生化药物杂志, no. 06, 31 December 2017 (2017-12-31) *

Also Published As

Publication number Publication date
CN114171121B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Bentolila et al. Comprehensive high-resolution analysis of the role of an Arabidopsis gene family in RNA editing
Wang et al. Computational resources for ribosome profiling: from database to Web server and software
CN111354418B (en) High-throughput sequencing technology animal tRFs data analysis method based on reference genome annotation file
CN115101128B (en) Method for evaluating off-target risk of hybridization capture probe
CN107506614B (en) Bacterial ncRNA prediction method
CN112309503A (en) Base interpretation method, interpretation equipment and storage medium based on nanopore electric signal
CN110556162A (en) Detection and analysis method of cyclic RNA translation polypeptide based on translation group
CN101914619A (en) RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression
CN110993023A (en) Detection method and detection device for complex mutation
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
CN111192637A (en) Analytical method for lncRNA identification and expression quantification
CN110556163A (en) Analysis method of long-chain non-coding RNA translation small peptide based on translation group
CN111292806B (en) Transcriptome analysis method by using nanopore sequencing
KR20080111537A (en) Individual discrimination method and apparatus
CN111710362B (en) Design method and application of capture probe based on next generation sequencing
CN112750501B (en) Optimized analysis method for macro virus group flow
CN114171121B (en) Quick detection method for mRNA 5'3' terminal difference
CN117133354B (en) Method for efficiently identifying key breeding gene modules of forest tree
CN113793644A (en) Quality evaluation method of DNA detection data
KR20070086080A (en) Method, program and system for the standardization of gene expression amount
CN115394356A (en) Method and device for filtering rRNA sequences in transcriptome sequencing data
CN112489724A (en) Transcriptome data automatic analysis method based on next generation sequencing
CN110684830A (en) RNA analysis method for paraffin section tissue
CN110592093B (en) Aptamer capable of recognizing EpCAM protein, and preparation method and application thereof
CN117095748B (en) Method for constructing plant miRNA genetic regulation pathway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant