CN113774121A - Low sample size m based on RNA connection label6A high throughput sequencing method - Google Patents
Low sample size m based on RNA connection label6A high throughput sequencing method Download PDFInfo
- Publication number
- CN113774121A CN113774121A CN202111066944.5A CN202111066944A CN113774121A CN 113774121 A CN113774121 A CN 113774121A CN 202111066944 A CN202111066944 A CN 202111066944A CN 113774121 A CN113774121 A CN 113774121A
- Authority
- CN
- China
- Prior art keywords
- rna
- sample
- samples
- linker
- barcode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012165 high-throughput sequencing Methods 0.000 title claims abstract description 23
- 238000012163 sequencing technique Methods 0.000 claims abstract description 62
- 239000000523 sample Substances 0.000 claims abstract description 58
- 230000006154 adenylylation Effects 0.000 claims abstract description 19
- 239000013614 RNA sample Substances 0.000 claims abstract description 13
- 238000002156 mixing Methods 0.000 claims abstract description 12
- 238000001114 immunoprecipitation Methods 0.000 claims abstract description 10
- 238000003559 RNA-seq method Methods 0.000 claims abstract description 7
- 230000026731 phosphorylation Effects 0.000 claims abstract description 7
- 238000006366 phosphorylation reaction Methods 0.000 claims abstract description 7
- 239000013068 control sample Substances 0.000 claims abstract description 3
- 229920002477 rna polymer Polymers 0.000 claims description 45
- 238000006243 chemical reaction Methods 0.000 claims description 33
- 238000003752 polymerase chain reaction Methods 0.000 claims description 14
- 238000000746 purification Methods 0.000 claims description 14
- 102100035460 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 claims description 11
- 239000002299 complementary DNA Substances 0.000 claims description 11
- 239000003153 chemical reaction reagent Substances 0.000 claims description 10
- 108090000790 Enzymes Proteins 0.000 claims description 9
- 102000004190 Enzymes Human genes 0.000 claims description 9
- 238000010839 reverse transcription Methods 0.000 claims description 8
- 238000011529 RT qPCR Methods 0.000 claims description 7
- 238000003762 quantitative reverse transcription PCR Methods 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims description 5
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 claims description 4
- JLVVSXFLKOJNIY-UHFFFAOYSA-N Magnesium ion Chemical compound [Mg+2] JLVVSXFLKOJNIY-UHFFFAOYSA-N 0.000 claims description 3
- 101710188535 RNA ligase 2 Proteins 0.000 claims description 3
- 101710204104 RNA-editing ligase 2, mitochondrial Proteins 0.000 claims description 3
- 229910001425 magnesium ion Inorganic materials 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 239000000758 substrate Substances 0.000 claims description 3
- 108091012456 T4 RNA ligase 1 Proteins 0.000 claims description 2
- 150000002500 ions Chemical class 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract description 15
- 108020004414 DNA Proteins 0.000 description 20
- 210000004027 cell Anatomy 0.000 description 13
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 10
- 230000004048 modification Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 238000012772 sequence design Methods 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 101710086015 RNA ligase Proteins 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 150000003838 adenosines Chemical class 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000003766 bioinformatics method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000002779 inactivation Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 239000011535 reaction buffer Substances 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 239000003161 ribonuclease inhibitor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 229920002594 Polyethylene Glycol 8000 Polymers 0.000 description 1
- 230000026279 RNA modification Effects 0.000 description 1
- 101150023114 RNA1 gene Proteins 0.000 description 1
- 101150084101 RNA2 gene Proteins 0.000 description 1
- 101100353432 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) PRP2 gene Proteins 0.000 description 1
- 101100191561 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) PRP3 gene Proteins 0.000 description 1
- 101100084449 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) PRP4 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000000749 co-immunoprecipitation Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 1
- 239000012160 loading buffer Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 231100000915 pathological change Toxicity 0.000 description 1
- 230000036285 pathological change Effects 0.000 description 1
- 210000005259 peripheral blood Anatomy 0.000 description 1
- 239000011886 peripheral blood Substances 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 238000012987 post-synthetic modification Methods 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 125000006850 spacer group Chemical group 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Abstract
The invention discloses a low sample size m based on an RNA connection label6A high-throughput sequencing method belongs to the field of high-throughput sequencing of RNA. The method comprises the following steps: carrying out phosphorylation and adenylation treatment on 3' linker containing different barcode tag sequences; respectively connecting the adenylated 3' -linker to the broken RNA samples from different sources, mixing the samples, reserving an input control sample for RNA-seq, and performing m on the rest mixed samples6Antibody A immunoprecipitation to obtain IP samples for m6A-seq, finally obtaining a second generation sequencing library and sequencing; according to the barcode sequence, the sequencing data is split and analyzed to obtain the RNA-seq and m of the initial single sample6A-seq information. The invention can simultaneously realize the m of a plurality of clinical low-sample-size samples6Antibody A immunoprecipitation and library construction sequencing.
Description
Technical Field
The invention relates to the field of high-throughput sequencing of RNA, in particular to a low-sample-size m based on an RNA connection label6A high throughput sequencing method.
Background
Post-synthetic modification of biological macromolecules, where N plays an important role in many life processes6-methylated adenosine (m)6A) Is the most abundant posttranscriptional RNA modification in eukaryotic messenger deoxyribonucleic acid (mRNA). m is6The A modification level is dynamically reversible in mammalian cells by a variety of m6Regulation of a-related proteins. Previous studies showed that m6The dynamic regulation process of A is closely related to the vital physiological process, and m6Dysregulation of a has also been shown to result in some disease-related pathological changes. m is6The overall A content can be obtained by subjecting the RNA sample to enzymatic digestion and then using LC-MS, however, since m6A plays an important role in almost all metabolic processes of mRNA (e.g.formation, processing, transport, translation, degradation, etc.) and thus on m6The localization of the A modification and the study of the changes in its modification level at a particular site are of great significance. In the study of nucleic acid biomacromolecules, qualitative and quantitative analysis of gene sequences at specific sites is usually achieved by sequencing. Compared with the first generation sequencing technology, the second generation sequencing technology can perform rapid sequencing analysis on hundreds of samples and thousands of samples of hundreds of samples and millions of DNA molecules at the same time with low cost and more than 99% accuracy for 1 time, thereby reducing the sequencing cost, improving the sequencing flux and being more suitable for performing sequencing analysis on a plurality of samples. At the same time, m6The discovery of A-specific recognition antibodies greatly advanced m at the transcriptome level6And (3) related research of A modification sites. By mixing m6A specific recognition antibody is used for co-immunoprecipitation, and a sequencing method MeRIP-seq (methylated RNA amplification biased by sequencing) developed by combining high-throughput sequencing technology realizes m in transcriptome range6A positions and pushes m6And (4) researching the molecular mechanism and action mechanism of A. However, since MeRIP-seq et al utilize m6High throughput sequencing method for A-specific recognition of antibodies efficiency of antibody immunoprecipitation and m6The effect of the lower content of A modification per se, m is difficult to achieve for a single clinical peripheral blood sample (2-4mL whole blood sample)6A-seq。
The development of single cell sequencing has been greatly facilitated by the bar code labeling (Barcode) labeling technique, which has the advantage that it can label multiple samples. The RNA or DNA of a single cell is labeled and mixed by barcode, then transcriptome, genome or modified sequencing after genome synthesis can be carried out, when bioinformatics analysis is carried out on sequencing data, the sequencing data can be split by different barcode label sequences, and each cell of an initial sample is traced.
Disclosure of Invention
The invention aims to provide a plurality of low-sample-size samples m based on RNA (ribonucleic acid) connected barcode labels6A high throughput sequencing method to simultaneously achieve mixing of multiple clinical low sample size samples6Antibody A immunoprecipitation and library construction sequencing.
The purpose of the invention is realized by the following technical scheme:
multiple low-sample-size samples m based on RNA (ribonucleic acid) connection barcode label6A high throughput sequencing method, comprising the steps of:
(1) a library containing different barcode tag sequences was phosphorylated and adenylated with a3 'linker oligonucleotide chain (3' linker). The composition of the 3 ' linker is 5 ' -barcode sequence-random sequence-PCR primer linker sequence-3 '.
(2) Respectively connecting the 3' -linker containing different barcode tag sequences after the adenylation treatment in the step (1) to the broken RNA samples of different sources, mixing the samples, purifying, reserving an input control sample for RNA-seq, performing m6A antibody immunoprecipitation on the residual mixed sample to obtain an IP sample for m6A-seq, and finally obtaining a next-generation sequencing library of the input and IP samples and sequencing.
(3) Splitting the sequencing data of the mixed sample according to the barcode sequence, and analyzing the data by using a bioinformatics analysis means to obtain the RNA-seq and m of the initial single sample6A-seq information.
Preferably, in step (1), the 3' linker is purified after phosphorylation and adenylation to reduce mutual interference between different reactions. Wherein, the phosphorylation treatment is carried out on the 3 ' -linker by utilizing the T4 PNK enzyme under the condition that ATP is contained in a reaction system, the adenylation treatment is carried out on the phosphorylated 3 ' -linker by utilizing a 5 ' adenylation reagent, and the purification is carried out by utilizing an oligonucleotide purification concentration kit.
Preferably, in step (1), the random sequence is a 6-base random sequence to ensure the accuracy of the barcode sequence in the sequencing process. The barcode sequence preferably consists of 6 bases, avoids the repetition of an Index sequence of a PCR primer used for constructing a commercial next-generation sequencing library in sequence design, ensures that the distribution of four bases of ATGC is uniform as much as possible, and avoids the situation of multiple repetition of a single base (such as GGGG).
Further, the step (2) comprises the following steps:
1) RNA is extracted from a sample (cell/blood sample/tissue sample, etc.).
2) The RNA sample is broken into 200-400bp fragments by a chemical ion breaking method, the breaking reagent is removed by purification, the phosphate group at the 3 'end of the broken RNA is removed by enzyme, and the phosphate group is added at the 5' end.
3) The adenylated 3 '-linker containing different barcode tags was attached to different cleaved RNA samples and the excess 3' -linker was removed after the reaction was complete.
4) Mixing and purifying different samples, leaving an input sample, and performing m on the rest samples6And performing antibody A immunoprecipitation to obtain an IP sample, performing reverse transcription on the input sample and the IP sample, removing a reverse transcription primer and a template, and purifying to obtain cDNA.
5) And connecting a 5' joint on the cDNA, purifying a reaction system, and determining the cycle number required for constructing the library PCR by using RT-qPCR.
6) And (3) carrying out PCR (polymerase chain reaction) to construct a library by taking the cDNA connected with the 5' -joint as a substrate, purifying a product by utilizing gel cutting recovery to obtain a second-generation sequencing library, and sending the purified library to a sequencing company for sequencing.
Preferably, in step 1), the RNA in the sample is extracted by using a tizol reagent.
Preferably, in the step 2), the RNA is interrupted by a magnesium ion chemical interruption reagent, the RNA after interruption is purified by an RNA purification concentration kit, and the purified RNA fragment is subjected to end repair by T4 PNK enzyme.
Preferably, in step 3), the adenylated 3' -linker is ligated to the cleaved RNA sample by performing an overnight reaction using T4 RNA ligase 2(truncated KQ). After completion of the reaction, excess 3 '-linker was removed using 5' deadenylase and RecJf enzyme.
Preferably, in step 4), a plurality of reaction mixtures in step 3) are directly mixed, and then the reaction system is purified using an RNA purification concentration kit. Leave 1/50 sample as input control, the remainder according to m6Antibody A instructions were immunoprecipitated to obtain IP samples. The sample was reverse transcribed using Superscript III enzyme to obtain cDNA.
Preferably, in step 5), the 5' linker is ligated to the cDNA using T4 RNA ligase 1 (high concentration) overnight reaction. The number of cycles at which the fluorescence value reached a plateau when RT-qPCR was performed was selected as the number of cycles for PCR library amplification.
Further, the step (3) comprises the following steps:
1) and analyzing the comparison rate of the sequencing data, and checking the data quality.
2) And splitting the data according to the barcode tag sequence, and corresponding to the initial sample.
3) And analyzing the split data to obtain sequencing information.
The strategy diagram of the present invention is shown in FIG. 1, since the existing method has difficulty in realizing m of single sample for clinical low sample size sample6A-seq, therefore the present invention is in m6Based on the A-seq library construction method, the barcode labeling technology is combined, RNA samples from different sources are labeled and then mixed for m6A, immunoprecipitation, and reading and tracing to an initial sample according to a barcode sequence in a data analysis stage to realize single low-sample-size sample m6And obtaining the information of the A-seq. Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) designing different barcode tag sequences at the 5' -end of a library-building linker 3-linker and connecting the sequences to broken RNA samples from different sourcesLine-differentiated post-mix m6A immunoprecipitation, lowering single sample progression m6Amount of RNA required for A-seq, m for achieving low sample size clinical samples6A-seq. Only less than 20ng of broken RNA in a single sample can be successfully used for constructing a library to realize m6A-seq。
(2) The library construction method has universality, different numbers of barcode labels can be used according to the number of samples, and library construction sequencing can be realized by increasing the number of mixed samples for a small number of samples.
(3) The library building method of the invention simplifies the experimental operation, reduces the experimental cost, and only needs to perform m for a plurality of samples once6a-IP, reduces the consumption of m ═ a antibodies and the corresponding experimental manipulations.
(4) The library construction method has good sequencing effect, the constructed library is subjected to next generation sequencing under the condition that the single sample amount before mixing is similar, the obtained sequencing data are split according to the barcode sequence, the data amount is relatively average, and the data analysis can be carried out by using a common method.
Drawings
FIG. 1 is a schematic of the strategy of the present invention.
FIG. 2 is a polyacrylamide gel electrophoresis of 3 ' -linker after adenylation treatment in the present invention, in which the uppermost part is 3 ' -linker after adenylation treatment and the lowermost part is 3 ' -linker without adenylation as a control.
FIG. 3 shows the results of one-generation sequencing of the library sequences constructed according to the present invention using TA clones.
FIG. 4 shows the case of sequencing data split before and after optimization of 3' -linker sequence design in the present invention. The upper graph is the result of splitting the sequencing data of 12 3 '-linkers before (without random sequences) sequence design optimization, and the lower graph is the result of splitting the sequencing data of 6 newly designed 3' -linkers after (with 6 base random sequences) sequence design optimization.
FIG. 5 shows the data independence experiment design and experiment results of the hybrid database construction using mRNA of HeLa cells as background after labeling different barcode tags to specific oligonucleotide sequences in the present invention, the upper graph is the experiment design graph, and the lower graph is the experiment results.
FIG. 6 shows the library construction and sequencing of mRNA of 100ng HeLa cells labeled with 6 barcode tags, respectively, according to the present invention6Results of analysis of A-seq data. The m is split according to 6 barcode sequences6A-seq data, m obtained after analysis6Percentage distribution of a peak in 5 non-overlapping transcriptome fragments.
FIG. 7 shows the data obtained by mixing and sequencing 20ng of mRNA of HeLa cells labeled with 16 barcode tags, and splitting the sequencing data obtained by the present invention. A and B are the results after data splitting of input and IP samples, C is for m6M after analysis of A-seq data6Motif sequence where peak A is located.
Detailed Description
The invention will be further explained with reference to the following examples and the accompanying drawings for better understanding. The present invention is not limited to the following embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which are made without departing from the spirit and principle of the present invention are also intended to be equivalent substitutions within the scope of the present invention.
The sequences of the 3' linker referred to in the examples below are shown in tables 1 and 2 below.
TABLE 13 ' linker sequence (5 ' -3 ')
General sequence optimization of Pre-3' linker | NNNNNNAGATCGGAAGAGCGTCGTG-SpC3 |
General sequence of optimized 3' linker | NNNNNNNNNNNNAGATCGGAAGAGCGTCGTG-SpC3 |
Note: the bold N is the barcode sequence, the specific sequence is shown in table 2 below, the italic N is a random sequence of 6 bases, and the SpC3 modification at the 3' end is: an intermediate arm (Spacer) with 3 carbon atoms is introduced at the 3 ' end to prevent the 3 ' linker from connecting with other nucleic acid chains at the 3 ' end.
TABLE 2 sequences of different 3' linkers
Note: italicized N is a random sequence of 6 bases.
Example 1
Adenylation treatment of 3' -linker:
(1) mixing 3' -linker (sequence shown in table) with ATP, T4 PNK Buffer and T4 PNK enzyme, and reacting in a specific treatment mode: the ordered 3' linker was dissolved with enzyme-free water to a final concentration of 100. mu.M, and then 5. mu.L was taken out and added to the following reaction system (Table 3) for reaction at 37 ℃ for 1h, followed by inactivation at 65 ℃ for 20 min.
TABLE 3 phosphorylation system (50. mu.L)
3’linker(100μM) | 5μL | |
| 3μL | |
10×T4 PNK Buffer | 5μL | |
10U/. mu. L T4 PNK enzyme | 2μL | |
Enzyme-free water | To 50 μ L |
The reaction system was purified using Oligo Clean & Concentrator (OCC, Zymo) purification kit and eluted with 10. mu.L of enzyme-free water to give phosphorylated 3' -linker.
(2) The phosphorylated 3 '-linker obtained in step (1) was reacted with Mth RNA Ligase, 5' DNA amplification reaction buffer, and ATP at 65 ℃ for 1 hour in the following reaction system (Table 4), followed by inactivation at 85 ℃ for 5 min.
TABLE 4 adenylation system (50. mu.L)
Phosphorylated 3' linker | 3μL | |
| 5μL | |
10×5’DNA adenylation reaction buffer | 5μL | |
50μM Mth RNA ligase | 3μL | |
Enzyme-free water | To 50 μ L |
The reaction mixture was purified again using OCC kit to obtain adenylated 3' -linker containing different barcode tag sequences.
Example 2
And (3) adenosine reaction verification:
equal amounts of the adenylated samples and 3' -linker not subjected to the adenylation treatment were added to a loading buffer and subjected to 20% neutral polyacrylamide gel electrophoresis.
And (3) analyzing an experimental result:
FIG. 2 is a polyacrylamide gel electrophoresis of 3 '-linker after adenylation treatment in the present invention, from which it can be seen that 3' -linker was successfully adenylated but a small amount of substrate was not adenylated. The unsuccessfully adenylated 3 '-linker has no linking reaction activity and the 3' -linker used in the reaction is greatly excessive, so that the subsequent experiment is not influenced.
Example 3
1. Library construction:
RNA in samples (cells, tissues, blood samples, etc.) was extracted using Trizol reagent, and then different RNA samples were disrupted according to the instructions of magnesium ion chemical disruption reagent (NEB, E6150S), and the disrupted RNA was purified using RNA Clean & concentrate (RCC) purification kit (Zymo), and then eluted with 7. mu.L of enzyme-free water. The eluted RNA was subjected to PNK treatment at 37 ℃ for 1h by adding the following system (Table 5) to allow ligation with an adenylated 3' linker.
TABLE 5 PNK treatment System (10. mu.L)
Disrupted and purified RNA | 7μL |
RibiLock RNase Inhibitor(40U/μL) | 1μL |
10×T4 PNK Buffer | 1μL |
10U/. mu. L T4 PNK enzyme | 1μL |
As shown in the following Table 6, 3' linker and T4 Ligase 2(truncated KQ) and other reagents required for ligation were directly added to the PNK reaction system, and after being blown up and down by a pipette, the mixture was reacted at 25 ℃ for 2 hours and then at 16 ℃ overnight (12 hours).
TABLE 63' linker ligation reaction systems (20. mu.L)
PNK post-treatment system | 10μL | |
3' linker after adenylation treatment and | 2μL | |
10×T4 | 1μL | |
50%PEG8000 | 6μL | |
0.1M DTT | 1μL | |
T4 RNA Ligase 2(truncated KQ) | 1μL |
The next day, 1. mu.L of 5' deadenylase was added directly to the reaction sample and reacted at 30 ℃ for 1 hour, followed by 1. mu.L of RecJf and reacted at 37 ℃ for 1 hour. After the reaction was completed, different samples were mixed, and after purification by an RCC purification kit, 50.7. mu.L of enzyme-free water was eluted. To the eluted mixed sample, 1.3. mu.L RibioLock RNase Inhibitor (40U/. mu.L) was added and mixed well to prevent RNA degradation, and 1. mu.L of the sample was taken out and added to 9. mu.L of non-enzyme water to be left as an Input control. The rest of the sample is pressedN6-Methylaldenosine Enrichment Kit (NEB, E1610S) Specification for m6Antibody A was immunoprecipitated and finally eluted with 12. mu.L of enzyme-free water to obtain IP samples. The Input sample and the IP sample are added into the following system (table 7) and blown by a pipette tip to be uniformly mixed, and then the reverse transcription reaction is carried out under the reaction conditions of 25 ℃ for 3min, 42 ℃ for 10min and 52 ℃ for 40 min.
TABLE 7 reverse transcription reaction System (20. mu.L)
After completion of the reverse transcription reaction, 1. mu.L of Exo I enzyme was added to the reaction system, and reacted at 37 ℃ for 30min to remove excess reverse transcription primer, followed by addition of 15. mu.L of 0.5M EDTA (pH 8.0) and 15. mu.L of 1M NaOH solution to the reaction system and treatment at 65 ℃ for 15min to remove RNA template. The reaction was purified using Oligo Clean & concentrator (OCC) purification kit (Zymo) and eluted with 7. mu.L of enzyme-free water to obtain cDNA samples. The cDNA samples were then ligated with 5 ' adaptor (5 ' -Phos-NNNNNNNNNNAGATCGGAAGAGCACACGTCTG-SpC-3 ', N stands for random base) overnight (12h) at 25 ℃ in the following reaction system (Table 8).
TABLE 85' adaptor connection system (20 μ L)
The reaction was eluted with 12. mu.L of enzyme-free water after purification with OCC purification kit.
Taking 1 mu L of IP and Input samples to perform RT-qPCR in a 20 mu L system, wherein the primer sequences are as follows:
RT-qPCR primer sequence (5 '-3')
qPCR forward primer | TACCTTGGCACCCCAGAC |
qPCR reverse primer | TTCAGAGTTCTACAGTCCGA |
And observing a fluorescence curve, and selecting the minimum Ct value when the fluorescence value reaches a platform as the cycle number of the PCR constructed by the library.
Library construction PCR reactions were performed in the following reaction system (Table 9), where the PCR primers were NEB second generation sequencing primers and the PCR program was based onUltraTM IIThe Master Mix instructions were set.
TABLE 9 construction of the library PCR reaction System (50. mu.L)
The PCR product was purified using a gel recovery kit (steps according to kit instructions used) to obtain library samples that could be sent to sequencing companies for second generation sequencing.
2. Library composition verification:
the library constructed by the method of the invention is inserted into plasmid by TA cloning, 5 monoclonals are selected for first-generation sequencing, and the constructed library is verified to be in accordance with expectations.
The results are shown in FIG. 3, and the first-generation sequencing results show that the parts with gray shades at both ends of the DNA sequence respectively correspond to the forward primer and the reverse primer in the library-building PCR primer kit; the wavy line part in the figure is a random sequence of ten N on 5' adaptor, and the result shows that the sequences of the parts of five monoclonals are different; the part with the lower dotted line in the figure can correspond to a barcode label sequence (6 random bases), and six base sequences obtained by five clones are different and all correspond to the designed barcode sequence; the portion with the solid line drawn in the figure is the DNA sequence corresponding to the inserted RNA fragment, and this portion is different from each other because of experiments using cellular mRNA.
Example 4
3' -linker sequence design optimization:
3' -linker (Table 1-2, FIG. 3) containing different barcode tags before/after optimization was ligated to HeLa cell mRNA after equivalent disruption and mixed6And A-seq, splitting the obtained sequencing data, and checking whether the data distribution is uniform, namely whether the barcode label influences the sequencing.
The result is shown in fig. 4, a phenomenon of obvious data nonuniformity exists after the splitting of the sequencing data before the design of the optimized sequence, and the data obtained after the splitting of the optimized sequencing data is more uniform. In combination with the sequencing result, it can be presumed that this is because errors are easily generated at the first few bases of sequencing, which results in the failure to successfully split the sequencing data and the waste of data.
Example 5
And (3) carrying out independence verification on multiple groups of data split by the same library according to the barcode:
(1) designing four oligo RNA strands with known sequences (as shown in Table 10), and mixing the four oligo RNA strands with the mRNA of the fragmented HeLa cells according to a mass ratio of 1: 100;
TABLE 10 Oligo RNA sequences (5 '-3')
Oligo RNA1 | AUACUGCCACAUGCUGCACAGUGC |
Oligo RNA2 | GGACUGAGAACUGGACUGUCUGGGGUGCCAAGGUA |
Oligo RNA3 | GGACUGAACUGGACUGUCUGGGGUGCCAAGGUA |
Oligo RNA4 | GUACGUCAUCGAGAUCAGCUU |
(2) Respectively taking 100ng of mRNA mixed with oligo RNA, correspondingly connecting 3 '-linkers with different labels at the 3' ends of the mRNA, mixing 4 samples into one sample, and performing library construction, high-throughput sequencing and data analysis;
(3) in the data analysis process, the numbers of reads of four oligo RNAs in the data split by each barcode are respectively counted so as to analyze whether data pollution exists or not.
The results are shown in FIG. 5, which lists the reads numbers of four oligo RNAs in the four barcode resolution data, and the results show that each oligo RNA has a large number of reads in the data resolved by its corresponding barcode, and basically none of the data resolved by the other three barcode. Based on this result, it was determined that the present invention utilizes the barcode tag for multiple low-sample-size samples m6A high-throughput sequencing experimental scheme is feasible, and the split data are independent from each other and have no mutual pollution.
Example 6
By utilizing the library construction method, 6 parts of 100ng of broken HeLa cell mRNA are respectively connected with 6 different barcode labels (3' linker 1-6 in Table 2), and m is carried out6And A-seq library construction and sequencing. Analyzing the obtained sequencing data, and checking m obtained by the invention6In the sequencing information, m6Whether the percentage distribution of the a peak is consistent with the general distribution.
The results are shown in FIG. 6, according to the barcode sequence for m6After the sequencing data are split and analyzed, m is found6The percentage distribution of A peaks in 5 non-overlapping transcriptome fragments corresponds to m6The general distribution of a in the transcriptome, i.e. more distribution at the coding region, 3' UTR and stop codon of the transcriptome.
Example 7
Experiments were performed on a small number of samples:
(1) 16 portions of 20ng of broken HeLa cell mRNA were taken, and connected to 16 different barcode tags (3' linker 1-16 in Table 2), and m was performed6A-seq library construction and sequencing;
(2) and splitting according to the barcode sequence after the sequencing data is obtained, analyzing the split data information, and comparing the data difference split by different barcodes.
As a result, 16 parts of 20ng of cleaved HeLa cell mRNA was labeled with barcode, mixed samples were pooled, and RNA-seq and m-seq could be successfully constructed6A-seq library. Through splitting and analyzing library sequencing data, the numbers of reads obtained by splitting according to different barcode and m found in each sample are found6There was no significant difference in the number of A peaks (FIG. 7A, B), the distribution was more uniform, and the distribution was more uniform by the number of m pairs6The data of A-seq are analyzed, and m is successfully constructed6A sample library after enrichment of antibody, and m6The sequence of the A modification site conforms to the general m6A modified motif (fig. 7C), i.e., RRACH (R ═ G or a; H ═ a, C or U).
Sequence listing
<110> Wuhan university
<120>Based on RLow sample size m of NA-linked tags6A high throughput sequencing method
<160> 16
<170> SIPOSequenceListing 1.0
<210> 1
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
actcgannnn nnagatcgga agagcgtcgt g 31
<210> 2
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
agctgannnn nnagatcgga agagcgtcgt g 31
<210> 3
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
agcagannnn nnagatcgga agagcgtcgt g 31
<210> 4
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
agctcgnnnn nnagatcgga agagcgtcgt g 31
<210> 5
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
atcgcannnn nnagatcgga agagcgtcgt g 31
<210> 6
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
agctcannnn nnagatcgga agagcgtcgt g 31
<210> 7
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
ttcggannnn nnagatcgga agagcgtcgt g 31
<210> 8
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
catcgannnn nnagatcgga agagcgtcgt g 31
<210> 9
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
ctagcannnn nnagatcgga agagcgtcgt g 31
<210> 10
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
acggtannnn nnagatcgga agagcgtcgt g 31
<210> 11
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
ccattgnnnn nnagatcgga agagcgtcgt g 31
<210> 12
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
gattcgnnnn nnagatcgga agagcgtcgt g 31
<210> 13
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
cgttagnnnn nnagatcgga agagcgtcgt g 31
<210> 14
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
gactgtnnnn nnagatcgga agagcgtcgt g 31
<210> 15
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
acagcannnn nnagatcgga agagcgtcgt g 31
<210> 16
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
agtcgtnnnn nnagatcgga agagcgtcgt g 31
Claims (10)
1. Multiple low-sample-size samples m based on RNA (ribonucleic acid) connection barcode label6A high throughput sequencing method, characterized in that: the method comprises the following steps:
(1) carrying out phosphorylation and adenylation treatment on a library containing different barcode tag sequences by using a 3' linker; the 3 ' linker comprises a 5 ' -barcode sequence-random sequence-PCR primer linker sequence-3 ';
(2) respectively connecting the 3' -linker containing different barcode tag sequences after the adenylation treatment in the step (1) to the broken RNA samples from different sources, mixing the samples, purifying, and reserving an input control sample for RNA-seq, and m is performed on the remaining mixed sample6Antibody A immunoprecipitation to obtain IP samples for m6A-seq, finally obtaining a second generation sequencing library of input and IP samples and sequencing;
(3) the sequencing data is split according to the barcode sequence and analyzed to obtain the RNA-seq and m of the initial single sample6A-seq information.
2. The plurality of low sample size samples m of claim 1 based on RNA-linked barcode tags6A high throughput sequencing method, characterized in that: in the step (1), the 3' linker is purified after phosphorylation and adenylation.
3. The plurality of low sample size samples m of claim 1 based on RNA-linked barcode tags6A high throughput sequencing method, characterized in that: in the step (1), phosphorylation treatment is performed on 3 ' -linker by using T4 PNK enzyme, and adenylation treatment is performed on the phosphorylated 3 ' -linker by using a 5 ' adenylation reagent.
4. The plurality of low sample size samples m of claim 1 based on RNA-linked barcode tags6A high throughput sequencing method, characterized in that: in the step (1), the random sequence is a random sequence of 6 bases.
5. The plurality of low sample size samples m of claim 1 based on RNA-linked barcode tags6A high throughput sequencing method, characterized in that: the step (2) comprises the following steps:
1) extracting RNA from the sample;
2) breaking RNA sample into 200-400bp segments by using a chemical ion breaking method, removing breaking reagent by purification, removing phosphate groups at the 3 'end of broken RNA by using enzyme, and adding phosphate groups at the 5' end;
3) connecting the adenylated 3 '-linker containing different barcode labels to different broken RNA samples, and removing redundant 3' -linker after reaction;
4) mixing and purifying different samples, leaving an input sample, and performing m on the rest samples6Performing antibody immunoprecipitation to obtain an IP sample, and performing reverse transcription on the input sample and the IP sample to obtain cDNA;
5) connecting a 5' joint on the cDNA, purifying a reaction system, and determining the cycle number required by constructing the library PCR by using RT-qPCR;
6) and (3) carrying out PCR (polymerase chain reaction) to construct a library by taking the cDNA connected with the 5' -joint as a substrate, purifying a product by utilizing gel cutting recovery to obtain a second-generation sequencing library, and sequencing the library.
6. A plurality of low sample size samples m according to claim 5 based on RNA linked barcode tags6A high throughput sequencing method, characterized in that: in the step 2), breaking RNA by adopting a magnesium ion chemical breaking reagent; the enzyme is T4 PNK enzyme.
7. A plurality of low sample size samples m according to claim 5 based on RNA linked barcode tags6A high throughput sequencing method, characterized in that: in the step 3), connecting the adenylated 3' -linker to the broken RNA sample by using T4 RNA ligase 2; excess 3 '-linker was removed using 5' deadenylase and RecJf enzyme.
8. A plurality of low sample size samples m according to claim 5 based on RNA linked barcode tags6A high throughput sequencing method, characterized in that: in step 4), reverse transcription was performed using Superscript III enzyme.
9. A plurality of low sample size samples m according to claim 5 based on RNA linked barcode tags6A high throughput sequencing method, characterized in that: in step 5), a 5' linker was ligated to the cDNA using T4 RNA ligase 1.
10. The multiple oligo based on RNA linked barcode tag of claim 1Sample size sample m6A high throughput sequencing method, characterized in that: the step (3) comprises the following steps:
1) analyzing the comparison rate of the sequencing data, and checking the data quality;
2) splitting data according to a barcode tag sequence, and corresponding to the initial sample;
3) and analyzing the split data to obtain sequencing information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111066944.5A CN113774121B (en) | 2021-09-13 | 2021-09-13 | Low sample size m based on RNA (ribonucleic acid) connection tag 6 A high throughput sequencing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111066944.5A CN113774121B (en) | 2021-09-13 | 2021-09-13 | Low sample size m based on RNA (ribonucleic acid) connection tag 6 A high throughput sequencing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113774121A true CN113774121A (en) | 2021-12-10 |
CN113774121B CN113774121B (en) | 2024-02-20 |
Family
ID=78842844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111066944.5A Active CN113774121B (en) | 2021-09-13 | 2021-09-13 | Low sample size m based on RNA (ribonucleic acid) connection tag 6 A high throughput sequencing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113774121B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105200530A (en) * | 2015-10-13 | 2015-12-30 | 北京百迈客生物科技有限公司 | Method for establishing multi-sample hybrid library suitable for high-flux whole-genome sequencing |
CN108504651A (en) * | 2017-02-27 | 2018-09-07 | 深圳市乐土精准医疗科技有限公司 | The library constructing method and reagent in library are built in PCR product large sample size mixing based on high-flux sequence |
CN110904192A (en) * | 2018-12-28 | 2020-03-24 | 广州表观生物科技有限公司 | Ultra-micro RNA methylation m6A detection method and application thereof |
CN113308514A (en) * | 2021-05-19 | 2021-08-27 | 武汉大学 | Construction method and kit for detection library of trace m6A and high-throughput detection method |
-
2021
- 2021-09-13 CN CN202111066944.5A patent/CN113774121B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105200530A (en) * | 2015-10-13 | 2015-12-30 | 北京百迈客生物科技有限公司 | Method for establishing multi-sample hybrid library suitable for high-flux whole-genome sequencing |
CN108504651A (en) * | 2017-02-27 | 2018-09-07 | 深圳市乐土精准医疗科技有限公司 | The library constructing method and reagent in library are built in PCR product large sample size mixing based on high-flux sequence |
CN110904192A (en) * | 2018-12-28 | 2020-03-24 | 广州表观生物科技有限公司 | Ultra-micro RNA methylation m6A detection method and application thereof |
CN113308514A (en) * | 2021-05-19 | 2021-08-27 | 武汉大学 | Construction method and kit for detection library of trace m6A and high-throughput detection method |
Also Published As
Publication number | Publication date |
---|---|
CN113774121B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11676682B1 (en) | Methods for accurate sequence data and modified base position determination | |
Carøe et al. | Single‐tube library preparation for degraded DNA | |
EP2914745B1 (en) | Barcoding nucleic acids | |
EP3622089A1 (en) | Universal short adapters for indexing of polynucleotide samples | |
CN108085315A (en) | A kind of library constructing method and kit for noninvasive antenatal detection | |
JP5801349B2 (en) | Method for identifying the clonal source of restriction fragments | |
EP3555305B1 (en) | Method for increasing throughput of single molecule sequencing by concatenating short dna fragments | |
JP7033602B2 (en) | Barcoded DNA for long range sequencing | |
JP7332733B2 (en) | High molecular weight DNA sample tracking tags for next generation sequencing | |
US9334532B2 (en) | Complexity reduction method | |
US20120316075A1 (en) | Sequence preserved dna conversion for optical nanopore sequencing | |
US20240117343A1 (en) | Methods and compositions for preparing nucleic acid sequencing libraries | |
CN109825552B (en) | Primer and method for enriching target region | |
CN115715323A (en) | High-compatibility PCR-free library building and sequencing method | |
EP3956445A1 (en) | Multiplex assembly of nucleic acid molecules | |
CN113774121B (en) | Low sample size m based on RNA (ribonucleic acid) connection tag 6 A high throughput sequencing method | |
CN110144383B (en) | Method for enriching target DNA fragments by utilizing multiplex PCR | |
CN116529430A (en) | UMI molecular tag and application thereof, joint connecting reagent, kit and library construction method | |
CN113564235A (en) | DNA sequencing method and kit | |
WO2022125100A1 (en) | Methods for sequencing polynucleotide fragments from both ends | |
WO2022101162A1 (en) | Paired end sequential sequencing based on rolling circle amplification | |
WO2023025784A1 (en) | Optimised set of oligonucleotides for bulk rna barcoding and sequencing | |
CN116804216A (en) | Detection method for single cell containing 5hmC | |
CN114686453A (en) | Method and kit for constructing transcriptome sequencing library | |
Khayal et al. | TRANSCRIPTOMIC CHARACTERIZATION USING RNA-SEQ DATA ANALYSIS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |