CN110643692A

CN110643692A - Analysis method and kit for sequencing single cell transcript isomer

Info

Publication number: CN110643692A
Application number: CN201910611776.XA
Authority: CN
Inventors: 胡友金; 钟嘉纬; 丘远辉; 饶品鸿
Original assignee: Zhongshan Ophthalmic Center
Current assignee: Zhongshan Ophthalmic Center
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2020-01-03
Anticipated expiration: 2039-07-08
Also published as: CN110643692B

Abstract

The invention discloses an analysis method and a kit for sequencing single-cell transcript isomers. The invention adds label sequences to two ends of single mRNA, carries out whole transcriptome sequencing after capture and enrichment, and judges the full-length transcript by accurately judging the head and tail structures of different transcript isomers, and the new research strategy can solve the prior technical problem and has the following beneficial effects: 1. improving the sensitivity of full-length transcript detection: because the head-tail terminal region of the target transcript is detected, signals are mostly concentrated at the terminal edge position of the transcript, so that the detection degree of the full-length transcript is higher. 2. Simple and easy, with low costs: the method is simple and easy to implement, wide in application range and low in cost. 3. Improving the accuracy of transcript detection: the invention can reach higher accuracy by combining the tag sequence of the terminal sequence and the optimization of the data analysis algorithm.

Description

Analysis method and kit for sequencing single cell transcript isomer

The technical field is as follows:

the invention belongs to the field of biological medicine, and particularly relates to an analysis method and a kit for sequencing single-cell transcript isomers.

Background art:

the variability of gene transcript isoforms results in a diversity of gene functions, and different selection of mRNA at the transcription start site, the transcription termination site can result in differences in mRNA stability, localization, and translation efficiency, and even the production of truncated proteins with different functions. Research proves that the single cells composing human organ tissues have remarkable heterogeneity of functional and molecular characteristics, and the existing single cell sequencing methods can distinguish the gene expression difference between different cells, but the methods are mostly based on the capture sequencing of the 5 'end or the 3' end of mRNA, but not on the detection of full-length transcripts. Two single cell full-length transcript sequencing methods reported at present have the problems of low sensitivity, high cost and the like, for example, smart2-seq can carry out full-length detection on a single cell transcriptome, but most detection signals are distributed in the middle of a gene, and are rarely distributed at the tail end of a transcript, so that the sensitivity and the accuracy of the method for analyzing the full-length transcript are low. Pacbio long-reading third-generation sequencing can perform full-length sequencing on mRNA transcripts, but has the problems of high cost, high error rate and the like.

The invention content is as follows:

the invention aims to provide an analysis method and a kit for sequencing single-cell transcript isomers, which have the advantages of high sensitivity, low cost and good accuracy.

The invention discloses an analysis method for sequencing single-cell transcript isomers, which comprises the following steps:

a. cell lysis: separating single cells, and cracking the single cells by using a lysis solution to obtain a cracking product, wherein the lysis solution contains oligo T with an A label sequence;

b. reverse transcription is carried out on the cleavage products by reverse transcriptase, during the reverse transcription, a plurality of C basic groups are added at the 3' tail end of a cDNA chain formed by reverse transcription by the reverse transcriptase, the C basic groups are matched with a TSO primer which is provided with a B tag sequence and contains locked nucleic acid to achieve template conversion, the tail end position of the transcribed cDNA is marked by the tag sequence at the tail end of the TSO primer and a dry C sequence, oligo T with the tag sequence is used as a primer of the reverse transcription reaction, the cDNA is tagged during the reverse transcription, and therefore a first chain of the cDNA generated by the reverse transcription is obtained;

c. after a first strand of cDNA is generated by reverse transcription, PCR amplification is carried out by primers matched with sequences introduced at two ends of the cDNA to obtain the cDNA of a full-length transcriptome;

d. and (3) utilizing Tn5 transposase to construct a library of the cDNA after full-length amplification, fragmenting the cDNA, adding a sequencing universal joint to construct the library, then sequencing, extracting corresponding sequences according to the tag sequences of 3 'and 5' added during reverse transcription, comparing the extracted sequences with a genome, extracting a unique comparison sequence, using CAGEr to perform downstream analysis on the 5 'end sequence to identify a transcription start site, and using the 3' end sequence to identify a transcription termination site.

The sequence of the A tag and the sequence of the B tag may be the same or different.

Preferably, the lysate of step a contains Triton X-100, RNase inhibitor, oligo T with A tag sequence and dNTPs. Triton X-100 lyses cells, and an RNase inhibitor can ensure the integrity of RNA; taking oligodT with a tag sequence as a primer of reverse transcription reaction, and carrying a tag on cDNA while carrying out reverse transcription; dNTPs are added in this step, and the yield of RT-PCR can be improved.

Preferably, in the step b, additional magnesium ions and betaine are added in the reverse transcription reaction system. Some RNAs form secondary structures (such as hairpin or loop structures) which may be overcome by betaines and magnesium ions due to steric hindrance which may result in termination of reverse transcriptase extension on the RNA strand; in addition, the protective capability of thermal stability of the betaine is realized, after the reaction is carried out for 90min at 42 ℃, the temperature can be increased to 50 ℃ and kept for 2min, which is favorable for opening the secondary structure of RNA, and then the temperature is reduced to 42 ℃ and kept for 2min to continue the reverse transcription reaction, so that the temperature is increased and reduced for 10 cycles, and the yield of cDNA is improved to the maximum extent.

Preferably, the PCR amplification in step c is performed in limited cycles, such as 18 cycles. In order to simplify the steps, improve the detection sensitivity and prevent the loss of cDNA, the first strand is synthesized and directly amplified by KAPA HiFi HotStartStreammix direct PCR without purification.

D, constructing a library by using Tn5 transposase of illumina on the cDNA after full-length amplification, and completing cDNA fragmentation and sequencing universal adaptor addition in the same step by using the method; the library of fragmenting and adding sequencing adaptor primer utilizes I5 and I7index primer matched with the tag sequence introduced by cDNA3 'and 5' to make PCR so as to attain the goal of trapping, enriching, library and building two ends of cDNA, and the library fragments are distributed between 200-400bp, so that it can use less sequencing data quantity to meet the requirements for identifying TSS and TES.

Analyzing and filtering sequencing data, and identifying and analyzing the detected TSS and TES, wherein the method specifically comprises the following steps: filtering low-quality and sequencing connector sequences from a fastq file obtained by sequencing, extracting corresponding sequences according to the added 3 'and 5' tag sequences during reverse transcription, comparing the extracted sequences with a genome, and extracting a unique comparison sequence; extracting data, and performing downstream analysis on the sequence at the 5' end by using CAGER to identify a transcription initiation site; the 3' end sequence identifies the transcription termination site.

The second objective of the present invention is to provide a single cell transcript isoform sequencing kit, which comprises oligo T with A-tag sequence, TSO primer with B-tag sequence containing locked nucleic acid and primer matching with the sequence introduced at both ends of the first strand of cDNA.

Further preferably, the kit further comprises a lysate, a reverse transcriptase reaction reagent, a cDNA PCR reaction reagent, Tn5 transposase and a sequencing universal linker.

The lysate contains Triton X-100, RNase inhibitor, oligo D T with A label sequence and dNTPs.

The cDNA PCR reaction reagent is KAPA HiFi HotStart Readymix.

The invention adds label sequences to two ends of single mRNA, carries out whole transcriptome sequencing after capture and enrichment, and judges the full-length transcript by accurately judging the head and tail structures of different transcript isomers, and the new research strategy can solve the prior technical problem and has the following beneficial effects:

1. improving the sensitivity of full-length transcript detection: because the head-tail terminal region of the target transcript is detected, signals are mostly concentrated at the terminal edge position of the transcript, so that the detection degree of the full-length transcript is higher.

2. Simple and easy, with low costs: the method is simple and easy to implement, wide in application range and low in cost. In addition, because sequencing signals are mostly concentrated at the head and tail of the transcript, the sequencing depth required for detecting the same number of transcripts can be reduced, thereby reducing the sequencing cost.

3. Improving the accuracy of transcript detection: the invention can reach higher accuracy by combining the tag sequence of the terminal sequence and the optimization of the data analysis algorithm.

Description of the drawings:

FIG. 1 is a technical schematic of the analysis method of the present invention;

FIG. 2 is a graph comparing the effect of the analysis method of the present invention and the existing single cell sequencing method;

FIG. 3 is an accuracy analysis chart of the analysis method of the present invention;

FIG. 4 shows TSS and TES identified by the assay of the invention.

The specific implementation mode is as follows:

the following examples are further illustrative of the present invention and are not intended to be limiting thereof.

Example 1:

the technical schematic diagram of the analysis method for single-cell transcript isoform sequencing in this example is shown in fig. 1, and the sequences of the primers used are as follows (table 1):

TABLE 1

Wherein N is A, G, C or T, V is A, G or C.

The specific operation steps are as follows:

1. single cell isolation and lysis

Single cells comprising the complete transcriptome can be isolated using, for example, physical mechanical, chemical, biological methods, such as microfluidics, flow cytometry, aspiration separation, gradient dilution, and the like. The method comprises the steps of obtaining an ovum cell suspension of a mouse strain C57/B6 through primary isolation by the method, adding the cell suspension into PBS night drops on a culture dish, diluting the PBS night drops appropriately, obtaining single cells through mouth pipette isolation under a microscope, placing the obtained single cells into a PCR tube containing 4ul of lysate (controlling the volume of the isolated cells to be as small as possible), wherein the specific formula of the lysate of each cell is shown in the table 2:

TABLE 2

Ingredient (solvent is water)	Volume of
		0.2％ Triton-X100	1.9ul
RNase inhibitor(40U/ul)	0.1ul
		oligo-dT 30VN primer(10uM)	1ul
dNTP(10mM)	1ul

After the single cell is added into the PCR tube of the lysate, immediately vortex the PCR tube, centrifugally collecting the liquid in the tube wall for a short time, and quickly putting the liquid on ice.

2. Reverse transcription of mRNA, addition of sequence tags 5 'and 3' to cDNA, and amplification of full-length cDNA

2.1, placing the PCR tube after cell lysis in a PCR reaction instrument, reacting at 70 ℃ for 3min, immediately placing the PCR tube on ice after the reaction is finished, then centrifuging for a short time to collect liquid in the tube wall, and then placing the PCR tube back on the ice.

2.2 preparing reverse transcription reaction systems, each cell sample reaction system formulation is shown in Table 3, when a plurality of reaction systems are prepared, one more reaction is added to prevent sample loss

TABLE 3

Composition (I)	Volume of
		Super Script III first-strand buffer	2.00ul
Super Script III reverse transcriptase	0.5ul
		RNAse inhibitor(40Uμl-1)	0.25ul
DTT(100mM)	0.50ul
		Betaine(5M)	2.00ul
MgCl₂(1M)	0.06ul
		TSO(100μM)	0.20ul
ERCC(1:1000)	0.5ul
		Total	6.01

Add 6.01ul of the above reverse transcription system into the PCR tube of the previous step, gently blow and mix several times with pipette gun, carefully not introduce air bubbles, centrifuge briefly to collect the liquid on the tube wall

2.3 setting the program of the PCR instrument as shown in Table 4, and when the temperature of the PCR instrument is close to 42 ℃, putting the PCR tube into the PCR instrument for reaction

TABLE 4

step	Temperature of	Time of day
			1	42	90min
2	50	2min
			3	42	2min	go to step 2,9cycles
4	70	15min
			5	4	hold

2.4 adding the following components (Table 5) into the PCR tube of the previous reaction, shaking and mixing, and centrifuging to collect the liquid on the tube wall

TABLE 5

Composition (I)	Volume of
		KAPA HiFi HotStart ReadyMix(2×)	12.5ul
IS PCR primers(10μM)	0.25ul
		Nuclease-free water	2.25ul
Total	15

2.5 setting the program of the PCR instrument as follows (Table 6), placing the PCR tube into the PCR reaction instrument for reaction

TABLE 6

2.6 purification after PCR

2.6.1 placing Ampure XP beads to room temperature for balancing for 15-30min, and mixing evenly by vortex oscillation before use;

2.6.2 adding 25ul Ampure XP beads (1:1) into the product obtained after the PCR in the step 2.5, blowing, beating and mixing evenly, and incubating for 8min at room temperature;

2.6.2 placing the PCR tube in a magnetic frame, and standing for 5min or until the solution is clear;

2.6.3 carefully aspirate the supernatant and retain the beads;

2.6.4 adding 200ul of freshly prepared 80% ethanol, incubating for 30s, and removing the supernatant;

2.6.5 repeating the previous step;

2.6.6 completely sucking up the ethanol remained in the PCR tube, and drying the beads at room temperature for 5min, wherein large cracks on the surfaces of the beads caused by excessive drying are avoided;

2.6.7 removing PCR from the magnetic frame, adding 20ul nuclease-free water, mixing and suspending beads, standing at room temperature for 5 min;

2.6.8 moving the PCR tube into the magnetic frame again, standing for 2min until the solution is clarified, carefully sucking the supernatant to a new PCR tube, and marking;

2.7 inspection of cDNA samples using Agilent Bioanalyzer 2100, good quality cDNA with distinct peaks appearing between 1.5-2 kb;

2.8 Using the Qubit^TMThe cDNA was quantified using the dsDNAHS Assay Kit, according to the Kit instructionsPerforming operation, diluting to 0.1ng/ul after the quantification is finished, and preparing library construction;

3. tn5 of illumina is used for constructing the head and tail ends of cDNA of a second-generation sequencing library for capturing

3.1 fragmentation of cDNA and linker addition Using Tn 5in the following reaction scheme (Table 7) with a cDNA input of 0.1ng

TABLE 7

Composition (I)	Volume of
		Tagmentation DNA buffer	4ul
Amplicon Tagmentation mix	1.67ul
		Resuspension buffer	1.33ul
cDNA(0.1ng)	1ul

After all the components are added, mixing evenly and centrifuging for a short time

3.2 setting the PCR temperature to 55 ℃ and reacting for 10 min; after the reaction, 2ul of NT buffer was added immediately and left at room temperature for 5min

3.3 to the reaction system of the previous step, the following ingredients (Table 8) were added:

TABLE 8

Mixing, centrifuging for a short time, and collecting liquid on the tube wall

The sequence of the I5index primer is as follows:

5'-AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGGTATCAACGCAGAGT-3'；

the sequence of the I7index primer is as follows:

5'-CAAGCAGAAGACGGCATACGAGATTCGCCTTAGTCTCGTGGGCTCGG-3'；

3.4 set up the PCR program as follows (Table 9), place the system into the PCR apparatus for reaction

TABLE 9

step	Temperature of	Time of day
			1	72	3min
2	98	30s
			3	98	10s
4	55	30s
			5	72	60s	go to step 3,10-15cycles
6	72	5min
			7	4	hold

3.5 post-PCR purification of the library

3.5.1 placing Ampure XP beads to room temperature for balancing for 15-30min, and mixing evenly by vortex oscillation before use;

3.5.2 adding 21ul Ampure XP beads (1:0.8) into the product obtained after the PCR in the step 3.4, blowing, beating and mixing evenly, and incubating for 8min at room temperature;

3.5.2 placing the PCR tube in a magnetic frame, and standing for 5min or until the solution is clear;

3.5.3 carefully aspirate the supernatant and retain the beads;

3.5.4 adding 200ul of freshly prepared 80% ethanol, incubating for 30s, and removing the supernatant;

3.5.5 repeating the previous step;

3.5.6 sucking up the residual ethanol in the PCR tube, drying the beads at room temperature for 5min, wherein excessive drying should be avoided to cause large cracks on the surfaces of the beads;

3.5.7 removing the PCR from the magnetic frame, adding 20ul of nuclease-free water, mixing and suspending the beads uniformly, and standing at room temperature for 5 min;

3.5.8 move the PCR tube into the magnetic frame again, stand for 2min until the solution is clarified, carefully suck the supernatant into a new PCR tube, and mark.

Thus, a library was obtained.

4. Quality control and quantification of libraries

4.1 the library constructed in step 3.5 was quality checked using an Agilent Bioanalyzer 2100, the library fragments were concentrated between 300 and 500 bp.

4.2 Using the Qubit^TMThe dsDNA HS Assay Kit quantitates the library constructed in step 3.5, operates according to Kit instructions, and is diluted to 10nM for sequencing after quantitation is complete.

5. Bioinformatics analysis

Filtering low-quality and sequencing connector sequences from a fastq file obtained by sequencing, extracting corresponding sequences according to 3 'and 5' tag sequences added during reverse transcription, comparing the extracted sequences with a genome, and extracting a unique comparison sequence;

the extracted data, the 5 'end sequence is analyzed by CAGER to identify the transcription start site, and the 3' end sequence identifies the transcription termination site.

The TSS and TES identified by this method are shown in FIG. 4.

The commercial reagents used in the experimental procedure are shown in the attached Table 10, and all primers were synthesized by Biotech.

Second, effect comparison

The experimental method comprises the following steps: taking mouse egg cells of the same days, and respectively carrying out the experiment of the invention and the Smart-seq2, such as cell separation, cell lysis, full-length transcriptome amplification, library construction and the like; the method of the invention is shown in the above specific experimental procedures, and the method of Smart-seq2 refers to the operation flow in the Smart-seq2 literature (Full-length RNA-seq from single species using Smart-seq2.nat Protoc, VOL.9 NO.1,2014,171 and 181); and sequencing the two libraries by using the same sequencing data quantity, filtering, comparing and quantifying the data obtained by sequencing, and calculating the sensitivity and accuracy of the TSS and TES of the transcripts detected by the two methods respectively.

Specific results are shown in fig. 2 and 3, and it can be seen from fig. 2 and 3 that the analysis method for single-cell transcript isoform sequencing (cap-seq) according to the present invention is significantly superior to the conventional method (fig. 2) in capturing TES and TSS sensitivity in comparison with the conventional single-cell sequencing method (Smart-seq2), regardless of TSS or TES. The method has high accuracy, the accuracy can approach 100% after algorithm optimization and data filtration, and the transcription initiation site and the transcription termination site can be analyzed and identified more accurately (figure 3).

Attached table 10:

Claims

1. an analytical method for sequencing single cell transcript isoforms comprising the steps of:

2. The assay of claim 1 wherein the a tag sequence and the B tag sequence are the same or different.

3. The assay of claim 1, wherein the lysate of step a comprises Triton X-100, rnase inhibitor, oligo T with a tag sequence, and dNTPs.

4. The method of claim 1, wherein in step b, additional magnesium ions and betaine are added to the reverse transcription reaction system.

5. The assay method of claim 1, wherein the PCR amplification in step c is performed in 18 limited cycles, and directly performed by KAPA HiFi HotStart Readymix direct PCR amplification without purification after the first strand synthesis.

6. The assay method according to claim 1, wherein in step d, the cDNA after full-length amplification is subjected to library construction using Tn5 transposase from illumina, and by this method, cDNA fragmentation and sequencing universal adaptor addition are performed in the same step; fragmenting and adding a library of sequencing adaptor primers, and carrying out PCR by using I5 and I7index primers matched with tag sequences introduced by cDNA3 'and 5', thereby achieving capture enrichment of the head and tail ends of the cDNA, and constructing the library.

7. A kit for sequencing single cell transcript isomers is characterized by comprising oligo T with an A tag sequence, TSO primer with a B tag sequence and containing locked nucleic acid and primer matched with sequences introduced at two ends of a first strand of cDNA.

8. The kit of claim 7, further comprising a lysate, reverse transcriptase reaction reagents, cDNA PCR reaction reagents, Tn5 transposase, and a sequencing universal linker.

9. The kit of claim 7, wherein the lysate comprises Triton X-100, RNase inhibitor, oligo T with A tag sequence and dNTPs.

10. The kit of claim 7, wherein the cDNA PCR reaction reagent is KAPA HiFi HotStart Readymix.