WO2021120959A1 - 单细胞基因组测序用的dna文库的构建方法 - Google Patents

单细胞基因组测序用的dna文库的构建方法 Download PDF

Info

Publication number
WO2021120959A1
WO2021120959A1 PCT/CN2020/129463 CN2020129463W WO2021120959A1 WO 2021120959 A1 WO2021120959 A1 WO 2021120959A1 CN 2020129463 W CN2020129463 W CN 2020129463W WO 2021120959 A1 WO2021120959 A1 WO 2021120959A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nuclei
dna
tag
cell
Prior art date
Application number
PCT/CN2020/129463
Other languages
English (en)
French (fr)
Inventor
翟继先
龙艳萍
肖丽丹
张飞
鹿东东
刘智剑
Original Assignee
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南方科技大学 filed Critical 南方科技大学
Publication of WO2021120959A1 publication Critical patent/WO2021120959A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the invention relates to biotechnology, in particular to a method for constructing a DNA library for single-cell genome sequencing.
  • Single-cell genome sequencing technology is a technology to amplify and sequence the whole genome at the single-cell level. The principle is to amplify the whole genome DNA of a single cell isolated to obtain a complete genome with high coverage and then perform high-throughput sequencing, which can be used to reveal individual differences in cell populations and cell evolutionary relationships.
  • single-cell genome sequencing needs to construct a single-cell genome library first, and then perform sequencing analysis.
  • it is necessary to rely on expensive microfluidic platforms and reagents to separate single cells, which is more complicated and costly.
  • a method for constructing a DNA library for single-cell genome sequencing includes:
  • Different sequence tags are used to label the fragmented DNA in a plurality of nuclei in multiple rounds, so that the fragmented DNA in each nucleus is connected with a tag code composed of a plurality of sequence tags, and the fragmented DNA in each nucleus is The tag codes of the fragmented DNA ligation are different;
  • the fragmented DNA connected with the tag code is amplified to obtain a DNA library for single-cell genome sequencing.
  • the method for constructing a DNA library for sequencing the cell genome described above By using the cell nucleus as a reaction chamber for labeling DNA, different sequence tags are used to mark the DNA in the nucleus for multiple rounds, and finally the DNA in each nucleus is connected to a unique tag code formed by multiple rounds of labeling.
  • the tag code distinguishes different cell nuclei, so as to realize the distinction of individual cells.
  • the method is simple to operate and low in cost.
  • FIG. 1 is a distribution diagram of single-cell DNA fragments in Example 1;
  • FIG. 2 is a diagram of cell differentiation efficiency in Example 1.
  • the method for constructing a DNA library for single-cell genome sequencing uses the cell nucleus as a reaction site for labeling DNA, and uses different sequence tags to mark the DNA in the nucleus for multiple rounds, and finally makes each cell nucleus DNA is connected to a unique tag code formed by multiple rounds of labeling, and the tag code is used to distinguish different nuclei, so as to realize the distinction of individual cells.
  • sequence tags can be formed after one round of marking
  • 9,216 types of tags can be formed after two rounds of marking
  • 884,736 types of tags can be formed after three rounds of marking.
  • the tag codes of DNA connections in each cell nucleus can be made different.
  • This method does not need to use microfluidic technology to distinguish single cells. It can be distinguished by multiple rounds of DNA labeling (DNA ligation reaction). It is easy to operate and low in cost. It is a fast and low-cost single-cell genome sequencing application. The method of DNA library construction.
  • the method for constructing a DNA library for single-cell genome sequencing includes step S110 to step S130.
  • Step S110 Fragment the DNA in the cell nucleus to obtain a cell nucleus with fragmented DNA.
  • the cells are collected and counted, then the cell membrane of the cell is lysed to obtain the nucleus; then the DNA in the nucleus is fragmented to obtain a nucleus with fragmented DNA. Fragmentation of DNA in the nucleus facilitates sequence tagging.
  • the cells are counted using formaldehyde fixation and then counting. Specifically, the collected cells are first mixed with formaldehyde to fix the cells for counting, and then the fixed cells are counted.
  • formaldehyde fixation e.g., formaldehyde fixation
  • other cell counting methods commonly used in the art can also be used for cell counting.
  • the method of fragmenting the DNA in the cell nucleus is restriction digestion.
  • the enzyme used for digestion is Dpn II.
  • the DNA in the nucleus is fragmented, so that the DNA has a sticky end, which is convenient for the sequence tag to be connected to the DNA.
  • Step S120 Use different sequence tags to perform multiple rounds of labeling on the fragmented DNA in multiple nuclei, so that the fragmented DNA in each nucleus is connected with a tag code composed of multiple sequence tags, and the fragmented DNA in each nucleus is connected
  • the tag codes are different.
  • the sequence tag is a base sequence, which is used to form a tag code.
  • the serial tag includes an identification part (barcode), and the identification part serves as an identification.
  • the sequence tag also includes a linker, and the linker is used for connection between the sequence tags.
  • the recognition part and the linking part are connected by base complementary pairing.
  • the tag sequence is not limited to 200 types, and can be selected according to the total number of cells that need to be distinguished and the number of rounds of labeling, for example, 48 types, 96 types, 384 types, and so on.
  • the base sequence of the identification part of the sequence tag is not limited to the base sequence shown in SEQ ID No. 1 to SEQ ID No. 200, and can be selected according to actual needs, as long as it can be identified.
  • different sequence tags are used to label the fragmented DNA in multiple nuclei in multiple rounds, so that the fragmented DNA in each nucleus is connected with a tag code composed of multiple sequence tags, and the fragmented DNA in each nucleus is connected to each other.
  • the steps of different tag codes include step S121 to step S123.
  • Step S121 After grouping multiple nuclei with fragmented DNA, use different first sequence tags to label the fragmented DNA in each group of nuclei, so that the fragmented DNA in each group of nuclei are all connected with the first sequence tag , The first sequence tags connected by the fragmented DNA of each group of cell nuclei are different, and multiple groups of primary labeled cell nuclei are obtained.
  • the first sequence tag includes a first sequence for identification and a first connection sequence for connection with the second sequence tag.
  • the first sequence and the first linking sequence, and the first linking sequence and the second sequence tag are all connected by base complementary pairing.
  • a phosphate group is attached to the 5'end of the first sequence. The phosphate group enables the first sequence to be connected to the fragmented DNA.
  • the base sequence of the first sequence is selected from one of the base sequences shown in SEQ ID No. 1 to SEQ ID No. 96; the base sequence of the first linking sequence is such as SEQ ID No. 96. 201 shown.
  • the base sequence of the first sequence is not limited to one of the base sequences shown in SEQ ID No. 1 to SEQ ID No. 96. In other embodiments, it can also be other base sequences commonly used in the field for identification or base sequences designed according to conventional methods in the field for identification; similarly, the base sequence of the first linking sequence The sequence is not limited to the base sequence shown in SEQ ID No. 201 above.
  • the nuclei of a plurality of cells are grouped into random equal grouping.
  • a plurality of DNA fragmented cell nuclei are grouped and mixed with different first sequence tags, and then incubated to obtain multiple sets of pre-ligation solutions containing different first-sequence tags; and DNA is added to each group of pre-ligation solutions Ligase is then incubated to obtain multiple sets of primary labeled nuclei.
  • the ligase is added to incubate, so that the first sequence tag enters the cell nucleus and mixes with the fragmented DNA in the nucleus, so that multiple fragmented DNAs are separated by the first sequence tag. Reduce the interconnection between multiple fragmented DNA.
  • a plurality of DNA fragmented cell nuclei are equally divided into different reaction containers (such as EP tubes or multi-well plates) containing the first sequence tags and mixed, wherein the first sequence tags in different reaction containers are different; then incubate, Obtain multiple sets of pre-ligation solutions containing different first sequence tags.
  • DNA ligase is added to multiple sets of pre-ligation solutions containing different first-sequence tags and incubated, so that the fragmented DNA in the nucleus in the reaction container and the first-sequence tags are ligated, thereby obtaining multiple sets of primary-labeled nuclei.
  • the fragmented DNA in the cell nucleus in the same reaction container are all connected to the same first sequence tag, and the first sequence tags on the fragmented DNA in the cell nucleus in different reaction containers are different from each other.
  • first sequence tags are used to label the fragmented DNA in the nuclei of each group, so that the fragments in the nucleus of each group are
  • the first sequence tags are connected to the DNA of each group, and the first sequence tags connected to the fragmented DNA of the cell nuclei of each group are different.
  • the step of obtaining multiple groups of first-level labeled nuclei it also includes the step of adding the first-level tags to each group. The step of mixing the nucleus separately with the blocking sequence.
  • an excessive amount of the first sequence tag is mixed with the cell nucleus, so that all the fragmented DNA in the cell nucleus is connected to the first sequence tag, so after the ligation reaction is completed, there will be a free first sequence tag.
  • the cell nuclei in each reaction vessel are directly mixed, it may interfere with the next round of labeling. Therefore, by adding the blocking sequence to the reaction vessel after the ligation reaction is completed, the blocking sequence is combined with the free first sequence label in each reaction vessel, reducing the influence of the previous round of labeling on the next round of labeling.
  • the base sequence of the blocking sequence is shown in SEQ ID No. 203.
  • the base sequence of the blocking sequence is not limited to the base sequence shown in SEQ ID No. 203. In other embodiments, it may also be other base sequences commonly used for blocking in the art or base sequences designed for blocking according to conventional methods in the art.
  • Step S122 The multiple groups of primary-labeled nuclei are mixed and grouped, and then different second sequence tags are used to label the fragmented DNA in the grouped primary-labeled nuclei, so that the fragmented DNA in the nuclei of each group of primary-labeled nuclei are marked
  • the second sequence tags are connected to each group, and the second sequence tags connected to the fragmented DNA of each group of primary-labeled cell nuclei are different, and multiple sets of secondary-labeled nuclei are obtained.
  • the second sequence tag includes a second sequence for identification and a second connection sequence for connection with the first sequence tag.
  • the base sequence of the second sequence is selected from one of the base sequences shown in SEQ ID No. 97 to SEQ ID No. 192; the base sequence of the second linking sequence is such as SEQ ID No. 192. 202 shown.
  • the base sequence of the second sequence is not limited to one of the base sequences shown in SEQ ID No. 97 to SEQ ID No. 192, and may also be other bases commonly used in the art.
  • the sequence or the base sequence designed according to the conventional method in the art for the purpose of identification; similarly, the base sequence of the second linking sequence is not limited to the base sequence shown in SEQ ID No. 202 above.
  • biotin is connected to the 5'end of the second sequence to facilitate subsequent purification of the fragmented DNA connected with the tag code.
  • the grouping of multiple groups of primary marker nuclei after mixing is random and equal grouping.
  • a plurality of primary-labeled cell nuclei are equally divided into different reaction vessels containing second-sequence tags and mixed, wherein the second-sequence tags in different reaction vessels are different; and then incubated, to obtain multiple groups containing different second-sequence tags Of pre-connecting fluid.
  • the fragmented DNA in the cell nucleus in the same reaction container are all connected with the same second sequence tag, and the second sequence tags on the fragmented DNA in the cell nucleus in different reaction containers are different from each other.
  • Step S123 The multiple groups of secondary labeled cell nuclei are mixed and grouped, and then different third sequence tags are used to label the fragmented DNA in the grouped secondary labeled nuclei so that the fragmented DNA of each group of secondary labeled nuclei is uniform
  • the third sequence tags are connected, and the third sequence tags connected to the DNA of each group of secondary labeled nuclei are different, and multiple sets of tertiary labeled nuclei are obtained.
  • the third sequence tag includes a third sequence for identification.
  • the base sequence of the third sequence is selected from one of the base sequences shown in SEQ ID No. 193 to SEQ ID No. 200.
  • the base sequence of the third sequence is not limited to one of the base sequences shown in SEQ ID No. 193 to SEQ ID No. 200, and may also be other bases commonly used in the art. Sequences or base sequences designed according to conventional methods in the art for identification purposes.
  • a plurality of secondary-labeled cell nuclei are equally divided into different reaction vessels containing third-sequence tags and mixed, wherein the third-sequence tags in different reaction vessels are different; and then incubated, to obtain multiple groups containing different third-sequence tags Of pre-connecting fluid.
  • the fragmented DNA in the cell nucleus in the same reaction container are all connected to the same third sequence tag, and the third sequence tag on the fragmented DNA in the nucleus in different reaction containers is different.
  • the identification sequences of the first sequence tag, the second sequence tag, and the third sequence tag are all different.
  • the first sequence tag, the second sequence tag, and the third sequence tag may have the same identification sequence.
  • both the first sequence of the first tag and the second sequence of the second tag are the base sequences shown in SEQ ID No. 1 to SEQ ID No. 96.
  • the tag code of DNA in each cell nucleus is formed by three rounds of labeling.
  • the product of the number of types of the first sequence tag, the number of types of the second sequence tag and the number of the third sequence tag is greater than the number of nuclei where the DNA is fragmented; after three rounds of labeling, the tag code of the DNA in each nucleus is determined by The first sequence tag, the second sequence tag and the third sequence tag corresponding to the DNA in each cell nucleus are sequentially connected.
  • the number of rounds required for the formation of the tag code for distinguishing DNA from different nuclei is not limited to three rounds, and it can also be designed according to the number of nuclei to be distinguished and the number of sequence tags.
  • the steps of lysing the cell nucleus and purifying the DNA linked to the tag codes are also included.
  • the steps of lysing the cell nucleus and purifying the DNA linked to the tag codes are also included.
  • the formation of the tag code only needs three rounds, after the step of obtaining the nucleus of the tertiary label, it also includes lysing the nucleus of the tertiary label, and purifying the released nucleus connected with the first sequence tag, the second sequence tag and The DNA of the third sequence tag, the DNA with the tag code attached is obtained.
  • Step S130 Amplify the fragmented DNA connected with the tag code to obtain a DNA library for single-cell genome sequencing.
  • the fragmented DNA connected with the tag code is fragmented using Tagmentation technology and connected to the library building adapter to obtain multiple fragmented DNAs with a shorter length and connected with the tag code; then the fragmented DNA with a shorter length is amplified and the tag is connected.
  • the fragmented DNA of the code is used to obtain a DNA library for single-cell genome sequencing.
  • other methods commonly used in the art can also be used to fragment the fragmented DNA connected with the tag code and connect to the library building adapter.
  • the steps of the method for constructing a DNA library for single-cell genome sequencing are roughly the same as the above-mentioned method for constructing a DNA library for single-cell genome sequencing, and the difference lies in
  • the steps of making the tag codes of the fragmented DNA ligation of each cell nucleus have different steps.
  • the DNA library construction method for single-cell genome sequencing makes the tag codes of the fragmented DNA ligation of each cell nucleus different steps including:
  • first sequence tags After grouping multiple nuclei with fragmented DNA, use different first sequence tags to label the fragmented DNA in each group of nuclei, so that the fragmented DNA in each group of nuclei are connected with the first sequence tag, and each group The first sequence tags connected by the fragmented DNA of the cell nuclei are different to obtain multiple sets of first-level labeled cell nuclei, where the first sequence tags are sequence tags;
  • the second sequence tag, the second sequence tags connected by the fragmented DNA of each group of primary labeled cell nuclei are different to obtain multiple sets of secondary labeled cell nuclei, where the second sequence tags are sequence tags;
  • the number of nuclei in each group is smaller than the product of the number of types of the first sequence tag and the number of types of the second sequence tag to obtain multiple groups of lysed solutions, and each nucleus in the lysed solution
  • the fragmented DNA inside is connected with a tag code composed of a first sequence tag and a second sequence tag.
  • the fragmented DNA of each cell nucleus is connected with different tag codes; or the fragmented DNA of the cell nucleus in the same group of lysis solution is connected
  • the probability of the same tag code is less than 5%;
  • One set of lysate is lysed to release the fragmented DNA connected with the first sequence tag and the second sequence tag in each cell nucleus in the lysed solution, and then the first sequence tag and the second sequence tag are connected by Tagmentation technology
  • the fragmented DNA is fragmented, and a library-building adapter containing a third sequence tag is connected to obtain a plurality of fragmented DNAs connected with a tag code consisting of a first sequence tag, a second sequence tag, and a third sequence tag, each
  • the tag codes containing the third sequence tag connected to the fragmented DNA are different from each other.
  • multiple groups of secondary labeled nuclei are mixed and equally grouped, and the number of nuclei in each group is less than the number of types of the first sequence tag and the number of the second sequence tag.
  • the product of the number of species replaces the third round of labeling.
  • the fragmented DNA in each nucleus is connected with a tag code consisting of a first sequence tag and a second sequence tag, and the fragmented DNA of each nucleus is connected
  • the label codes are different.
  • each kind of first sequence tag is composed of the first linking sequence and the first sequence connected to the first linking sequence.
  • the 5'ends of 96 kinds of first sequences are all connected with phosphate groups, and the base sequences of the first linking sequences of 96 kinds of first sequence tags are all as SEQ ID No. 201 is shown.
  • Each kind of second sequence tag consists of a second linking sequence and a second sequence connected to the second linking sequence.
  • the base sequences of 96 kinds of second sequences are as follows: As shown in SEQ ID No. 97 to SEQ ID No. 192, the 5'ends of 96 kinds of second sequences are all connected with biotin, and the base sequences of the second linking sequences of 96 kinds of second sequence tags are all as SEQ ID No. 202 shown.
  • the blocking sequence can be complementary to the bases at the 5'end of 96 kinds of first connecting sequences, and the base sequence of the blocking sequence is shown in SEQ ID No. 203.
  • the library building adapters include i5 terminal adapters and 8 types of i7 terminal adapters.
  • the base sequences of i5 terminal adapters are shown in SEQ ID No. 204, and the base sequences of 8 types of i7 terminal adapters are shown in SEQ ID No. 205 ⁇ SEQ ID No.
  • the 8 types of i7 end adaptors include 8 types of third sequence tags, and the sequences of the 8 types of third sequence tags are shown in the 8 types of SEQ ID No. 193 to SEQ ID No. 200.
  • Collect cells and cross-link collect human cells (293T) and mouse cells (CT26) and cross-link them separately.
  • the operations for both human cells and mouse cells are as follows: A. Centrifuge to collect 1 ⁇ 10 6 freshly cultured cells , 1500rpm, 3min, and resuspend in 1mL DMEM medium. B. Add 312.5 ⁇ L of 16% formaldehyde (with a concentration of 1%) to the cell suspension of step A to fix the cells, and incubate the cells with rotation at room temperature for 10 minutes. C. Add 312.5 ⁇ L of 2M glycine (final concentration of 0.125M) to the cell suspension incubated in step B, and incubate with rotation at room temperature for 5 minutes to terminate the cross-linking reaction.
  • the cells can be directly lysed to extract the nucleus, or temporarily stored at -80°C.
  • Second sequence tags Among the 96 second sequence tags used in the second round, the final concentration of each second sequence is 16 ⁇ M, and the final concentration of the second connection sequence is 15 ⁇ M.
  • the first round of connection 1) Configure the nucleus solution according to Table 1, and then divide the obtained nucleus solution into each reaction well of the 96-well plate for the first round of labeling, each reaction well is 10 ⁇ L, and pipette fully with a pipette tip Mix well. Then seal the lid with a plywood, and incubate in a 37°C incubator with slow rotation for 30 min.
  • Terminating the ligation reaction Add the terminating reaction solution (consisting of 400 mL 0.5M EDTA pH 8.0 and 800 mL H 2 O) into a new separation tank. Then the cell nucleus after the incubation in step G is transferred to the separating tank, and the cell nucleus and the termination reaction solution are fully pipetted and mixed with each transfer, and then a new cell nucleus is added. 1. Transfer all cell nuclei to a 15 mL centrifuge tube to obtain about 5 mL of secondary labeled cell nucleus solution. The fragmented DNA in the secondary labeled cell nucleus is sequentially connected with a first sequence tag and a second sequence tag.
  • Tris pH8.0 1M 20mM 0.5 ⁇ L NaCl 5M 400mM 2 ⁇ L EDTA, pH8.0 0.5M 100mM 5 ⁇ L SDS 10% 4.4% 11 ⁇ L ddH 2 O NA NA 6.5 ⁇ L
  • the number of cells in each sub-bank is less than 1800.
  • I. Put the number of cells required for each sub-bank into a new 1.7 mL test tube. Add 1 ⁇ PBS to each tube, the final volume is 50 ⁇ L.
  • J. Add 50 ⁇ L of 2 ⁇ Lysis Buffer to each tube.
  • K. Add 10 ⁇ L of proteinase K (20 mg/mL) to each lysate.
  • L. React at 55°C for 2 hours or overnight.
  • TWB Tween Wash Buffer
  • Liquid storage Final concentration 10mL 50mL 1M Tris-HCl pH 8.0 10mM 100 ⁇ L 500 ⁇ L 0.5M EDTA pH 8.0 1mM 20 ⁇ L 100 ⁇ L 5M NaCl 2M 4mL 20mL
  • A. Use bwamem to compare read1 containing genomic information to the reference genomes of humans and mice. The parameters are the default parameters.
  • B. Keep the fragments in read1 that can be compared to the genome, and record the comparison information to confirm which species the read comes from.
  • C. Use fastp to control the read2 quality of the first sequence, the second sequence and the UMI (Unique molecular identifiers) sequence containing the identifier, and use the parameter -A to retain the linker sequence.
  • D. Extract barcode1, barcode2 and UMI in the remaining read2 files.
  • E. Use starcode to cluster the extracted sequence tags and UMI, and use the parameter -d to set the allowable maximum edit distance to 1.
  • F. Remove reads that contain sequence tags that do not exist in the tag library.
  • the reads containing the same sequence tag combination are classified into the same group, and the reads are deduplicated according to the UMI information, and then the number of human and mouse reads contained in each group is marked based on the species information extracted by read1.
  • H. Draw a histogram of the number of human and mouse reads in each group, which is generally a bimodal distribution, and select the point that can separate the two peaks as the threshold. Then classify each taxa, and the rules are as follows: a) If the number of human and mouse reads contained in a taxa is lower than its corresponding threshold, it is classified as "non-cellular";
  • mouse cells If the number of mouse-derived reads contained in a certain group is higher than its corresponding threshold, and more than 90% of the reads of this group are of mouse-derived, then it is classified as "mouse cells".
  • a single-cell DNA fragment number distribution map ( Figure 1) and a human and mouse cell differentiation efficiency map ( Figure 2) can be obtained.
  • the abscissa represents the distribution of the number of non-redundant genomic DNA fragments obtained in a single cell, and the ordinate represents the number of cells.
  • light gray (located in the upper left part of the figure) indicates mouse-derived single cells that have been successfully single-cell labeled, and each cell contains only one label code.
  • Light black located in the lower right part of the figure) indicates successfully labeled human-derived cells, and each cell contains only one label code.
  • Black located in the upper right part of the figure indicates that a sequence label-labeled cell has both mouse-derived and human-derived DNA, that is, the sequence label code is contaminated and cannot distinguish single cells.
  • the proportion of this part of the cell is 4.62%, which can be single-cell contamination. Acceptable range ( ⁇ 5%).
  • Dark gray located in the lower left part of the figure is background noise or DNA fragments that have failed to mark. Both the abscissa and the total coordinate in Figure 2 indicate the number of reads contained in each single cell.
  • Example 1 can distinguish single cells and can be used to construct a DNA library for single-cell genome sequencing.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biochemistry (AREA)
  • Organic Chemistry (AREA)
  • Microbiology (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供的是一种单细胞基因组测序用的DNA文库的构建方法,该方法包括:将细胞核内的DNA片段化,得到DNA被片段化的细胞核;采用不同的序列标签对多个细胞核内的片段化DNA进行多轮标记,使得各细胞核内的片段化DNA连接有由多个序列标签组成的标签码,各细胞核的片段化DNA连接的标签码各不同;及扩增连接有标签码的片段化DNA,得到单细胞基因组测序用的DNA文库。

Description

单细胞基因组测序用的DNA文库的构建方法 技术领域
本发明涉及生物技术,特别是涉及一种单细胞基因组测序用的DNA文库的构建方法。
背景技术
单细胞基因组测序技术是在单细胞水平对全基因组进行扩增和测序的一项技术。其原理是将分离的单个细胞的全基因组DNA进行扩增,获得高覆盖率的完整的基因组后进行高通量测序,可用于揭示细胞群中个体差异和细胞进化关系。
目前,单细胞基因组测序首先需要构建单细胞基因组文库,然后再进行测序分析。然而在构建单细胞基因组文库时,要依赖于昂贵的微流控平台及试剂对单细胞进行分离,此操作较为繁琐、成本较高。
发明内容
基于此,有必要提供一种快捷、成本较低的单细胞基因组测序用的DNA文库的构建方法。
一种单细胞基因组测序用的DNA文库的构建方法,包括:
将细胞核内的DNA片段化,得到DNA被片段化的细胞核;
采用不同的序列标签对多个所述细胞核内的片段化DNA进行多轮标记,使得各所述细胞核内的片段化DNA连接有由多个所述序列标签组成的标签码,各所述细胞核的片段化DNA连接的标签码各不同;及
扩增所述连接有标签码的片段化DNA,得到单细胞基因组测序用的DNA文库。
上述细胞基因组测序用的DNA文库的构建方法。通过将细胞核作为标记DNA的反应室,采用不同的序列标签对细胞核内的DNA进行多轮标记,最终使得每个细胞核内的DNA都连上一个经多轮标记而形成的独特标签码,以该标签码区分不同的细胞核,从而实现单个细胞的区分。该方法操作简便、且成本较低。
附图说明
图1为实施例1的单细胞DNA片段分布图;图2为实施例1的细胞区分效率图。
具体实施方式
为了便于理解本发明,下面将参照相关附图对本发明进行更全面的描述。附图中给出了本发明的部分实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使本发明公开内容更加透彻全面。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
一实施方式的单细胞基因组测序用的DNA文库的构建方法,该方法通过将细胞核作为标记DNA的反应场所,采用不同的序列标签对细胞核内的DNA进行多轮标记,最终使得每个细胞核内的DNA都连上一个经多轮标记而形成的独特标签码,以该标签码区分不同的细胞核,从而实现单个细胞的区分。以序列标签种类为96种为例,经过一轮标记之后就形成96种标记,经过两轮标记之后能形成9216种标记,经三轮标记之后就能形成884736种标记。因此,若待区分的细胞个数小于884736,则经过三轮标记就能使得各个细胞核中的DNA连接的标签码各不相同。该方法不必使用微流控技术区分单个细胞,通过多轮DNA标记(DNA连接反应)就能实现单个细胞的区分,操作简便,成本低,是一种快捷且成本较低的单细胞基因组测序用的DNA文库的构建方法。
具体地,该单细胞基因组测序用的DNA文库的构建方法包括步骤S110~步骤S130。
步骤S110:将细胞核内的DNA片段化,得到DNA被片段化的细胞核。
具体地,收集细胞并计数,然后将细胞的细胞膜裂解,得到细胞核;然后将细胞核内的DNA片段化,得到DNA被片段化的细胞核。将细胞核内的DNA片段化便于加上序列标签。
在本实施方式中,细胞的计数采用先甲醛固定后计数的方式。具体地,先将收集的细胞与甲醛混合,使得细胞固定以便计数,然后对固定的细胞进行计数。当然,在其他实施方式中,也可以采用其他本领域常用的细胞计数方式进行细胞计数。
在本实施方式中,将细胞核内的DNA片段化的方式为酶切。具体地,酶切所用的酶为Dpn II。在标记之前将细胞核内的DNA片段化,使得DNA有粘性末端出现,便于序列标签与DNA连接。
步骤S120:采用不同的序列标签对多个细胞核内的片段化DNA进行多轮标记,使得各细胞核内的片段化DNA连接有由多个序列标签组成的标签码,各细胞核的片段化DNA连接的标签码各不同。
具体地,序列标签为碱基序列,用于形成标签码。在测序时,同一个细胞核内的片段化DNA连接的标签码相同,不同细胞核内的片段化DNA连接的标签码不同。通过识别标签码而区分不同的细胞核内的片段化DNA,从而实现单细胞基因组测序。序列标签包括识别部(barcode),识别部起标识作用。在本实施方式中,序列标签为200种,200种序列标签的识别部的碱基序列如SEQ ID No.1~SEQ ID No.200所示。进一步地,序列标签还包括连接部(linker),连接部用于序列标签之间的连接。更进一步地,识别部与连接部通过碱基互补配对的方式连接。
当然,在其他实施方式中,标签序列不限于200种,可以根据需要区分的细胞的总数及标记的轮数进行选择,例如可以是48种、96种、384种等。序列标签的识别部的碱基序列也不限于上述SEQ ID No.1~SEQ ID No.200所示的碱基序列,可以根据实际需求进行选择,只要能够其识别作用即可。
进一步地,采用不同的序列标签对多个细胞核内的片段化DNA进行多轮标记,使得各细胞核内的片段化DNA连接有由多个序列标签组成的标签码,各细胞核的片段化DNA连接的标签码各不同的步骤包括步骤S121~步骤S123。
步骤S121:将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组细胞核内的片段化DNA进行标记,使得各组细胞核内的片段化DNA均连接上第一序列标签,各组细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核。其中,第一序列标签包括用于标识的第一序列和用于与第二序列标签连接的第一连接序列。第一序列与第一连接序列、第一连接序列与第二序列标签均通过碱基互补配对的方式连接。进一步地,第一序列的5’端连接有磷酸基团。通过磷酸基团能够使得第一序列与片段化DNA连接。
在本实施方式中,第一序列的碱基序列选自如SEQ ID No.1~SEQ ID No.96所示的碱基序列中的一种;第一连接序列的碱基序列如SEQ ID No.201所示。当然,第一序列的碱基序列不限于上述SEQ ID No.1~SEQ ID No.96所示的碱基序列中的一种。在其他实施方式中,还可以是本领域其他常用于起标识作用的碱基序列或根据本领域的常规方法设计的用于起标识作用的碱基序列;同样地,第一连接序列的碱基序列也不限于上述SEQ ID No.201所示的碱基序列。
在本实施方式中,多个细胞的细胞核的分组方式为随机均等分组。
具体地,将多个DNA被片段化的细胞核分组后与不同的第一序列标签混合,然后孵育,得到多组含有不同第一序列标签的预连接液;及向各组预连接液中加入DNA连接酶,然后孵育,得到多组一级标记细胞核。通过先将第一序列标签与细胞核混合孵育之后,再加入连接酶孵育,使得第一序列标签先进入细胞核内与细胞核内的片段化DNA混合,使得多个片段化DNA被第一序列标签间隔,减少多个片段化DNA间的互联。
进一步地,多个DNA被片段化的细胞核均分到含有第一序列标签的不同反应容器(例如EP管或多孔板)中混合,其中不同的反应容器中的第一序列标签不同;然后孵育,得到多组含有不同第一序列标签的预连接液。向多组含有不同第一序列标签的预连接液中加入DNA连接酶并孵育,使得反应容器中细胞核内的片段化DNA与第一序列标签发生连接反应,从而得到多组一级标记细胞核。其中,同一反应容器内的细胞核内的片段化DNA均连接上相同的第一序列标签,不同反应容器内的细胞核内的片段化DNA连接上的第一序列标签各不相同。
在其中一个实施例中,在将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组所述细胞核内的片段化DNA进行标记,使得各组所述细胞核内的片段化DNA均连接上所述第一序列标签,各组所述细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核的步骤之后,还包括将各组一级标记细胞核分别与阻断序列混合的步骤。一般地,采用过量的第一序列标签与细胞核混合,使得细胞核内的所有片段化DNA均连接上第一序列标签,所以,在连接反应结束之后,会有游离的第一序列标签。此时,若直接将各个反应容器中的细胞核混合,则可能会干扰下一轮的标记。因此,通过向连接反应结束后的反应容器中加入阻断序列,使得阻断序列与各反应容器中游离的第一序列标签结合,减少上一轮标记对下一轮标记的影响。
在本实施方式中,阻断序列的碱基序列如SEQ ID No.203所示。当然,阻断序列的碱基序列不限于如SEQ ID No.203所示的碱基序列。在其他实施方式中,还可以是本领域其他常用于阻断的碱基序列或根据本领域的 常规方法设计的用于起阻断作用的碱基序列。
步骤S122:将多组一级标记细胞核混合后分组,然后采用不同的第二序列标签对分组后的一级标记细胞核内的片段化DNA进行标记,使得各组一级标记细胞核内的片段化DNA均连接上第二序列标签,各组一级标记细胞核的片段化DNA连接的第二序列标签各不相同,得到多组二级标记细胞核。其中,第二序列标签包括用于标识的第二序列和用于与第一序列标签连接的第二连接序列。
在本实施方式中,第二序列的碱基序列选自如SEQ ID No.97~SEQ ID No.192所示的碱基序列中的一种;第二连接序列的碱基序列如SEQ ID No.202所示。当然,在其他实施方式中,第二序列的碱基序列不限于上述SEQ ID No.97~SEQ ID No.192所示的碱基序列中的一种,还可以是本领域常用的其他碱基序列或根据本领域的常规方法设计的用于起标识作用的碱基序列;同样地,第二连接序列的碱基序列也不限于上述SEQ ID No.202所示的碱基序列。
进一步地,第二序列的5’端连接有生物素,以便于后续连接有标签码的片段化DNA的纯化。
在本实施方式中,多组一级标记细胞核混合后分组的方式为随机均等分组。
更具体地,将多个一级标记细胞核均分到含有第二序列标签的不同反应容器中混合,其中不同反应容器中的第二序列标签不同;然后孵育,得到多组含有不同第二序列标签的预连接液。向多组含有不同第二序列标签的预连接液中加入DNA连接酶并孵育,使得反应容器中细胞核内片段化DNA的第一序列标签与第二序列标签发生连接反应,从而得到多组二级标记细胞核。其中,同一反应容器内的细胞核内的片段化DNA均连接上相同的第二序列标签,不同反应容器内的细胞核内的片段化DNA连接上的第二序列标签各不相同。
步骤S123:将多组二级标记细胞核混合后分组,然后采用不同的第三序列标签对分组后的二级标记细胞核内的片段化DNA进行标记,使得各组二级标记细胞核的片段化DNA均连接上第三序列标签,各组二级标记细胞核的DNA连接的第三序列标签各不相同,得到多组三级标记细胞核。其中,第三序列标签包括用于标识的第三序列。
在本实施方式中,第三序列的碱基序列选自如SEQ ID No.193~SEQ ID No.200所示的碱基序列中的一种。当然,在其他实施方式中,第三序列的碱基序列不限于上述SEQ ID No.193~SEQ ID No.200所示的碱基序列中的一种,还可以是本领域常用的其他碱基序列或根据本领域的常规方法设计的用于起标识作用的碱基序列。
更具体地,将多个二级标记细胞核均分到含有第三序列标签的不同反应容器中混合,其中不同反应容器中的第三序列标签不同;然后孵育,得到多组含有不同第三序列标签的预连接液。向多组含有不同第三序列标签的预连接液中加入DNA连接酶并孵育,使得反应容器中细胞核内片段化DNA的第二序列标签和第三序列标签发生连接反应,从而得到多组三级标记细胞核。其中,同一反应容器内的细胞核内的片段化DNA均连接上相同的第三序列标签,不同反应容器内的细胞核内的片段化DNA连接上的第三序列标签各不相同。
需要说明的是,本实施方式中,第一序列标签、第二序列标签及第三序列标签的起标识作用的序列均各不相同。当然,在其他一些实施方式中,第一序列标签、第二序列标签及第三序列标签起标识作用的序列可以相同。例如,第一标签的第一序列和第二标签的第二序列均是如SEQ ID No.1~SEQ ID No.96所示的碱基序列。
本实施方式中,通过三轮标记形成各细胞核内的DNA的标签码。第一序列标签的种类数与第二序列标签的种类数和第三序列标签的种类数之积大于DNA被片段化的细胞核的个数;三轮标记之后,各细胞核内的DNA的标签码由各细胞核内的DNA对应的第一序列标签、第二序列标签及第三序列标签依次连接而成。当然,用作区分不同细胞核的DNA的标签码的形成所需的轮数不限于三轮,还可以根据需要区分的细胞核的个数及序列标签的种类数进行设计。
当然,在获得各所述细胞核的片段化DNA连接的标签码各不同的步骤之后,还包括裂解细胞核,并纯化连接有标签码的DNA的步骤。例如,若标签码的形成只需三轮,则在获得三级标记的细胞核的步骤之后,还包括将三级标记的细胞核裂解,并纯化释放的连接有第一序列标签、第二序列标签及第三序列标签的DNA,得到连接有标签码的DNA。
步骤S130:扩增连接有标签码的片段化DNA,得到单细胞基因组测序用的DNA文库。
具体地,将连接有标签码的片段化DNA采用Tagmentation技术片段化并接上建库接头,得到多个长度更短的连接有标签码的片段化DNA;然后扩增长度更短的连接有标签码的片段化DNA,得到单细胞基因组测序用的DNA文库。当然,在其他实施方式中,也可以采用本领域常用的其他方法将连接有标签码的片段化 DNA片段化并连接上建库接头。
另一实施方式的单细胞基因组测序用的DNA文库的构建方法,该单细胞基因组测序用的DNA文库的构建方法的步骤大致与上述单细胞基因组测序用的DNA文库的构建方法相同,其不同在于使各个细胞核的的片段化DNA连接的标签码各不同的步骤,该单细胞基因组测序用的DNA文库的构建方法使各个细胞核的的片段化DNA连接的标签码各不同的步骤包括:
将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组细胞核内的片段化DNA进行标记,使得各组细胞核内的片段化DNA均连接上第一序列标签,各组细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核,其中,第一序列标签为序列标签;
将多组一级标记细胞核混合后分组,然后采用不同的第二序列标签对分组后的一级标记细胞核内的片段化DNA进行标记,使得各组一级标记细胞核内的片段化DNA均连接上第二序列标签,各组一级标记细胞核的片段化DNA连接的第二序列标签各不相同,得到多组二级标记细胞核,其中,第二序列标签为序列标签;
将多组二级标记细胞核混合后均等分组,各组的细胞核的数量小于第一序列标签的种类数与第二序列标签的种类数之积,得到多组待裂解液,待裂解液中各细胞核内的片段化DNA连接有由第一序列标签和第二序列标签组成的标签码,各细胞核的片段化DNA连接的标签码各不同;或同一组待裂解液中的细胞核的片段化DNA连接的标签码相同的概率小于5%;
裂解其中一组待裂解液,以释放待裂解液中各细胞核内的连接有第一序列标签及第二序列标签的片段化DNA,然后通过Tagmentation技术将连接有第一序列标签及第二序列标签的片段化DNA片段化,并连接上含有第三序列标签的建库接头,得到多个连接有由第一序列标签、第二序列标签及第三序列标签组成的标签码的片段化DNA,各片段化后的DNA连接的含有第三序列标签的标签码各不相同。
该实施方式的单细胞基因组测序用的DNA文库的构建方法中通过以多组二级标记细胞核混合后均等分组,并各组的细胞核的数量小于第一序列标签的种类数与第二序列标签的种类数之积的方式替代第三轮标记,在各组细胞核裂解后,各细胞核内的片段化DNA连接有由第一序列标签和第二序列标签组成的标签码,各细胞核的片段化DNA连接的标签码各不同。
具体实施例
以下结合具体实施例进行详细说明。实施例中采用药物和仪器如非特别说明,均为本领域常规选择。实施例中未注明具体条件的实验方法,按照常规条件,例如文献、书本中所述的条件或者生产厂家推荐的方法实现。
实施例1
(1)委托上海生工生物工程技术服务有限公司合成第一序列标签、第二序列标签、阻断序列及建库接头。其中:第一序列标签共96种,每种第一序列标签均由第一连接序列及与第一连接序列连接的第一序列组成,第一序列有96种,96种第一序列的碱基序列如SEQ ID No.1~SEQ ID No.96所示,96种第一序列的5’端均连接有磷酸基团,96种第一序列标签的第一连接序列的碱基序列均如SEQ ID No.201所示。第二序列标签共96种,每种第二序列标签均由第二连接序列及与第二连接序列连接的第二序列组成,第二序列有96种,96种第二序列的碱基序列如SEQ ID No.97~SEQ ID No.192所示,96种第二序列的5’端均连接有生物素,96种第二序列标签的第二连接序列的碱基序列均如SEQ ID No.202所示。阻断序列能与96种第一连接序列的5’端的碱基互补配对,阻断序列的碱基序列如SEQ ID No.203所示。建库接头包括i5端接头及8种i7端接头,i5端接头的碱基序列如SEQ ID No.204所示,8种i7端接头的碱基序列如SEQ ID No.205~SEQ ID No.212所示,8种i7端接头的包括8种第三序列标签,8种第三序列标签的序列如8种SEQ ID No.193~SEQ ID No.200所示。
(2)收集细胞并交联:收集人细胞(293T)和鼠细胞(CT26)并分别进行交联,人细胞和鼠细胞的操作均如下:A、离心收集新鲜培养的细胞1×10 6个,1500rpm,3min,并重悬至1mL DMEM培养基中。B、加312.5μL 16%甲醛(浓度为1%)至步骤A的细胞悬液中对细胞进行固定,并室温旋转孵育10min。C、向步骤B中孵育后的细胞悬浮液加入312.5μL 2M甘氨酸(终浓度为0.125M),室温旋转孵育5min,终止交联反应。然后在冰上孵育15min。D、1500rpm离心3min,收集细胞。E、1×PBS缓冲液清洗一次。F、弃去上清后,细胞可直接进行裂解提取细胞核,也可以暂存于-80℃。
(3)裂解细胞并将细胞核内的DNA片段化:
A、将步骤(2)获得的人细胞和鼠细胞分别计算,然后按照1:1混合,使得细胞总数为1×10 5个。B、 加500μL预冷的裂解缓冲液(由10mM Tris-HCl pH 8.0、10mM NaCl、0.2%Igepal CA-630、EDTA-free蛋白酶抑制剂组成的混合物)至步骤A获得的人细胞与鼠细胞的混合物中,充分重悬,冰上孵育30min,使细胞充分裂解。C、4℃,650g,离心5min,去掉上清,收集细胞核。D、500μL 1×Dpn II缓冲液清洗细胞核两次。E、362μL 1×Dpn II重悬细胞核。F、增加细胞核膜通透性:加38μL 1%SDS入步骤E的细胞核中,小心吹打混匀,避免产生气泡。65℃孵育10min后迅速插入冰上,并加入44μL 10%Triton X-100,小心吹打混匀,避免产生气泡。G、酶切消化染色体:增加细胞核膜通透性后,加入50μL 1%BSA,10μL 10×Dpn II缓冲液和20μL Dpn II(NEB),于37℃旋转孵育(50rpm)过夜。
(4)对细胞核内的DNA进行标记
A、65℃处理细胞核20min,使Dpn II失活。B、将细胞核依次通过孔径为40μm和20μm的过滤器,去除粘连在一起的细胞团。C、准备2块96孔板,制备第一序列标签和第二序列标签:1)第一序列标签:第一轮用的96种第一序列标签中,每种第一序列的终浓度为14μM,第一连接序列的终浓度13μM。先在96孔板的每个反应孔中添加14μL第一序列,各个孔中第一序列各不相同;然后向每个添加有第一序列的反应孔中添加13μL的第一连接序列;最后向每个添加了第一连接序列的反应孔中添加73μL的水。2)第二序列标签:第二轮用的96种第二序列标签中,每种第二序列的终浓度为16μM,第二连接序列的终浓度15μM。先在96孔板的每个反应孔中添加16μL第二序列,各个反应孔中第二序列各不相同;然后向每个添加有第二序列的反应孔中添加15μL的第二连接序列;最后向每个添加了第二连接序列的反应孔中添加69μL的水。使用前,对于每个96孔板,用以下热循环操作进行退火:加热到95℃,持续2min;然后下降到20℃,速率为-0.1℃/s;然后4℃,得到第一轮标记用的96孔板及第二轮标记用的96孔板。D、第一轮连接:1)按照表1配置细胞核溶液,然后将得到的细胞核溶液分至第一轮标记用的96孔板的每个反应孔,每个反应孔10μL,用枪头充分吹打混匀。然后用胶合板密封盖好,37℃培养箱中缓慢旋转孵育30min。
表1
Figure PCTCN2020129463-appb-000001
2)按照表2配置连接酶溶液,将连接酶溶液分至已加入细胞核及连接酶缓冲液的96孔板的反应孔,每个反应孔10μL,用枪头充吹打混匀。然后用胶合板密封盖好,室温缓慢旋转孵育2小时。
表2
Figure PCTCN2020129463-appb-000002
3)第一轮连接的阻断:在步骤2)孵育结束后的每个反应孔中加入10μL的阻断序列,用自粘封板膜密封,37℃培养箱中缓慢旋转(50rpm)孵育30min。
E、阻断结束后,取出96孔板,取下封板膜,将所有细胞核转入分液槽进行合并。F、通过20μm过滤器后转入新分液槽,以除去粘连的细胞核团。G、第二轮连接:将100μL T4 DNA连接酶加入细胞核溶液中,吹打20次混合,避免产生气泡。将细胞核转移至装有已退火的第二轮标记用的96孔板中,每个反应孔28μL。放入37℃培养箱中缓慢旋转(50rpm)孵育30min。H、终止连接反应:在新的分液槽中加入终止反应液(由400mL 0.5M EDTA pH8.0和800mL H 2O组成)。然后将步骤G孵育后的细胞核转移至分液槽中,每次转入时将细胞核与终止反应液充分吹打混匀再加入新的细胞核。I、将所有细胞核转到一个15mL的离心管中,得到约5mL的二级标记细胞核溶液,该二级标记细胞核内的片段化DNA依次连接有第一序列标签和第二序列标签。
(5)DNA与蛋白解交联、裂解细胞核:
A、按照表3准备2×的裂解缓冲液:
表3
试剂 储液浓度 最终浓度 体积(25μL)
Tris,pH8.0 1M 20mM 0.5μL
NaCl 5M 400mM 2μL
EDTA,pH8.0 0.5M 100mM 5μL
SDS 10% 4.4% 11μL
ddH 2O NA NA 6.5μL
B、准备下述清洗缓冲液:
表4
试剂 体积
1×PBS 4000μL
10%Triton X-100 40μL
C、按100:1的比例添加10%Triton X-100到步骤(4)得到的二级标记细胞核溶液中(Triton X-100的终浓度为0.1%)。D、4℃,1000g,离心5min,小心弃去上清后,加4mL清洗缓冲液重悬沉淀,充分混匀,清洗细胞核。E、4℃,1000g离心5min。然后吸入上清液,重新悬浮于50μL PBS中。F、取5μL到5μL的1×PBS中,用血细胞板计数。H、根据第一序列标签及第二序列标签的种类数量确定子库含有的细胞数量,本实施例中,每个子库中的细胞数量均小于1800个。I、将每个子库所需的细胞数放入新的1.7mL试管中。每管加入1×PBS,最终体积为50μL。J、每管加入50μL 2×裂解缓冲液。K、在每个裂解液中加入10μL蛋白酶K(20mg/mL)。L、在55℃反应2小时或过夜。
(6)连接有第一序列标签和第二序列标签的片段化DNA的纯化:
A、取2μL链霉亲和素磁珠加到装有400μL Tween Wash Buffer(TWB)的1.5毫升管中。室温旋转混匀2min。其中,TWB配方如表5所示:
表5
储液 终浓度 10mL 50mL
1M Tris-HCl pH 8.0 5mM 50μL 250μL
0.5M EDTA pH 8.0 0.5mM 10μL 50μL
5M NaCl 1M 2mL 10mL
B、将离心管置于磁力架上,待至溶液变澄清,吸去上清。C、再次重复步骤A和步骤B。D、用dd H 2O将步骤C得到的细胞核裂解液体积增加到400μL。E、将400μL 2×Binding缓冲液(BB)与400μL细胞核裂解液重悬磁珠。2×Binding缓冲液(BB)的配方如表6所示:
表6
储液 终浓度 10mL 50mL
1M Tris-HCl pH 8.0 10mM 100μL 500μL
0.5M EDTA pH 8.0 1mM 20μL 100μL
5M NaCl 2M 4mL 20mL
F、在室温条件下旋转孵育15min,使被生物素标记的片段结合到链霉亲和素磁珠上。G、将离心管置于磁力架上。待至溶液变澄清,弃上清。H、用400μL 1×Binding缓冲液重悬磁珠并转移到新的LoBind管。I、将离心管置于磁力架上。待至溶液变澄清,弃上清。J、用100μL 1×Binding缓冲液重悬磁珠并转移到新的LoBind管。K、将离心管置于磁力架上。待至溶液变澄清,弃上清。I、加20μL ddH 2O重悬磁珠。
(7)采用Tagmentation技术将DNA打成小片段并插入建库接头:
A在冰上融化5×TTBL缓冲液并按表7进行tagmentation反应:先将TTBL、纯化后的磁珠-DNA、TTE Mix V5及H 2O的混合液充分混匀,避免起泡,然后在55℃孵育10min,迅速冷却至4℃。最后将7.5μL 1%SDS添加到管中并充分吹打混合,55℃孵育15min。
表7
组分 体积
5×TTBL 6μL
纯化后的磁珠-DNA 20μL
TTE Mix V5(Tn5) 2μL
H 2O 补足至30μL
B、将离心管置于磁力架上。待至溶液变澄清,弃去上清。C、800μL 1×BB重悬磁珠并转移到新的LoBind管。D、将离心管置于磁力架上。待至溶液变澄清,弃去上清。E、用100μL 1×BB重悬磁珠并转移到新的LoBind管。离心管置于磁力架上,待至溶液变澄清,去除上清。F、20μL ddH 2O重悬磁珠。
(8)文库扩增:使用Vazyme公司的Vazyme TruePrepTM DNA文库准备试剂盒V2,TD502。其中,扩增体系如表8所示,扩增条件如表9所示。
表8
PCR Mix 50μL
5×TAB 10μL
TAE 1μL
i5端接头(2.5μM) 1μL
i7端接头(2.5μM) 1μL
H 2O 17μL
步骤(6)得到的磁珠 20μL
表9
Figure PCTCN2020129463-appb-000003
(9)文库扩增:使用AMpure XP磁珠对片段进行片段分选及纯化,用于去除引物二聚体,并得到300bp~500bp的DNA片段:
A、使用前将Vazyme VAHTS DNA磁珠放室温中30min,平衡至室温。B、轻轻离心取PCR产物上清。并按0.55×的比例,在PCR产物上清中加入DNA磁珠。C、反复吹打至少10次,充分混匀。D、室温静置5min。E、用磁力架结合磁珠约5min,然后将上清液转移到新的试管中。F、在上清液中加入0.15×体积的磁珠。反复吹打至少10次,充分混匀。G、室温静置5min。H、用磁力架结合磁珠约5min。弃去上清。I、用1mL新配置的70%乙醇将磁珠清洗两次,小心不要吸到磁珠。J、吸去上清后将离心管置于磁力架上,把珠子风干。K、将磁珠重悬在30μL ddH 2O中,吹打10次以上以充分混匀。L、室温静置10min,每隔2min轻敲一次试管。M、将离心管置于磁力架上静置5min。N、将包含最终文库的上清转移到新的离心管中。O、如上所述,使用2%琼脂糖凝胶(5μL文库)电泳检测文库大小,并进行Qubit定量(1μL文库)。P、将上述文库送至深圳市海普洛斯生物科技有限公司进行上机测序,测序模式为PE150测序,测序平台为HiSeq X Ten。
(10)采用生物信息学方法对测序数据进行质量分析。
A、使用bwa mem将含有基因组信息的read1比对到人和小鼠的参考基因组上,参数为默认参数。B、保留read1中可以比对到基因组上的片段,并记录比对信息从而确认该read来自何物种。C、利用fastp对含有标识的第一序列、第二序列及UMI(Unique molecular identifiers)序列的read2质控,使用参数-A保留接头序列。D、提取剩余的read2文件中的barcode1,barcode2和UMI。E、利用starcode对提取出的序列标签和UMI进行聚类,使用参数-d设置允许的最大编辑距离为1。F、去除含有标签库中不存在的序列标签的reads。G、将含有相同序列标签组合的reads归为同一类群,同时根据UMI信息对reads去重,随后基于read1提取到的物种信息标注每个类群中所含有的人源和鼠源的reads数目。H、分别绘制各个类群人源和鼠源reads数目的直方图,其一般为双峰分布,选取恰好可以分开两个峰的位点作为阈值。然后对每个类群进行归类,规则如 下:a)若某一类群所含有的人源和鼠源reads数均低于其对应的阈值,则将其归类为“非细胞”;
b)若某一类群所含有的人源reads数高于其对应的阈值,且该类群90%以上的reads为人源,则将其归类为“人类细胞”。
c)若某一类群所含有的鼠源reads数高于其对应的阈值,且该类群90%以上的reads为鼠源,则将其归类为“小鼠细胞”。
d)若不满足a)、b)、c)中的条件,则将其归类为“混合细胞”。
下机数据按照步骤(10)处理后,可得到单细胞DNA片段数目分布图(如图1)和人和小鼠细胞区分效率图(如图2)。图1中,横坐标表示单个细胞里得到的非冗余基因组DNA片段数目分布,纵坐标表示细胞数目。图2中,浅灰(位于图中左上部分)表示成功进行单细胞标记的小鼠来源单细胞,每个细胞仅含有一种标签码。浅黑(位于图中右下部分)表示成功标记的人来源细胞,每个细胞仅含有一种标签码。黑色(位于图中右上部分)表示一个序列标签标记细胞既有小鼠来源又有人来源的DNA,即序列标签码发生污染,无法区分单细胞,该部分细胞比例为4.62%,处于单细胞污染可接受范围(<5%)。深灰色(位于图中左下部分)为背景噪声或者是标记失败的DNA片段。图2的横坐标和总坐标都表示每个单细胞里包含的reads数。
因此,由图1和图2可知,采用实施例1的方法能够区分单个细胞,可以用于构建单细胞基因组测序用的DNA文库。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种单细胞基因组测序用的DNA文库的构建方法,其特征在于,包括:
    步骤S110:将细胞核内的DNA片段化,得到DNA被片段化的细胞核;
    步骤S120:采用不同的序列标签对多个所述细胞核内的片段化DNA进行多轮标记,使得各所述细胞核内的片段化DNA连接有由多个所述序列标签组成的标签码,各所述细胞核的片段化DNA连接的标签码各不同;及
    步骤S130:扩增所述连接有标签码的片段化DNA,得到单细胞基因组测序用的DNA文库。
  2. 根据权利要求1所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述步骤S120包括:
    将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组所述细胞核内的片段化DNA进行标记,使得各组所述细胞核内的片段化DNA均连接上第一序列标签,各组所述细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核;
    将多组所述一级标记细胞核混合后分组,然后采用不同的第二序列标签对分组后的所述一级标记细胞核内的片段化DNA进行标记,使得各组所述一级标记细胞核内的片段化DNA均连接上第二序列标签,各组所述一级标记细胞核的片段化DNA连接的第二序列标签各不相同,得到多组二级标记细胞核;及
    将多组所述二级标记细胞核混合后分组,然后采用不同的第三序列标签对分组后的所述二级标记细胞核内的片段化DNA进行标记,使得各组所述二级标记细胞核的片段化DNA均连接上所述第三序列标签,各组所述二级标记细胞核的DNA连接的第三序列标签各不相同,得到多组三级标记细胞核,其中,所述第一序列标签的种类数与所述第二序列标签的种类数和所述第三序列标签的种类数之积大于所述DNA被片段化的细胞核的个数。
  3. 根据权利要求2所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,各所述细胞核的片段化DNA的所述标签码由各所述细胞核的片段化DNA对应的所述第一序列标签、所述第二序列标签及所述第三序列标签依次连接而成。
  4. 根据权利要求2所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组所述细胞核内的片段化DNA进行标记,使得各组所述细胞核内的片段化DNA均连接上第一序列标签,各组所述细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核的步骤包括:
    将多个DNA被片段化的细胞核分组后与不同的第一序列标签混合,然后孵育,得到多组含有不同第一序列标签的预连接液;及
    向各组所述预连接液中加入DNA连接酶,然后孵育,得到多组一级标记细胞核。
  5. 根据权利要求2所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,在所述将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组所述细胞核内的片段化DNA进行标记,使得各组所述细胞核内的片段化DNA均连接上第一序列标签,各组所述细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核的步骤之后,还包括将各组所述一级标记细胞核分别与阻断序列混合的步骤。
  6. 根据权利要求5所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述阻断序列的碱基序列如SEQ ID No.203所示。
  7. 根据权利要求2~6任一项所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述第一序列标签包括用于标识的第一序列,所述第一 序列的碱基序列选自如SEQ ID No.1~SEQ ID No.96所示的碱基序列中的一种;及/或
    所述第二序列标签包括用于标识的第二序列,所述第二序列的碱基序列选自如SEQ ID No.97~SEQ ID No.192所示的碱基序列中的一种;及/或
    所述第三序列标签包括用于标识的第三序列,所述第三序列的碱基序列选自如SEQ ID No.193~SEQ ID No.200所示的碱基序列中的一种。
  8. 根据权利要求7所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述第一序列标签还包括用于与所述第二序列标签连接的第一连接序列,所述第一连接序列的碱基序列如SEQ ID No.201所示;及/或
    所述第二序列标签还包括用于与所述第一序列标签连接的第二连接序列,所述第二连接序列的碱基序列如SEQ ID No.202所示。
  9. 根据权利要求2~6任一项所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述第一序列的5’端连接有磷酸基团,所述第二序列的5’端连接有生物素。
  10. 根据权利要求1所述的单细胞基因组测序用的DNA文库的构建方法,其特征在于,所述步骤S120包括:
    将多个DNA被片段化的细胞核分组后,采用不同的第一序列标签对各组所述细胞核内的片段化DNA进行标记,使得各组所述细胞核内的片段化DNA均连接上第一序列标签,各组所述细胞核的片段化DNA连接的第一序列标签各不相同,得到多组一级标记细胞核,其中,所述第一序列标签为序列标签;
    将多组所述一级标记细胞核混合后分组,然后采用不同的第二序列标签对分组后的所述一级标记细胞核内的片段化DNA进行标记,使得各组所述一级标记细胞核内的片段化DNA均连接上第二序列标签,各组所述一级标记细胞核的 片段化DNA连接的第二序列标签各不相同,得到多组二级标记细胞核,其中,所述第二序列标签为序列标签;
    将多组所述二级标记细胞核混合后均等分组,得到多组待裂解液,所述待裂解液中各所述细胞核内的片段化DNA连接有由所述第一序列标签和所述第二序列标签组成的标签码,各所述细胞核的片段化DNA连接的标签码各不同;或同一组所述待裂解液中的所述细胞核的片段化DNA连接的标签码相同的概率小于5%;
    裂解其中一组所述待裂解液,以释放所述待裂解液中各细胞核内的连接有第一序列标签及第二序列标签的片段化DNA,然后通过Tagmentation技术将所述连接有第一序列标签及第二序列标签的片段化DNA片段化,并连接上含有第三序列标签的建库接头,得到多个连接有由所述第一序列标签、所述第二序列标签及所述第三序列标签组成的标签码的片段化DNA,各片段化后的DNA连接的含有第三序列标签的标签码各不相同。
PCT/CN2020/129463 2019-12-18 2020-11-17 单细胞基因组测序用的dna文库的构建方法 WO2021120959A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911311804.2 2019-12-18
CN201911311804.2A CN110952147B (zh) 2019-12-18 2019-12-18 单细胞基因组测序用的dna文库的构建方法

Publications (1)

Publication Number Publication Date
WO2021120959A1 true WO2021120959A1 (zh) 2021-06-24

Family

ID=69982711

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129463 WO2021120959A1 (zh) 2019-12-18 2020-11-17 单细胞基因组测序用的dna文库的构建方法

Country Status (2)

Country Link
CN (1) CN110952147B (zh)
WO (1) WO2021120959A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110952147B (zh) * 2019-12-18 2023-05-05 南方科技大学 单细胞基因组测序用的dna文库的构建方法
CN115058503A (zh) * 2022-06-24 2022-09-16 广州市碳码科技有限责任公司 一种barcode微滴的单细胞测序方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110268059A (zh) * 2016-07-22 2019-09-20 俄勒冈健康与科学大学 单细胞全基因组文库及制备其的组合索引方法
CN110438572A (zh) * 2019-09-16 2019-11-12 上海其明信息技术有限公司 单细胞测序的建库方法
CN110952147A (zh) * 2019-12-18 2020-04-03 南方科技大学 单细胞基因组测序用的dna文库的构建方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3553180B1 (en) * 2016-12-07 2022-05-04 MGI Tech Co., Ltd. Method for constructing single cell sequencing library and use thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110268059A (zh) * 2016-07-22 2019-09-20 俄勒冈健康与科学大学 单细胞全基因组文库及制备其的组合索引方法
CN110438572A (zh) * 2019-09-16 2019-11-12 上海其明信息技术有限公司 单细胞测序的建库方法
CN110952147A (zh) * 2019-12-18 2020-04-03 南方科技大学 单细胞基因组测序用的dna文库的构建方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VIJAY RAMANI, XINXIAN DENG, RUOLAN QIU, KEVIN L GUNDERSON, FRANK J STEEMERS, CHRISTINE M DISTECHE, WILLIAM S NOBLE, ZHIJUN DUAN, J: "Massively multiplex single-cell Hi-C", NAT METHODS, vol. 14, no. 3, 30 January 2017 (2017-01-30), pages 263 - 266, XP055673378, ISSN: 1548-7091, DOI: 10.1038/nmeth.4155 *

Also Published As

Publication number Publication date
CN110952147B (zh) 2023-05-05
CN110952147A (zh) 2020-04-03

Similar Documents

Publication Publication Date Title
Beier et al. Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.
WO2021120959A1 (zh) 单细胞基因组测序用的dna文库的构建方法
De Rop et al. Hydrop enables droplet-based single-cell ATAC-seq and single-cell RNA-seq using dissolvable hydrogel beads
US11074991B2 (en) Methods for multiplex chromatin interaction analysis by droplet sequencing with single molecule precision
US9663822B2 (en) Multiplex capture of nucleic acids
CN111808854B (zh) 带有分子条码的平衡接头及快速构建转录组文库的方法
US20220403465A1 (en) Systems, methods, and compositions for generating multi-omic information from single cells
CN108517567A (zh) 用于cfDNA建库的接头、引物组、试剂盒和建库方法
Tombácz et al. Long-read assays shed new light on the transcriptome complexity of a viral pathogen
CN114107459A (zh) 一种基于寡核苷酸链杂交标记的高通量单细胞测序方法
US12084652B2 (en) Methods and compositions for processing samples containing nucleic acids
Dong et al. Balanced Chromosomal Rearrangement Detection by Low‐Pass Whole‐Genome Sequencing
CN108410970A (zh) 一种单细胞基因组拷贝数变异的检测方法及试剂盒
US20230048356A1 (en) Cell barcoding compositions and methods
Poulsen et al. RNA‐Seq for bacterial gene expression
CN117089597A (zh) 一种单细胞文库构建测序方法及其应用
WO2024174405A1 (zh) 一种用于多样品pooling建库的接头、试剂盒及建库方法
Mulqueen et al. Scalable and efficient single-cell DNA methylation sequencing by combinatorial indexing
WO2023236121A1 (zh) 检测罕见细胞的方法、装置及其应用
CN112703253A (zh) 液滴单细胞表观基因组谱分析用于患者分层的用途
CN116694730A (zh) 一种单细胞开放染色质和转录组共测序文库的构建方法
CN108363903B (zh) 一种适用于单细胞的染色体非整倍性检测系统及应用
US20220362771A1 (en) Use of droplet single cell epigenome profiling for patient stratification
Sang et al. High throughput detection of variation in single-cell whole transcriptome through streamlined scFAST-seq
EP3594364A1 (en) Method of assaying nucleic acid in microfluidic droplets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902563

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902563

Country of ref document: EP

Kind code of ref document: A1