WO2023134719A1 - 基于探针杂交的高通量转录谱测序文库构建方法 - Google Patents

基于探针杂交的高通量转录谱测序文库构建方法 Download PDF

Info

Publication number
WO2023134719A1
WO2023134719A1 PCT/CN2023/071872 CN2023071872W WO2023134719A1 WO 2023134719 A1 WO2023134719 A1 WO 2023134719A1 CN 2023071872 W CN2023071872 W CN 2023071872W WO 2023134719 A1 WO2023134719 A1 WO 2023134719A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
primer
probe
cells
target region
Prior art date
Application number
PCT/CN2023/071872
Other languages
English (en)
French (fr)
Inventor
刘洋
李军
赵扬
Original Assignee
南京昕瑞再生医药科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京昕瑞再生医药科技有限公司 filed Critical 南京昕瑞再生医药科技有限公司
Publication of WO2023134719A1 publication Critical patent/WO2023134719A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6862Ligase chain reaction [LCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B70/00Tags or labels specially adapted for combinatorial chemistry or libraries, e.g. fluorescent tags or bar codes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism

Definitions

  • the invention relates to the field of biomedicine. Specifically, the present invention relates to a high-throughput transcript profiling sequencing library construction method based on probe hybridization, which can be used for high-throughput and low-cost drug screening.
  • RNA-seq is a powerful tool for studying drug effects using transcriptome changes as markers, but standard library construction is costly.
  • Hommek Teder et al. (npj Genomic Medicine (2016) 3:34) developed TAC-seq technology for precise quantification of specific nucleic acid biomarkers.
  • TAC-seq technology needs to use a kit or Trizol to extract RNA for each sample separately, and its probe hybridization and ligation steps need to be performed separately, and a single barcode also limits the number of samples that can be screened. Therefore, there is still a need in the art for improved high-throughput transcript profiling sequencing library construction methods.
  • FIG. 1 The development process and schematic diagram of the high-throughput transcript profiling sequencing technology based on probe hybridization.
  • A Schematic diagram of PHDs-seq;
  • B PHDs-seq process and schematic diagram.
  • FIG. 1 The establishment of adipocyte-induced adipocyte system from keloid fibroblasts and the test for PHDs-seq.
  • A Schematic diagram of the separation of human keloid fibroblasts and induction of adipocytes
  • B flow cytometric sorting results of human keloid fibroblasts.
  • Sorting results of CD90 (C) Morphology of isolated keloid cells, scale bar is 50um; (D) Reprogramming of isolated human keloid fibroblasts into adipocytes, induction medium is AD medium, Nile red Staining identification results, the scale is 200um; (E) The list of characteristic genes used to test PHDs-seq; (F) The gel running results of each sub-library (1-8) and mixed library 9 of PHDs-seq, DNA Marker: 2k plus; (G) The quality inspection results of the PHDs-seq sub-library mixed into a large library; (H) The quality control results of the PHDs-seq library: the distribution of base quality scores at different positions in the reads.
  • FIG. 3 Analysis and evaluation of PHDs-seq sequencing results.
  • A Each small library is split from the sequencing results of the PHDs-seq mixed library, and the heat map generated by calculating the expression of the characteristic genes is standardized by Log10 (CPM+1), Posi: Positive, KF: keloid fibroblast, D5 : five days of processing, D8: eight days of processing;
  • B a hierarchical clustering diagram between samples based on the expression of characteristic genes;
  • C a correlation analysis diagram between samples obtained based on the expression of characteristic genes.
  • FIG. 4 Analysis and evaluation of PHDs-seq sequencing results 2.
  • A), (B), (D), and (E) are some genes (ACTB, SDHA, PPIA, THY1) detected by internal comparison of PHDs-seq and qPCR in the four samples of KF_D5, Posi_D5, KF_D8, and Posi_D8, respectively.
  • BMP can improve the efficiency of KF cells induced to fat.
  • A The upper picture is the flow chart of the induction of keloid fibroblasts into adipocytes, the lower picture is the oil red O staining of fat, the scale is 100um;
  • B the fat marker gene ADIPOQ was detected on the 12th day in the picture A Quantitative analysis of FABP4 and FABP4;
  • C Statistical fat induction efficiency in Figure A;
  • D Test the effect of different concentrations of BMP4 on fat induction, and count the number of fat in each well;
  • E Use the small BMP signaling pathway Molecular inhibitors Dorsmorphin and DMH1 were treated, and the number of fat per well was counted (24-well plate);
  • F The phenotype map of Figure E.
  • FIG. 6 Screening of small molecules that enhance adipocyte efficiency using PHDs-seq.
  • A PHDs-seq screening flow chart of small molecules to improve fat induction efficiency;
  • B PHDs-seq sequencing heat map showing the expression of each characteristic gene after 8 days of small molecule treatment, Log10(CPM+1);
  • C The samples processed by PCA analysis, the red markers are candidate small molecules, and the blue ones are KF.
  • Figure 7 Shows the optimization of the relative TAC-seq system: hybrid ligation one-step reaction.
  • the present invention provides a method for constructing a high-throughput transcript profiling sequencing library based on probe hybridization, the method comprising:
  • step (c) transferring the cell lysate supernatant obtained in step (b) to the corresponding well of another multi-well plate, and performing a reverse transcription reaction to obtain cDNA;
  • a barcoding (Barcoding) PCR mixture to each well, said mixture comprising a DNA polymerase and a pair of barcode (Barcode) primers, said pair of barcode (Barcode) primers comprising the first target to said left probe a primer and a second primer for the right probe, one of the first and second primers comprising Well barcode sequences, the other contains plate barcode sequences that are unique to each multiwell plate;
  • a library that can be used for high-throughput transcriptional profiling sequencing is thus obtained.
  • the multi-well plate is a 96-well plate or a 384-well plate, preferably a 96-well plate.
  • the at least one biological sample comprising cells can be 1-200 or more, such as at least 2, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200 or more biological samples comprising cells.
  • the at least one biological sample comprising cells is biological samples each comprising a different cell type. In some embodiments, the at least one biological sample comprising cells comprises biological samples of the same cell type, but each biological sample has been treated differently, eg, with a different compound. In some embodiments, the treatment is capable of causing a particular phenotype of the cell.
  • the cells described herein can be any type of cell of interest.
  • the cells may be somatic cells, germ cells, stem cells (such as embryonic stem cells or induced pluripotent stem cells).
  • stem cells such as embryonic stem cells or induced pluripotent stem cells.
  • Such cells include, but are not limited to, neuronal cells, skeletal muscle cells, hepatocytes, fibroblasts, osteoblasts, chondrocytes, adipocytes, endothelial cells, mesenchymal cells, smooth muscle cells, cardiomyocytes, nerve cells, hematopoietic cells , islet cells, or almost any cell in the body including tumor cells.
  • the cells are fibroblasts.
  • the fibroblasts include, but are not limited to, keloid fibroblasts, skin fibroblasts, and cardiac fibroblasts.
  • the cells may be of mammalian or non-mammalian origin. In some embodiments, the cells are of human origin. In some embodiments, the starting cell is derived from a non-human mammal. In some embodiments, the cells are of murine origin, such as mice or rats or non-human primates.
  • the cells are lysed in step (b) using a non-ionic surfactant-based cell lysate.
  • the nonionic surfactant is Triton X-100.
  • the cell lysate consists of Tris-HCl, KCl, Ficoll such as Ficoll PM-400, Triton X-100, a ribonuclease inhibitor, and water.
  • the final concentration of each component of the cell lysate is: about 5mM to about 10mM, about 5mM to about 50mM, about 5mM to about 100mM, about 5mM to about 150mM, about 5mM to about 200mM, about 5 mM to about 250 mM, about 5 mM to about 500 mM Tris-HCl; about 7.5 mM to about 15 mM, about 7.5 mM to about 30 mM, about 7.5 mM to about 60 mM, about 7.5 mM to about 120 mM, about 7.5 mM to about 300 mM, About 7.5 mM to about 500 mM, about 7.5 mM to about 750 mM KCl; about 0.6% to about 5%, about 0.6% to about 10%, about 0.6% to about 20%, about 0.6% to about 30%, about 0.6% to about 40%, about 0.6% to about 50%, about 0.6% to about 60% polysucrose such as Ficoll PM-
  • the inventors have surprisingly found that using a mild cell lysate based on non-ionic surfactants, especially Triton X-100, the lysed supernatant can be used directly after lysing the cells for subsequent reverse transcription reactions without further steps. Purification step. Thus, in some embodiments, the cell lysate supernatant transferred in step c) is not further purified.
  • the reverse transcription reaction in the step (c) can be performed using a conventional (eg commercial) reverse transcription reaction system in the art.
  • the reverse transcription reaction can be performed using an Oligo-dT system.
  • the reverse transcription reaction is performed at about 40-45°C, eg, 42°C.
  • the reverse transcription reaction is performed for about 15-60 minutes, such as about 30 minutes.
  • the DNA ligase in the step (d) can be a conventional DNA ligase in the art, such as T4 DNA ligase or Taq DNA ligase, preferably Taq DNA ligase.
  • the hybridization-ligation mixture may comprise a buffer compatible with the DNA ligase.
  • the hybridization-ligation mixture may comprise at least one set of probe pairs that specifically hybridize to at least one target region of at least one gene to be detected.
  • the number of probe pairs depends on the number of genes/target regions to be detected.
  • the hybridization-ligation mixture may comprise 1-200 or more, such as at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200 or more probe pairs.
  • the probe pairs can be used to detect 1-200 or more, such as at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30 , at least 40, at least 50, at least 100, at least 150, at least 200 or more genes/target regions to be detected.
  • the gene to be detected is associated with at least one phenotype of the cell.
  • part or all of said at least one gene to be detected may be used as a marker for said phenotype.
  • the expression profile of part or all of said at least one gene to be detected can be used as a marker for said phenotype.
  • the phenotype can be, for example, an inhibition or increase in cell proliferation, a change in cell type, and the like. Those skilled in the art can determine the specific gene and number to be detected according to the specific cell phenotype.
  • the cells are keloid fibroblasts
  • the phenotype is reprogrammed from keloid fibroblasts to adipocytes
  • the gene to be detected is selected from one or more of the following or All: PRRX1, THY1, ACTA2, FBN1, COL1A1, COL3A1, MMP1, TIMP1, FABP4, ADIPOQ, EBF2, CEBPA, ZNF423, ZNF516, ATF2, LEP, PPARG, PPARGC1A, FNDC5, PRDM16, UCP1, INSR, SLC2A4, INSR, and SLC2C4.
  • the gene to be detected also includes an internal reference gene, for example, one or more or all selected from ACTB, CYC, GAPDH, HMBS, PPIA, SDHA, TBP, YWHAZ.
  • the target region is a region characteristic of the gene to be detected, ie, specific for the gene to be detected.
  • the target region may be from about 20 nucleotides (nt) to about 300 nt or longer in length, such as about 20 nt, about 30 nt, about 40 nt, about 50 nt, about 60 nt, About 70 nt, about 80 nt, about 90 nt, about 100 nt, about 120 nt, about 140 nt, about 160 nt, about 180 nt, about 200 nt, about 250 nt, about 300 nt or longer.
  • the left probe includes, from a 5' to 3' direction, a 5' primer binding sequence, a unimolecular tag (UMI), and a first target region binding sequence.
  • the right probe includes from a 5' to 3' direction a second target region binding sequence, a unimolecular tag (UMI) and a 3' primer binding sequence.
  • the 5' end of the right probe contains a phosphate group whereby it can be ligated to the 3' end of the left probe.
  • the first target region binding sequence and the second target region binding sequence perfectly match the target region of the gene to be detected after ligation.
  • the first or second target region binding sequence is about 10 nt to about 150 nt or longer in length, for example about 10 nt, about 15 nt, about 20 nt, about 25 nt, about 30 nt, about 35 nt, about 40 nt , about 45 nt, about 50 nt, about 60 nt, about 70 nt, about 80 nt, about 90 nt, about 100 nt, about 125 nt, about 1500 nt or longer, provided that it allows specific hybridization of the probe to the target region.
  • said first and/or second target region binding sequences span different exons of the gene to be detected.
  • the length of the unimolecular tag may be about 3nt-8nt, such as 4nt. It is known in the art how to design and generate single molecule tags. Single-molecule tags allow the identification of amplification products from a single transcript in sequencing.
  • the 5' primer binding sequence is a universal primer binding sequence, e.g., the 5' primer binding sequence is the same in different pairs of probes.
  • the 3' primer binding sequence is a universal primer binding sequence, e.g., the 3' primer binding sequence is the same in different pairs of probes.
  • the probe pairs are obtained and/or evaluated by the methods described below.
  • the probes are each present at a concentration of about 0.0001 ⁇ M to about 1 ⁇ M, eg, about 0.0001 ⁇ M to about 0.001 ⁇ M, about 0.0001 ⁇ M to about 0.01 ⁇ M, about 0.0001 ⁇ M to about 0.1 ⁇ M. In some embodiments, the concentration of each of the probes is no more than about 0.1 ⁇ M, preferably no more than about 0.01 ⁇ M, more preferably no more than about 0.001 ⁇ M.
  • probe concentrations as low as about 0.0001 [mu]M can be utilized in subsequent amplification steps to achieve the desired amplification efficiency and significantly increase the specificity of the amplification.
  • step (e) "hybridize said at least one set of probe pairs to the target region of said at least one gene to be detected” and “make said probe pairs hybridized to said target region
  • the steps of connecting the left probe and the right probe to each other are carried out under the same solution system.
  • step (e) "hybridizing the at least one set of probe pairs to the target region of the at least one gene to be detected” and “making the probe pairs hybridized to the target region
  • the steps of connecting the left probe and the right probe to each other are carried out simultaneously.
  • step (e) "hybridizing said at least one set of probe pairs to the target region of said at least one gene to be detected” and “hybridizing to said at least one gene to be detected” are carried out simultaneously under the same solution system.
  • the step of "connecting the left probe and the right probe of the probe pair on the target region to each other” can obtain higher amplification efficiency (more amplification products) in subsequent steps.
  • step (e) comprises incubating said at least one multiwell plate at about 50 to about 70°C, such as about 60°C. In some embodiments, step (e) comprises incubating said at least one multiwell plate for about 30-120 minutes or longer, for example incubating for at least 30 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes or longer.
  • the ligation product is enriched by magnetic beads in the step (f).
  • the magnetic beads are, for example, Dynabeads MyOne Carboxylic Acid beads.
  • the first primer in the pair of barcode primers includes a primer region sequence corresponding to the 5' primer binding sequence of the left probe
  • the second primer in the pair of barcode primers includes a sequence corresponding to the 3' primer binding sequence of the right probe. 'The sequence of the primer region corresponding to the primer binding sequence.
  • the first primer comprises a well barcode sequence unique to each well and the second primer comprises a plate barcode sequence unique to each multiwell plate.
  • the second primer comprises a well barcode sequence unique to each well and the first primer comprises a plate barcode sequence unique to each multiwell plate.
  • Well barcode sequences unique to each well means that primers added to one well have a different well barcode sequence than primers added to other wells.
  • the plate barcode sequence is unique to each plate means that the plate barcode sequence of the primers added to one plate is different from the plate barcode sequence of the primers added to other plates; conversely, the primers added to different wells of the same multi-well plate
  • the plate barcode sequences should be identical.
  • the length of the well barcode sequence or the plate barcode sequence may be about 4nt-10nt, such as 4nt, 5nt, 6nt, 7nt, 8nt, 9nt or 10nt.
  • the well barcode sequence is 7 nt in length.
  • the panel barcode sequence is 6 nt in length.
  • the obtained library can be subjected to double-end sequencing, and the number of samples that can be included in the library is significantly increased, and the cost of synthesizing amplification primers is also reduced.
  • the first primer and/or the second primer may further comprise an adapter sequence for high-throughput sequencing, such as a P5 adapter sequence or a P7 adapter sequence.
  • said amplification in said step (h) is performed by conventional PCR methods.
  • PCR amplification is carried out using the following program: 94°C for 5min; 94°C for 30s, 57°C for 30s, 72°C for 20s, 2 cycles; 94°C for 30s, 65°C for 30s, 72°C for 20s, 20 cycles ;72 60s.
  • the amplification products in all wells of all multi-well plates are harvested and mixed in step (i).
  • the amplified products harvested and pooled in step (i) can be purified using methods known in the art, such as using commercially available kits for purification.
  • the harvested and pooled amplification products are purified using a DNA clean & concentrator-100 kit (XYBO, D4029).
  • NGS next-generation sequencing
  • Next-generation sequencing generates thousands to millions of sequences simultaneously in a parallel sequencing process.
  • Sanger sequencing generation sequencing
  • Sequencing platforms that can use the NGS of the present invention are commercially available, including but not limited to Roche/454FLX, Illumina/Solexa Genome Analyzer, Applied Biosystems SOLID system, and the like.
  • the high-throughput sequencing can obtain the expression profile of each gene to be detected in the cells of each biological sample.
  • the high-throughput transcription profile sequencing library construction method of the present invention is particularly suitable for high-throughput, low-cost drug screening, such as small molecule drug screening.
  • the present invention provides a high-throughput drug screening method, said method comprising:
  • the drug candidates include, but are not limited to, small molecule compounds, antibodies, polypeptides, and nucleic acid molecules.
  • the drug is a small molecule compound.
  • candidate drugs are identified according to the expression profile of the gene to be detected in the high-throughput sequencing results.
  • the present invention provides a method for obtaining and/or evaluating a pair of probes for a gene to be detected, preferably, the pair of probes can be used to analyze the transcription of the gene to be detected by hybridization-ligation; more Preferably, the probe pair can be used in the method for constructing a high-throughput transcript profiling sequencing library based on probe hybridization of the present invention, and the method for obtaining and/or evaluating a probe pair includes the following steps:
  • CDS coding sequence
  • step c) positioning the candidate primer sequence output in step b) to the CDS of the gene to be detected;
  • the sequence spans different exons then output the candidate primer sequence as the candidate right probe target region binding sequence (R), and the sequence of corresponding length upstream of the candidate primer sequence is the candidate left probe target region binding sequence ( L); if ii) the sequence spans different exons, then output the candidate primer sequence as the candidate left probe target region binding sequence (L), and the corresponding length sequence upstream of the candidate primer sequence is the candidate right probe target region Binding sequence (R); if the gene to be detected has only one exon, it is considered that both i) or ii) span different exons;
  • Y is the length of the candidate primer sequence
  • L GC% GC content percentage of the output L sequence
  • R GC% GC content percentage of the output R sequence
  • A the length of the entire CDS
  • T the length from the last nucleotide of the R sequence to the last base of the entire CDS
  • Barcode a barcode sequence such as a unimolecular label (UMI) to the candidate left probe target region binding sequence (L) and the candidate right probe target region binding sequence (R) output from f) and/or primer binding sequences.
  • UMI barcode
  • the coding sequence (CDS) of the gene to be detected in step a) is obtained from a database (eg http://www.ensembl.org/index.html). In some embodiments, wherein the obtained coding sequence (CDS) retains information of different exons.
  • step b) obtains at least one primer pair comprising a forward primer and a reverse primer using the following parameters:
  • PCR product size blank
  • Organism the same organism as the CDS sequence source
  • step b if the full-length sequence of the CDS is greater than 2000 bp, only the 2000 nucleotide sequence at the 3' end is input in step b).
  • step b) obtains 1-10 primer pairs, thereby outputting 2-20 candidate primer sequences.
  • step d) is carried out by the following steps:
  • d-1) Select and analyze the sequence including the extension of 15-20bp before and after the candidate primer sequence, such as 20bp, if the upstream 15-20bp+candidate primer sequence+downstream 15-20bp only fall in one exon, then the candidate primer Sequence discarding; if the upstream 15-20bp+candidate primer sequence spans different exons, output the upstream 25-29bp sequence of the candidate primer sequence such as the 27bp sequence as L, and the candidate primer sequence as R; if the candidate primer sequence+downstream 15-20bp If the sequence spans different exons, then output the candidate primer sequence as L, and the 25-29bp sequence downstream of the candidate primer sequence, such as a 27bp sequence, as R; and
  • d-2) Perform the second round of judgment on the discarded candidate primer sequences in d-1); if the upstream 25-29bp such as 27bp + candidate primer sequence + downstream 25-29bp such as 27bp only falls on one exon, then the candidate primer sequence Discard; if the upstream 25-29bp such as 27bp+candidate primer sequence spans different exons, then output the upstream 25-29bp of the candidate primer sequence such as 27bp sequence as L, and the candidate primer sequence as R; if the candidate primer sequence+downstream 25-29bp If the 27bp sequence spans different exons, then output the candidate primer sequence as L, and the 25-29bp downstream of the candidate primer sequence, such as the 27bp sequence, as R.
  • the isolated primary KF cells were centrifuged, resuspended with 0.5% BSA, added the corresponding antibody, stained on ice for 20 minutes, centrifuged and then used PBS, each centrifuged, resuspended with 0.5% BSA after centrifugation, and used FACSAria II
  • the instrument is sorted on the machine. Positive cells were received with KF medium, spread on 10cm for culture.
  • KF cells were cultured in a 37°C, 5% CO2 incubator with KF medium, and passaged once every 2-3 days.
  • first wash with PBS preheated to 37°C add 1ml 0.25% trypsin to digest for 1 minute, neutralize with 2ml KF medium, centrifuge at 1000rpm for 3min, and re-spread on 10cm according to the ratio of one dish to two dishes Petri dish, cultured in the incubator.
  • the excitation wavelength is 543nm
  • the emission wavelength is 598nm—the cells showing strong orange-red fluorescence are lipid-rich positive cells
  • Oil red O oil red O
  • RNA was extracted with ER101-01 kit, and finally dissolved in 50ml RNase-free Water.
  • RNA was added to the reverse transcription system in the following ratio, mixed gently, incubated at 42°C for 30 minutes, and heated at 85°C for 5 seconds to inactivate reverse transcriptase and gDNA remover.
  • the cDNA produced by reverse transcription was diluted according to the ratio of inputting 1ug RNA and finally diluted to 200ul, and the diluted template cDNA was loaded according to the following system.
  • the qPCR reaction program is pre-denaturation at 95°C for 30s, one cycle, and then 95°C for 10s Cycle 40 reactions at 95°C for 30s, and finally run the dissolution curve.
  • PHDs-seq Probe Hybridization based Drug Screening by sequencing
  • Figure 1 cell lysis, plate lysate transfer and inversion, probe hybridization and ligation, template enrichment, library amplification and introduction of barcodes (barcode), library mixing and purification
  • Figure 1 cell lysis, plate lysate transfer and inversion, probe hybridization and ligation, template enrichment, library amplification and introduction of barcodes (barcode), library mixing and purification
  • the hybridization process is to use the synthesized gene-specific left and right double probes to hybridize with the template cDNA.
  • Both probes have four bases of unique molecular identifiers (UMI), which are respectively located on the left probe.
  • UMI unique molecular identifiers
  • PHDs-seq inherits the advantages of TAC-seq, namely high sensitivity.
  • PHDs-seq has more advantages: first, the operation is easier, and it is no longer necessary to use a kit or Trizol to extract RNA for each sample, instead of using a mild cell lysate to directly reverse transcribe the cells after lysing into cDNA.
  • TAC-seq hybridization and ligation are carried out independently and step by step, while this PHDs-seq method optimizes the hybridization and ligation reactions, so that these two steps can be completed in one solution system at the same time. Second, the cost is lower.
  • the total cost of a sample is about 8 yuan.
  • library structure optimization The inventor added a new sequence to the P5 end of the TAC-seq library in order to introduce well barcode, thereby upgrading the single-end sequencing of TAC-seq to a paired-end sequencing method.
  • TAC -Seq limits the number of samples by the type of plate barcode during screening, which greatly improves the throughput of screening, and on the other hand reduces the cost of synthesizing amplification primers.
  • the library construction of two 96-well plates that is, 196 samples
  • the PHDs-seq library can be mixed with ordinary bulk RNA-seq for sequencing, or multiple 96-well plates can be mixed into one sample for sequencing, which greatly increases the flexibility of screening.
  • Embodiment 2 the design and evaluation method of the probe for PHDs-seq
  • the design principles of the probes used for PHDs-seq mainly include: spanning different exons; GC content: 40-60%; melting temperature: 60-76°C; close to the 3' end; no SNP. It can be designed and evaluated by the following methods.
  • CDS sequence of the gene to be detected from the database (eg http://www.ensembl.org/index.html), and retain the information of its different exons.
  • the following CDS sequence can be obtained, where each sequence in bold and italic represents an independent exon:
  • PCR product size blank
  • Organism the same organism as the CDS sequence source
  • Primer pair 1 there will generally be Primer pair 1 to 10, a total of 10 pairs.
  • the "Forward primer” of Primer pair 1 is named for the corresponding gene name-F1 (such as ATP4A-F1)
  • the "Reverse primer” of Primer pair 1 is named for the corresponding gene name-R1 (such as ATP4A-F1). R1).
  • Primer pair 2 "Forward primer” is named for the corresponding gene name-F2
  • “Reverse primer” of Primer pair 2 is named for the corresponding gene name-R2. and so on. And convert the output sequences of all Reverse primers into reverse complementary sequences.
  • primer sequence For a specific primer sequence, first select the sequence including its front and rear extensions of 20 bp for analysis. If 20bp+primer+20bp only falls in the bold or italic sequence, the primer will be discarded. If any of the 20+primer or primer+20 contains both bold and italic sequences, it is considered a success. Count the number of successful primers. If the number is ⁇ 3, go to step 2 to judge. If it is > 3, go to step 3 directly. Output the successful primer sequence + successful upstream 27 or downstream 27 (which one to output depends on which one can make it match successfully, if both are matched, both are output), separate the successful primer sequence from the corresponding upstream 27bp Sequences or downstream 27bp sequences are exported separately. Exception: If the full-length sequence of CDS has only one exon (such as all in italics or all in bold), it can be considered as 20+primer+20 including both italics and bold.
  • GC% was calculated for the 27bp sequence upstream or downstream of the primer.
  • T the length from the last base of the right probe to the last base of the entire CDS
  • Additional sequences such as appropriate Barcode sequences and primer binding sequences were added to the ends of the obtained left and right probes.
  • the added sequence can be any one of the following example.
  • genes with low background expression can be evaluated, or artificially synthesized sequences can be added to the template
  • Example 3 Using human keloid fibroblast reprogramming into adipocyte system to assist in the development of PHDs-seq
  • keloid disease model was selected, hoping to reprogram keloid fibroblast (KF) into adipocytes to achieve the therapeutic effect of the disease (Fig. 2A).
  • head First patient-derived keloid fibroblasts were successfully isolated using the surface antibody CD90 (Fig. 2B), and could be cultured in vitro, with typical fibroblast-like morphology (Fig. 2C).
  • AD medium DMEM+1%ITS+0.5mM isobutylmethylxanthine+0.1uM cortisol+1uM dexamethasome+0.2nM triiodothyronine+1uM rosiglitazone
  • maintenance medium DMEM+1%ITS+0.1uM cortisol +0.2nM triiodothyronine
  • MMP1 is an important enzyme for decomposing collagen.
  • TIMP1 is a protein that inhibits the activity of MMP1.
  • the KF cell-induced adipocyte system was tested, and the fifth day and the eighth day of fat induction were selected to collect induced and non-induced samples to build a bank, and the fifth day induced group (Posi_D5) had not yet formed Lipid droplets cannot be confirmed morphologically whether the induction is successful.
  • the induction group (Posi_D8) began to secrete a small amount of lipid droplets. The purpose of choosing two time points is to explore whether PHDs-seq has more advantages than traditional morphological screening , that is, the samples can be distinguished by the expression of multiple genes when the morphology is not obvious.
  • PHDs-seq meets the requirements of high-throughput screening in terms of accuracy, sensitivity and parallelism, and can be used for practical screening.
  • Example 4 Using human keloid fibroblast reprogramming to adipocyte system to screen small molecules
  • BMP4 can increase the induction efficiency of KF to adipocytes.
  • the effect of BMP4 is indeed very good. It can greatly increase the number of Oil Red O-positive cells and the expression levels of FABP4 and ADIPOQ (Figure 5A and B), and the positive ratio also increases from ⁇ 5% to ⁇ 20% (Fig. 5C), and we found that there is a dose effect of BMP4 (Fig. 5D), and the effect of BMP4 was significantly weakened when Dorsmorphin and DMH1 small molecules were used to block the BMP4 signaling pathway (Fig. 5E and F).
  • the small molecules triiodothyronine, rosiglitazone and cortisol in the AD medium that can promote the transformation of white adipocytes are removed, and only three small molecules, 1% ITS, 0.5mM isobutylmethylxanthine and 1uM dexamethasome that can promote white adipocytes, are retained. Molecules, they make up the MDI medium, and use MDI as the base medium to screen the collected small molecules. After four days of induction, they are replaced with DMEM+1% ITS medium. On the eighth day, samples are collected for library construction and sequencing.
  • small molecule inhibitors such as Rosiglitazone, FSK and some TGFBR1 can promote the transition of KF to adipocyte fate, indicating the reliability of PHDs-seq.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明涉及生物医药领域。具体而言,本发明涉及一种基于探针杂交的高通量转录谱测序文库构建方法,其可用于高通量、低成本地进行药物的筛选。

Description

基于探针杂交的高通量转录谱测序文库构建方法 技术领域
本发明涉及生物医药领域。具体而言,本发明涉及一种基于探针杂交的高通量转录谱测序文库构建方法,其可用于高通量、低成本地进行药物的筛选。
发明背景
当前新药物的发现很大程度上依赖于高通量筛选,但当前的筛选平台的筛选能力有限。RNA-seq是使用转录组变化作为标志物研究药物效应的强大工具,但标准文库构建成本高昂。Hindrek Teder等人(npj Genomic Medicine(2018)3:34)开发了TAC-seq技术用于特定核酸生物标志物的精确定量。然而,TAC-seq技术需要对每个样品单独用试剂盒或者Trizol提取RNA,其探针杂交和连接步骤需要分开进行,单一条码(barcode)也限制了能筛选的样品数量。因此,本领域仍需要改进的高通量转录谱测序文库构建方法。
附图简述
图1.基于探针杂交的高通量转录谱测序技术的开发流程和原理图。(A)PHDs-seq示意图;(B)PHDs-seq流程和原理图。
图2.瘢痕疙瘩成纤维细胞诱导成脂肪细胞体系的建立和用于PHDs-seq的测试。(A)人瘢痕疙瘩成纤维细胞的分离与诱导成脂肪细胞示意图;(B)人瘢痕疙瘩成纤维细胞流式分选结果,左图是isotype作为对照的分选结果,右图是用表面抗体CD90的分选结果;(C)分离出的瘢痕疙瘩细胞形态,标尺为50um;(D)将分离到的人瘢痕疙瘩成纤维细胞重编程为脂肪细胞,诱导培养基为AD medium,用Nile red染色鉴定结果图,标尺为200um;(E)用于测试PHDs-seq的特征基因列表;(F)PHDs-seq各子文库(1-8)和混合文库9的跑胶结果,DNA Marker:2k plus;(G)PHDs-seq子文库混成大文库后质检结果;(H)PHDs-seq文库质控结果:reads中不同位置的碱基质量得分分布。
图3.PHDs-seq测序结果分析与评估。(A)从PHDs-seq混合文库测序结果中拆分出每个小文库,分别计算特征基因表达量产生的热图,通过Log10(CPM+1)标准化,Posi:Positive,KF:keloid fibroblast,D5:处理五天,D8:处理八天;(B)基于特征基因表达量产生的各样品间层次聚类图;(C)基于特征基因表达量得到的各样品间相关性分析图。
图4.PHDs-seq测序结果分析与评估2。(A)、(B)、(D)、(E)分别为KF_D5、Posi_D5、KF_D8、Posi_D8四种样品中内部比较PHDs-seq和qPCR两种方法检测的部分基因(ACTB、SDHA、PPIA、THY1、PRRX1、FBN1、COL1A1、ADIPOQ、FABP4、PPARG、UCP1)与内参GAPDH相对表达,取Log10FC;(C)比较Posi_D5和KF_D5两个样品 之间部分基因(ACTB、SDHA、PPIA、THY1、PRRX1、FBN1、COL1A1、ADIPOQ、FABP4、PPARG、UCP1)相对表达量;(F)比较Posi_D8和KF_D8两个样品之间部分基因(ACTB、SDHA、PPIA、THY1、PRRX1、FBN1、COL1A1、ADIPOQ、FABP4、PPARG、UCP1)相对表达量,取Log10FC;(G)用两个探针分别检测同一基因在PHDs-seq中表达情况;(H)所有特征基因在两个PHDs-seq重复样品中的一致性分析,取Log(CPM+1)。
图5.BMP可以提高KF细胞向脂肪诱导的效率。(A)上图为瘢痕疙瘩成纤维细胞转变成脂肪细胞的诱导流程图,下图为脂肪的油红O染色,标尺为100um;(B)对A图中12天收样对脂肪标志基因ADIPOQ和FABP4的定量分析;(C)统计A图中脂肪的诱导效率;(D)测试不同浓度的BMP4对脂肪诱导的影响,并统计每孔中脂肪的数量;(E)用BMP信号通路的小分子抑制剂Dorsmorphin和DMH1处理,并统计每孔脂肪数量(24孔板);(F)E图的表型图。
图6.利用PHDs-seq筛选提高脂肪细胞效率的小分子。(A)PHDs-seq筛选提高脂肪诱导效率的小分子流程图;(B)小分子处理8天后收样PHDs-seq测序热图展示每个特征基因的表达量,Log10(CPM+1);(C)PCA分析处理的样品,红色标记的是候选小分子,蓝色为KF。
图7.示出相对TAC-seq体系的优化:杂交连接一步反应。
图8.示出相对TAC-seq体系的优化:探针浓度变更。
发明详述
在一方面,本发明提供一种基于探针杂交的高通量转录谱测序文库构建方法,所述方法包括:
(a)提供在至少一个多孔板中的至少一种包含细胞的生物学样品,所述至少一种包含细胞的生物学样品的每种分别位于单独的孔中;
(b)在所述多孔板的孔中裂解所述生物学样品中的细胞;
(c)将步骤(b)获得的细胞裂解上清液转移至另一多孔板的相应孔中,进行逆转录反应以获得cDNA;
(d)向每个孔添加杂交-连接混合液,所述混合液包含DNA连接酶以及至少一组与至少一个待检测基因的靶区域特异性杂交的探针对,其中所述探针对包括与所述靶区域的上游部分杂交的左探针和与所述靶区域的下游部分杂交的右探针;
(e)使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域,并且使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接;
(f)富集所述连接产物;
(g)向每个孔添加条码化(Barcoding)PCR混合液,所述混合液包含DNA聚合酶以及条码(Barcode)引物对,所述条码(Barcode)引物对包含针对所述左探针的第一引物以及针对所述右探针的第二引物,所述第一和第二引物中的一种包含对每个孔而言是唯一的 孔条码序列,另一种包含对于每个多孔板而言是唯一的板条码序列;
(h)用所述条码引物对通过PCR扩增所述至少一个待检测基因的靶区域;和
(i)收获并混合所述至少一个多孔板的至少一个孔中的扩增产物,并任选地纯化,
由此获得可用于高通量转录谱测序的文库。
在一些实施方案中,所述多孔板是96孔板或384孔板,优选96孔板。
在一些实施方案中,所述至少一种包含细胞的生物学样品可以为1-200种为更多种,例如至少2种,至少5种、至少10种、至少20种、至少30种、至少40种、至少50种、至少100种、至少150种、至少200种或更多种包含细胞的生物学样品。
在一些实施方案中,所述至少一种包含细胞的生物学样品是各自包含不同的细胞类型的生物学样品。在一些实施方案中,所述至少一种包含细胞的生物学样品包含相同的细胞类型的生物学样品,但每种生物学样品都经过不同的处理,例如经过不同化合物的处理。在一些实施方案中,所述处理能够导致细胞的特定表型。
本文所述细胞可以是感兴趣的任何类型的细胞。所述细胞可以是体细胞、生殖细胞、干细胞(如胚胎干细胞或诱导的多能干细胞)。所述细胞包括但不限于神经元细胞、骨骼肌细胞、肝细胞、成纤维细胞、成骨细胞、软骨细胞、脂肪细胞、内皮细胞、间质细胞、平滑肌细胞、心肌细胞、神经细胞、造血细胞、胰岛细胞或体内几乎任何细胞包括肿瘤细胞。在一些实施方案中,所述细胞是成纤维细胞。所述成纤维细胞包括但不限于瘢痕疙瘩成纤维细胞(Keloid fibroblast)、皮肤成纤维细胞、心脏成纤维细胞。
所述细胞可以来源于哺乳动物或非哺乳动物。在一些实施方案中,所述细胞来源于人。在一些实施方案中,所述起始细胞来源于非人哺乳动物。在一些实施方案中,所述细胞来源于鼠如小鼠或大鼠或非人灵长类动物。
在一些实施方案中,其中步骤(b)中使用基于非离子型表面活性剂的细胞裂解液裂解所述细胞。在一些实施方案中,所述非离子型表面活性剂是Triton X-100。在一些实施方案中,所述细胞裂解液由Tris-HCl、KCl、聚蔗糖如Ficoll PM-400、Triton X-100、核糖核酸酶抑制剂和水组成。在一些实施方案中,所述细胞裂解液各组分的使用终浓度是:大约5mM至大约10mM、大约5mM至大约50mM、大约5mM至大约100mM、大约5mM至大约150mM、大约5mM至大约200mM、大约5mM至大约250mM、大约5mM至大约500mM Tris-HCl;大约7.5mM至大约15mM、大约7.5mM至大约30mM、大约7.5mM至大约60mM、大约7.5mM至大约120mM、大约7.5mM至大约300mM、大约7.5mM至大约500mM、大约7.5mM至大约750mM KCl;大约0.6%至大约5%、大约0.6%至大约10%、大约0.6%至大约20%、大约0.6%至大约30%、大约0.6%至大约40%、大约0.6%至大约50%、大约0.6%至大约60%聚蔗糖如Ficoll PM-400;大约0.015%至大约0.15%、大约0.015%至大约0.25%、大约0.015%至大约0.5%、大约0.015%至大约0.75%、大约0.015%至大约1%、大约0.015%至大约1.25%、大约0.015%至大约1.5%Triton X-100;大约0.05U/μL至大约0.1U/μL、大约0.05U/μL至大约0.25U/μL、大约0.05U/μL至大约0.5U/μL、大约0.05U/μL至大约1U/μL、大 约0.05U/μL至大约2.5U/μL、大约0.05U/μL至大约5U/μL核糖核酸酶抑制剂。
本发明人令人惊奇地发现,使用基于非离子型表面活性剂特别是Triton X-100的温和细胞裂解液,可以在裂解细胞后直接使用裂解上清液进行后续的逆转录反应,无需进一步的纯化步骤。因此,在一些实施方案中,步骤c)中转移的细胞裂解上清液未经进一步纯化。
在本发明的方法中,所述步骤(c)中的逆转录反应可以使用本领域常规的(例如商品化的)逆转录反应体系进行。例如,可以使用Oligo-dT的体系进行所述逆转录反应。在一些实施方案中,所述逆转录反应在大约40-45℃例如42℃下进行。在一些实施方案中,所述逆转录反应进行大约15-60分钟,例如大约30分钟。
在本发明的方法中,所述步骤(d)中的DNA连接酶可以是本领域常规的DNA连接酶,例如T4DNA连接酶或Taq DNA连接酶,优选Taq DNA连接酶。所述杂交-连接混合液可以包含与所述DNA连接酶相容的缓冲剂。
在本发明的方法中,所述杂交-连接混合液可以包含至少一组与至少一个待检测基因的至少一个靶区域特异性杂交的探针对。探针对的数目取决于待检测基因/靶区域的数目。例如,所述杂交-连接混合液可以包含1-200个或更多个,例如至少1个、至少2个、至少3个、至少4个、至少5个、至少10个、至少20个、至少30个、至少40个、至少50个、至少100个、至少150个、至少200个或更多个探针对。所述探针对可以用于检测1-200个或更多个,例如至少1个、至少2个、至少3个、至少4个、至少5个、至少10个、至少20个、至少30个、至少40个、至少50个、至少100个、至少150个、至少200个或更多个待检测基因/靶区域。
在本发明的方法的一些实施方案中,所述待检测基因与所述细胞的至少一种表型相关。例如,在本发明的方法用于筛选导致细胞特定表型的处理(例如用具体化合物的处理)时,所述至少一个待检测基因的部分或全部可以用作所述表型的标记物。例如,所述至少一个待检测基因的部分或全部的表达谱可以用作所述表型的标记物。所述表型例如可以是细胞增殖的抑制或增加、细胞类型的改变等。本领域技术人员可以根据具体的细胞表型确定具体的待检测基因和数目。
在一些具体实施方案中,所述细胞是瘢痕疙瘩成纤维细胞,所述表型是由瘢痕疙瘩成纤维细胞重编程为脂肪细胞,所述待检测基因为选自以下的一种或多种或全部:PRRX1、THY1、ACTA2、FBN1、COL1A1、COL3A1、MMP1、TIMP1、FABP4、ADIPOQ、EBF2、CEBPA、ZNF423、ZNF516、ATF2、LEP、PPARG、PPARGC1A、FNDC5、PRDM16、UCP1、INSR、SLC2A4、INSR和SLC2C4。在一些实施方案中,所述待检测基因还包括内参基因,例如选自ACTB、CYC、GAPDH、HMBS、PPIA、SDHA、TBP、YWHAZ的一种或多种或全部。
在一些实施方案中,所述靶区域是所述待检测基因的特征性区域,即,对于所述待检测基因是特异性的。在一些实施方案中,所述靶区域的长度可以为大约20个核苷酸(nt)-大约300nt或更长,例如大约20nt、大约30nt、大约40nt、大约50nt、大约60nt、 大约70nt、大约80nt、大约90nt、大约100nt、大约120nt、大约140nt、大约160nt、大约180nt、大约200nt、大约250nt、大约300nt或更长。
在一些实施方案中,所述左探针从5’至3’方向包括5’引物结合序列、单分子标签(UMI)和第一靶区域结合序列。在一些实施方案中,所述右探针从5’至3’方向包括第二靶区域结合序列、单分子标签(UMI)和3’引物结合序列。在一些实施方案中,所述右探针5’端含有磷酸基团,由此其可以与左探针的3’端连接。在一些实施方案中,第一靶区域结合序列与第二靶区域结合序列在连接后与待检测基因的靶区域完美匹配。
在一些实施方案中,所述第一或第二靶区域结合序列的长度为大约10nt-大约150nt或更长,例如大约10nt、大约15nt、大约20nt、大约25nt、大约30nt、大约35nt、大约40nt、大约45nt、大约50nt、大约60nt、大约70nt、大约80nt、大约90nt、大约100nt、大约125nt、大约1500nt或更长,条件是其使得所述探针特异性杂交至所述靶区域。在一些实施方案中,所述所述第一和/或第二靶区域结合序列跨越待检测基因的不同外显子。
所述单分子标签(UMI)的长度可以是大约3nt-8nt,例如4nt。本领域已知如何设计和生成单分子标签。单分子标签允许在测序中鉴定出来自单一转录本的扩增产物。
在一些实施方案中,所述5’引物结合序列是通用引物结合序列,例如,不同的探针对中的5’引物结合序列是相同的。在一些实施方案中,所述3’引物结合序列是通用引物结合序列,例如,不同的探针对中的3’引物结合序列是相同的。
在一些实施方案,所述探针对通过下文所述的方法获得和/或评估。
在一些实施方案中,所述探针各自的浓度为约0.0001μM-约1μM,例如约0.0001μM-约0.001μM、约0.0001μM-约0.01μM、约0.0001μM-约0.1μM。在一些实施方案中,所述探针各自的浓度为不超过约0.1μM,优选不超过约0.01μM,更优选不超过约0.001μM。
本发明人令人惊奇地发现,通过本发明的方法,可利用低至约0.0001μM的探针浓度在后续扩增步骤获得期望的扩增效率并显著增加扩增的特异性。
在一些实施方案中,步骤(e)中“使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域”和“使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接”的步骤在同一溶液体系下进行。在一些实施方案中,步骤(e)中“使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域”和“使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接”的步骤同步进行。
本发明人令人惊奇地发现,在同一溶液体系下同步进行步骤(e)中“使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域”和“使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接”的步骤,可以在后续步骤获得更高的扩增效率(更多扩增产物)。
在一些实施方案中,步骤(e)包括在大约50-大约70℃,例如大约60℃下孵育所述至少一个多孔板。在一些实施方案中,步骤(e)包括孵育所述至少一个多孔板大约30-120分钟或更长时间,例如孵育至少30分钟、至少60分钟、至少90分钟、至少120分钟 或更长时间。
在一些实施方案中,所述步骤(f)中通过磁珠富集所述连接产物。所述磁珠例如是Dynabeads MyOne Carboxylic Acid珠。
在一些实施方案中,所述条码引物对中的第一引物包括与左探针的5’引物结合序列对应的引物区序列,所述条码引物对中的第二引物包括与右探针的3’引物结合序列对应的引物区序列。通过所述条码引物对,可以以探针对的连接产物为模板,扩增包含单分子标签的待检测基因的靶区域。
在一些实施方案中,所述第一引物包含对每个孔而言是唯一的孔条码序列,且第二引物包含对于每个多孔板而言是唯一的板条码序列。在一些实施方案中,所述第二引物包含对每个孔而言是唯一的孔条码序列,且第一引物包含对于每个多孔板而言是唯一的板条码序列。
孔条码序列对每个孔而言是唯一指的是加入至某一孔的引物的孔条码序列不同于加入至其它孔的引物的孔条码序列。板条码序列对每个板而言是唯一指的是加入至某一板的引物的板条码序列不同于加入至其它板的引物的板条码序列;反之,加入至同一多孔板的不同孔的引物的板条码序列应当是相同的。通过板条码和孔条码的组合,可以通过最终测序获得给定多孔板中给定孔内的的序列信息或基因表达信息。
所述孔条码序列或板条码序列的长度可以是大约4nt-10nt,例如4nt、5nt、6nt、7nt、8nt、9nt或10nt。在一些具体实施方案中,所述孔条码序列的长度是7nt。在一些具体实施方案中,所述板条码序列的长度是6nt。
通过同时在两端引入孔条码序列和板条码序列,可以使获得文库能够进行双端测序,并且显著增加文库中可包含的样品的数量,也降低了合成扩增引物的成本。
在一些实施方案中,所述第一引物和/或所述第二引物还可以包含用于高通量测序的接头序列,例如P5接头序列或P7接头序列。
在一些实施方案中,所述步骤(h)中通过常规PCR方法进行所述扩增。在一些具体实施方式中,使用以下程序进行PCR扩增:94℃5min;94℃30s,57℃ 30s,72℃20s,2个循环;94℃30s,65℃ 30s,72℃20s,20个循环;72 60s。
在一些实施方案中,其中步骤(i)中收获并混合所有多孔板的所有孔中的扩增产物。
步骤(i)中收获并混合的扩增产物可以使用本领域已知的方法进行纯化,例如施用商购可得的试剂盒进行纯化。在一些具体实施方案中,使用DNA clean&concentrator-100试剂盒(XYBO,D4029)纯化所述收获并混合的扩增产物。
通过本发明的方法获得的文库可以用于进行高通量测序,也称作二代测序(“NGS”)。二代测序在并行的测序过程中同时产生数千至数百万条序列。NGS区别于“Sanger测序”(一代测序),后者是基于单个测序反应中的链终止产物的电泳分离。可用本发明的NGS的测序平台是商用可得的,包括但不限于Roche/454FLX、Illumina/Solexa Genome Analyzer和Applied Biosystems SOLID system等。所述高通量测序可以获得每个待检测基因在每个生物学样品的细胞中的表达谱。
本发明的高通量转录谱测序文库构建方法特别适合于进行高通量的、低成本的药物筛选,例如小分子药物筛选。
因此,在另一方面,本发明提供一种高通量药物筛选方法,所述方法包括:
(1)在至少一个多孔板的至少一个孔中培养细胞;
(2)向不同孔中的细胞进行不同处理,例如添加不同的候选药物进行处理;
(3)通过本发明的高通量转录谱测序文库构建方法构建转录谱测序文库;
(4)对所述文库进行高通量测序;和
(5)根据高通量测序结果鉴定候选药物。
所述候选药物包括但不限于小分子化合物、抗体、多肽、核酸分子。在一些具体实施方案中,所述药物是小分子化合物。
在一些实施方案中,根据高通量测序结果中待检测基因的表达谱鉴定候选药物。
在另一方面,本发明提供一种获得和/或评价针对待检测基因的探针对的方法,优选地,所述探针对可用于通过杂交-连接的方式分析待检测基因的转录;更优选地,所述探针对可用于本发明的基于探针杂交的高通量转录谱测序文库构建方法,所述获得和/或评价探针对的方法包括以下步骤:
a)、获取待检测基因的编码序列(CDS);
b)、将待测基因的CDS序列或其部分输入引物设计程序如https://www.ncbi.nlm.nih.gov/tools/primer-blast/进行分析,获得至少一个包含正向引物和反向引物的引物对,输出所述至少一个引物对的正向引物序列和反向引物的反向互补序列作为候选引物序列;
c)、将步骤b)输出的候选引物序列定位至待检测基因的CDS;
d)、鉴定i)所述候选引物序列和所述候选引物序列上游相应长度的序列,或ii)所述候选引物序列和所述候选引物序列下游相应长度的序列是否跨越待检测基因的不同外显子,
若i)序列跨越不同外显子,则输出所述候选引物序列为候选右探针靶区域结合序列(R),所述候选引物序列上游相应长度的序列为候选左探针靶区域结合序列(L);若ii)序列跨越不同外显子,则输出所述候选引物序列为候选左探针靶区域结合序列(L),所述候选引物序列上游相应长度的序列为候选右探针靶区域结合序列(R);若待检测基因只有一个外显子,则视为i)或ii)均跨越不同外显子;
e)、使用下式对所输出序列进行打分:0.4*{(Y-N)/Y}+0.15*{1-|0.55-L GC%|}+0.15*{1-|0.55-R GC%|}+0.2*(1-T/A)+0.1*X
Y为候选引物序列长度;
N为不同外显子交界处距离i)或ii)序列中间处的核苷酸数,如待检测基因仅有一个外显子,则N=27;
L GC%:输出的L序列的GC含量百分比;
R GC%:输出的R序列的GC含量百分比;
A:整个CDS的长度;
T:R序列的最后一个核苷酸距离整个CDS的最后一个碱基的长度;
X:若该候选引物序列在步骤b)输出为正向引物,则X=1;若该候选引物序列在步骤b)输出为反向引物,则X=0;
f)、输出得分高的候选左探针靶区域结合序列(L)和候选右探针靶区域结合序列(R)的组合;和
g)、任选地,向f)输出的所述候选左探针靶区域结合序列(L)和候选右探针靶区域结合序列(R)添加条码(Barcode)序列如单分子标签(UMI)和/或引物结合序列。
在一些实施方案中,步骤a)中所述待检测基因的编码序列(CDS)从数据库(例如http://www.ensembl.org/index.html)获取。在一些实施方案中,其中所获取的编码序列(CDS)保留不同外显子的信息。
在一些实施方案中,步骤b)使用以下参数获取获得至少一个包含正向引物和反向引物的引物对:
1、PCR product size:空白;
2、Primer melting temperatures(Tm):Min:60,opt:68,max:76;
3、Database:“nr”;
4、Organism:与CDS序列来源相同的organism;
5、Primer Size:Min:25,opt:27,max:29;
6、Primer GC content(%):Min:30,max:75。
在一些优选实施方案中,若CDS的全长序列大于2000bp,则在步骤b)中只输入3’端的2000个核苷酸的序列。
在一些实施方案中,其中步骤b)获得1-10个引物对,由此输出2-20个候选引物序列。
在一些实施方案中,其中步骤d)通过以下步骤进行:
d-1)选取包括所述候选引物序列前后各延伸15-20bp如20bp的序列进行分析,若上游15-20bp+候选引物序列+下游15-20bp仅落在一个外显子,则所述候选引物序列舍弃;若上游15-20bp+候选引物序列跨越不同外显子,则输出候选引物序列上游25-29bp序列如27bp序列作为L,所述候选引物序列作为R;若候选引物序列+下游15-20bp序列跨越不同外显子,则输出所述候选引物序列作为L,候选引物序列下游25-29bp序列如27bp序列作为R;和
d-2)将d-1)舍弃的候选引物序列进行第二轮判断;若上游25-29bp如27bp+候选引物序列+下游25-29bp如27bp仅落在一个外显子,则该候选引物序列舍弃;若上游25-29bp如27bp+候选引物序列跨越不同外显子,则输出候选引物序列上游25-29bp如27bp序列作为L,所述候选引物序列作为R;若候选引物序列+下游25-29bp如27bp序列跨越不同外显子,则输出所述候选引物序列作为L,候选引物序列下游25-29bp如27bp序列作为R。
实施例
以下实施例仅为了更好地阐述本发明,并不旨在限制本发明的范围。
实验材料和方法
1、人瘢痕疙瘩成纤维细胞(KF)的分离
1)准备好手术器械(转子、弯剪、弯镊);
2)按表1和2准备好消化酶溶液和KF细胞培养液;
3)将皮肤样本在70%乙醇中完全浸泡1分钟;
4)用预冷Wash buffer(PBS+PS)清洗一遍;
5)在空的10cm dish中,将皮下脂肪组织剪下丢弃;
6)将剩余组织剪碎(尽量碎);
7)将剪碎的组织转移入装有10ml预冷wash buffer的15ml tube中,震荡混匀;
8)1,500rpm 3min离心,小心去上清;
9)用10ml消化酶溶液将离心沉淀重悬,并完全转移到新50ml tube中;
10)在培养箱或水浴锅37℃消化3hours or overnight,直至组织块明显松散;
11)用10ml预冷的KF培养液稀释消化酶溶液,并完全转移到一个新的一次性锥形瓶中,锥形瓶中提前放置有一个已消毒的磁力转子,补充1ul Y-27632;
12)将锥形瓶在磁力搅拌器上搅拌1-2小时,直至无明显组织块;
13)用10ml移液管充分吹打细胞悬液,将细胞悬液过70um筛网过滤;
14)1,600rpm离心8分钟,去上清;
15)用1ml KF培养液重悬细胞,并转移到新的15ml tube;
16)加入3倍体积(3ml)红细胞裂解液,轻轻漩涡混匀;
17)冰上放置15分钟,其间轻轻涡旋混匀两次;
18)1,600rpm离心8分钟,去上清;
19)用10ml KF培养液重悬,转移到10cm dish,转入培养箱;
20)每2-3天换一次液,一周后可收集细胞。
表1消化酶溶液配方
表2 KF细胞培养基配方
2、流式细胞分选(FACS)
分离好的原代KF细胞,离心后,用0.5%BSA重悬,加入相应的抗体,冰上染色20分钟,离心后用PBS,每次离心,离心后用0.5%BSA重悬,用FACSAria II仪器上机分选。用KF培养基接收阳性细胞,铺在10cm上培养。
3、KF细胞培养与传代
用KF培养基将KF细胞置于37℃,5%CO2培养箱中培养,2-3天传代一次。传代时,先用提前预热至37℃的PBS洗一遍,加入1ml 0.25%胰酶消化1分钟,用2ml KF培养基中和,1000rpm离心3min,按一皿传两皿的比例重铺于10cm培养皿,置于培养箱中培养。
4、脂肪细胞诱导
脂肪细胞诱导前一天,消化KF细胞,消化时按传代操作,铺板时,按12孔板每孔5万,24孔板每孔3万,48孔板每孔1万的比例铺板,第二天用AD培养基或添加相应小分子的培养基分化,第四天和八天时统一切换成AM培养基。
表3 AD培养基配方
表4 AM培养基配方
5、尼罗红(nile red)染色
实验开始前,从4度取出2mg/ml的Nile red in DMSO解冻,用PBS缓冲溶液将nile red稀释到1ug/ml的工作浓度(取1到2ml PBS中,加入Hoechst一起稀释),混匀后,置入冰槽里,标记为染色工作液。然后进行下列操作。
1)开启荧光显微镜;
2)小心抽掉细胞培养瓶里的培养液,PBS洗一遍,加入适量染色工作液;
3)在37℃培养箱孵育10分钟,避免光照。
4)在荧光显微镜下观察荧光细胞:激发波长543nm,散发波长598nm――显示强烈桔红色荧光细胞的为脂类丰富的阳性细胞
6、油红O(oil red O)染色
1)按饱和油红O:蒸馏水=3:2稀释,室温放置5-10分钟,混匀后用0.45um的滤膜过滤,称为油红O工作液,放置待用;
2)取培养的细胞,吸去培养基,用PBS洗一遍;
3)用4%多聚甲醛固定10分钟;
4)蒸馏水清洗2遍;
5)用60%异丙醇浸洗一遍;
6)油红O工作液染色10分钟(染液可回收再利用);
7)60%异丙醇分化至间质清晰(可在显微镜下观察);
8)蒸馏水洗涤2-4遍:
9)苏木素复染3-5分钟;
10)蒸馏水洗1-2遍,显微镜拍照。
7、PHDs-seq文库构建
1)准备细胞裂解液(配方见表5);
2)取出培养的细胞,吸走培养基后用提前预热至37℃的PBS洗一遍;
3)每孔中添加60ul提前配制好的表5裂解液,贴好封板膜,置于-80℃冰箱过夜;
4)第二天从-80℃冰箱取出,在水平摇床上900rpm摇15-30min;
5)准备反转录混合液(配方见表6);
6)从第一步得到的细胞裂解液中取出4.286ul转移到一个新的96孔PCR板中;
7)往每孔中加入0.714ul反转录混合液,移液器吹打混匀;
8)在PCR仪中42℃反转录30分钟,85℃5分钟使酶失活,热盖105℃;
9)准备杂交-连接混合液(配方见表7);
10)往反转后的产物中加入6ul杂交-连接混合液,移液器吹打混匀;
11)在PCR仪中60℃孵育90分钟;
12)准备模板富集混合液(配方见表8);
13)每孔添加15ul模板富集混合液并用涡旋仪混匀;
14)常温孵育10分钟;
15)在DynaMag-96Side Magnet上放置3分钟;
16)用移液器小心吸弃上清液;
17)准备Barcoding PCR混合液;
18)每孔中添加19ul Barcoding PCR混合液;
19)每孔中分别添加1ul well barcode;
20)用涡旋仪混匀,直到磁珠均匀重悬;
21)按照下面程序进行扩增;
22)将96孔板中的扩增产物混到一个50ml管中(也可以多块板混在一起);
23)用DNA clean&concentrator-100试剂盒(XYBO,D4029)纯化回收混合样本;
24)往纯化产物中按1:1比例加入AMPure XP beads(150ul:150ul);
25)将离心管保持在磁力架上,用枪头轻轻吸走上清并丢弃;
26)用25ul pH>6.0的无菌水重悬磁珠,室温放置1分钟;
27)在磁力架上放置1分钟,轻轻吸取上清(约23ul)至一个干净的1.5ml离心管;
28)用Qubit荧光计测样品浓度;
29)按测序要求送样测序。
表5裂解液成分(96孔板)

注意:RNaseOut在使用前添加,其他成分可以提前配制好后4℃保存。
表6反转录混合液(96孔板)


注意:RNaseOut和Maxima在使用前添加。
表7杂交-连接混合液(96孔板)
表8模板富集混合液(96孔板)
8、RNA提取
待提取样品用PBS洗一遍后,用ER101-01试剂盒提取RNA,最后溶于50ml RNase-free Water中。
9、反转录
提取好的RNA按下列比例添加反转录体系,轻轻混匀后42℃孵育30分钟,85℃加热5秒失活反转录酶和gDNA remover。
表9反转录体系
10、荧光定量PCR
反转录产生的cDNA按照投入1ug RNA最终稀释成200ul的比例稀释,稀释好的模板cDNA按下面体系加样。qPCR反应程序为预变性95℃30s,一个循环,然后95℃10s 和95℃30s循环40个反应,最后跑溶解曲线。
表10 qPCR体系
实施例1、基于探针杂交的高通量转录谱测序技术PHDs-seq
为了高通量低成本地实现小分子药物的筛选,利用相同于TAC-seq探针杂交原理,本发明人开发了一套名为PHDs-seq(Probe Hybridization based Drug Screening by sequencing)建库测序体系,总共包括六步(图1):细胞裂解,孔板裂解液转移并反转,探针杂交与连接,模板富集,文库扩增并引入条码(barcode),文库混合与纯化,而其中的杂交过程是利用合成的基因特异性的左右双探针去与模板cDNA进行杂交反应,两条探针上都有四个碱基的单分子标签(unique molecular identifiers,UMI),分别位于左探针的5’端和右探针的3’端,同时右探针5’端还有磷酸基团,以便在连接这一步与左探针连接。
PHDs-seq继承了TAC-seq的优点,即灵敏度高。此外,PHDs-seq有更多的优点:第一,操作更简便,不再需要对每个样品单独用试剂盒或者Trizol提取RNA,取而代之的是用细胞温和裂解液对细胞裂解后直接反转录成cDNA。同时,TAC-seq杂交连接是独立分步进行的,而本PHDs-seq方法优化了杂交和连接反应,使得这两步可以同时在一个溶液体系下完成。第二,成本更低,包括建库和测序的所有试剂耗材在内,一个样品总费用约8元人民币。第三,文库结构优化,本发明人在TAC-seq文库的P5端新增加了一段序列,以便引入well barcode,从而将TAC-seq的单端测序升级为双端测序方法,一方面解决了TAC-seq在筛选时plate barcode种类对样品数量的限制,大大提到了筛选的通量,另一方面也降低了合成扩增引物的成本。在建库时间方面,8小时能同时完成2个96孔板(即196个样品)的文库构建。值得一提的是,PHDs-seq文库可以与普通bulk RNA-seq混lane测序,也可以多个96孔板混成一个样品测序,大大地增加了筛选的灵活性。
实施例2、用于PHDs-seq的探针的设计与评价方法
用于PHDs-seq的探针的设计原则主要包括:跨越不同外显子;GC含量:40~60%;解链温度:60~76℃;接近3’端;无SNP。其可以通过以下方法设计和评价。
一、获得待检测基因的CDS sequence
从数据库(例如http://www.ensembl.org/index.html)获取待检测基因的全长CDS序列,并保留其不同外显子的信息。以ATP4A基因为例,可获得以下的CDS序列,其中每一粗体序列和每一斜体序列均代表一独立外显子:
二、获得基因的探针序列
将获得的待测基因的CDS序列输入https://www.ncbi.nlm.nih.gov/tools/primer-blast/进行在线分析(若CDS的全长序列大于2000bp,则可以只截取后2000bp,即3’端2000bp)。参数如下:
1、PCR product size:空白;
2、Primer melting temperatures(Tm):Min:60,opt:68,max:76;
3、Database:“nr”;
4、Organism:与CDS序列来源相同的organism;
5、Primer Size:Min:25,opt:27,max:29;
6、Primer GC content(%):Min:30,max:75。
设置完成后点击“get primers”。如果CDS序列在数据库不唯一,则在Identity为100%的前提下,选择排行靠上的;若Identity没有100%,则选择最接近100%的。
输出结果,一般会有Primer pair 1~10,共计10对。为了避免重名,Primer pair 1的“Forward primer”,命名为对应的gene name-F1(如ATP4A-F1),Primer pair 1的“Reverse primer”,命名为对应的gene name-R1(如ATP4A-R1)。Primer pair 2“Forward primer”,命名为对应的gene name-F2,Primer pair 2的“Reverse primer”,命名为对应的gene name-R2。以此类推。并且把所有的Reverse primer的输出序列全部转换成反向互补序列。
三、筛选探针序列
1、将输出的至多20条序列全部单独地mapping到CDS全长序列中。
2、对于具体一引物序列,首先选取包括其前后各延伸20bp的序列进行分析。若20bp+primer+20bp仅落在粗体或者斜体序列中,则该primer舍弃。若20+primer或primer+20有任意一种同时包含粗体和斜体序列,则视为成功。统计成功primer数量,若数量≤3个,则进入第2步判断,若>3个,则直接进入第3步。输出成功的primer序列+成功的上游27或下游27(输出哪一种取决于哪一种可以令其匹配成功,若两种都匹配成则都输出),分开成功的primer序列与对应的上游27bp序列或下游27bp序列分开输出。例外:若CDS的全长序列仅有一个外显子(如都是斜体或都是粗体),可以认为是20+primer+20既包括斜体也包括粗体。
3、将上1步舍弃的primer,重新进入第二轮判断。若27+primer+27仅落在蓝色或者黑色字体中,则该primer舍弃。若27+primer或primer+27有任意一种同时包含粗体和斜体序列,则视为成功。输出成功primer序列+成功的上游27bp序列或下游27bp序列(输出哪一种取决于哪一种可以令其匹配成功,若两种都匹配成则都输出),分开成功的primer序列与对应的上游27bp序列或下游27bp序列分开输出。
4、对获得的27bp序列进行分析
将primer的上游或下游的27bp的序列计算GC%。
ATP4A的例子:CTCCTACTTCCAGATTGGTGCCATTCA(SEQ ID No:2)
首先统计G+C出现的次数为13,作为分子,用序列长度做为分母,获得比值。
GC%=13/27=48.1%=0.481
5、对输出序列打分
例子(SEQ ID No:3):
得分计算公式:
Score:0.4*{(27-N)/27}+0.15*{1-|0.55-L GC%|}+0.15*{1-|0.55-R GC%|}+0.2* (1-T/A)+0.1*X
N:不同外显子交界处距离中间部分的bp数(如CDS仅有一个外显子则N=27)
GC%需要把百分比换算成小数
A:整个CDS的长度
T:右侧探针的最后一个碱基距离整个CDS的最后一个碱基的长度
X:若该的primer在之前第二步输出是Forward primer则X=1,若该primer在之前第二步输出是Reverse primer则X=0。
6、重叠分析
将同一个基因输出的54bp长度的序列(可以是52~56bp),两两比较,如果他们覆盖在CDS上时,彼此存在互相重合的部分,则将这两个序列的打分数据进行比较,得分高者保留,直至输出最终结果(仅保留前3对)。
7、输出数量判断
若输出结果数量小于3。则对该基因的CDS长度进行判断,若CDS全长序列大于2000,则截取除后2000之前的序列(即之前分析剩下的序列)作为分析序列在此分析。
若输出结果数量小于3,且该基因的CDS长度小于2000,则将之前舍弃的“primer”全部视为既包括斜体也包括粗体,继续分析,输出结果用红色或者特殊颜色标注。
8、添加固定序列
在所得的左探针、右探针的末端添加合适的Barcode序列、引物结合序列等额外序列。
对于上述例子,添加后的序列可以为
L GGAAGCCTTGGCTTTTGNNNNCTCCTACTTCCAGATTGGTGCCATTCA(SEQ ID NO:4)
R GTCCTTTGCTGGCTTCACTGACTACTTNNNNAGATCGGAAGAGCACAC(SEQ ID NO:5)
所述探针对设计和评估方法有如下优点:
·评估周期较短
·不需要借助qPCR
·每个探针单独评价,只评价探针质量(结合效率)本身
·在模板足够多的情况下,可以评价本底表达低的基因,或者通过人工合成序列添加至模板中
实施例3、利用人瘢痕疙瘩成纤维重编程为脂肪细胞体系辅助PHDs-seq开发
为了检测PHDs-seq的检测效果,选择了瘢痕疙瘩疾病模型,希望将瘢痕疙瘩成纤维细胞(Keloid fibroblast,KF)重编程为脂肪细胞,以达到治疗疾病效果(图2A)。首 先,利用表面抗体CD90成功分离了病人来源的瘢痕疙瘩成纤维细胞(图2B),并能在体外培养皿培养,其形态是典型的成纤维样(图2C)。利用已报道的诱导培养基(DMEM+1%ITS+0.5mM isobutylmethylxanthine+0.1uM cortisol+1uM dexamethasome+0.2nM triiodothyronine+1uM rosiglitazone,简称AD medium)和维持培养基(DMEM+1%ITS+0.1uM cortisol+0.2nM triiodothyronine)1成功将KF重编程为脂肪细胞,通过尼罗红(Nile red)染色进一步确认为脂肪细胞(图2D)。
通过文献查阅,进一步确认KF细胞重编程为脂肪细胞体系比较适合用于评估PHDs-seq效果,因为重编程后脂肪相关的标志基因(marker gene)会明显上调,同时一些相对低表达的转录因子也会上调,这一点在后期验证候选小分子时也能体现。所以,首先搜集了一些特征基因,分别代表不同的方向(图2E):纤维化方面,PRRX1,THY1是成纤维细胞的标志基因,ACTA2是激活的成纤维细胞的标志基因,FBN1、COL1A1、COL3A1是纤维化的细胞高表达的胶原蛋白,MMP1是分解胶原蛋白重要的酶,TIMP1是抑制MMP1活性的蛋白;脂肪细胞方面,脂肪细胞的共同标志基因为FABP4、ADIPOQ、EBF2、CEBPA、ZNF423、ZNF516、ATF2,白色脂肪的标志基因LEP,褐色脂肪的标志基因包括PPARG、PPARGC1A、FNDC5、PRDM16、UCP1,脂肪细胞发挥功能的标志基因INSR和SLC2A4,INSR是脂肪细胞响应胰岛素的受体,SLC2C4是响应胰岛素后葡萄糖转运蛋白;同时,选择了八个高中低表达三个档位的内参基因ACTB、CYC、GAPDH、HMBS、PPIA、SDHA、TBP、YWHAZ2和十条高中低含量的ERCC探针。所有基因中PRDM16、UCP1、EBF2、ADIPOQ、COL1A1、ACTB、GAPDH、THY1均设计两对探针,用于测试PHDs-seq检测基因的标准与否。
利用这些探针,对KF细胞诱导成脂肪细胞体系进行测试,选择了脂肪诱导的第五天和第八天分别收取诱导和不诱导的样品建库,第五天诱导组(Posi_D5)还未形成脂滴,不能从形态上确认诱导是否成功,第八天诱导组(Posi_D8)开始有少量的脂滴分泌,选择两个时间点的目的是想探究PHDs-seq是否比传统的形态筛选更具优势,即能在形态不明显时通过多个基因的表达量就能区别样品。获得这八个样品以及它们混合样品后用上面选择的这些探针对其进行PHDs-seq建库,建库后文库的大小与预期的大小相符,约208bp,其中Posi_D5_2样品因为建库失误导致失败,没有条带(图2F),将混合样品送去测序,质检结果也证实大小相符(图2G)。测序结果表明,产生的Reads在不同位置的碱基质量也很高,完全符合分析要求(图2H)。
随后对混合样品进行比对与拆分并进行基因定量分析,结果显示,阳性组(Posi)中FABP4和ADIPOQ相较于对照组(KF)均有很高的表达,同时,PPARG、PPARGC1A、CEBPA也有相似的表达趋势,而D5和D8的样品没有明显的差异(图3A)。用这些特征基因对这些样品进行聚类分析发现,脂肪诱导组和不诱导组能够明显区分(图3B),与热图聚类相符(图3A)。相关性分析也表明处理组之间相关性很高,而与对照组相关性很低(图3C),也能很好的区别。这些结果同时也说明PHDs-seq能够在早期就能监测到KF细胞向脂肪命运转变,要优于传统的形态学筛选。
为了进一步评价PHDs-seq的准确性,在收样时留有一份样品用普通方法提RNA并反转定量部分基因的表达量,用qPCR和PHDs-seq结果分别比较每个样品中检测基因相对于内参基因GAPDH的表达情况,从结果来看,两种方法有相对较高的一致性,只是一些低表达的基因波动较大,比如在KF的两个时间点样品中FABP4和PPARG差异很大,而在Posi两个时间点样品中FBN1和PRRX1差异明显(图4A、图4B、图4D、图4E)。进一步地,针对两个时间点处理组与不处理组基因差异倍数分析发现,qPCR和PHDs-seq两种方法表现也有较高的一致性(图4C、图4F)。设计探针时我们针对部分基因设计了两条探针以验证PHDs-seq方法定量的准确性,分析这部分基因PHDs-seq的表达量后发现两个探针的表现差异不大(图4G)。从图3结果来看,初步能得出两个平行样品重复度很高,对其中几个平行样品进行线性拟合,也进一步确认了平行孔重复度很高的结论(图4H)。
综合以上结果,认为PHDs-seq在准确性、灵敏度和平行性方面均达到高通量筛选的要求,可以用于实际筛选。
实施例4、利用人瘢痕疙瘩成纤维重编程为脂肪细胞体系筛选小分子
在先前关于瘢痕疙瘩的报道中,有研究表明BMP4会提高KF向脂肪细胞的诱导效率,本发明人试图重复这些结果,细胞在诱导培养基下四天后切换成维持培养基,第十二天收样鉴定,从结果来看,BMP4效果的确很好,能大幅度提高油红O阳性的细胞数量和提高FABP4与ADIPOQ的表达水平(图5A和B),阳性比例也从~5%提高到~20%(图5C),同时我们发现BMP4起作用存在剂量效应(图5D),当用Dorsmorphin和DMH1小分子去阻断BMP4信号通路时BMP4效果明显被削弱(图5E和F),这些结果再一次说明BMP4对脂肪诱导效果非常明显。
那么,本发明人想知道,能否存在独立于BMP4之外的小分子,对脂肪诱导起到相似甚至更好的效果呢?带着这个问题,本发明人搜集了大约130个在细胞重编程体系下有常用的小分子或蛋白因子,它们覆盖了绝大多数的信号通路,借助目前开发的PHDs-seq筛选体系,在KF细胞上进行了筛选(图6A)。考虑到脂肪细胞分为白色脂肪和褐色脂肪细胞,并且白色脂肪细胞能够在特定处理下转化成褐色脂肪细胞,还想进一步区分有作用的小分子对哪一类型的细胞有作用。根据已有文章报道,把AD培养基中能促进白色转褐色脂肪细胞的小分子triiodothyronine、rosiglitazone和cortisol成分去除,只保留能促进白色脂肪细胞的1%ITS、0.5mM isobutylmethylxanthine和1uM dexamethasome三种小分子,它们组成MDI培养基,以MDI为基础培养基,筛选搜集的这些小分子,在诱导四天后,换成DMEM+1%ITS培养基,第八天收样建库测序。从测序结果来看,绝大部分小分子对脂肪细胞的特征基因的表达没有促进作用,而PPARG的激活剂Rosiglitazone和以及AD中的成分FSK均能促进FABP4和ADIPOQ的表达,它们能很好的聚类在一起,同时,还发现DNA甲基转移酶抑制剂Decitabine和TGFβ的抑制剂SD_208、Repsox、SB431542也能同时轻微提高这两个基因表达量(图6B)。 在细胞纤维化方面,只有Lithocholic Acid对COL1A1和COL3A1的下调效果明显,其余的小分子不起作用(图6B)。通过PCA分析,发现Rosiglitazone、Forskolin、Decitabine、SD_208、Repsox和SB431542等小分子处理样品与其余的样品能很好的分开,并且更接近AD样品(图6C)。
综上所述,利用PHDs-seq筛选技术,筛选到诸如Rosiglitazone、FSK和一些TGFBR1的小分子抑制剂能促进KF向脂肪细胞命运转变,说明PHDs-seq的可靠性。
实施例5、PHDs-seq与TAC-seq的比较
首先,通过图7A所示的三种方案测试不同的杂交/连接步骤对结果的影响。
结果如图7B所示。发现用TACseq原实验方案(方案二、lane 5)扩增出的目的条带较弱,而在优化后的条件中(lane 8,将杂交连接放在一步同时进行的方案三),可以更有效的扩增出目的条带。lane2是在方案二(lane 5)的基础上,在杂交过程中加入一定的盐溶液(1500mM KCl、300mM Tris–HCl pH 8.5、1mM EDTA)帮助更有效的杂交,但实验结果显示,加入盐溶液并不能有效扩增出目的条带。
此外,基于一步杂交和连接的方案三,测试了不同的探针浓度对结果的影响。实验如图8A所示。结果示于图8B。仅使用TAC-seq原实验方案中探针浓度的1/1000(lane9,探针浓度0.83/1000μM)即可同等有效的扩增出原浓度条件下的目的条带(lane5,探针浓度0.83μM)。同时,发现在1/1000探针浓度条件下,杂带的数量明显变弱或基本检测不到。
参考文献
1.Plikus,M.V.et al.Regeneration of fat cells from myofibroblasts during wound healing.Science 355,748-752,doi:10.1126/science.aai8792(2017).
2.Teder,H.et al.TAC-seq:targeted DNA and RNA sequencing for precise biomarker molecule counting.npj Genomic Medicine 3,34,doi:10.1038/s41525-018-0072-5(2018).

Claims (38)

  1. 一种基于探针杂交的高通量转录谱测序文库构建方法,所述方法包括:
    (a)提供在至少一个多孔板中的至少一种包含细胞的生物学样品,所述至少一种包含细胞的生物学样品的每种分别位于单独的孔中;
    (b)在所述多孔板的孔中裂解所述生物学样品中的细胞;
    (c)将步骤(b)获得的细胞裂解上清液转移至另一多孔板的相应孔中,进行逆转录反应以获得cDNA;
    (d)向每个孔添加杂交-连接混合液,所述混合液包含DNA连接酶以及至少一组与至少一个待检测基因的靶区域特异性杂交的探针对,其中所述探针对包括与所述靶区域的上游部分杂交的左探针和与所述靶区域的下游部分杂交的右探针;
    (e)使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域,并且使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接;
    (f)富集所述连接产物;
    (g)向每个孔添加条码化(Barcoding)PCR混合液,所述混合液包含DNA聚合酶以及条码(Barcode)引物对,所述条码(Barcode)引物对包含针对所述左探针的第一引物以及针对所述右探针的第二引物,所述第一和第二引物中的一种包含对每个孔而言是唯一的孔条码序列,另一种包含对于每个多孔板而言是唯一的板条码序列;
    (h)用所述条码引物对通过PCR扩增所述至少一个待检测基因的靶区域;和
    (i)收获并混合所述至少一个多孔板的至少一个孔中的扩增产物,并任选地纯化,
    由此获得可用于高通量转录谱测序的文库。
  2. 权利要求1的方法,其中所述多孔板是96孔板或384孔板,优选96孔板。
  3. 权利要求1或2的方法,其中所述至少一种包含细胞的生物学样品为1-200种或更多种,例如至少2种,至少5种、至少10种、至少20种、至少30种、至少40种、至少50种、至少100种、至少150种、至少200种或更多种包含细胞的生物学样品。
  4. 权利要求1-3中任一项的方法,其中所述至少一种包含细胞的生物学样品是各自包含不同的细胞类型的生物学样品。
  5. 权利要求1-3中任一项的方法,其中所述至少一种包含细胞的生物学样品包含相同的细胞类型的生物学样品,但每种生物学样品都经过不同的处理,例如经过不同化合物的处理。
  6. 权利要求5的方法,其中所述处理能够导致细胞的特定表型。
  7. 权利要求1-6中任一项的方法,其中所述细胞是体细胞、生殖细胞或干细胞(如胚胎干细胞或诱导的多能干细胞)。
  8. 权利要求1-6中任一项的方法,其中所述细胞选自神经元细胞、骨骼肌细胞、肝细胞、成纤维细胞、成骨细胞、软骨细胞、脂肪细胞、内皮细胞、间质细胞、平滑肌细胞、心肌细胞、神经细胞、造血细胞、胰岛细胞或肿瘤细胞。
  9. 权利要求1-8中任一项的方法,其中所述细胞来源于哺乳动物或非哺乳动物,例如,所述细胞来源于人、小鼠、大鼠或非人灵长类动物。
  10. 权利要求1-9中任一项的方法,其中步骤(b)中使用基于非离子型表面活性剂的细胞裂解液裂解所述细胞,优选地,所述非离子型表面活性剂是Triton X-100。
  11. 权利要求10的方法,其中所述细胞裂解液由Tris-HCl、KCl、聚蔗糖如Ficoll PM-400、Triton X-100、核糖核酸酶抑制剂和水组成。
  12. 权利要求10的方法,其中所述细胞裂解液各组分的使用终浓度是:大约5mM至大约500mM Tris-HCl、大约7.5mM至大约750mM KCl、大约0.6%至大约60%聚蔗糖如Ficoll PM-400、大约0.015%至大约1.5%Triton X-100、大约0.05U/μL至大约5U/μL核糖核酸酶抑制剂。
  13. 权利要求1-12中任一项的方法,其中所述步骤(c)中的逆转录反应在大约40-45℃例如42℃下进行;和/或,所述逆转录反应进行大约15-60分钟,例如大约30分钟。
  14. 权利要求1-13中任一项的方法,其中所述步骤(d)中的DNA连接酶选自T4 DNA连接酶或Taq DNA连接酶,优选Taq DNA连接酶。
  15. 权利要求1-14中任一项的方法,其中所述杂交-连接混合液包含1-200个或更多个,例如至少2个、至少3个、至少4个、至少5个、至少10个、至少20个、至少30个、至少40个、至少50个、至少100个、至少150个、至少200个或更多个探针对。
  16. 权利要求15的方法,其中所述探针对用于检测1-200个或更多个,例如至少2个、至少3个、至少4个、至少5个、至少10个、至少20个、至少30个、至少40个、至少50个、至少100个、至少150个、至少200个或更多个待检测基因。
  17. 权利要求1-16中任一项的方法,其中所述待检测基因与所述细胞的至少一种表型相关。
  18. 权利要求17的方法,其中所述至少一个待检测基因的部分或全部的表达谱用作所述表型的标记物。
  19. 权利要求17或18的方法,其中所述表型选自细胞增殖的抑制或增加、细胞类型的改变。
  20. 权利要求1-19中任一项的方法,其中所述靶区域是所述待检测基因的特征性区域。
  21. 权利要求1-20中任一项的方法,其中所述靶区域的长度为大约20个核苷酸(nt)-大约300nt或更长,例如大约20nt、大约30nt、大约40nt、大约50nt、大约60nt、大约70nt、大约80nt、大约90nt、大约100nt、大约120nt、大约140nt、大约160nt、大约180nt、大约200nt、大约250nt、大约300nt或更长。
  22. 权利要求1-21中任一项的方法,其中所述左探针从5’至3’方向包括5’引物结合序列、单分子标签(UMI)和第一靶区域结合序列。
  23. 权利要求1-22中任一项的方法,其中所述右探针从5’至3’方向包括第二靶区域 结合序列、单分子标签(UMI)和3’引物结合序列,且所述右探针5’端含有磷酸基团,由此其可以与左探针的3’端连接。
  24. 权利要求22或23的方法,其中第一靶区域结合序列与第二靶区域结合序列在连接后与待检测基因的靶区域完美匹配。
  25. 权利要求22-24中任一项的方法,其中所述第一或第二靶区域结合序列的长度为大约10nt-大约150nt或更长,例如大约10nt、大约15nt、大约20nt、大约25nt、大约30nt、大约35nt、大约40nt、大约45nt、大约50nt、大约60nt、大约70nt、大约80nt、大约90nt、大约100nt、大约125nt、大约1500nt或更长,条件是其使得所述探针特异性杂交至所述靶区域。
  26. 权利要求22-25中任一项的方法,其中所述单分子标签(UMI)的长度是大约3nt-8nt,例如4nt。
  27. 权利要求22-26中任一项的方法,其中所述5’引物结合序列和/或所述3’引物结合序列是通用引物结合序列。
  28. 权利要求1-27中任一项的方法,其中步骤(d)中所述探针各自的浓度为约0.0001μM-约1μM,例如约0.0001μM-约0.001μM、约0.0001μM-约0.01μM、约0.0001μM-约0.1μM;优选地,所述探针各自的浓度为不超过约0.1μM,优选不超过约0.01μM,更优选不超过约0.001μM。
  29. 权利要求1-28中任一项的方法,其中步骤(e)中“使所述至少一组探针对杂交至所述至少一个待检测基因的靶区域”和“使杂交至所述靶区域上的所述探针对中的左探针和右探针相互连接”的步骤在同一溶液体系下同步进行。
  30. 权利要求1-29中任一项的方法,其中步骤(e)包括在大约50-大约70℃,例如大约60℃下孵育所述至少一个多孔板,和/或孵育所述至少一个多孔板大约30-120分钟或更长时间,例如孵育至少30分钟、至少60分钟、至少90分钟、至少120分钟或更长时间。
  31. 权利要求1-30中任一项的方法,其中所述步骤(f)中通过磁珠富集所述连接产物,所述磁珠例如是Dynabeads MyOne Carboxylic Acid珠。
  32. 权利要求22-31中任一项的方法,其中所述条码引物对中的第一引物包括与左探针的5’引物结合序列对应的引物区序列,所述条码引物对中的第二引物包括与右探针的3’引物结合序列对应的引物区序列。
  33. 权利要求1-32中任一项的方法,所述孔条码序列或板条码序列的长度是大约4nt-10nt,例如4nt、5nt、6nt、7nt、8nt、9nt或10nt。
  34. 权利要求1-33中任一项的方法,其中所述第一引物和/或所述第二引物还包含用于高通量测序的接头序列,例如P5接头序列或P7接头序列。
  35. 权利要求1-34中任一项的方法,其中步骤(i)中收获并混合所有多孔板的所有孔中的扩增产物。
  36. 一种高通量药物筛选方法,所述方法包括:
    (1)在至少一个多孔板的至少一个孔中培养细胞;
    (2)对不同孔中的细胞进行不同处理,例如添加不同的候选药物进行处理;
    (3)通过权利要求1-35中任一项的方法构建高通量转录谱测序文库;
    (4)对所述文库进行高通量测序;和
    (5)根据高通量测序结果鉴定候选药物。
  37. 权利要求36的方法,其中所述候选药物选自小分子化合物、抗体、多肽、核酸分子,优选小分子化合物。
  38. 一种获得和/或评价针对待检测基因的探针对的方法,优选地,所述探针对可用于权利要求1-37中任一项方法,所述获得和/或评价探针对的方法包括以下步骤:
    a)、获取待检测基因的编码序列(CDS);
    b)、将待测基因的CDS序列或其部分输入引物设计程序如https://www.ncbi.nlm.nih.gov/tools/primer-blast/进行分析,获得至少一个包含正向引物和反向引物的引物对,输出所述至少一个引物对的正向引物序列和反向引物的反向互补序列作为候选引物序列;
    c)、将步骤b)输出的候选引物序列定位至待检测基因的CDS;
    d)、鉴定i)所述候选引物序列和所述候选引物序列上游相应长度的序列,或ii)所述候选引物序列和所述候选引物序列下游相应长度的序列是否跨越待检测基因的不同外显子,
    若i)序列跨越不同外显子,则输出所述候选引物序列为候选右探针靶区域结合序列(R),所述候选引物序列上游相应长度的序列为候选左探针靶区域结合序列(L);若ii)序列跨越不同外显子,则输出所述候选引物序列为候选左探针靶区域结合序列(L),所述候选引物序列上游相应长度的序列为候选右探针靶区域结合序列(R);若待检测基因只有一个外显子,则视为i)或ii)均跨越不同外显子;
    e)、使用下式对所输出序列进行打分:0.4*{(Y-N)/Y}+0.15*{1-|0.55-L GC%|}+0.15*{1-|0.55-R GC%|}+0.2*(1-T/A)+0.1*X
    Y为候选引物序列长度;
    N为不同外显子交界处距离i)或ii)序列中间处的核苷酸数,如待检测基因仅有一个外显子,则N=27;
    L GC%:输出的L序列的GC含量百分比;
    R GC%:输出的R序列的GC含量百分比;
    A:整个CDS的长度;
    T:R序列的最后一个核苷酸距离整个CDS的最后一个碱基的长度;
    X:若该候选引物序列在步骤b)输出为正向引物,则X=1;若该候选引物序列在步骤b)输出为反向引物,则X=0;
    f)、输出得分高的候选左探针靶区域结合序列(L)和候选右探针靶区域结合序列(R)的组合;和
    g)、任选地,向f)输出的所述候选左探针靶区域结合序列(L)和候选右探针靶区域结合序列(R)添加条码序列如单分子标签(UMI)和/或引物结合序列。
PCT/CN2023/071872 2022-01-12 2023-01-12 基于探针杂交的高通量转录谱测序文库构建方法 WO2023134719A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210033028.X 2022-01-12
CN202210033028.XA CN116555391A (zh) 2022-01-12 2022-01-12 基于探针杂交的高通量转录谱测序文库构建方法

Publications (1)

Publication Number Publication Date
WO2023134719A1 true WO2023134719A1 (zh) 2023-07-20

Family

ID=87280109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071872 WO2023134719A1 (zh) 2022-01-12 2023-01-12 基于探针杂交的高通量转录谱测序文库构建方法

Country Status (2)

Country Link
CN (1) CN116555391A (zh)
WO (1) WO2023134719A1 (zh)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190323074A1 (en) * 2017-01-05 2019-10-24 Tervisetehnoloogiate Arenduskeskus As Quantifying dna sequences

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190323074A1 (en) * 2017-01-05 2019-10-24 Tervisetehnoloogiate Arenduskeskus As Quantifying dna sequences

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ATTAF NOUDJOUD, CERVERA-MARZAL IÑAKI, DONG CHUANG, GIL LAURINE, RENAND AMÉDÉE, SPINELLI LIONEL, MILPIED PIERRE: "FB5P-seq: FACS-Based 5-Prime End Single-Cell RNA-seq for Integrative Analysis of Transcriptome and Antigen Receptor Repertoire in B and T Cells", FRONTIERS IN IMMUNOLOGY, vol. 11, 3 March 2020 (2020-03-03), pages 216, XP093079615, DOI: 10.3389/fimmu.2020.00216 *
CHAOYANG YE, DANIEL J. HO, MARILISA NERI, CHIAN YANG, TRIPTI KULKARNI, RANJIT RANDHAWA, MARTIN HENAULT, NADEZDA MOSTACCI, PIERRE F: "DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery", NATURE COMMUNICATIONS, vol. 9, no. 1, 1 December 2018 (2018-12-01), XP055620437, DOI: 10.1038/s41467-018-06500-x *
TEDER HINDREK, KOEL MARIANN, PALUOJA PRIIT, JATSENKO TATJANA, REKKER KADRI, LAISK-PODAR TRIIN, KUKUŠKINA VIKTORIJA, VELTHUT-MEIKAS: "TAC-seq: targeted DNA and RNA sequencing for precise biomarker molecule counting", NPJ GENOMIC MEDICINE, vol. 3, no. 1, 18 December 2018 (2018-12-18), pages 34, XP093079613, DOI: 10.1038/s41525-018-0072-5 *
ZUO LE, JIANG MIN, JIANG YIXIANG, SHI XIAOLU, LI YINGHUI, LIN YIMAN, QIU YAQUN, DENG YINHUA, LI MINXU, LIN ZEREN, LIAO YIQUN, XIE : "Multiplex ligation reaction based on probe melting curve analysis: a pragmatic approach for the identification of 30 common Salmonella serovars", ANNALS OF CLINICAL MICROBIOLOGY AND ANTIMICROBIALS, vol. 18, no. 1, 1 December 2019 (2019-12-01), pages 39, XP093079616, DOI: 10.1186/s12941-019-0338-5 *

Also Published As

Publication number Publication date
CN116555391A (zh) 2023-08-08

Similar Documents

Publication Publication Date Title
Potter Single-cell RNA sequencing for the study of development, physiology and disease
EP3366818B1 (en) Method for constructing high-resolution single cell hi-c library with a lot of information
Li et al. Single-cell brain organoid screening identifies developmental defects in autism
CN105934523B (zh) 核酸的多重检测
WO2019076768A1 (en) METHOD AND KIT FOR PREPARING DNA BANK
KR100433782B1 (ko) 차별적으로 발현되는 유전자의 고체상 선택
Liu et al. Barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel in situ analyses
CN106566828B (zh) 一种高效的全基因组染色质构象技术eHi-C
CN111621548A (zh) 扩增dna的方法
Van den Hurk et al. Patch-seq protocol to analyze the electrophysiology, morphology and transcriptome of whole single neurons derived from human pluripotent stem cells
CN107893100A (zh) 一种单细胞mRNA逆转录与扩增的方法
Esumi et al. Method for single-cell microarray analysis and application to gene-expression profiling of GABAergic neuron progenitors
US20170218446A1 (en) Cell characterisation
US20240000900A1 (en) Compositions and methods for treating diseases associated with an imprinting defect
CN110747514B (zh) 一种高通量单细胞小rna文库构建方法
Lamanna et al. Reconstructing the ancestral vertebrate brain using a lamprey neural cell type atlas
Traunmüller et al. A cell-type-specific alternative splicing regulator shapes synapse properties in a trans-synaptic manner
WO2023134719A1 (zh) 基于探针杂交的高通量转录谱测序文库构建方法
CN107083440A (zh) 一种检测染色体非整倍性的试剂盒及其制备方法和应用
CN114875118B (zh) 确定细胞谱系的方法、试剂盒和装置
CN111534858B (zh) 用于高通量测序的文库构建方法及高通量测序方法
CN112626215B (zh) Aml预后相关基因表达检测试剂盒
De Rop et al. HyDrop: droplet-based scATAC-seq and scRNA-seq using dissolvable hydrogel beads
Mao et al. RNA‐seq and ATAC‐seq analyses of multilineage differentiating stress enduring cells: Comparison with dermal fibroblasts
CN113106160A (zh) 评估肝谱系细胞成熟度的标志物、双组学试剂盒及构建方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23740061

Country of ref document: EP

Kind code of ref document: A1