CN116555391A

CN116555391A - High-throughput transcription profiling sequencing library construction method based on probe hybridization

Info

Publication number: CN116555391A
Application number: CN202210033028.XA
Authority: CN
Inventors: 刘洋; 李军; 赵扬
Original assignee: Nanjing Xinrui Regenerative Medicine Technology Co ltd
Current assignee: Nanjing Xinrui Regenerative Medicine Technology Co ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2023-08-08
Also published as: WO2023134719A1

Abstract

The invention relates to the field of biological medicine. In particular, the invention relates to a high-throughput transcriptional profiling library construction method based on probe hybridization, which can be used for screening drugs at a high throughput and low cost.

Description

High-throughput transcription profiling sequencing library construction method based on probe hybridization

Technical Field

Background

The discovery of new drugs currently relies largely on high throughput screening, but current screening platforms have limited screening capabilities. RNA-seq is a powerful tool for studying drug effects using transcriptome changes as markers, but standard libraries are expensive to construct. Hindek Teder et al (npj Genomic Medicine (2018) 3:34) developed the TAC-seq technique for accurate quantification of specific nucleic acid biomarkers. However, the TAC-seq technique requires separate kit or Trizol for RNA extraction per sample, and the probe hybridization and ligation steps need to be performed separately, and a single barcode (barcode) also limits the number of samples that can be screened. Thus, there remains a need in the art for improved methods of high throughput transcriptional profiling library construction.

Brief Description of Drawings

FIG. 1. Development flow and schematic diagram of high throughput transcriptional profiling techniques based on probe hybridization. (a) PHDs-seq schematic; (B) PHDs-seq flow and schematic.

FIG. 2. Keloid fibroblasts induced the establishment of an adipocyte system and test for PHDs-seq. (A) Isolation of human keloid fibroblasts and induced adipocyte schematic representation; (B) Human keloid fibroblasts were flow-sorted, left panel shows the sorting results for isotype as control, right panel shows the sorting results with surface antibody CD 90; (C) The isolated keloid cell morphology, scale 50um; (D) Reprogramming the isolated human keloid fibroblast into fat cell, wherein the induction culture medium is AD medium, and using Nile red staining to identify the result graph, and the scale is 200um; (E) a list of signature genes for testing PHDs-seq; (F) The running results of the PHDs-seq sub-libraries (1-8) and the mixed library 9, DNA Marker:2k plus; (G) results of quality testing after the PHDs-seq sub-library is mixed into a large library; (H) PHDs-seq library quality control results: base quality score distribution at different positions in reads.

FIG. 3 PHDs-seq sequencing results analysis and evaluation. (A) Splitting each small library from the PHDs-seq mixed library sequencing result, respectively calculating a heat map generated by the characteristic gene expression quantity, standardizing through Log10 (CPM+1), posi: positive, KF: keloid fibroblast, D5: five days of treatment, and D8: eight days of treatment; (B) Hierarchical cluster diagrams among the samples are generated based on the characteristic gene expression quantity; (C) And a correlation analysis chart among the samples based on the characteristic gene expression quantity.

FIG. 4 PHDs-seq sequencing results analysis and evaluation 2. (A) The (B), (D) and (E) are respectively the relative expression of partial genes (ACTB, SDHA, PPIA, THY1, PRRX1, FBN1, COL1A1, ADIPOQ, FABP4, PPARG and UCP 1) detected by two methods of PHDs-seq and qPCR in four samples of KF_D5, posi_D5, KF_D8 and Posi_D8 and the internal reference GAPDH, and Log10FC is taken; (C) Comparing the relative expression level of partial genes (ACTB, SDHA, PPIA, THY1, PRRX1, FBN1, COL1A1, ADIPOQ, FABP4, PPARG, UCP 1) between the two samples of Posi_D5 and KF_D5; (F) Comparing the relative expression quantity of partial genes (ACTB, SDHA, PPIA, THY1, PRRX1, FBN1, COL1A1, ADIPOQ, FABP4, PPARG and UCP 1) between the two samples of Posi_D8 and KF_D8, and taking Log10FC; (G) Detecting the expression condition of the same gene in PHDs-seq by using two probes respectively; (H) The identity of all the signature genes in two PHDs-seq repeat samples was analyzed and Log (cpm+1) was taken.

Bmp can increase the efficiency of KF cell induction towards fat. (A) The upper graph shows the induction flow chart of keloid fibroblasts to adipocytes, the lower graph shows the oil red O staining of fat, scale 100um; (B) Quantitative analysis of fat marker genes ADIPOQ and FABP4 was performed on 12 days of sample collection in panel A; (C) counting the induction efficiency of fat in the A graph; (D) Testing the effect of BMP4 at different concentrations on fat induction and counting the amount of fat in each well; (E) Treatment with the small molecule inhibitors dorsforphin and DMH1 of BMP signaling pathway and counting fat number per well (24 well plate); phenotype of (F) E.

FIG. 6 shows the use of PHDs-seq to screen for small molecules that increase adipocyte efficiency. (A) PHDs-seq screening a small molecule flow diagram to increase fat induction efficiency; (B) The PHDs-seq sequencing heat map shows the expression level of each characteristic gene after 8 days of small molecule treatment, log10 (CPM+1); (C) Samples processed by PCA analysis were labeled with candidate small molecules in red and KF in blue.

Fig. 7 shows the optimization with respect to TAC-seq system: hybridization ligation is a one-step reaction.

Fig. 8 shows the optimization with respect to TAC-seq system: the probe concentration was changed.

Detailed Description

In one aspect, the invention provides a method of high throughput transcriptional profiling library construction based on probe hybridization, the method comprising:

(a) Providing at least one cell-containing biological sample in at least one multiwell plate, each of the at least one cell-containing biological sample being located in a separate well;

(b) Lysing cells in the biological sample in wells of the multiwell plate;

(c) Transferring the cell lysis supernatant obtained in step (b) to a corresponding well of another multi-well plate, and performing a reverse transcription reaction to obtain cDNA;

(d) Adding to each well a hybridization-ligation mixture comprising a DNA ligase and at least one set of probe pairs that specifically hybridize to a target region of at least one gene to be detected, wherein the probe pairs comprise a left probe that hybridizes to an upstream portion of the target region and a right probe that hybridizes to a downstream portion of the target region;

(e) Hybridizing the at least one set of probe pairs to a target region of the at least one gene to be detected, and interconnecting left and right probes of the probe pairs hybridized to the target region;

(f) Enriching the ligation product;

(g) Adding to each well a Barcoding (Barcoding) PCR cocktail comprising a DNA polymerase and a Barcode (Barcode) primer pair comprising a first primer for the left probe and a second primer for the right probe, one of the first and second primers comprising a Kong Tiaoma sequence unique to each well and the other comprising a Barcode sequence unique to each multiwell plate;

(h) Amplifying a target region of the at least one gene to be detected by PCR using the barcode primer pair; and

(i) Harvesting and mixing the amplified products in at least one well of said at least one multiwell plate, and optionally purifying,

thus, libraries useful for high throughput transcriptional profiling are obtained.

In some embodiments, the multi-well plate is a 96-well plate or 384-well plate, preferably a 96-well plate.

In some embodiments, the at least one biological sample comprising cells may be 1-200 or more, for example, at least 2, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200, or more biological samples comprising cells.

In some embodiments, the at least one biological sample comprising cells is a biological sample each comprising a different cell type. In some embodiments, the at least one biological sample comprising cells comprises biological samples of the same cell type, but each biological sample is subjected to a different treatment, e.g., to a different compound. In some embodiments, the treatment is capable of causing a particular phenotype of the cell.

The cells described herein may be any type of cell of interest. The cells may be somatic cells, germ cells, stem cells (e.g., embryonic stem cells or induced pluripotent stem cells). Such cells include, but are not limited to, neuronal cells, skeletal muscle cells, liver cells, fibroblasts, osteoblasts, chondrocytes, adipocytes, endothelial cells, interstitial cells, smooth muscle cells, cardiomyocytes, neural cells, hematopoietic cells, islet cells, or virtually any cell in the body including tumor cells. In some embodiments, the cell is a fibroblast. The fibroblasts include, but are not limited to keloid fibroblasts (Keloid fibroblast), skin fibroblasts, cardiac fibroblasts.

The cells may be derived from a mammal or a non-mammal. In some embodiments, the cell is derived from a human. In some embodiments, the starting cell is derived from a non-human mammal. In some embodiments, the cells are derived from a mouse, such as a mouse or a rat, or a non-human primate.

In some embodiments, wherein the cells are lysed in step (b) using a non-ionic surfactant based cell lysate. In some embodiments, the nonionic surfactant is Triton X-100. In some embodiments, the cell lysate consists of Tris-HCl, KCl, a polysucrose such as Ficoll PM-400, triton X-100, a ribonuclease inhibitor, and water. In some embodiments, the final concentrations of the components of the cell lysate are: about 5mM to about 10mM, about 5mM to about 50mM, about 5mM to about 100mM, about 5mM to about 150mM, about 5mM to about 200mM, about 5mM to about 250mM, about 5mM to about 500mM Tris-HCl; about 7.5mM to about 15mM, about 7.5mM to about 30 mM, about 7.5mM to about 60mM, about 7.5mM to about 120mM, about 7.5mM to about 300mM, about 7.5mM to about 500mM, about 7.5mM to about 750mM KCl; about 0.6% to about 5%, about 0.6% to about 10%, about 0.6% to about 20%, about 0.6% to about 30%, about 0.6% to about 40%, about 0.6% to about 50%, about 0.6% to about 60% polysucroses such as Ficoll PM-400; about 0.015% to about 0.15%, about 0.015% to about 0.25%, about 0.015% to about 0.5%, about 0.015% to about 0.75%, about 0.015% to about 1%, about 0.015% to about 1.25%, about 0.015% to about 1.5% triton X-100; about 0.05U/μL to about 0.1U/μL, about 0.05U/μL to about 0.25U/μL, about 0.05U/μL to about 0.5U/μL, about 0.05U/μL to about 1U/μL, about 0.05U/μL to about 2.5U/μL, about 0.05U/μL to about 5U/μL ribonuclease inhibitor.

The inventors have surprisingly found that using a mild cell lysate based on a non-ionic surfactant, in particular Triton X-100, it is possible to use the lysate supernatant directly after lysing the cells for subsequent reverse transcription without further purification steps. Thus, in some embodiments, the cell lysis supernatant transferred in step c) is not further purified.

In the method of the present invention, the reverse transcription reaction in the step (c) may be performed using a reverse transcription reaction system conventional in the art (e.g., commercially available). For example, the reverse transcription reaction may be performed using an Oligo-dT system. In some embodiments, the reverse transcription reaction is performed at about 40-45 ℃, e.g., 42 ℃. In some embodiments, the reverse transcription reaction is performed for about 15-60 minutes, for example about 30 minutes.

In the method of the invention, the DNA ligase in step (d) may be a DNA ligase conventional in the art, such as T4 DNA ligase or Taq DNA ligase, preferably Taq DNA ligase. The hybridization-ligation mixture may comprise a buffer compatible with the DNA ligase.

In the methods of the invention, the hybridization-ligation mixture may comprise at least one set of probe pairs that specifically hybridize to at least one target region of at least one gene to be detected. The number of probe pairs depends on the number of genes/target regions to be detected. For example, the hybridization-ligation mixture can comprise 1-200 or more, e.g., at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200, or more probe pairs. The probe pairs can be used to detect 1-200 or more, e.g., at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200, or more genes/target regions to be detected.

In some embodiments of the methods of the invention, the gene to be detected is associated with at least one phenotype of the cell. For example, where the methods of the invention are used to screen for a treatment that results in a particular phenotype of a cell (e.g., treatment with a particular compound), some or all of the at least one gene to be detected may be used as a marker for that phenotype. For example, the expression profile of part or all of the at least one gene to be detected may be used as a marker for the phenotype. The phenotype may be, for example, inhibition or increase in cell proliferation, a change in cell type, etc. One skilled in the art can determine the specific genes and numbers to be detected based on the specific cell phenotype.

In some embodiments, the cell is a keloid fibroblast, the phenotype is reprogramming from a keloid fibroblast to an adipocyte, and the gene to be detected is one or more or all of the following selected from the group consisting of: PRRX1, THY1, ACTA2, FBN1, COL1A1, COL3A1, MMP1, TIMP1, FABP4, ADIPOQ, EBF2, CEBPA, ZNF423, ZNF516, ATF2, LEP, PPARG, PPARGC1A, FNDC5, PRDM16, UCP1, INSR, SLC2A4, INSR, and SLC2C4. In some embodiments, the genes to be detected further include internal genes, such as one or more or all selected from ACTB, CYC, GAPDH, HMBS, PPIA, SDHA, TBP, YWHAZ.

In some embodiments, the target region is a region characteristic of the gene to be detected, i.e., specific for the gene to be detected. In some embodiments, the target region can be about 20 nucleotides (nt) to about 300nt or more in length, such as about 20nt, about 30nt, about 40nt, about 50nt, about 60nt, about 70nt, about 80nt, about 90nt, about 100nt, about 120nt, about 140nt, about 160nt, about 180nt, about 200nt, about 250nt, about 300nt or more.

In some embodiments, the left probe comprises a 5' primer binding sequence, a single molecule tag (UMI), and a first target region binding sequence in a 5' to 3' direction. In some embodiments, the right probe comprises a second target region binding sequence, a single molecule tag (UMI), and a 3' primer binding sequence in a 5' to 3' direction. In some embodiments, the right probe contains a phosphate group at the 5 'end, whereby it can be attached to the 3' end of the left probe. In some embodiments, the first target region binding sequence and the second target region binding sequence, after ligation, perfectly match the target region of the gene to be detected.

In some embodiments, the length of the first or second target region binding sequence is about 10nt to about 150 nt or longer, e.g., about 10nt, about 15nt, about 20nt, about 25nt, about 30nt, about 35nt, about 40nt, about 45nt, about 50nt, about 60nt, about 70nt, about 80nt, about 90nt, about 100nt, about 125nt, about 1500nt or longer, provided that it allows the probe to specifically hybridize to the target region.

The length of the single molecule tag (UMI) may be about 3nt-8nt, for example 4nt. It is known in the art how to design and generate single molecule tags. Single molecule tags allow the identification of amplified products from a single transcript in sequencing.

In some embodiments, the 5 'primer binding sequences are universal primer binding sequences, e.g., the 5' primer binding sequences in different probe pairs are identical. In some embodiments, the 3 'primer binding sequences are universal primer binding sequences, e.g., the 3' primer binding sequences in different probe pairs are identical.

In some embodiments, the probes are each at a concentration of about 0.0001. Mu.M to about 1. Mu.M, for example, about 0.0001. Mu.M to about 0.001. Mu.M, about 0.0001. Mu.M to about 0.01. Mu.M, about 0.0001. Mu.M to about 0.1. Mu.M. In some embodiments, the probes each have a concentration of no more than about 0.1. Mu.M, preferably no more than about 0.01. Mu.M, and more preferably no more than about 0.001. Mu.M.

The inventors have surprisingly found that by the method of the invention, a probe concentration as low as about 0.0001 μm can be used to obtain the desired amplification efficiency in the subsequent amplification step and significantly increase the specificity of the amplification.

In some embodiments, the steps of "hybridizing the at least one set of probe pairs to the target region of the at least one gene to be detected" and "interconnecting the left and right probes of the probe pairs hybridized to the target region" in step (e) are performed under the same solution system. In some embodiments, the steps of "hybridizing the at least one set of probe pairs to the target region of the at least one gene to be detected" and "ligating the left and right probes of the probe pairs hybridized to the target region" in step (e) are performed simultaneously.

The inventors have surprisingly found that by simultaneously performing the steps of "hybridizing the at least one set of probe pairs to the target region of the at least one gene to be detected" and "interconnecting the left and right probes of the probe pairs hybridized to the target region" in step (e) under the same solution system, a higher amplification efficiency (more amplification products) can be obtained in the subsequent steps.

In some embodiments, step (e) comprises incubating the at least one multiwell plate at about 50 to about 70 ℃, e.g., about 60 ℃. In some embodiments, step (e) comprises incubating the at least one multiwell plate for about 30-120 minutes or more, e.g., for at least 30 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes or more.

In some embodiments, the ligation product is enriched in step (f) by magnetic beads. The magnetic beads are, for example, dynabeads MyOne Carboxylic Acid beads.

In some embodiments, the first primer of the barcode primer pair comprises a primer region sequence corresponding to the 5 'primer binding sequence of the left probe and the second primer of the barcode primer pair comprises a primer region sequence corresponding to the 3' primer binding sequence of the right probe. By using the bar code primer pair, a target region of a gene to be detected comprising a single molecular tag can be amplified by using a connection product of the probe pair as a template.

In some embodiments, the first primer comprises a well barcode sequence that is unique to each well, and the second primer comprises a barcode sequence that is unique to each multiwell plate. In some embodiments, the second primer comprises a well barcode sequence that is unique to each well, and the first primer comprises a barcode sequence that is unique to each multiwell plate.

Kong Tiaoma sequence is unique for each well in that Kong Tiaoma sequence of a primer added to one well differs from the pore barcode sequence of a primer added to the other well. The unique association of the sequences for each plate means that the sequences of the primers added to one plate are different from those of the primers added to the other plates; conversely, the sequences of the primers added to different wells of the same multi-well plate should be identical. By combining plate and well barcodes, sequence information or gene expression information within a given well in a given multiwell plate can be obtained by final sequencing.

The Kong Tiaoma sequence or the slat sequence may be about 4nt-10nt in length, e.g., 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, or 10nt. In some embodiments, the Kong Tiaoma sequence is 7nt in length. In some embodiments, the length of the barcode sequence is 6nt.

By introducing a pore-barcode sequence and a barcode sequence at both ends, the resulting library can be subjected to double-ended sequencing, and the number of samples that can be contained in the library is significantly increased, as well as the cost of synthesizing the amplification primers.

In some embodiments, the first primer and/or the second primer may further comprise a linker sequence for high throughput sequencing, such as a P5 linker sequence or a P7 linker sequence.

In some embodiments, the amplification in step (h) is performed by conventional PCR methods. In some embodiments, PCR amplification is performed using the following procedure: 94 ℃ for 5min;94℃for 30s,57℃for 30s,72℃for 20s,2 cycles; 94℃for 30s,65℃for 30s,72℃for 20s,20 cycles; 72 60s.

In some embodiments, wherein the amplification products in all wells of all multi-well plates are harvested and mixed in step (i).

The amplified products harvested and mixed in step (i) may be purified using methods known in the art, for example using commercially available kits. In some embodiments, the harvested and mixed amplification product is purified using a DNA clean & concentrator-100 kit (XYBO, D4029).

Libraries obtained by the methods of the invention can be used to perform high throughput sequencing, also known as second generation sequencing ("NGS"). Second generation sequencing produces thousands to millions of sequences simultaneously in a parallel sequencing process. NGS is distinguished from "Sanger sequencing" (first generation sequencing), which is based on the electrophoretic separation of chain termination products in a single sequencing reaction. Sequencing platforms that can use NGS of the present invention are commercially available, including but not limited to Roche/454FLX, illumina/Solexa Genome Analyzer and Applied Biosystems SOLID system, and the like. The high throughput sequencing allows to obtain the expression profile of each gene to be detected in the cells of each biological sample.

The high-throughput transcription profiling sequencing library construction method is particularly suitable for high-throughput and low-cost drug screening, such as small molecule drug screening.

Accordingly, in another aspect, the present invention provides a high throughput drug screening method comprising:

(1) Culturing cells in at least one well of at least one multiwell plate;

(2) Performing different treatments on cells in different wells, for example adding different drug candidates;

(3) Constructing a transcription profiling sequencing library by the high-throughput transcription profiling sequencing library construction method;

(4) High throughput sequencing of the library; and

(5) And identifying candidate drugs according to the high-throughput sequencing result.

Such candidate agents include, but are not limited to, small molecule compounds, antibodies, polypeptides, nucleic acid molecules. In some embodiments, the drug is a small molecule compound.

In some embodiments, the candidate agent is identified based on the expression profile of the gene to be detected in the high throughput sequencing results.

Examples

The following examples are only for better illustration of the present invention and are not intended to limit the scope of the present invention.

Experimental materials and methods

1. Isolation of human Keloid Fibroblasts (KF)

1) Preparing surgical instruments (rotor, curved scissors and curved forceps);

2) Preparing a digestive enzyme solution and a KF cell culture broth according to tables 1 and 2;

3) The skin samples were fully soaked in 70% ethanol for 1 minute;

4) Washing with pre-chilled Wash buffer (PBS+PS) for one time;

5) In an empty 10cm dish, subcutaneous adipose tissue was sheared off and discarded;

6) Cutting (as much as possible) the rest of the tissue;

7) Transferring the sheared tissue into 15ml tube with 10ml precooling wash buffer, and shaking and mixing uniformly;

8) Centrifuging at 1,500rpm for 3min, and carefully removing the supernatant;

9) The pellet was resuspended with 10ml of digestive enzyme solution and transferred completely to a new 50ml tube;

10 3hours or overnight in an incubator or water bath at 37 ℃ until the tissue mass is obviously loose;

11 Diluting the digestive enzyme solution with 10ml of precooled KF culture solution and transferring completely to a new disposable conical flask, in which a sterilized magnetic rotor is placed in advance, supplementing 1ul Y-27632;

12 Stirring the conical flask on a magnetic stirrer for 1-2 hours until no obvious tissue blocks exist;

13 Fully blowing the cell suspension by a 10ml pipette, and filtering the cell suspension through a 70um screen;

14 1,600rpm for 8 minutes, removing the supernatant;

15 Resuspended cells with 1ml KF culture medium and transferred to a new 15ml tube;

16 3 times volume (3 ml) of erythrocyte lysate is added and mixed evenly by gentle vortex;

17 Placing on ice for 15 minutes, and mixing twice by gentle vortex;

18 1,600rpm for 8 minutes, removing the supernatant;

19 Resuspension with 10ml KF culture solution, transferring to 10cm dish, transferring to incubator;

20 Once every 2-3 days, cells can be collected after one week.

Table 1 digestive enzyme solution formulation

Table 2 KF cell culture Medium formulation

2. Flow cell sorting (FACS)

The isolated primary KF cells were resuspended in 0.5% BSA after centrifugation, stained on ice for 20 min with the corresponding antibody, centrifuged with PBS, centrifuged each time, resuspended in 0.5% BSA after centrifugation, and sorted on a FACSaria II instrument. Positive cells were received with KF medium and plated on 10cm for culture.

3. KF cell culture and passage

KF cells were cultured in a 5% CO2 incubator at 37℃with KF medium for 2-3 days. At the time of passage, the cells were washed once with PBS preheated to 37℃in advance, digested with 1ml of 0.25% pancreatin for 1 minute, neutralized with 2ml of KF medium, centrifuged at 1000rpm for 3 minutes, and re-spread on 10cm dishes at a ratio of one dish to two dishes, and placed in an incubator for cultivation.

4. Adipocyte induction

The day before the induction of the adipocytes, the KF cells are digested, the operation is carried out according to passage in the digestion, the operation is carried out according to the proportion of 5 ten thousand of each hole of a 12-hole plate, 3 ten thousand of each hole of a 24-hole plate and 1 ten thousand of each hole of a 48-hole plate in the plating, the next day is differentiated by AD culture medium or culture medium added with corresponding small molecules, and the culture medium is uniformly switched into AM culture medium on the fourth day and the eighth day.

Table 3 AD Medium formulation

Table 4 AM Medium formulation

5. Nile red (nile red) dyeing

Before the experiment starts, 2mg/ml of Nile red in DMSO is taken out from 4 degrees to defrost, the Nile red is diluted to a working concentration of 1ug/ml by PBS buffer solution (1 to 2ml of PBS is taken, hoechst is added to dilute together), and after uniform mixing, the mixture is placed into an ice tank and marked as a dyeing working solution. The following operations are then performed.

1) Turning on a fluorescence microscope;

2) Carefully pumping out the culture solution in the cell culture flask, washing the cell culture flask once by PBS, and adding a proper amount of dyeing working solution;

3) Incubate for 10 min at 37℃in incubator, avoid light.

4) Fluorescent cells were observed under a fluorescent microscope: excitation wavelength 543nm, emission wavelength 598 nm-positive cells rich in lipids showing strong orange fluorescent cells

6. Oil red O (oil red O) staining

1) According to saturated oil red O: distilled water=3:2 dilution, standing at room temperature for 5-10 minutes, filtering with a 0.45um filter membrane after uniform mixing, namely oil red O working solution, and standing for standby;

2) Taking cultured cells, sucking out the culture medium, and washing with PBS once;

3) Fixation with 4% paraformaldehyde for 10 min;

4) Washing with distilled water for 2 times;

5) Soaking and washing with 60% isopropanol;

6) Dyeing the oil red O working solution for 10 minutes (the dyeing solution can be recycled);

7) 60% isopropanol differentiated to clear matrix (observable under microscope);

8) Washing with distilled water for 2-4 times:

9) Counterstaining with hematoxylin for 3-5 minutes;

10 Distilled water for 1-2 times, and microscopic photographing.

7. PHDs-seq library construction

1) Preparing a cell lysate (formula see table 5);

2) Taking out the cultured cells, sucking the culture medium, and washing with PBS preheated to 37 ℃ in advance;

3) 60ul of the prepared lysis solution shown in Table 5 is added into each hole, a sealing plate film is pasted, and the mixture is placed in a refrigerator at the temperature of minus 80 ℃ for overnight;

4) Taking out from the refrigerator at-80deg.C the next day, and shaking on a horizontal shaking table at 900rpm for 15-30min;

5) Preparing a reverse transcription mixed solution (the formula is shown in Table 6);

6) 4.286ul from the cell lysate obtained in the first step was transferred to a new 96-well PCR plate;

7) Adding 0.714ul of reverse transcription mixed solution into each hole, and blowing and uniformly mixing by a liquid transfer device;

8) Reverse transcription is carried out in a PCR instrument at 42 ℃ for 30 minutes, enzyme inactivation is carried out at 85 ℃ for 5 minutes, and a heat cover is used at 105 ℃;

9) Preparing hybridization-ligation mixture (formula see table 7);

10 Adding 6ul of hybridization-connection mixed solution into the inverted product, and blowing and uniformly mixing by a pipettor;

11 Incubation in a PCR instrument at 60 ℃ for 90 minutes;

12 Preparing a template enrichment mixed solution (formula shown in table 8);

13 Adding 15ul template enrichment mixed solution into each hole and uniformly mixing by a vortex instrument;

14 Normal temperature for 10 minutes;

15 Placed on DynaMag-96 Side Magnet for 3 minutes;

16 Carefully pipetting the supernatant with a pipette;

17 Preparing Barcoding PCR mixed solution;

18 19ul of Barcoding PCR mixture is added to each hole;

19 1ul of well ball is added to each hole respectively;

20 Uniformly mixing by a vortex instrument until the magnetic beads are uniformly resuspended;

21 Amplification was performed according to the following procedure;

22 Mixing the amplification products in a 96-well plate into a 50ml tube (multiple plates may be mixed together);

23 Purifying and recovering the mixed sample by using DNAclean & concentrator-100 kit (XYBO, D4029);

24 AMPure XP beads (150 ul:150 ul) were added to the purified product in a 1:1 ratio;

25 Holding the centrifuge tube on a magnetic rack, gently sucking the supernatant with a gun head, and discarding;

26 Resuspension the beads with 25ul of sterile water at ph >6.0, standing at room temperature for 1 min;

27 Placing for 1 minute on a magnetic rack, gently pipetting the supernatant (about 23 ul) into a clean 1.5ml centrifuge tube;

28 Using a Qubit fluorometer to measure the concentration of the sample;

29 Sample feeding and sequencing according to the sequencing requirement.

TABLE 5 lysate components (96 well plates)

Note that: RNaseOut is added before use, and other components can be stored at 4 ℃ after being prepared in advance.

TABLE 6 reverse transcription mixed solution (96 well plate)

Note that: RNaseOut and Maxima are added prior to use.

TABLE 7 hybridization-ligation mixture (96 well plate)

Table 8 template enrichment mixed solution (96 well plate)

8. RNA extraction

After washing the extracted sample with PBS once, RNA was extracted with ER101-01 kit and finally dissolved in 50ml RNase-free Water.

9. Reverse transcription

The extracted RNA was added with a reverse transcription system in the following ratio, gently mixed, incubated at 42℃for 30 minutes, and heated at 85℃for 5 seconds to inactivate the reverse transcriptase and gDNA reverse.

TABLE 9 reverse transcription system

10. Fluorescent quantitative PCR

The cDNA produced by reverse transcription was diluted in a proportion of 200. Mu.l by adding 1. Mu.g of RNA, and the diluted template cDNA was sampled as follows. qPCR reaction procedure was pre-denatured at 95℃for 30s, one cycle, then at 95℃for 10s and 95℃for 30s for 40 reactions, and finally run the dissolution profile.

TABLE 10 qPCR System

Example 1 high throughput transcriptional sequencing technique based on probe hybridization PHDs-seq

In order to realize the screening of small molecular drugs at high throughput and low cost, the inventor develops a set of PHDs-seq (Probe Hybridization based Drug Screening by sequencing) library-building sequencing system which comprises six steps in total (figure 1) by utilizing the same hybridization principle as a TAC-seq probe: cell lysis, transfer and inversion of well plate lysate, hybridization and connection of probes, template enrichment, library amplification and introduction of bar code (barcode), library mixing and purification, wherein the hybridization process is to utilize synthesized gene-specific left and right double probes to carry out hybridization reaction with template cDNA, and the two probes are provided with single molecular tags (unique molecular identifiers, UMI) with four bases, which are respectively positioned at the 5' end of the left probe and the 3' end of the right probe, and simultaneously the 5' end of the right probe is provided with a phosphate group so as to be connected with the left probe in the connection step.

PHDs-seq inherits the advantages of TAC-seq, namely high sensitivity. Furthermore, PHDs-seq has the further advantage: first, the procedure is simpler, and it is no longer necessary to extract RNA from each sample using a kit or Trizol alone, but instead, the cDNA is directly reverse transcribed after cell lysis with a gentle lysate. Meanwhile, the TAC-seq hybridization connection is independently and stepwise carried out, and the PHDs-seq method optimizes hybridization and connection reaction, so that the two steps can be completed in a solution system at the same time. Second, the cost is lower, including all reagent consumables for library construction and sequencing, and one sample total cost is about 8 Yuan Renzhen. Thirdly, the inventor adds a sequence at the P5 end of the TAC-seq library to introduce a well ball so as to upgrade the single-end sequencing of the TAC-seq into a double-end sequencing method, thereby solving the problem that the number of samples is limited by the plate ball type during the screening of the TAC-seq, greatly improving the screening flux and reducing the cost of synthesizing the amplification primer. In terms of library construction time, library construction of 2 96-well plates (i.e., 196 samples) can be completed simultaneously in 8 hours. It is worth mentioning that PHDs-seq library can be sequenced with common bulk RNA-seq mixed lane, and can also be sequenced by mixing a plurality of 96-well plates into one sample, thus greatly increasing the flexibility of screening.

Example 2 reprogramming to an adipocyte System Using human keloid fibroblasts to aid PHDs-seq development

To test the effect of PHDs-seq, a keloid disease model was selected, and it was desirable to reprogram keloid fibroblasts (Keloid fibroblast, KF) into adipocytes to achieve the therapeutic effect (fig. 2A). First, patient-derived keloid fibroblasts were successfully isolated using surface antibody CD90 (fig. 2B) and were able to be cultured in vitro in petri dishes in a morphology typical of that of fibroblasts (fig. 2C). Using the reported induction medium (DMEM+ 1%ITS+0.5mM isobutylmethylxanthine+0.1uM cortisol+1uM dexamethasome+0.2nM triiodothyronine+1uM rosiglitazone, AD medium for short) and maintenance medium (DMEM+1% ITS+ 0.1uM cortisol+0.2nM triiodothyronine) ¹ KF was successfully reprogrammed to adipocytes, which were further confirmed by Nile red staining (FIG. 2D).

By literature review, it was further confirmed that reprogramming KF cells to adipocyte systems is more suitable for assessing PHDs-seq effect, since the fat-related marker genes (marker genes) are significantly up-regulated after reprogramming, while some relatively low expressed transcription factors are up-regulated, which later verifies candidate fractions Can also be embodied in a sub-period. Therefore, some characteristic genes were first collected, representing different directions respectively (fig. 2E): in terms of fibrosis, PRRX1, THY1 is a marker gene of fibroblasts, ACTA2 is a marker gene of activated fibroblasts, FBN1, COL1A1, COL3A1 are collagens highly expressed by fibrotic cells, MMP1 is an important enzyme for decomposing collagens, TIMP1 is a protein inhibiting the activity of MMP 1; in the aspect of fat cells, common marker genes of the fat cells are FABP4, ADIPOQ, EBF2, CEBPA, ZNF423, ZNF516 and ATF2, marker genes LEP of white fat, the marker genes of brown fat comprise PPARG, PPARGC1A, FNDC5, PRDM16 and UCP1, marker genes INSR and SLC2A4 of fat cell functioning, the INSR is a receptor of fat cell responding to insulin, and the SLC2C4 is a glucose transporter responding to insulin; meanwhile, eight reference genes ACTB, CYC, GAPDH, HMBS, PPIA, SDHA, TBP, YWHAZ expressing three gears in high, medium and low are selected ² And ten ERCC probes with high, medium and low contents. Two pairs of probes were designed for PRDM16, UCP1, EBF2, ADIPOQ, COL1A1, ACTB, GAPDH, THY1 in all genes, and used to test the PHDs-seq detection gene for standard or not.

Using these probes, the KF cell-induced adipocyte system was tested, and the induced and non-induced samples were collected and pooled on the fifth and eighth days of fat induction, respectively, and the fifth day of induction group (Posi_D5) had not formed lipid droplets, and it was impossible to morphologically confirm whether the induction was successful or not, and the eighth day of induction group (Posi_D8) had a small amount of lipid droplets to be secreted, and the purpose of selecting two time points was to search for whether PHDs-seq was more advantageous than the conventional morphological screening, i.e., the samples could be distinguished by the expression amounts of a plurality of genes when the morphology was not apparent. PHDs-seq was pooled for these eight samples and their mixed samples using these probes selected above, and the size of the pooled library was approximately 208bp, with Posi_D5_2 samples failing due to pooling errors, without bands (FIG. 2F), the mixed samples were sent for sequencing, and quality control results also confirmed the size match (FIG. 2G). Sequencing results show that the produced Reads have high base quality at different positions and completely meet the analysis requirements (FIG. 2H).

The mixed samples were then aligned and resolved and subjected to quantitative analysis of genes, which showed that FABP4 and ADIPOQ were highly expressed in the positive group (Posi) compared to the control group (KF), and that PPARG, PPARGC1A, CEBPA also had similar expression tendencies, whereas samples D5 and D8 were not significantly different (fig. 3A). Clustering analysis of these samples with these characteristic genes revealed that the fat-induced group and the non-induced group were clearly distinguishable (fig. 3B), consistent with the heat map clusters (fig. 3A). Correlation analysis also showed that the correlation between treatment groups was very high, but low (FIG. 3C) compared to the control group, and also differed well. These results also demonstrate that PHDs-seq can detect the switch from KF cells to fat fate early, which is superior to traditional morphological screening.

To further evaluate the accuracy of PHDs-seq, a sample was left to be sampled, RNA was extracted and the quantitative amount of gene expression was reversed by the conventional method, and qPCR and PHDs-seq results were used to compare the expression of the detected gene relative to the reference gene GAPDH in each sample, respectively, with the results that the two methods had relatively high consistency, except that some of the low expressed genes were fluctuating significantly, such as FABP4 and PPARG were significantly different in samples at two time points of KF, and FBN1 and PRRX1 were significantly different in samples at two time points of Posi (fig. 4A, fig. 4B, fig. 4D, fig. 4E). Further, fold analysis of gene differences between treated and untreated groups for the two time points revealed that both qPCR and PHDs-seq methods also exhibited higher agreement (fig. 4C, fig. 4F). When designing probes, we designed two probes for part of the gene to verify the accuracy of the PHDs-seq method, and found that the two probes showed little difference after analyzing the expression level of the part of the gene PHDs-seq (fig. 4G). From the results of FIG. 3, it was initially possible to draw the conclusion that the reproducibility of two parallel samples was high, and that the linear fitting was performed on several of the parallel samples, further confirming the high reproducibility of the parallel wells (FIG. 4H).

By combining the results, the PHDs-seq is considered to meet the requirements of high-throughput screening in terms of accuracy, sensitivity and parallelism, and can be used for actual screening.

Example 3 screening of small molecules Using human keloid fibroblasts reprogrammed to adipocyte System

In previous reports on keloids, studies have shown that BMP4 increases the induction rate of KF into adipocytes, the inventors tried to repeat these results, cells were switched to maintenance medium four days after induction medium, and samples were taken and identified on the tenth two days, and from the results, BMP4 did indeed work well, increasing the number of cells positive for oil red O and increasing the expression level of FABP4 and ADIPOQ (fig. 5A and B), positive ratios were also increased from-5% to-20% (fig. 5C), while we found that BMP4 works with a dose effect (fig. 5D), BMP4 effects were significantly impaired when the BMP4 signaling pathway was blocked with dormorphin and DMH1 small molecules, again demonstrating that BMP4 was very significant on fat induction.

The inventors then wanted to know that the presence of small molecules independent of BMP4 would have a similar or even better effect on fat induction? With this problem, the inventors gathered about 130 small molecules or protein factors that are commonly used in cell reprogramming systems, which cover the vast majority of signal pathways, and screened on KF cells with the PHDs-seq screening system currently developed (fig. 6A). Considering that adipocytes are classified into white fat and brown fat cells, and that white fat cells can be converted into brown fat cells under a specific treatment, it is also conceivable to further distinguish which type of cells a small molecule having an effect has an effect on. According to the prior art, the components triiodothyronine, rosiglitazone and cotisol of small molecules which promote white and brown adipocytes in AD culture medium are removed, only three small molecules which can promote white and brown adipocytes, namely 1% ITS, 0.5mM isobutylmethylxanthine and 1uM dexamethame, are reserved, the three small molecules form an MDI culture medium, the MDI is taken as a basic culture medium, the collected small molecules are screened, and after four days of induction, the small molecules are changed into DMEM+1% ITS culture medium, and the samples are collected on the eighth day to be subjected to library-building and sequencing. From the sequencing results, most small molecules have no promotion effect on the expression of characteristic genes of adipocytes, but the PPARG activator Rosilitazone and the component FSK in AD can promote the expression of FABP4 and ADIPOQ, and can be clustered together well, and meanwhile, the DNA methyltransferase inhibitors Decitabine and the TGF beta inhibitors SD_208, repsox and SB431542 can also slightly improve the expression level of the two genes (FIG. 6B). In terms of cell fibrosis, only the down-regulating effect of lithiolic Acid on COL1A1 and COL3A1 was evident, and the remaining small molecules were not active (fig. 6B). By PCA analysis, small molecule treated samples such as Rosiglitazone, forskolin, decitabine, SD _208, repox and SB431542 were found to be well separated from the rest and closer to AD samples (fig. 6C).

In summary, screening for small molecule inhibitors such as Rosiglitazone, FSK and some TGFBR1 by PHDs-seq screening techniques can promote KF to adipocyte fate conversion, demonstrating the reliability of PHDs-seq.

Example 4 comparison of PHDs-seq with TAC-seq

First, the effect of the different hybridization/ligation steps on the results was tested by the three protocols shown in FIG. 7A.

The results are shown in FIG. 7B. It was found that the target band amplified using the TACseq protocol (protocol two, lane 5) was weaker, whereas in the optimized conditions (Lane 8, protocol three where hybridization ligation was performed in one step) the target band was amplified more efficiently. Lane2 is based on scheme II (Lane 5) where the addition of a certain salt solution (1500 mM KCl, 300mM Tris-HCl pH 8.5, 1mM EDTA) during hybridization helps in more efficient hybridization, but experimental results show that the addition of salt solution is not effective in amplifying the target band.

Furthermore, based on scheme three of one-step hybridization and ligation, the effect of different probe concentrations on the results was tested. The experiment is shown in fig. 8A. The results are shown in FIG. 8B. The target band (Lane 5, probe concentration 0.83. Mu.M) at the original concentration can be amplified equally efficiently using only 1/1000 of the probe concentration (Lane 9, probe concentration 0.83/1000. Mu.M) in the TAC-seq original protocol. At the same time, it was found that at 1/1000 probe concentration, the number of bands was significantly weaker or substantially undetectable.

Reference to the literature

1.Plikus,M.V.et al.Regeneration of fat cells from myofibroblasts during wound healing.Science 355,748-752,doi:10.1126/science.aai8792(2017).

2.Teder,H.et al.TAC-seq:targeted DNAand RNAsequencing for precise biomarker molecule counting.npj Genomic Medicine 3,34,doi:10.1038/s41525-018-0072-5(2018).

Claims

1. A method of constructing a high throughput transcriptional profiling library based on probe hybridization, the method comprising:

(a) Providing at least one biological sample comprising cells in at least one multiwell plate, each of the at least one biological sample comprising cells being located in a separate well;

(b) Lysing cells in the biological sample in wells of the multiwell plate;

(f) Enriching the ligation product;

(g) Adding to each well a Barcoding (Barcoding) PCR cocktail comprising a DNA polymerase and a Barcode (Barcode) primer pair comprising a first primer for the left probe and a second primer for the right probe, one of the first and second primers comprising a well Barcode sequence unique to each well and the other comprising a Barcode sequence unique to each multiwell plate;

2. The method of claim 1, wherein the multiwell plate is a 96-well plate or 384-well plate, preferably a 96-well plate.

3. The method of claim 1 or 2, wherein the at least one biological sample comprising cells is 1-200 or more, such as at least 2, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200 or more biological samples comprising cells.

4. The method of any one of claims 1-3, wherein the at least one biological sample comprising cells is a biological sample each comprising a different cell type.

5. A method according to any one of claims 1-3, wherein the at least one biological sample comprising cells comprises biological samples of the same cell type, but each biological sample has been subjected to a different treatment, e.g. to a different compound.

6. The method of claim 5, wherein the treatment is capable of causing a specific phenotype of the cell.

7. The method of any one of claims 1-6, wherein the cell is a somatic cell, germ cell, or stem cell (e.g., an embryonic stem cell or an induced pluripotent stem cell).

8. The method of any one of claims 1-6, wherein the cell is selected from the group consisting of neuronal cells, skeletal muscle cells, liver cells, fibroblasts, osteoblasts, chondrocytes, adipocytes, endothelial cells, interstitial cells, smooth muscle cells, cardiomyocytes, nerve cells, hematopoietic cells, islet cells, or tumor cells.

9. The method of any one of claims 1-8, wherein the cells are derived from a mammal or a non-mammal, e.g., the cells are derived from a human, a mouse, a rat or a non-human primate.

10. The method of any one of claims 1-9, wherein the cells are lysed in step (b) using a cell lysate based on a non-ionic surfactant, preferably the non-ionic surfactant is Triton X-100.

11. The method of claim 10, wherein the cell lysate consists of Tris-HCl, KCl, a polysucrose such as Ficoll PM-400, triton X-100, a ribonuclease inhibitor and water.

12. The method of claim 10, wherein the final concentrations of components of the cell lysate are: about 5mM to about 500mM Tris-HCl, about 7.5mM to about 750mM KCl, about 0.6% to about 60% polysucroses such as Ficoll PM-400, about 0.015% to about 1.5% Triton X-100, about 0.05U/. Mu.L to about 5U/. Mu.L ribonuclease inhibitor.

13. The method of any one of claims 1-12, wherein the reverse transcription reaction in step (c) is performed at about 40-45 ℃, e.g. 42 ℃; and/or, the reverse transcription reaction is performed for about 15-60 minutes, for example, about 30 minutes.

14. The method of any one of claims 1-13, wherein the DNA ligase in step (d) is selected from T4 DNA ligase or Taq DNA ligase, preferably Taq DNA ligase.

15. The method of any one of claims 1-14, wherein the hybridization-ligation mixture comprises 1-200 or more, e.g., at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200 or more probe pairs.

16. The method of claim 15, wherein the probe pairs are used to detect 1-200 or more, such as at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 150, at least 200 or more genes to be detected.

17. The method of any one of claims 1-16, wherein the gene to be detected is associated with at least one phenotype of the cell.

18. The method of claim 17, wherein the expression profile of part or all of the at least one gene to be detected is used as a marker for the phenotype.

19. The method of claim 17 or 18, wherein the phenotype is selected from the group consisting of inhibition or increase of cell proliferation, change in cell type.

20. The method of any one of claims 1-19, wherein the target region is a region characteristic of the gene to be detected.

21. The method of any one of claims 1-20, wherein the target region is from about 20 nucleotides (nt) to about 300nt or more in length, such as about 20nt, about 30nt, about 40nt, about 50nt, about 60nt, about 70nt, about 80nt, about 90nt, about 100nt, about 120nt, about 140nt, about 160nt, about 180nt, about 200nt, about 250nt, about 300nt or more.

22. The method of any one of claims 1-21, wherein the left probe comprises a 5' primer binding sequence, a single molecule tag (UMI), and a first target region binding sequence in a 5' to 3' direction.

23. The method of any one of claims 1-22, wherein the right probe comprises in the 5' to 3' direction a second target region binding sequence, a single molecule tag (UMI) and a 3' primer binding sequence, and the right probe comprises a phosphate group at the 5' end, whereby it can be ligated to the 3' end of the left probe.

24. The method of claim 22 or 23, wherein the first target region binding sequence and the second target region binding sequence, after ligation, perfectly match the target region of the gene to be detected.

25. The method of any one of claims 22-24, wherein the length of the first or second target region binding sequence is about 10nt to about 150nt or more, such as about 10nt, about 15nt, about 20nt, about 25nt, about 30nt, about 35nt, about 40nt, about 45nt, about 50nt, about 60nt, about 70nt, about 80nt, about 90nt, about 100nt, about 125nt, about 1500nt or more, provided that the probe specifically hybridizes to the target region.

26. The method of any one of claims 22-25, wherein the single molecule tag (UMI) is about 3nt-8nt, such as 4nt, in length.

27. The method of any one of claims 22-26, wherein the 5 'primer binding sequence and/or the 3' primer binding sequence is a universal primer binding sequence.

28. The method of any one of claims 1-27, wherein the probes in step (d) are each at a concentration of about 0.0001 μm to about 1 μm, e.g., about 0.0001 μm to about 0.001 μm, about 0.0001 μm to about 0.01 μm, about 0.0001 μm to about 0.1 μm; preferably, the probes each have a concentration of no more than about 0.1. Mu.M, preferably no more than about 0.01. Mu.M, and more preferably no more than about 0.001. Mu.M.

29. The method of any one of claims 1-28, wherein the steps of "hybridizing the at least one set of probe pairs to the target region of the at least one gene to be detected" and "interconnecting the left and right probes of the probe pairs hybridized to the target region" in step (e) are performed simultaneously in the same solution system.

30. The method of any one of claims 1-29, wherein step (e) comprises incubating the at least one multiwell plate at about 50 to about 70 ℃, e.g., about 60 ℃, and/or incubating the at least one multiwell plate for about 30 to 120 minutes or more, e.g., at least 30 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes or more.

31. The method of any one of claims 1-30, wherein the ligation product is enriched in step (f) by magnetic beads, e.g. Dynabeads MyOne Carboxylic Acid beads.

32. The method of any one of claims 22-31, wherein a first primer of the pair of barcode primers comprises a primer region sequence corresponding to a 5 'primer binding sequence of the left probe and a second primer of the pair of barcode primers comprises a primer region sequence corresponding to a 3' primer binding sequence of the right probe.

33. The method of any one of claims 1-32, wherein the Kong Tiaoma sequence or the slat sequence is about 4nt-10nt in length, such as 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, or 10nt.

34. The method of any one of claims 1-33, wherein the first primer and/or the second primer further comprises a linker sequence for high throughput sequencing, such as a P5 linker sequence or a P7 linker sequence.

35. The method of any one of claims 1-34, wherein the amplification products in all wells of all multiwell plates are harvested and mixed in step (i).

36. A high throughput drug screening method, the method comprising:

(1) Culturing cells in at least one well of at least one multiwell plate;

(2) Subjecting the cells in different wells to different treatments, for example, adding different drug candidates;

(3) Constructing a high throughput transcriptional profiling library by the method of any one of claims 1-35;

(4) High throughput sequencing of the library; and

37. The method of claim 36, wherein the candidate drug is selected from the group consisting of a small molecule compound, an antibody, a polypeptide, a nucleic acid molecule, preferably a small molecule compound.