WO2023284735A1

WO2023284735A1 - Methods of identifying drug sensitive genes and drug resistant genes in cancer cells

Info

Publication number: WO2023284735A1
Application number: PCT/CN2022/105193
Authority: WO
Inventors: Pengfei YUAN; Ming Jin; Yongjian Zhang; Hongyan Shen; Ling Yang; Na LIU; Meihua SU; Yaru Zheng; Yulan Li
Original assignee: Edigene Therapeutics (Beijing) Inc.
Priority date: 2021-07-12
Filing date: 2022-07-12
Publication date: 2023-01-19
Also published as: TW202317523A; WO2023284736A1; TW202309299A

Abstract

Provided are methods of identifying target genes in cancer cells whose mutations make the cancer cells sensitive or resistant to anti-cancer drugs. Also provided are methods of treating cancer and selecting patients based on aberrations (e.g., mutations) in target genes identified herein. Modified cancer cells that are sensitive or resistant to anti-cancer drugs, and methods and kits for generating thereof are also provided.

Description

METHODS OF IDENTIFYING DRUG SENSITIVE GENES AND DRUG RESISTANT GENES IN CANCER CELLS

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority benefits of International Patent Applications No. PCT/CN2021/105822 filed July 12, 2021, and International Patent Applications No. PCT/CN2021/105816 filed July 12, 2021, the contents of each of which are incorporated herein by reference in their entirety.

FIELD OF THE PRESENT APPLICATION

The present application relates to methods of identifying target genes in cancer cells whose mutations make the cancer cells sensitive or resistant to anti-cancer drugs. Also provided are methods of treating cancer and selecting patients based on aberrations (e.g., mutations) in target genes identified herein. Modified cancer cells that are sensitive or resistant to anti-cancer drugs, and methods and kits for generating thereof are also provided.

BACKGROUND OF THE PRESENT APPLICATION

Cancer cells can acquire resistance to targeted therapeutic agents when mutations happen. Resistance to anti-cancer drugs has become the major hurdle to successful cancer treatments. Using colorectal cancer as an example, it is the third most common cancer in the world, the second leading cause of cancer-related deaths, and the leading cause of death from gastrointestinal cancer. Traditional pathological staging divides colorectal cancer into stage 0, stage I, stage II, stage III, and stage IV based on the depth of tumor infiltration into the bowel wall, metastasis to lymph node or distant metastasis. At present, early-stage colorectal cancer is usually treated with surgery or radiotherapy. In addition to surgery and radiotherapy, patients with intermediate and advanced stages are usually treated with chemotherapy and targeted drug therapies (e.g., PARP inhibitor) for systemic treatment. With current treatments, the five-year survival rate for early-stage colorectal cancer exceeds 90%; however, the five-year survival rate for advanced metastatic colorectal cancer is only 14% (H. Sung et al., CA Cancer J Clin. May 2021; 71 (3) : 209-249; R. Dienstmann et al., Nat Rev Cancer. 2017; 17 (2) : 79-92; E.J. Kuipers et al., Nat Rev Dis Primers. 2015; 1: 15065; C. Joachim et al., Medicine (Baltimore) . 2019; 98 (35) : e16941) .

After the onset of cancer, specific gene mutations may occur. Mutated genes are usually related to specific pathogenesis and/or therapeutic pathways. Among these mutated genes, some may be drug sensitive genes (i.e., after mutation, cancer cells are more sensitive to therapeutic effects of anti-cancer drugs) , and some may be drug resistant genes (i.e., after mutation, cancer cells are more resistant to therapeutic effects of anti-cancer drugs) . Identification of drug sensitive genes and drug resistant genes involved in various therapeutic pathways will be of great significance for patient selection and treatment design with drugs targeting corresponding therapeutic pathway, in order to achieve better therapeutic efficacy.

The clustered regularly interspaced short palindromic repeats (CRISPR) /Cas9 (CRISPR-associated protein 9) (CRISPR/Cas9) system enables editing at targeted genomic sites with high efficiency and specificity. One of its extensive applications is to identify functions of coding genes, non-coding RNAs and regulatory elements through high-throughput pooled screening in combination with next generation sequencing ( “NGS” ) analysis. By introducing a pooled single-guide RNA ( “sgRNA” ) or paired-guide RNA ( “pgRNA” ) library into cells expressing Cas9 or catalytically inactive Cas9 (dCas9) fused with effector domains, investigators can perform multifarious genetic screens by generating diverse mutations, large genomic deletions, transcriptional activation or transcriptional repression.

To generate a high-quality cell library of gRNAs for any given pooled CRISPR screen, one must use a low multiplicity of infection ( “MOI” ) during cell library construction to ensure that each cell on average harbors less than one sgRNA or pgRNA to minimize the false discovery rate (FDR) of the screen. To further reduce the FDR and increase data reproducibility, in-depth coverage of gRNAs and multiple biological replicates are often necessary to obtain hit genes with high statistical significance, resulting in increased workload. Additional difficulties may arise when one performs a large number of genome-wide screens, when cell materials for library construction are limited, or when one conducts more challenging screens (i.e., in vivo screens) for which it is difficult to obtain experimental replicates or control the MOI. The “internal barcodes ( “iBAR” ) methods previously developed by the Applicant (see WO2020125762, the content of which is incorporated herein by reference in its entirety) provide a reliable and highly efficient screening strategy for large-scale target identification in eukaryotic cells, with much lower false-positive and false-negative rates, and allow cell library generation using a high MOI. For example, compared to a conventional CRISPR/Cas screen with a low MOI of 0.3, the iBAR methods can reduce the starting cell numbers for more than 20-fold (e.g., at an MOI of 3) to more than 70-fold (e.g., at an MOI of 10) , while maintaining high efficiency and accuracy. The iBAR system is particularly useful for cell-based screens in which the cells are available in limited quantities, or for in vivo screens in which viral infection to specific cells or tissues is difficult to control at low MOI.

The disclosures of all publications, patents, patent applications and published patent applications referred to herein are hereby incorporated herein by reference in their entirety.

BRIEF SUMMARY OF THE PRESENT APPLICATION

The present invention in one aspect provides a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a mutation at a hit gene ( “hit gene mutation” ) , wherein the hit gene in at least two of the plurality of cancer cells are different from each other; wherein the cancer cell library is generated by contacting an initial population of cancer cells with i) a single-guide RNA ( “sgRNA” ) library comprising a plurality of sgRNA constructs, wherein each sgRNA construct comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene; and ii) a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., Cas9) , under a condition that allows introduction of the sgRNA constructs and the Cas component into the initial population of cancer cells and generation of the mutations at the hit genes; b) contacting the cancer cell library with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of sgRNAs or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug.

In some embodiments according to any one of the methods described above, the identification of the target gene is based on the difference between the profiles of sgRNAs in the post-treatment cancer cell population and the control cancer cell population. In some embodiments, the profiles of sgRNAs in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing. In some embodiments, the method comprises comparing the sgRNA sequence counts obtained from the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) with sgRNA sequence counts obtained from the control cancer cell population, wherein: i) the hit genes whose corresponding sgRNA guide sequences are identified as enriched in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA guide sequences are identified as depleted in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments according to any one of the methods described above, sgRNA library and the Cas component are introduced into the initial population of cancer cells sequentially. In some embodiments, the Cas component is introduced into the initial population of cancer cells before the introduction of the sgRNA library.

In some embodiments according to any one of the methods described above, the Cas protein is Cas9. In some embodiments, each sgRNA comprises the guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with the Cas9. In some embodiments, the second sequence of each sgRNA further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3.

In some embodiments according to any one of the methods described above, each sgRNA further comprises an internal barcode (iBAR) sequence ( “sgRNA ^iBAR” ) , wherein each sgRNA ^iBAR is operable with the Cas protein (e.g., Cas9) to modify the hit gene (e.g., cleave the hit gene, or modulate hit gene expression) . In some embodiments, each sgRNA ^iBAR comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a double-stranded RNA (dsRNA) region that interacts with the Cas protein, and the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence. In some embodiments, the Cas protein is Cas9, and the iBAR sequence of each sgRNA ^iBAR is inserted in the loop region of the repeat-anti-repeat stem loop. In some embodiments, each iBAR sequence comprises about 1 to about 50 nucleotides (e.g., about 6 nucleotides) . In some embodiments, the sgRNA library is an sgRNA ^iBAR library, wherein the sgRNA ^iBAR library comprises a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise four sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein the guide sequences for the four sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site in the hit gene (e.g., different target sites in the same hit gene, or different hit genes) . In some embodiments, the sgRNA ^iBAR library comprises at least about 100 (e.g., at least about any of 1,000, 10,000, 50,000, or more) sets of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs) . In some embodiments, the iBAR sequences for at least two sets of sgRNA ^iBAR constructs are the same. In some embodiments, the cancer cell library (e.g., Cas9 ⁺ sgRNA ^iBAR cancer cell library) has averagely at least about 100-fold (e.g., at least about any of 200-, 400-, 500-, 1,000-, 5,000-, or more fold) coverage for each sgRNA ^iBAR, such as averagely about 100-fold to about 1000-fold, or averagely about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library (e.g., Cas9 ⁺ sgRNA ^iBAR cancer cell library) has averagely at least about 400-fold (e.g., at least about any of 800-, 1000-, 2000-, 4000-, 16,000-, or more fold) coverage for each set of sgRNAs ^iBAR, such as averagely about 400-fold to about 4000-fold, or averagely about 4000-fold coverage for each set of sgRNAs ^iBAR. In some embodiments, the cancer cell library (e.g., Cas9 ⁺ sgRNA ^iBAR cancer cell library) has averagely at least about 400-fold (e.g., at least about any of 800-, 1000-, 1200-, 2000-, 3000-, 4000-, 10,000-, 12,000-, 16,000-, or more fold) coverage for each hit gene, such as averagely about 1200-fold to about 12,000-fold coverage for each hit gene, or averagely about 12,000-fold coverage for each hit gene.

In some embodiments according to any one of the methods described above, at least about 95% (e.g., at least about any of 96%, 97%, 98%, 99%, or 100%) of the sgRNA constructs (or sgRNA ^iBAR constructs) in the sgRNA library (or sgRNA ^iBAR library) are introduced into the initial population of cancer cells.

In some embodiments according to any one of the methods described above, the cancer cell library (e.g., Cas9 ⁺ sgRNA cancer cell library, or Cas9 ⁺ sgRNA ^iBAR cancer cell library) has at least about 400-fold (e.g., at least about any of 600-, 800-, 1,000-, 2,000-, 8,000-, 12,000-, or more fold) coverage for each sgRNA (or sgRNAs ^iBAR) .

In some embodiments according to any one of the methods described above, the sgRNA (or gRNAs ^iBAR) library comprises at least about 400 (e.g., at least about any of 400, 600, 1000, 5000, 10,000, 50,000, 100,000, or more) sgRNA (or gRNAs ^iBAR) constructs, such as about 6000 to about 18,000 sgRNA (or gRNAs ^iBAR) constructs.

In some embodiments according to any one of the methods described above, each sgRNA (or sgRNAs ^iBAR) construct in the sgRNA (or sgRNAs ^iBAR) library is an RNA. In some embodiments, each sgRNA (or sgRNAs ^iBAR) construct in the sgRNA (or sgRNAs ^iBAR) library is a plasmid. In some embodiments, each sgRNA (or sgRNAs ^iBAR) construct in the sgRNA (or sgRNAs ^iBAR) library is a viral vector, such as a lentiviral vector. In some embodiments, each sgRNA (or sgRNAs ^iBAR) construct in the sgRNA (or sgRNAs ^iBAR) library is a virus, such as a lentivirus. In some embodiments, the sgRNA (or sgRNAs ^iBAR) library is contacted with the initial population of cancer cells at a multiplicity of infection (MOI) of at least about 2, such as 3.

In some embodiments according to any one of the methods described above, each guide sequence comprises about 17 to about 23 nucleotides.

In some embodiments according to any one of the methods described above, step b) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time. In some embodiments, step b) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time.

In some embodiments according to any one of the methods described above, the sgRNA (or sgRNAs ^iBAR) sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the sgRNA library is an sgRNA ^iBAR library, and the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged) .

In some embodiments according to any one of the methods described above, the method comprises: subjecting the cancer cell library from step a) to at least two separate different treatments with the anti-cancer drug in step b) ; growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., alive, resistant to anti-cancer drug) ; identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment; and combining the one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, i) the hit genes whose corresponding sgRNA (or sgRNAs ^iBAR) guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA (or sgRNAs ^iBAR) guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments according to any one of the methods described above, the method comprises: subjecting the cancer cell library from step a) to two separate treatments b1) and b2) : b1) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time; b2) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time; c1) growing the cancer cell library from treatment b1) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; c2) growing the cancer cell library from treatment b2) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b1) , d2) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b2) , and d3) combining the one or more hit genes identified from treatment b1) and treatment b2) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, i) the hit genes whose corresponding sgRNA (or sgRNAs ^iBAR) guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA (or sgRNAs ^iBAR) guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments according to any one of the methods described above, the method comprises: i) separately identifying a set of one or more target genes whose mutations make the cancer cells sensitive to an anti-cancer drug, for two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs when treated alone; ii) obtaining one or more target genes present in every set of target genes identified for each anti-cancer drug, thereby identifying target genes whose mutations make the cancer cells sensitive to a combination treatment of the two or more different anti-cancer drugs; and/or i) separately identifying a set of one or more target genes whose mutations make the cancer cells resistant to an anti-cancer drug, for two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs when treated alone; ii) obtaining one or more target genes present in a combination of sets of target genes identified for all anti-cancer drugs, thereby identifying target genes whose mutations make the cancer cells resistant to a combination treatment of the two or more different anti-cancer drugs. In some embodiments, the two or more different anti-cancer drugs target the same cancer target. In some embodiments, the two or more different anti-cancer drugs target different cancer targets.

In some embodiments according to any one of the methods described above, the method further comprises ranking the identified target genes, wherein the target gene ranking is based on the degree of enrichment or depletion (e.g., fold of enrichment, fold of depletion, enrichment FDR, or depletion FDR) of the sgRNA (or sgRNAs ^iBAR) guide sequences in the post-treatment cancer cell population compared to the control cancer cell population. In some embodiments, the sgRNA library is an sgRNA ^iBAR library, and the target gene ranking is further adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence of the target gene. In some embodiments, the method further comprises assigning a sensitivity score or a resistance score to the identified target gene, wherein target genes whose mutations make the cancer cells resistant to the anti-cancer drug are ranked from high to low based on the fold of enrichment (or based on enrichment FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA (or sgRNAs ^iBAR) guide sequences in the post-treatment cancer cell population compared to the control cancer cell population, and each target gene is assigned a resistance score from high to low accordingly; and/or wherein target genes whose mutations make the cancer cells sensitive to the anti-cancer drug are ranked from high to low based on the fold of depletion (or based on depletion FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA (or sgRNAs ^iBAR) guide sequences in the post-treatment cancer cell population compared to the control cancer cell population, and each target gene is assigned a sensitivity score from high to low accordingly.

In some embodiments according to any one of the methods described above, the anti-cancer drug is a PARP inhibitor.

In some embodiments according to any one of the methods described above, the cancer cell is a colorectal cancer cell.

In some embodiments according to any one of the methods described above, the method further comprises culturing a same cancer cell library under the same condition without contacting with the anti-cancer drug, and optionally subjected to the same obtaining method in step c) to obtain the control cancer cell population.

In some embodiments according to any one of the methods described above, the method further comprises validating the target gene by: a) modifying a cancer cell by creating a mutation (e.g., inactivating mutation) in the target gene in the cancer cell; and b) determining the sensitivity or resistance of the modified cancer cell to the anti-cancer drug.

The present invention in another aspect provides a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive to a combination therapy comprising a first anti-cancer drug and a second anti-cancer drug, comprising: i) identifying a first set of one or more target genes in a cancer cell whose mutation make the cancer cell sensitive to the first anti-cancer drug according to any one of the methods described above; ii) identifying a second set of one or more target genes in a cancer cell whose mutation make the cancer cell sensitive to the second anti-cancer drug according to any one of the methods described above; and iii) obtaining one or more target genes present in both the first set of target genes and the second set of target genes, thereby identifying the target gene whose mutation makes the cancer cell sensitive to the combination therapy.

The present invention in another aspect provides a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected for treatment based on that the individual has an aberration (e.g., carries a mutation) in a target gene ( “a drug sensitive gene” ) which makes the cancer cells sensitive to the anti-cancer drug ( “drug sensitive aberration” ) , and wherein the drug sensitive gene is identified according to any one of the target gene identification methods described above.

The present invention in another aspect provides a method of excluding an individual (e.g., human) suffering from a cancer from a treatment comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is excluded if the individual has an aberration (e.g., carries a mutation) in a target gene ( “a drug resistant gene” ) which makes the cancer cells resistant to the anti-cancer drug ( “drug resistant aberration” ) , and wherein the drug resistant gene is identified according to any one of the target gene identification methods described above.

The present invention in another aspect provides a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on i) aberrations (e.g., mutations) in one or more target genes ( “drug sensitive genes” ) which make the cancer cells sensitive to the anti-cancer drug ( “drug sensitive aberrations” such as “drug sensitive mutations” ) , and ii) aberrations (e.g., mutations) in one or more target genes ( “drug resistant genes” ) which make the cancer cells resistant to the anti-cancer drug ( “drug resistant aberrations” such as “drug resistant mutations” ) , wherein the drug sensitive genes and drug resistant genes are identified using the method according to any one of the target gene identification methods described above, and wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) is above a composite score threshold level. In some embodiments, the composite score is obtained by (i) subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , or (ii) Formula I described herein, wherein the individual is selected for treatment if the composite score is above zero.

In another aspect, there is also provided a method of generating a modified cancer cell, comprising inactivating a target gene identified by any of the target gene identification methods described above.

Also provided are modified colorectal cancer cells comprising a mutation (e.g., inactivating mutation) in a target gene, wherein the target gene is i) selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1; or ii) selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

Further provided are sgRNA (or sgRNA ^iBAR) libraries comprising one or more sgRNA (or sgRNA ^iBAR) constructs, wherein each sgRNA (or sgRNA ^iBAR) construct comprises or encodes an sgRNA (or sgRNA ^iBAR) , and wherein each sgRNA (or sgRNA ^iBAR) comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a target gene i) selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1; or ii) selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

Kits and articles of manufacture that are useful for the methods described herein are also provided, such as kits for generating a modified cancer cell sensitive or resistant to an anti-cancer drug.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows exemplary procedure for screening drug sensitive and/or drug resistant genes to anti-cancer drug.

FIG. 2 shows exemplary screening workflow for Cas9 ⁺ sgRNA ^iBAR cancer cell library.

FIG. 3 shows an exemplary target gene identification workflow for Cas9 ⁺ sgRNA ^iBAR cancer cell library.

FIG. 4 shows PARPi response curves of cancer cells with drug sensitive genes or drug resistant genes knock-out (KO) . Cancer cells without such KOs treated with PARPi served as control (WT) .

FIG. 5A shows loss-of-function (LOF) mutation probability of drug sensitive genes and drug resistant genes of PARPi (y-axis, 51 genes total) in 16 cancer samples. FIG. 5B shows composite score calculated using Formula I based on 51 genes for each cancer sample. FIG. 5C shows composite score, response to PARPi treatment, and therapeutic efficacy prediction based on composite score for each cancer sample.

DETAILED DESCRIPTION OF THE PRESENT APPLICATION

Cancer cells can acquire resistance to targeted therapeutic agents when mutations happen. Resistance to anti-cancer drugs has become the major hurdle to successful cancer treatments. Certain gene mutations may make cancer cells more prone to be killed by anti-cancer agents. Treating cancers carrying such drug sensitive mutations with anti-cancer agents may lead to higher treatment success. For patients with drug resistant mutations, alternative treatment plans can be pursued. Hence identification of “drug sensitive genes” (i.e., after mutation, cancer cells are more sensitive to therapeutic effects of anti-cancer drugs) and “drug resistant genes” (i.e., after mutation, cancer cells are more resistant to therapeutic effects of anti-cancer drugs) involved in various therapeutic pathways will be of great significance for patient selection and treatment design (e.g., choosing drugs targeting selected therapeutic pathway (s) , or combination therapy) , in order to achieve better therapeutic efficacy. Engineered cancer cells carrying these drug sensitive mutations or drug resistant mutations can also be used for new drug design and screening, such as de novo design, or by modifying chemical groups of an existing compound resisted by certain drug resistant mutations.

For example, poly (ADP ribose) polymerase (PARP) inhibitors (PARPi) are a type of cancer drug targeting specific therapeutic pathways. Once PARP detects a single-strand break (SSB) , PAPR binds DNA and catalyzes the synthesis of polymeric adenosine diphosphate ribose (poly (ADP-ribose) or PAR) chain on protein substrates. Through this catalysis, PARP can recruit other DNA damage repair (DDR) proteins to the damage site to repair DNA damage together. PARPi binds to the PARP catalytic site, which blocks polyADP-ribosylation (PARylation) and the recruitment of other DDR proteins; more importantly, PARP becomes trapped on the damaged DNA and cannot fall off. The trapped PRAP on DNA damage site causes the stall of the DNA replication fork, DNA replication cannot proceed, leading to double strand break in the DNA. When this happens, cells usually trigger homologous recombination repair (HRR) . BRCA plays an important role in HRR. In tumors with HRR deficiency, such as those with BRCA mutations, HRR of double-strand DNA (dsDNA) breaks is impaired, and tumor cells are directed to use other DNA repair methods, such as error-prone non-homologous end joining (NHEJ) , which usually introduces large-scale genomic recombination, leading to genetic instability and cell death. Therefore, the combination of PARPi and BRCA function loss may greatly inhibit tumor cell DDR and promote tumor cell apoptosis.

In order to provide more effective patient selection and treatment design, and to achieve better therapeutic effects for cancer, especially hard-to-treat cancer types and/or stages (e.g., advanced metastatic colorectal cancer) , the present invention uses high-throughput screening to identify mutations that cause drug sensitivity and/or drug resistance phenotypes to certain anti-cancer drugs, obtain the relationship between gene functions and drug responses, and explore the use of these drug sensitive and drug resistant genes as biomarkers for patient selection and therapeutic design. This will greatly facilitate the accurate selection of patient population, and improve the efficacy of anti-cancer drugs (e.g., PARPi) in the treatment of cancer (e.g., colorectal cancer) . Engineered cancer cells carrying mutations in these drug sensitive or drug resistant genes will also serve as promising tools in new drug design and screening.

The present application provides methods of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug. A target gene whose mutation makes a cancer cell sensitive to an anti-cancer drug is hereinafter referred to as “drug sensitive gene” , and the mutation therein is hereinafter referred to as “drug sensitive mutation” . A target gene whose mutation makes a cancer cell resistant to an anti-cancer drug is hereinafter referred to as “drug resistant gene” , and the mutation therein is hereinafter referred to as “drug resistant mutation” . The method comprises: a) providing a cancer cell library comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a mutation (e.g., inactivating mutation) at a hit gene ( “hit gene mutation” ) , wherein the hit gene in at least two of the plurality of cancer cells are different from each other; b) contacting the cancer cell library with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of hit gene mutations in the post-treatment cancer cell population and a control cancer cell population (e.g., obtained from the same cancer cell library cultured under the same condition without contacting with the anti-cancer drug) . In some embodiments, the one or more mutations (e.g., inactivating mutations) at one or more hit genes are generated by CRISPR/Cas guide RNAs (e.g., single-guide RNA, “sgRNA” ) or constructs encoding the CRISPR/Cas guide RNAs (e.g., vector such as viral vector, or virus such as lentivirus) , such as sgRNA comprising an iBAR sequence (sgRNA ^iBAR) described herein. Hence the target gene can either be identified based on the difference between the profiles of hit gene mutations directly (e.g., by DNA-sequencing) in the post-treatment cancer cell population and the control cancer cell population, or based on the difference between the profiles of sgRNAs or sgRNAs ^iBAR for generating the hit gene mutations (e.g., by identifying the sgRNA guide sequences hence identifying the corresponding hit genes) in the post-treatment cancer cell population and the control cancer cell population. Screening assays employing sgRNA ^iBAR molecules, constructs, sets, or libraries described herein provide a reliable and highly efficient screening strategy for large-scale target identification in eukaryotic cells (e.g., cancer cells) , with much lower false-positive and false-negative rates, and allow cell library generation using a high MOI. Target genes identified herein are particularly useful in patient selection/exclusion in cancer treatments. For example, patients carrying a mutation (e.g., inactivation) in a drug sensitive gene identified herein, and/or with reduced or absent expression (e.g., mRNA or protein) of the drug sensitive gene compared to a healthy individual, and/or with reduced or abolished activity of an expression product (e.g., mRNA or protein) of the drug sensitive gene compared to a healthy individual, are particularly suitable for treatment with the corresponding anti-cancer drug.

Thus, the present invention in one aspect provides a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA library or an sgRNA ^iBAR library and a Cas component (e.g., Cas9) targeting one or more hit genes; b) contacting the cancer cell library with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of sgRNAs, sgRNA ^iBAR, or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the Cas component comprises a Cas protein or a nucleic acid encoding the Cas protein. In some embodiments, the sgRNA library comprises one or a plurality of sgRNA constructs, wherein each sgRNA construct comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene. In some embodiments, the sgRNA ^iBAR library comprises a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., four) sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more (e.g., four) sgRNA ^iBAR constructs are the same and are complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a same target site in a corresponding hit gene, wherein the iBAR sequence for each of the three or more (e.g., four) sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site in a hit gene (e.g., different genes, or different sites within the same gene) , and wherein each sgRNA ^iBAR is operable with a Cas protein (e.g., Cas9) to modify (e.g., cleave or modulate expression) the target site. In some embodiments, more than one (e.g., 2, 3, 4 or more, such as 3) guide sequence is designed for each hit gene. In some embodiments, the method comprises comparing the sgRNA (or sgRNA ^iBAR) sequence counts obtained from the post-treatment cancer cell population with sgRNA (or sgRNA ^iBAR) sequence counts obtained from the control cancer cell population. In some embodiments, the hit genes whose corresponding sgRNA (or sgRNA ^iBAR) guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population (e.g., with an FDR ≤ 0.1) are identified as drug resistant genes. In some embodiments, the hit genes whose corresponding sgRNA (or sgRNA ^iBAR) guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population (e.g., with an FDR ≤ 0.1) are identified as drug sensitive genes.

Also provided are method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive to a combination therapy comprising two or more (e.g., 2, 3, 4, 5, or more) anti-cancer drugs, comprising separately identifying a set of target genes whose mutation make the cancer cell sensitive to an anti-cancer drug when treated alone, using any of the methods described herein, and obtaining one or more target genes present in every set of target genes identified for each anti-cancer drug, thereby identifying the target gene whose mutation makes the cancer cell sensitive to the combination therapy.

The present invention in another aspect provides a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected for treatment based on that the individual has a drug sensitive aberration (e.g., carries a drug sensitive mutation to the anti-cancer drug, and/or has aberrant (e.g., reduced or absent) expression (e.g., mRNA or protein) of a drug sensitive gene compared to a healthy individual, and/or has aberrant (e.g., reduced or abolished) activity of a drug sensitive gene (e.g., RNA or protein activity, such as due to epigenetic or post-translational modification) compared to a healthy individual) . The present invention also provides a method of excluding an individual suffering from a cancer from a treatment comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is excluded if the individual has a drug resistant aberration (e.g., carries a drug resistant mutation to the anti-cancer drug, and/or has aberrant (e.g., reduced or absent) expression (e.g., mRNA or protein) of a drug resistant gene compared to a healthy individual, and/or has aberrant (e.g., reduced or abolished) activity of a drug resistant gene (e.g., RNA or protein activity, such as due to epigenetic or post-translational modification) compared to a healthy individual) . The present invention also provides a method of treating a cancer in an individual, comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on drug sensitive aberrations (e.g., drug sensitive mutations) and drug resistant aberrations (e.g., drug resistant mutations) , wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations and the drug resistant aberrations is above a composite score threshold level (e.g., the overall mutations will make the cancer cells sensitive to the anti-cancer drug) .

Also provided are sgRNA or sgRNA ^iBAR molecules, constructs, sets, or libraries, which are useful for conducting the screening methods described herein. Modified cancer cells comprising the sgRNA or sgRNA ^iBAR molecules, constructs, sets, or libraries, and methods of generating thereof, are also provided. Further provided are target genes whose mutation (e.g., inactivation such as knock-out) renders cancer cells higher sensitivity, or higher resistance, to killing by one or more anti-cancer drugs. sgRNA or sgRNA ^iBAR molecules, constructs, sets, or libraries against drug sensitive genes or drug resistant genes identified herein, modified cancer cells comprising thereof, pharmaceutical compositions thereof, and kits, are also provided.

I. Definitions

The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto. Any reference signs in the claims shall not be construed as limiting the scope. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In case of conflict, the present document, including definitions, will control. Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present invention. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.

As used herein, “internal barcode” or “iBAR” refers to an index inserted into or appended to a molecule, which is useful for tracing the identity and performance of the molecule. The iBAR can be, for example, a short nucleotide sequence inserted in or appended to a guide RNA for a CRISPR/Cas system, as exemplified by the present invention. Multiple iBARs can be used to trace the performance of a single guide RNA sequence within one experiment, thereby providing replicate data for statistical analysis without having to repeat the experiment.

“CRISPR system” or “CRISPR/Cas system” refers collectively to transcripts and other elements involved in the expression and/or directing the activity of CRISPR-associated ( “Cas” ) genes. For example, a CRISPR/Cas system may include sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA) , a tracr-mate sequence (e.g., encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in an endogenous CRISPR system) , a guide sequence (also referred to as a “spacer” in an endogenous CRISPR system) , and other sequences and transcripts derived from a CRISPR locus.

In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides. A CRISPR complex may comprise a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins.

The term “guide sequence” refers to a contiguous sequence of nucleotides in a guide RNA which has partial or complete complementarity to a target sequence in a target polynucleotide and can hybridize to the target sequence by base pairing facilitated by a Cas protein. In a CRISPR/Cas9 system, a target sequence is adjacent to a PAM site. The PAM sequence, and its complementary sequence on the other strand, together constitutes a PAM site.

The terms “single guide RNA, ” “synthetic guide RNA” and “sgRNA” are used interchangeably and refer to a polynucleotide sequence comprising a guide sequence and any other sequence necessary for the function of the sgRNA and/or interaction of the sgRNA with one or more Cas proteins to form a CRISPR complex. In some embodiments, an sgRNA comprises a guide sequence fused to a second sequence comprising a tracr sequence derived from a tracr RNA and a tracr mate sequence derived from a crRNA. A tracr sequence may contain all or part of the sequence from the tracrRNA of a naturally-occurring CRISPR/Cas system. The term “guide sequence” refers to the nucleotide sequence within the guide RNA that specifies the target site and may be used interchangeably with the terms “guide” or “spacer. ” The term “tracr mate sequence” may also be used interchangeably with the term “direct repeat (s) . ” “sgRNA ^iBAR” as used herein refers to a single-guide RNA having an iBAR sequence.

The term “operable with a Cas protein” means that a guide RNA can interact with the Cas protein to form a CRISPR complex.

As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.

As used herein the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature.

“Complementarity” refers to the ability of a nucleic acid to form hydrogen bond (s) with another nucleic acid sequence by either traditional Watson-Crick base pairing or other non-traditional types. A percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100%complementary) . “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100%over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.

As used herein, “stringent conditions” for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes with the target sequence, and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent, and vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in detail in Tijssen (1993) , Laboratory Techniques In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes Part 1, Second Chapter “Overview of principles of hybridization and the strategy of nucleic acid probe assay” , Elsevier, N.Y.

“Hybridization” refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self-hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a more extensive process, such as the initiation of PCR, or the cleavage of a polynucleotide by an enzyme. A sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.

“Doubling time” or “population doubling time” (PDT) herein refers to the time it takes for a cell population to double in size. Cell doubling time = ln (2) / (growth rate) . Growth rate (gr) refers to the amount of doubling in one unit of time.

in which N (t) is the number of cells at time t, N (0) is the number of cells at time 0, t is time (usually in hours) . When a cell population is an exponentially growing population, i.e., every individual cell doubles with every cell cycle, the growth rate only depends on the length of the cell cycle,

“Construct” as used herein refers to a nucleic acid molecule (e.g., DNA or RNA) , or a vehicle capable of delivering such nucleic acid molecule. For example, when used in the context of an sgRNA, a construct refers to the sgRNA molecule, a nucleic acid molecule (e.g., isolated DNA, or viral vector) encoding the sgRNA, or a vehicle capable of delivering a nucleic acid molecule encoding the sgRNA, such as a lentivirus carrying a nucleic acid molecule encoding the sgRNA. When used in the context of a protein, a construct refers to a nucleic acid molecule comprising a nucleotide sequence that can be transcribed to an RNA or expressed as a protein. A construct may contain necessary regulatory elements operably linked to the nucleotide sequence that allow transcription or expression of the nucleotide sequence when the construct is present in a host cell.

“Operably linked” as used herein means that expression of a gene is under the control of a regulatory element (e.g., a promoter) with which it is spatially connected. A regulatory element may be positioned 5' (upstream) or 3' (downstream) to a gene under its control. The distance between the regulatory element (e.g., promoter) and a gene may be approximately the same as the distance between that regulatory element (e.g., promoter) and a gene it naturally controls and from which the regulatory element is derived. As it is known in the art, variation in this distance may be accommodated without loss of function in the regulatory element (e.g., promoter) .

The term “vector” is used to describe a nucleic acid molecule that may be engineered to contain a cloned polynucleotide or polynucleotides that may be propagated in a host cell. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular) ; nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a "plasmid, " which refers to a circular double-stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors) . Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as “expression vectors. ” Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on basis of the host cells to be used for expression, that is operably linked to the nucleic acid sequence to be expressed.

A “host cell” refers to a cell that may be or has been a recipient of a vector or isolated polynucleotide. Host cells may be prokaryotic cells or eukaryotic cells. In some embodiments, the host cell is a eukaryotic cell that can be cultured in vitro and modified using the methods described herein. The term “cell” includes the primary subject cell and its progeny.

“Multiplicity of infection” or “MOI” are used interchangeably herein to refer to a ratio of agents (e.g., phage, virus, or bacteria) to their infection targets (e.g., cell or organism) . For example, when referring to a group of cells inoculated with viral particles, the multiplicity of infection or MOI is the ratio between the number of viral particles (e.g., viral particles comprising an sgRNA library) and the number of target cells present in a mixture during viral transduction.

A “phenotype” of a cell as used herein refers to an observable characteristic or trait of a cell, such as its morphology, development (e.g., growth, proliferation, differentiation, or death) , biochemical or physiological property, phenology, or behavior. A phenotype may result from expression of genes in a cell, influence from environmental factors, or interactions between the two. In some embodiments, the phenotype is resistance or sensitivity to killing (e.g., by an anti-cancer drug) . In some embodiments, the phenotype is inhibition of growth or proliferation. In some embodiments, the phenotype is death.

An “isolated” nucleic acid molecule described herein is a nucleic acid molecule that is identified and separated from at least one contaminant nucleic acid molecule with which it is ordinarily associated in the environment in which it was produced. Preferably, the isolated nucleic acid is free of association with all components associated with the production environment. The isolated nucleic acid molecules encoding the polypeptides and antibodies herein is in a form other than in the form or setting in which it is found in nature. Isolated nucleic acid molecules therefore are distinguished from nucleic acid encoding the polypeptides and antibodies herein existing naturally in cells.

Unless otherwise specified, a “nucleotide sequence encoding an amino acid sequence” includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. The phrase nucleotide sequence that encodes a protein or an RNA may also include introns to the extent that the nucleotide sequence encoding the protein may in some version contain an intron (s) .

The term “transfected” or “transformed” or “transduced” as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell (e.g., cancer cell) . A “transfected” or “transformed” or “transduced” cell is one which has been transfected, transformed or transduced with exogenous nucleic acid. The cell includes the primary subject cell and its progeny.

As used herein, “treatment” or “treating” is an approach for obtaining beneficial or desired results including clinical results. For purposes of this invention, beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from the disease, diminishing the extent of the disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease) , preventing or delaying the spread (e.g., metastasis) of the disease, preventing or delaying the recurrence of the disease, delay or slowing the progression of the disease, ameliorating the disease state, providing a remission (partial or total) of the disease, decreasing the dose of one or more other medications required to treat the disease, delaying the progression of the disease, increasing the quality of life, and/or prolonging survival. Also encompassed by “treatment” is a reduction of pathological consequence of cancer.

As used herein, an “individual” or a “subject” refers to a mammal, including, but not limited to, human, bovine, horse, feline, canine, rodent, or primate. In some embodiments, the individual is a human.

A “patient” as used herein includes any human who is afflicted with a disease (e.g., cancer) . The terms “subject, ” “individual, ” and “patient” are used interchangeably herein.

Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps.

It is understood that embodiments of the present application described herein include “consisting” and/or “consisting essentially of” embodiments.

Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X” .

As used herein, reference to “not” a value or parameter generally means and describes “other than” a value or parameter. For example, the method is not used to treat cancer of type X means the method is used to treat cancer of types other than X.

The term “about X-Y” used herein has the same meaning as “about X to about Y. ”

For the recitation of numeric ranges of nucleotides herein, each intervening number therebetween, is explicitly contemplated. For example, for the range of 19-21nt, the number 20nt is contemplated in addition to 19nt and 21nt, and for the range of MOI, each intervening number therebetween, whether it is integral or decimal, is explicitly contemplated.

As used herein and in the appended claims, the singular forms “a, ” “or, ” and “the” include plural referents unless the context clearly dictates otherwise.

II. Methods of identifying target genes in cancer cells whose mutations make the cancer cells sensitive or resistant to anti-cancer drugs

The present application provides methods of identifying a target gene in a cancer cell that modulates the activity of the cancer cell, such as in response to anti-cancer drug treatment.

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a mutation (e.g., inactivating mutation) at a hit gene ( “hit gene mutation” ) , wherein the hit gene in at least two of the plurality of cancer cells are different from each other; b) contacting the cancer cell library with the anti-cancer drug (e.g., at a concentration of about IC50 to about IC70) ; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug. In some embodiments, the profiles of hit gene mutations in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing. In some embodiments, the method comprises comparing the sequence counts of sequences comprising the hit gene mutations obtained from the post-treatment cancer cell population with sequence counts of sequences comprising the hit gene mutations obtained from the control cancer cell population, wherein: i) the hit genes whose corresponding hit gene mutation sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding hit gene mutation sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 200-, 600-, 1000-, 2000-, 4000-, 8000-, 10000-, or more) coverage for each hit gene, such as about 600-fold to about 1200-fold, or about 1200-fold to about 12,000-fold coverage for each hit gene. In some embodiments, each hit gene is targeted by at least 2 (e.g., 2, 3, 4, 5, 6, or more, such as 3, or 6 to 12) different hit gene mutations (e.g., targeting different target sites of the hit gene) in the cancer cell library. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, the coverage for each hit gene of the cancer cell library after passage for continuous anti-cancer drug treatment remains the same or similarly (e.g., within about 10%difference) . In some embodiments, the sequence counts of sequences comprising the hit gene mutations are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the variance of each sequence comprising the hit gene mutation (e.g., inactivating mutation) is adjusted based on data consistency among the same gene. In some embodiments, the data consistency among the different hit gene mutation (e.g., inactivating mutation) sequences corresponding to the same hit gene is determined based on the direction of the fold change of each hit gene mutation sequence, wherein the variance of the hit gene mutation sequence is increased if the fold changes of the different hit gene mutation sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged are all considered as different directions) for the same hit gene.

In some embodiments, genes whose DNA mutation frequency are at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients (e.g., based on literature or databases) are selected as hit genes. In some embodiments, genes whose RNA expression levels are up-regulated or down-regulated by at least about 1.2-fold (e.g., at least about any of 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or higher, such as at least about 2-fold) in cancer patients (e.g., based on literature or databases) are selected as hit genes. In some embodiments, genes whose DNA mutation frequency are at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression levels are up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) are selected as hit genes. In some embodiments, a hit gene is further selected based on that the encoded mRNA or protein expresses within a cell, or that the encoded protein expresses on the cell surface, either in heathy cells or in cancer cells. In some embodiments, the hit gene is selected based on: i) whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) , ii) whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) ; and iii) whose encoded RNA or protein is expressed within cell, or whose encoded protein is expressed on the cell surface (in cancer cell or in healthy cell) .

In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with a mutagenic agent.

In some embodiments, the cancer cell library is generated by subjecting an initial population of cancer cells to gene editing (e.g., genome-wide, or subset of genes) . In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA library comprising a plurality of sgRNA constructs, wherein each sgRNA construct (e.g., lentiviral vector or lentivirus) comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene; and ii) a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein, under a condition that allows introduction of the sgRNA constructs and the Cas component into the initial population of cancer cells and generation of the mutations at the hit genes. In some embodiments, the sgRNA library and the Cas component are introduced into the initial population of cancer cells simultaneously. In some embodiments, the sgRNA library and the Cas component are introduced into the initial population of cancer cells sequentially. In some embodiments, the initial cancer cell library comprises a Cas component (e.g., Cas9) . In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells comprising Cas9 with an sgRNA library comprising a plurality of sgRNA constructs, wherein each sgRNA construct (e.g., lentiviral vector or lentivirus) comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene, under a condition that allows introduction of the sgRNA constructs into the initial population of cancer cells comprising Cas9 and generation of the mutations at the hit genes. In some embodiments, the Cas component is introduced into the initial population of cancer cells before the introduction of the sgRNA library. In some embodiments, the cancer cell library is generated by i) contacting an initial population of cancer cells with a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., lentiviral vector or lentivirus encoding Cas9) , under a condition that allows introduction of the Cas component into the initial population of cancer cells; ii) optionally obtaining a population of cancer cells comprising the Cas component ( “Cas ⁺ cancer cell population” ; such as by FACS sorting, e.g., with a marker on the Cas-encoding vector) ; iii) contacting the Cas ⁺ cancer cell population with an sgRNA library comprising a plurality of sgRNA constructs, wherein each sgRNA construct (e.g., lentiviral vector or lentivirus) comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene, under a condition that allows introduction of the sgRNA constructs into the cancer cells (e.g., Cas ⁺ cancer cells) and generation of the mutations at the hit genes. In some embodiments, the Cas protein is Cas9. In some embodiments, each sgRNA comprises the guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with the Cas9. In some embodiments, the second sequence of each sgRNA further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3. In some embodiments, each sgRNA further comprises an iBAR sequence ( “sgRNA ^iBAR” ) , wherein each sgRNA ^iBAR is operable with the Cas protein to modify (e.g., cleave or modulate expression) the hit gene. In some embodiments, each sgRNA ^iBAR comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a dsRNA region that interacts with the Cas protein, and wherein the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence. In some embodiments, the Cas protein is Cas9, and the iBAR sequence of each sgRNA ^iBAR is inserted in the loop region of the repeat-anti-repeat stem loop. In some embodiments, each guide sequence comprises about 17 to about 23 nucleotides. In some embodiments, at least about 95% (e.g., at least about any of 96%, 97%, 98%, 99%, or more) , such as at least about 99%, of the sgRNA constructs in the sgRNA library are introduced into the initial population of cancer cells. In some embodiments, each hit gene within the cancer cell library or the sgRNA library is targeted by at least about 3 (e.g., about 6 to about 12) different sgRNA constructs in at least about 3 (e.g., about 6 to about 12) different target sites of the hit gene. In some embodiments, the cancer cell library has at least about 100-fold (e.g., about 600-fold to about 1200-fold) coverage for each sgRNA. In some embodiments, the cancer cell library has at least about 300-fold coverage for each hit gene, such as about 600-fold to about 1200-fold coverage for each hit gene.

In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., four) sgRNA ^iBAR constructs (e.g., lentiviral vector or lentivirus) each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more (e.g., four) sgRNA ^iBAR constructs are the same and are complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a same target site of a hit gene, wherein the iBAR sequence for each of the three or more (e.g., four) sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site of a hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with a Cas9 protein to modify the target site; and ii) a Cas (e.g., Cas9) component comprising a Cas protein or a nucleic acid encoding the Cas protein, under a condition that allows introduction of the sgRNA ^iBAR constructs and the Cas component into the initial population of cancer cells and generation of the mutations at the hit genes. In some embodiments, the initial cancer cell library comprises a Cas component (e.g., Cas9) . In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells comprising Cas9 with an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., four) sgRNA ^iBAR constructs (e.g., lentiviral vector or lentivirus) each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more (e.g., four) sgRNA ^iBAR constructs are the same and are complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a same target site of a hit gene, wherein the iBAR sequence for each of the three or more (e.g., four) sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site of a hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with a Cas9 protein to modify the target site, under a condition that allows introduction of the sgRNA ^iBAR constructs into the initial population of cancer cells comprising Cas9 and generation of the mutations at the hit genes. In some embodiments, the cancer cell library is generated by i) contacting an initial population of cancer cells with a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., lentiviral vector or lentivirus encoding Cas9) , under a condition that allows introduction of the Cas component into the initial population of cancer cells; ii) optionally obtaining a population of cancer cells comprising the Cas component ( “Cas ⁺ cancer cell population” ; such as by FACS sorting, e.g., with a marker on the Cas-encoding vector) ; iii) contacting the Cas ⁺ cancer cell population with an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., four) sgRNA ^iBAR constructs (e.g., lentiviral vector or lentivirus) each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more (e.g., four) sgRNA ^iBAR constructs are the same and are complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a same target site of a hit gene, wherein the iBAR sequence for each of the three or more (e.g., four) sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site of a hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with a Cas9 protein to modify the target site, under a condition that allows introduction of the sgRNA ^iBAR constructs into the cancer cells (e.g., Cas ⁺ cancer cells) and generation of the mutations at the hit genes. In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., four) sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence, a second sequence, and an iBAR sequence, wherein the guide sequences for the three or more (e.g., four) sgRNA ^iBAR constructs are the same and are complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a same target site of a hit gene, wherein the iBAR sequence for each of the three or more (e.g., four) sgRNA ^iBAR constructs is different from each other, wherein the guide sequence is fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with a Cas9 protein, wherein the iBAR sequence is inserted in the loop region of the repeat-anti-repeat stem loop, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site of a hit gene (e.g., different hit genes, or different target sites of the same hit gene) , and wherein each sgRNA ^iBAR is operable with the Cas9 protein to modify the target site; and ii) a Cas9 component comprising a Cas9 protein or a nucleic acid encoding the Cas9 protein, under a condition that allows introduction of the sgRNA ^iBAR constructs and the Cas9 component into the initial population of cancer cells and generation of the mutations at the hit genes. In some embodiments, the Cas component (e.g., Cas9) is introduced into the cancer cells before the introduction of the sgRNA ^iBAR library. In some embodiments, the sgRNA ^iBAR library is introduced into the cancer cells before the introduction of the Cas component (e.g., Cas9) . In some embodiments, the Cas component (e.g., Cas9) and the sgRNA ^iBAR library are introduced into the cancer cells at the same time. In some embodiments, each iBAR sequence comprises about 1 to about 50 (such as 6) nucleotides. In some embodiments, each set of sgRNA ^iBAR constructs comprises four sgRNA ^iBAR constructs, and the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other. In some embodiments, the sgRNA ^iBAR library comprises at least about 100 sets of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs) . In some embodiments, the iBAR sequences for at least two sets of sgRNA ^iBAR constructs are the same. In some embodiments, the sgRNA ^iBAR library is contacted with the initial population of cancer cells at an MOI of more than about 2 (e.g., at least about 3, 5, or 10) , such as 3. In some embodiments, the sgRNA ^iBAR library comprising a plurality of sgRNA ^iBAR constructs comprises or encodes sgRNA ^iBAR with guide sequences complementary to target sites of cancer-related genes. In some embodiments, at least about 95% (e.g., at least about any of 96%, 97%, 98%, 99%, or more) , such as at least about 99%, of the sgRNA ^iBAR constructs in the sgRNA ^iBAR library are introduced into the initial population of cancer cells. In some embodiments, each hit gene within the cancer cell library or the sgRNA ^iBAR library is targeted by 3 different sets of sgRNA ^iBAR constructs at 3 different target sites of the hit gene. In some embodiments, the cancer cell library has at least about 100-fold coverage for each sgRNA ^iBAR, such as about 100-fold to about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold coverage for each set of sgRNAs ^iBAR, such as about 400-fold to about 4000-fold coverage for each set of sgRNAs ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold coverage for each hit gene, such as about 1200-fold to about 12,000-fold coverage for each hit gene.

Screening methods using sgRNA ^iBAR libraries described herein in some embodiments can improve target identification and data reproducibility by statistical analysis and reduce FDR and. In conventional CRISPR/Cas-based screening methods using a pooled sgRNA library, a high-quality cell library expressing gRNAs are generated using a low MOI during cell library construction to ensure that each cell harbors on average less than one sgRNA or paired guide RNA ( “pgRNA” ) . Because the sgRNA molecules in a library are randomly integrated in the transfected cells, a sufficiently low MOI ensures that each cell expresses a single sgRNA, thereby minimizing the FDR of the screen. To further reduce FDR and increase data reproducibility, in-depth coverage of gRNAs and multiple biological replicates are often necessary to obtain hit genes with high statistical significance. The conventional screen methods face difficulties when a large number of genome-wide screens are needed, when cell materials for library construction are limited, or when one conducts more challenging screens (i.e., in vivo screen) for which it is difficult to arrange the experimental replications or control the MOI. The screening methods using sgRNA ^iBAR libraries described herein overcome the difficulties by including an iBAR sequence in each sgRNA, which enables collection of internal replicates within each sgRNA set having the same guide sequence but different iBAR sequences. Such iBAR method can reduce experimental noise. For example, an iBAR with four nucleotides for each sgRNA, as demonstrated in WO2020125762, can provide sufficient internal replicates to evaluate data consistency among different sgRNA ^iBAR constructs targeting the same genomic locus. The high level of consistency between the two independent experiments in WO2020125762 indicates that one experimental replicate is sufficient for CRISPR/Cas screens using the iBAR method. Because library coverage is significantly increased with a high MOI during viral transduction of host cells, the cell number in the initial cell population could be reduced more than 20-fold to reach the same library coverage, as demonstrated in the constructed genome-wide human library in WO2020125762. By the same token, workload for each genome-wide screen using sgRNA ^iBAR can be reduced proportionally. Using sgRNAs with different iBAR sequences, one could then trace the performance of each guide sequence multiple times within the same experiment by counting both the guide sequence and the corresponding iBAR nucleotide sequences, thereby drastically reducing FDR, and increasing efficiency and liability. Transduction efficiency and library coverage could be further increased, a high viral titer is used during the viral transduction step, for example, with MOI >1 (e.g., MOI >1.5, MOI >2, MOI >2.5, MOI >3, MOI >3.5, MOI >4, MOI >4.5, MOI >5, MOI >5.5, MOI >6, MOI >6.5, MOI >7, MOI >7.5, MOI >8, MOI >8.5, MOI >9, MOI >9.5 or MOI >10; such as, MOI is about any of 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10) .

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b) contacting the cancer cell library with the anti-cancer drug (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time, with or without cell passages) ; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of sgRNAs ^iBAR or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b/c) contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time, with or without cell passages) , harvesting cancer cells by removing the cell culture medium containing the anti-cancer drug (and dead floating cells) and collecting the remaining adherent cancer cells (e.g., by trypsinization) , thus obtaining a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of sgRNAs ^iBAR or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the sgRNA ^iBAR library targets cancer-related genes. In some embodiments, the cancer cell library has about 100-fold to about 1000-fold coverage for each sgRNA ^iBAR, such as about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold coverage for each hit gene, e.g., about 1200-fold to about 12,000-fold coverage for each hit gene. In some embodiments, the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug, and optionally subjected to the same obtaining method in step c) . In some embodiments, the sequence counts obtained from the post-treatment cancer cell population are compared to corresponding sequence counts obtained from the control cancer cell population to provide fold changes (e.g., actual fold changes, or derivatives of fold changes such as log2 or log10 fold changes) . In some embodiments, the identification of the target gene is based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population and the control cancer cell population. In some embodiments, the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing. In some embodiments, identifying the target gene in step d) comprises: comparing the sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the post-treatment cancer cell population with sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the control cancer cell population, wherein: i) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, identifying the target gene in step d) comprises: i) identifying the sgRNA ^iBAR sequence in the post-treatment cancer cell population; and ii) identifying the hit gene corresponding to the guide sequence of the sgRNAs ^iBAR. In some embodiments, identifying the target gene in step d) comprises: i) obtaining sgRNA ^iBAR sequences in the post-treatment cancer cell population; ii) ranking the corresponding guide sequences of the sgRNA ^iBAR sequences based on sequence counts, wherein the ranking comprises adjusting the rank of each guide sequence based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence; and iii) identifying the hit gene corresponding to a guide sequence ranked above a predetermined threshold level. In some embodiments, the method is a positive screening. In some embodiments, the method is a negative screening. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, the coverage for each hit gene (or sgRNA ^iBAR) of the cancer cell library after passage for continuous anti-cancer drug treatment remains the same or similarly (e.g., within about 10%difference) . In some embodiments, the sgRNA ^iBAR sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged) .

Hence in some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b) contacting the cancer cell library with the anti-cancer drug (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time, with or without cell passages) ; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; and d) identifying the target gene based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population and a control cancer cell population, wherein the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug, wherein the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing, wherein step d) comprises comparing the sgRNA ^iBAR sequence counts obtained from the post-treatment cancer cell population with sgRNA sequence counts obtained from the control cancer cell population, and wherein i) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, steps b) and c) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, the coverage for each hit gene (or sgRNA ^iBAR) of the cancer cell library after passage for continuous anti-cancer drug treatment remains the same or similarly (e.g., within about 10%difference) . In some embodiments, the cancer cell library has about 100-fold to about 1000-fold coverage for each sgRNA ^iBAR, such as about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold (e.g., about 1200-fold to about 12,000-fold) coverage for each hit gene. In some embodiments, the sgRNA ^iBAR sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged) .

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b) subjecting the cancer cell library to at least two separate different treatments (e.g., treatments described herein) with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., all alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug, based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population from each treatment and a corresponding control cancer cell population, and d2) combining the one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug; wherein identifying the one or more hit genes in step d1) comprises comparing the sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the post-treatment cancer cell population with sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the control cancer cell population for each treatment, wherein i) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) for the corresponding treatment are identified as hit genes whose mutations make the cancer cells resistant to the anti-cancer drug for the corresponding treatment; and/or ii) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) for the corresponding treatment are identified as hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug for the corresponding treatment; and wherein step d2) comprises combining the one or more hit genes whose mutations make the cancer cells resistant to the anti-cancer drug from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell resistant to the anti-cancer drug; and/or combining the one or more hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive to the anti-cancer drug. In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b) subjecting the cancer cell library to at least two separate different treatments (e.g., treatments described herein) with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., all alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug, based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population from each treatment and a corresponding control cancer cell population, and d2) combining the one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug; wherein i) the hit genes whose corresponding sgRNAs ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) in at least one treatment are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) in at least one treatment are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; b) subjecting the cancer cell library to at least two separate different treatments (e.g., treatments described herein) with the anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., all alive, resistant to the anti-cancer drug) , and d) identifying the target gene based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population and a control cancer cell population based on the at least two separate different treatments; wherein identifying the target gene comprises: i) obtaining sgRNA ^iBAR (or guide sequence thereof) sequences in the post-treatment cancer cell population for each treatment; ii) ranking the corresponding guide sequences of the sgRNA ^iBAR sequences based on sequence counts for each treatment, wherein the ranking comprises adjusting the rank of each guide sequence based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence; and iii) identifying the target gene corresponding to a guide sequence ranked above a predetermined threshold level for each treatment; wherein (1) the hit genes that are identified as depleted from the post-treatment cancer cell population resistant to the anti-cancer drug (alive) in at least one treatment with FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as target genes whose mutation (e.g., inactivation) make the cancer cells sensitive to the anti-cancer drug; and/or (2) the hit genes that are identified as enriched from the post-treatment cancer cell population resistant to the anti-cancer drug (alive) in at least one treatment with FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as target genes whose mutation (e.g., inactivation) make the cancer cells resistant to the anti-cancer drug. In some embodiments, the sequence counts obtained from the post-treatment cancer cell population for each treatment are compared to corresponding sequence counts obtained from a control cancer cell population to provide fold changes (e.g., actual fold changes, or derivatives of fold changes such as log2 or log10 fold changes) . In some embodiments, the cancer cell library has about 100-fold to about 1000-fold coverage for each sgRNA ^iBAR, such as about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold coverage for each hit gene, e.g., about 1200-fold to about 12,000-fold coverage for each hit gene. In some embodiments, the method is a positive screening. In some embodiments, the method is a negative screening. In some embodiments, the control cancer cell population is obtained from the same cancer cell library cultured under the same condition without contacting with the anti-cancer drug, optionally subjected to the same obtaining method in step c) . In some embodiments, the method further comprises conducting next generation sequencing on the post-treatment cancer cell population and the control cancer cell population from each treatment. In some embodiments, one treatment comprises contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, another treatment comprises contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time while allowing alive cancer cells to grow, optionally passaging the cancer cells every about 3 doubling time. In some embodiments, the coverage for each hit gene (or sgRNA ^iBAR) of the cancer cell library after passage for continuous anti-cancer drug treatment remains the same or similarly (e.g., within about 10%difference) . In some embodiments, the sgRNA ^iBAR sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged) .

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing a cancer cell library comprising an sgRNA ^iBAR library described herein targeting one or more hit genes; subjecting the cancer cell library from step a) to two separate treatments b1) and b2) : b1) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time; b2) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time; c1) growing (e.g., passaging every about 3 doubling time, in the presence of the anti-cancer drug) the cancer cell library from treatment b1) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; c2) growing (e.g., passaging every about 3 doubling time, in the presence of the anti-cancer drug) the cancer cell library from treatment b2) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b1) based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population from c1) and a corresponding control cancer cell population, d2) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b2) based on the difference between the profiles of sgRNAs ^iBAR in the post-treatment cancer cell population from c2) and a corresponding control cancer cell population, and d3) combining the one or more hit genes identified from treatment b1) and treatment b2) (sensitive or resistant) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, identifying the hit gene in the post-treatment cancer cell population from treatment b1) or b2) in step d1) or d2) , respectively, comprises: i) identifying the sgRNA ^iBAR sequence in the post-treatment cancer cell population from each treatment (e.g., alive, resistant to the anti-cancer drug) ; and ii) identifying the hit gene corresponding to the guide sequence of the sgRNAs ^iBAR. In some embodiments, identifying the one or more hit genes in step d1) and/or d2) comprises comparing the sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) with sgRNA ^iBAR (or guide sequence thereof) sequence counts obtained from the control cancer cell population for each treatment, wherein i) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) for the corresponding treatment are identified as hit genes whose mutations make the cancer cells resistant to the anti-cancer drug for the corresponding treatment; and/or ii) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as depleted in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) for the corresponding treatment are identified as hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug for the corresponding treatment. In some embodiments, step d3) comprises combining the one or more hit genes whose mutations make the cancer cells resistant to the anti-cancer drug from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell resistant to the anti-cancer drug; and/or combining the one or more hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive to the anti-cancer drug. In some embodiments, identifying the target gene comprises identifying one or more hit genes in the post-treatment cancer cell populations obtained from two separate treatments b1) and b2) , wherein: i) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as enriched in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) in either treatment b1) or b2) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA ^iBAR guide sequences are identified as depleted in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) in either treatment b1) or b2) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, the sequence counts obtained from the post-treatment cancer cell population for each treatment are compared to corresponding sequence counts obtained from a control cancer cell population to provide fold changes (e.g., actual fold changes, or derivatives of fold changes such as log2 or log10 fold changes) . In some embodiments, the cancer cell library has about 100-fold to about 1000-fold coverage for each sgRNA ^iBAR, such as about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold coverage for each hit gene, e.g., about 1200-fold to about 12,000-fold coverage for each hit gene. In some embodiments, the method is a positive screening. In some embodiments, the method is a negative screening. In some embodiments, the control cancer cell population is obtained from the same cancer cell library cultured under the same condition without contacting with the anti-cancer drug, optionally subjected to the same obtaining method in step c) . In some embodiments, the method further comprises conducting next generation sequencing on the post-treatment cancer cell population and the control cancer cell population from each treatment. In some embodiments, the coverage for each hit gene (or sgRNA ^iBAR) of the cancer cell library after passage for continuous anti-cancer drug treatment remains the same or similarly (e.g., within about 10%difference) . In some embodiments, the sgRNA ^iBAR sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged) .

In some embodiments, the one or more target genes are identified using the methods described herein tested on two or more (e.g., 2, 3, 4, 5, or more) cancer cell lines (e.g., HCT116, SW480) of the same cancer type (e.g., colorectal cancer) . In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising: a) providing two or more (e.g., 2, 3, 4, 5, or more) cancer cell libraries (e.g., Cas9 ⁺ sgRNA ^iBAR cancer cell library) each comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a hit gene mutation, wherein the hit gene in at least two of the plurality of cancer cells within the same cancer cell library are different from each other, and wherein the two or more cancer cell libraries are generated from different initial populations of cancer cells (e.g., HCT116 or SW480) of the same cancer type (e.g., colorectal cancer) ; b) separately contacting each cancer cell library with the anti-cancer drug (e.g., at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) ; c) separately growing each cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; d1) separately identifying one or more hit genes in the post-treatment cancer cell population obtained from each cancer cell library, based on the difference between the profiles of hit gene mutations (or sgRNA or sgRNA ^iBAR) in the post-treatment cancer cell population and a corresponding control cancer cell population; and d2) combining the one or more hit genes identified from all cancer cell libraries, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, the treatment step b) and the cancer cell obtaining step c) are the same for different cancer cell libraries. In some embodiments, the treatment step b) and/or the cancer cell obtaining step c) are different for different cancer cell libraries. In some embodiments, the two or more cancer cell libraries are Cas9 ⁺ sgRNA ^iBAR cancer cell libraries. In some embodiments, the control cancer cell population is obtained from the corresponding same cancer cell library cultured under the same condition without contacting with the anti-cancer drug. In some embodiments, the profiles of hit gene mutations or sgRNA or sgRNA ^iBAR in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing. In some embodiments, identifying the one or more hit genes for each cancer cell library in step d1) comprises comparing the hit gene mutation (or sgRNA or sgRNA ^iBAR) sequence counts obtained from the post-treatment cancer cell population with the hit gene mutation (or sgRNA or sgRNA ^iBAR) sequence counts obtained from the corresponding control cancer cell population, wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the corresponding control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as hit genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the corresponding control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, step d2) comprises combining the one or more hit genes whose mutations make the cancer cells resistant to the anti-cancer drug obtained from each all cancer cell libraries, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell resistant to the anti-cancer drug; and/or combining the one or more hit genes whose mutations make the cancer cells sensitive to the anti-cancer drug obtained from each all cancer cell libraries, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive to the anti-cancer drug.

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs, comprising: a) providing a cancer cell library (e.g., Cas9 ⁺ sgRNA ^iBAR cancer cell library) comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a hit gene mutation, wherein the hit gene in at least two of the plurality of cancer cells are different from each other; b) separately contacting the cancer cell library with the two or more different anti-cancer drugs (e.g., at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) ; c) separately growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) for each anti-cancer drug; d1) separately identifying a set of one or more target genes whose mutations make the cancer cells sensitive to an anti-cancer drug (e.g., using any of the identification methods described herein) , for two or more different anti-cancer drugs when treated alone; and d2) obtaining one or more target genes present in every set of target genes identified for each anti-cancer drug, thereby identifying target genes whose mutations make the cancer cells sensitive to a combination treatment of the two or more different anti-cancer drugs; and/or d1) separately identifying a set of one or more target genes whose mutations make the cancer cells resistant to an anti-cancer drug (e.g., using any of the identification methods described herein) , for two or more different anti-cancer drugs when treated alone; and d2) obtaining one or more target genes present in a combination of sets of target genes identified for all anti-cancer drugs, thereby identifying target genes whose mutations make the cancer cells resistant to a combination treatment of the two or more different anti-cancer drugs. In some embodiments, the two or more different anti-cancer drugs target the same cancer target. In some embodiments, the two or more different anti-cancer drugs target different cancer targets.

Thus in some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive to a combination therapy comprising a first anti-cancer drug and a second anti-cancer drug, comprising i) identifying a first set of one or more target genes in the cancer cell whose mutation make the cancer cell sensitive to the first anti-cancer drug according to any of the methods described herein; ii) identifying a second set of one or more target genes in a cancer cell whose mutation make the cancer cell sensitive to the second anti-cancer drug according to any of the methods described herein; and iii) obtaining one or more target genes present in both the first set of target genes and the second set of target genes, thereby identifying the target gene whose mutation makes the cancer cell sensitive to the combination therapy. In some embodiments, the two anti-cancer drugs target the same cancer target. In some embodiments, the two anti-cancer drugs target different cancer targets.

In some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to a combination therapy comprising a first anti-cancer drug and a second anti-cancer drug, comprising: a) providing a cancer cell library (e.g., sgRNA or sgRNA ^iBAR cancer cell library) comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a hit gene mutation, wherein the hit gene in at least two of the plurality of cancer cells are different from each other; b) contacting the cancer cell library with the first anti-cancer drug and the second anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to anti-cancer drug (s) ) ; and d) identifying the target gene based on the difference between the profiles of hit gene mutations (or sgRNAs or sgRNA ^iBAR) in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the first anti-cancer drug and the second anti-cancer drug are contacted with the cancer cell library at the same time. In some embodiments, the first anti-cancer drug and the second anti-cancer drug are contacted with the cancer cell library with an overlapping period. In some embodiments, the first anti-cancer drug and the second anti-cancer drug are contacted with the cancer cell library sequentially. In some embodiments, a cancer cell population post-one drug treatment is obtained (e.g., alive, can be enriched/sorted or not enriched/sorted for alive cells, with or without a recovery growth period) , then contacted with the other anti-cancer drug, to obtain the final post-treatment cancer cell population. In some embodiments, the control cancer cell population is obtained from the same cancer cell library cultured under the same condition without contacting with any anti-cancer drug, optionally subjected to the same cancer cell obtaining method in step c) . In some embodiments, the control cancer cell population is obtained from the same cancer cell library cultured under the same condition and contacted with only one anti-cancer drug, optionally subjected to the same cancer cell obtaining method in step c) . Thus in some embodiments, there is provided a method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive to a combination therapy comprising a first anti-cancer drug and a second anti-cancer drug, comprising: a) providing a cancer cell library (e.g., sgRNA or sgRNA ^iBAR cancer cell library) comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a hit gene mutation, wherein the hit gene in at least two of the plurality of cancer cells are different from each other; b) contacting the cancer cell library with the first anti-cancer drug and the second anti-cancer drug; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to anti-cancer drug (s) ) ; d1) identifying a first set of one or more hit genes based on the difference between the profiles of hit gene mutations (or sgRNAs or sgRNA ^iBAR) in the post-treatment cancer cell population and a first control cancer cell population; d2) identifying a second set of one or more hit genes based on the difference between the profiles of hit gene mutations (or sgRNAs or sgRNA ^iBAR) in the post-treatment cancer cell population and a second control cancer cell population; and d3) combining the first set and the second set of one or more hit genes identified from d1) and d2) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the combination therapy comprising the first anti-cancer drug and the second anti-cancer drug; wherein the first control cancer cell population is obtained from the cancer cell library cultured under the same condition, contacted with the first anti-cancer drug alone, and obtained with the same cancer cell obtaining method in step c) ; and wherein the second control cancer cell population is obtained from the cancer cell library cultured under the same condition, contacted with the second anti-cancer drug alone, and obtained with the same cancer cell obtaining method in step c) . In some embodiments, identifying the first set of one or more hit genes in d1) and/or identifying the second set of one or more hit genes in d2) can comprise any of the hit gene/target gene identification methods described herein. For example, in some embodiments, identifying the first (or second) set of one or more hit genes comprises comparing the sgRNA or sgRNA ^iBAR sequence counts (or sequence counts of sequences comprising the hit gene mutation) obtained from the post-treatment cancer cell population with sgRNA or sgRNA ^iBAR sequence counts (or sequence counts of sequences comprising the hit gene mutation) obtained from the first (or second) control cancer cell population, wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences (or hit gene mutation) are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug (s) ) compared to the first (and/or second) control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as hit genes whose mutations make the cancer cells more resistant to the combination therapy compared to the first (and/or second) anti-cancer drug treatment alone; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences (or hit gene mutation) are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug (s) ) compared to the first (and/or second) control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as hit genes whose mutations make the cancer cells more sensitive to the combination therapy compared to the first (and/or second) anti-cancer drug treatment alone. In some embodiments, identifying the first (or second) set of one or more hit genes in step d1) (or d2) ) further comprises comparing the sgRNA or sgRNA ^iBAR sequence counts (or sequence counts of sequences comprising the hit gene mutation) obtained from the post-treatment cancer cell population with sgRNA or sgRNA ^iBAR sequence counts (or sequence counts of sequences comprising the hit gene mutation) obtained from a control cancer cell population obtained from the same cancer cell library cultured under the same condition without contacting with any of the anti-cancer drugs, wherein i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences (or hit gene mutation) are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug (s) ) compared to both the control cancer cell population and the first (and/or second) control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold enrichment) are identified as hit genes whose mutations make the cancer cells more resistant to the combination therapy compared to the first (and/or second) anti-cancer drug treatment alone; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences (or hit gene mutation) are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug (s) ) compared to both the control cancer cell population and the first (and/or second) control cancer cell population with an FDR ≤ 0.1 (and/or with at least about 2-fold depletion) are identified as hit genes whose mutations make the cancer cells more sensitive to the combination therapy compared to the first (and/or second) anti-cancer drug treatment alone. In some embodiments, the method further comprises next generation sequencing to obtain the sgRNA or sgRNA ^iBAR sequences or sequences comprising the hit gene mutations. In some embodiments, the two anti-cancer drugs target the same cancer target. In some embodiments, the two anti-cancer drugs target different cancer targets.

In some embodiments, any of the identification methods described herein further comprise validating the target gene by: a) modifying a cancer cell by creating a mutation (e.g., inactivating mutation) in the target gene in the cancer cell; b) determining the sensitivity or resistance of the modified cancer cell to the anti-cancer drug.

Further provided are modified cancer cells obtained by inactivating one or more target genes identified by any of the methods described herein.

Single-guide RNA (sgRNA) library and sgRNA ^iBAR library

In some embodiments, the present invention uses CRISPR/Cas guide RNAs (e.g., single-guide RNA) and constructs encoding the CRISPR/Cas guide RNAs to generate mutations (e.g., inactivating mutations) in one or more hit genes. In some embodiments, the mutations are generated by cleaving the hit gene (e.g., with CRISPR/Cas9) . In some embodiments, the mutations are generated by modulating (e.g., repressing or reducing) the expression of the hit gene (e.g., with CRISPR/dCas fused to a repressor domain) .

In some embodiments, there is provided an sgRNA library comprising one or a plurality of (e.g., 1, 2, 3, 4, 5, 10, 100, 1,000, 10,000, 20,000, or more) sgRNA constructs, wherein each sgRNA construct (e.g., lentivirus or lentiviral vector encoding the sgRNA) comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene. In some embodiments, the sgRNA library comprises a plurality of (e.g., 2, 3, 4, 5, 10, 100, 1,000, 10,000, 20,000, or more) sgRNA constructs, wherein at least two hit genes that the guide sequences are complementary to are different from each other. In some embodiments, the sgRNA construct comprises (or consists of) an sgRNA. In some embodiments, the sgRNA construct encodes an sgRNA. In some embodiments, the sgRNA construct is a plasmid that encodes the sgRNA. In some embodiments, the sgRNA construct is a viral vector (e.g., lentiviral vector) encoding the sgRNA. In some embodiments, the sgRNA construct is a virus (e.g., lentivirus) encoding the sgRNA. In some embodiments, each sgRNA comprises the guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with a Cas protein (e.g., Cas9) . In some embodiments, the second sequence of each sgRNA further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3. In some embodiments, each guide sequence comprises about 17 to about 23 nucleotides. In some embodiments, the sgRNA library comprises at least about 100 sgRNAs constructs, such as at least about any of 200, 300, 400, 1,000, 1, 600, 4,000, 10,000, 15,000, 16,000, 19,000, 20,000, 38,000, 50,000, 100,000, 150,000, 155,000, 200,000, or more sgRNA constructs. In some embodiments, the sgRNA library comprises about 6000 to about 16,000 sgRNA constructs. In some embodiments, the sgRNA library comprises about 10,000 to about 18,000 sgRNA constructs. In some embodiments, the sgRNA library comprising a plurality of sgRNA constructs comprises or encodes sgRNAs with guide sequences complementary to target sites of every annotated gene in the genome (hereinafter also referred to as “whole-genome sgRNA library” ) . In some embodiments, the sgRNA library comprising a plurality of sgRNA constructs comprises or encodes sgRNAs with guide sequences complementary to target sites of hit genes whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the hit gene encodes a protein that is expressed within a cell or on cell surface, either in healthy cells or in cancer cells. In some embodiments, the sgRNA library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sgRNA constructs comprising or encoding sgRNAs with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites of the same hit gene, i.e., the sgRNA library has at least two-fold coverage for that hit gene. In some embodiments, for each hit gene, the sgRNA library comprises at least 3 (e.g., about 6 to about 12) sgRNA constructs comprising or encoding sgRNAs with guide sequences complementary to at least 3 (e.g., about 6 to about 12) different target sites of the same hit gene. In some embodiments, the sgRNA library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sgRNA constructs comprising or encoding sgRNAs with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites within the same hit gene for every annotated gene in the genome, i.e., the sgRNA library has at least two-fold coverage for the whole genome. In some embodiments, the sgRNA library further comprises one or a plurality of (e.g., 1, 2, 3, 4, 5, 10, 100, 1,000, 2,000, 10,000, or more) “negative control sgRNA constructs” , wherein each negative control sgRNA construct (e.g., lentivirus or lentiviral vector encoding the negative control sgRNA) comprises or encodes a negative control sgRNA, and wherein each negative control sgRNA comprises a guide sequence that is complementary to an irrelevant sequence that is not in the genome, is complementary to a control gene (e.g., known to respond the same or similar between test and control groups after gene inactivation) , or is complementary to a sequence not associated with any annotated gene in the genome. In some embodiments, the sgRNA library further comprises negative control sgRNA constructs in the amount of about 3%to about 30%of the number of hit gene sgRNA constructs in the sgRNA library. In some embodiments, the sgRNA library further comprises about 500 to about 4000 (e.g., about 500) negative control sgRNA constructs.

In some embodiments, the sgRNA further comprises an internal barcode (iBAR) sequence (such sgRNA is hereinafter referred to as “sgRNA ^iBAR” ) . In some embodiments, the iBAR is positioned in the sgRNA such that the resulting sgRNA ^iBAR is operable with a Cas protein (e.g., Cas9) to modify (e.g., cleave or modulate expression) the hit gene complementary to the guide sequence of the sgRNA ^iBAR. Thus in some embodiments, the sgRNA library described herein is an sgRNA ^iBAR library. In some embodiments, the sgRNA ^iBAR library comprises one or a plurality of (e.g., 1, 2, 3, 4, 5, 10, 100, 1,000, 10,000, 20,000, or more) sgRNA ^iBAR constructs, wherein each sgRNA ^iBAR construct comprises or encodes an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, and wherein each guide sequence is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene. In some embodiments, the sgRNA ^iBAR library comprises a plurality of (e.g., 2, 3, 4, 5, 10, 100, 1,000, 10,000, or more) sgRNA ^iBAR constructs, wherein at least two hit genes that the guide sequences are complementary to are different from each other. In some embodiments, each sgRNA ^iBAR comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a double-stranded RNA (dsRNA) region that interacts with the Cas protein, and wherein the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence. In some embodiments, each sgRNA ^iBAR comprises the guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with the Cas protein (e.g., Cas9) . In some embodiments, the second sequence of each sgRNA ^iBAR further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3. In some embodiments, the Cas protein is Cas9, and the iBAR sequence of each sgRNA ^iBAR is inserted in the loop region of the repeat-anti-repeat stem loop. In some embodiments, each sgRNA ^iBAR comprises from 5’-to-3’: a guide sequence, a repeat-anti-repeat stem loop with iBAR sequence inserted in the loop region, a stem loop 1, a stem loop 2, and a stem loop 3. In some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) sgRNA ^iBAR constructs (e.g., lentivirus or lentiviral vector encoding the sgRNAs ^iBAR) each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site in a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) . In some embodiments, each set of sgRNA ^iBAR constructs comprises four sgRNA ^iBAR constructs, and wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other. Hence in some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise four sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the four sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site in a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) . In some embodiments, the sgRNA ^iBAR library comprises at least about 100 (e.g., at least about any of 200, 400, 1,000, 1,300, 1,600, 4,000, 10,000, 15,000, 19,000, 20,000, 38,000, 50,000, 100,000, 150,000, 155,000, 200,000, or more) sets of sgRNA ^iBAR constructs, such as about 1000 to about 4000 sets of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs) . In some embodiments, the iBAR sequences for at least two sets of sgRNA ^iBAR constructs are the same. In some embodiments, the sgRNA ^iBAR library comprising a plurality of sets sgRNA ^iBAR constructs comprises or encodes sgRNAs ^iBAR with guide sequences complementary to target sites of every annotated gene in the genome (hereinafter also referred to as “whole-genome sgRNA ^iBAR library” ) . In some embodiments, the sgRNA ^iBAR library comprising a plurality of sets sgRNA ^iBAR constructs comprises or encodes sgRNAs ^iBAR with guide sequences complementary to target sites of hit genes whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the hit gene encodes a protein that is expressed within a cell or on cell surface, either in healthy cells or in cancer cells. In some embodiments, the sgRNA ^iBAR library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sets sgRNA ^iBAR constructs comprising or encoding sgRNAs ^iBAR with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites of the same hit gene, i.e., the sgRNA ^iBAR library has at least two-fold coverage for that hit gene. In some embodiments, for each hit gene, the sgRNA ^iBAR library comprises 3 sets sgRNA ^iBAR constructs comprising or encoding sgRNAs ^iBAR with guide sequences complementary to 3 different target sites of the same hit gene. In some embodiments, the sgRNA ^iBAR library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sets sgRNA ^iBAR constructs comprising or encoding sgRNAs ^iBAR with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites within the same hit gene for every annotated gene in the genome, i.e., the sgRNA ^iBAR library has at least two-fold coverage for the whole genome. In some embodiments, each guide sequence comprises about 17 to about 23 nucleotides. In some embodiments, each iBAR sequence comprises about 1 to about 50 (e.g., about 6) nucleotides. In some embodiments, the sgRNA ^iBAR construct comprises (or consists of) an sgRNA ^iBAR. In some embodiments, the sgRNA ^iBAR construct encodes an sgRNA ^iBAR. In some embodiments, the sgRNA ^iBAR construct is a plasmid that encodes the sgRNA ^iBAR. In some embodiments, the sgRNA ^iBAR construct is a viral vector (e.g., lentiviral vector) encoding the sgRNA ^iBAR. In some embodiments, the sgRNA ^iBAR construct is a virus (e.g., lentivirus) encoding the sgRNA ^iBAR. Different sgRNA ^iBAR constructs of a set having different iBAR sequences can be used in a single gene-editing and screening experiment to provide replicate data. In some embodiments, the sgRNA ^iBAR library further comprises one or a plurality of sets of “negative control sgRNA ^iBAR constructs” , wherein each set of negative control sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) negative control sgRNA ^iBAR constructs (e.g., lentivirus or lentiviral vector encoding the negative control sgRNAs ^iBAR) each comprising or encoding a negative control sgRNA ^iBAR, wherein each negative control sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more negative control sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more negative control sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of negative control sgRNA ^iBAR constructs is complementary to a target site not associated with any annotated gene in the genome, is complementary to a control gene (e.g., known to respond the same or similar between test and control groups after gene inactivation) , or is complementary to an irrelevant sequence that is not in the genome. In some embodiments, the sgRNA ^iBAR library further comprises negative control sgRNA ^iBAR constructs in the amount of about 3%to about 30%of the number of hit gene sgRNA ^iBAR constructs in the sgRNA ^iBAR library. In some embodiments, the sgRNA ^iBAR library further comprises about 500 to about 4000 negative control sgRNA ^iBAR constructs (e.g., 2000) or sets of negative control sgRNA ^iBAR constructs (e.g., 500 sets) .

In some embodiments, there is provided an sgRNA library (e.g., sgRNA ^iBAR library) comprising one or more sgRNA constructs (e.g., sgRNA ^iBAR constructs) , wherein each sgRNA construct (e.g., lentivirus or lentiviral vector encoding the sgRNA) comprises or encodes an sgRNA (e.g., sgRNA ^iBAR) , and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a target gene selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1. In some embodiments, there is provided an sgRNA library (e.g., sgRNA ^iBAR library) comprising one or more sgRNA constructs (e.g., sgRNA ^iBAR constructs) , wherein each sgRNA construct (e.g., lentivirus or lentiviral vector encoding the sgRNA) comprises or encodes an sgRNA (e.g., sgRNA ^iBAR) , and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a target gene selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

In some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) sgRNA ^iBAR constructs (e.g., lentiviruses or lentiviral vectors encoding the sgRNAs ^iBAR) each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site in a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with a Cas9 protein to modify the target site. In some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise four sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the four sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site in a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with a Cas9 protein to modify the target site. In some embodiments, each sgRNA ^iBAR sequence comprises a guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with the Cas9. In some embodiments, the second sequence of each sgRNA ^iBAR sequence further comprises a stem loop 1, stem loop 2, and/or stem loop 3. In some embodiments, the iBAR sequence is inserted in the loop region of the repeat-anti-repeat stem loop, and/or the loop region of the stem loop 1, stem loop 2, or stem loop 3. In some embodiments, each iBAR sequence comprises about 1-50 (e.g., about 6) nucleotides. In some embodiments, each sgRNA ^iBAR construct is an RNA, a plasmid, a viral vector (e.g., lentiviral vector) , or a virus (e.g., lentivirus) . In some embodiments, the sgRNA ^iBAR library comprises at least about 100 (e.g., at least about any of 200, 400, 1,000, 1,300, 1,600, 4,000, 10,000, 15,000, 19,000, 20,000, 38,000, 50,000, 100,000, 150,000, 155,000, 200,000, or more) sets of sgRNA ^iBAR constructs, such as about 1000 to about 4000 sets of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs) . In some embodiments, the iBAR sequences for at least two sets of sgRNA ^iBAR constructs are the same. In some embodiments, the sgRNA ^iBAR library comprising a plurality of sets sgRNA ^iBAR constructs comprises or encodes sgRNAs ^iBAR with guide sequences complementary to target sites of hit genes whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the sgRNA ^iBAR library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sets sgRNA ^iBAR constructs comprising or encoding sgRNAs ^iBAR with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites within the same hit gene for every hit gene whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the hit gene encodes a protein that is expressed within a cell or on cell surface, either in healthy cells or in cancer cells. In some embodiments, each guide sequence comprises about 17 to about 23 nucleotides.

In some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence, a second sequence, and an iBAR sequence, wherein the guide sequences for the three or more sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more sgRNA ^iBAR constructs is different from each other, wherein the guide sequence is fused to the second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with a Cas9 protein, wherein the iBAR sequence is inserted in the loop region of the repeat-anti-repeat stem loop, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site of a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with the Cas9 protein to modify the target site. In some embodiments, there is provided an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise four sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence, a second sequence, and an iBAR sequence, wherein the guide sequences for the four sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other, wherein the guide sequence is fused to the second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with a Cas9 protein, wherein the iBAR sequence is inserted in the loop region of the repeat-anti-repeat stem loop, wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site of a corresponding hit gene (e.g., different hit genes, or different sites within the same hit gene) , and wherein each sgRNA ^iBAR is operable with the Cas9 protein to modify the target site. In some embodiments, the second sequence of each sgRNA ^iBAR sequence further comprises a stem loop 1, stem loop 2, and/or stem loop 3, e.g., fused to the 3’ end of the repeat-anti-repeat stem loop sequence. In some embodiments, each iBAR sequence comprises about 1-50 (e.g., 6) nucleotides. In some embodiments, each sgRNA ^iBAR construct is an RNA, a plasmid, a viral vector (e.g., lentiviral vector) , or a virus (e.g., lentivirus) . In some embodiments, the sgRNA ^iBAR library comprises at least about 100 (e.g., at least about any of 200, 400, 1,000, 1,300, 1,600, 4,000, 10,000, 15,000, 19,000, 20,000, 38,000, 50,000, 100,000, 150,000, 155,000, 200,000, or more) sets of sgRNA ^iBAR constructs, such as about 1000 to about 4000 sets of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs) . In some embodiments, the iBAR sequences for at least two sets of sgRNA ^iBAR constructs are the same. In some embodiments, the sgRNA ^iBAR library comprising a plurality of sets sgRNA ^iBAR constructs comprises or encodes sgRNAs ^iBAR with guide sequences complementary to target sites of hit genes whose DNA mutation frequency is at least about 5%(e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the sgRNA ^iBAR library comprises at least two (e.g., 2, 3, 4, 5, or more, such as 3) sets sgRNA ^iBAR constructs comprising or encoding sgRNAs ^iBAR with guide sequences complementary to at least two (e.g., 2, 3, 4, 5, or more, such as 3) different target sites within the same hit gene for every hit gene whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the hit gene encodes a protein that is expressed within a cell or on cell surface, either in healthy cells or in cancer cells. In some embodiments, each guide sequence comprises about 17 to about 23 nucleotides.

In some embodiments, there is provided an sgRNA ^iBAR construct comprising a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene, and a guide hairpin coding sequence for a Repeat: Anti-Repeat Duplex and a tetraloop, wherein an iBAR is embedded in the tetraloop serving as internal replicates. In some embodiments, the iBAR comprises a 1 nucleotide ( “nt” ) -50nt (e.g., 1nt-40nt, 1nt-30nt, 1nt-25nt, 2nt-20nt, 3nt-18nt, 3nt-16nt, 3nt-14nt, 3nt-12nt, 3nt-10nt, 3nt-9nt, 4nt-8nt, 5nt-7nt; preferably, 3nt, 4nt, 5nt, 6nt, 7nt) sequence consisting of A, T, C, and G nucleotides. In some embodiments, the guide sequence is about any of 17-23, 18-22, or 19-21 nucleotides in length, and the hairpin sequence once transcribed can be bound to a Cas nuclease (e.g., Cas9) . In some embodiments, the sgRNA ^iBAR construct further comprises a sequence coding for stem loop 1, stem loop 2 and/or stem loop 3. In some embodiments, each sgRNA ^iBAR construct is an RNA, a plasmid, a viral vector (e.g., lentiviral vector) , or a virus (e.g., lentivirus) .

Also provided are sgRNA molecules encoded by any one of the sgRNA constructs or libraries described herein. Also provided are sgRNA ^iBAR molecules encoded by any one of the sgRNA ^iBAR constructs, sets, or libraries described herein. Compositions and kits comprising any one of the sgRNA or sgRNA ^iBAR constructs, molecules, sets, or libraries are further provided.

In some embodiments, there is provided a modified cancer cell comprising any one of the sgRNA or sgRNA ^iBAR constructs, molecules, sets, or libraries described herein. In some embodiments, there is provided a cancer cell library wherein each cancer cell comprises one or more sgRNA constructs from an sgRNA library described herein, or one or more sgRNA ^iBAR constructs from an sgRNA ^iBAR library described herein. In some embodiments, the cancer cell library comprises an sgRNA library or an sgRNA ^iBAR library described herein targeting any target genes identified herein, or any hit genes whose DNA mutation frequency is at least about 5%(e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients compared to healthy individuals (e.g., based on literature or databases) . In some embodiments, the modified cancer cells or the initial population of cancer cells comprise or express one or more components of the CRISPR/Cas system, such as the Cas protein (e.g., Cas9) operable with the sgRNA or sgRNA ^iBAR constructs.

iBAR sequences

A set of sgRNA ^iBAR construct comprises three or more sgRNA ^iBAR constructs each comprising a different iBAR sequence. In some embodiments, a set of sgRNA ^iBAR construct comprises three sgRNA ^iBAR constructs each comprising a different iBAR sequence. In some embodiments, a set of sgRNA ^iBAR construct comprises four sgRNA ^iBAR constructs each comprising a different iBAR sequence. In some embodiments, a set of sgRNA ^iBAR construct comprises five sgRNA ^iBAR constructs each comprising a different iBAR sequence. In some embodiments, a set of sgRNA ^iBAR construct comprises six or more sgRNA ^iBAR constructs each comprising a different iBAR sequence.

The iBAR sequences may have any suitable length. In some embodiments, each iBAR sequence is about 1-50 nucleotides ( “nt” ) in length, such as about any one of 1nt-40nt, 1nt-30nt, 1nt-20nt, 2nt-20nt, 3nt-18nt, 3nt-16nt, 3nt-14nt, 3nt-12nt, 3nt-10nt, 3nt-9nt, 3nt-8nt, 4nt-8nt, or 5nt-7nt. In some embodiments, each iBAR sequence is about any of 2nt, 3nt, 4nt, 5nt, 6nt, 7nt, or 8nt long. In some embodiments, the iBAR sequence in each sgRNA ^iBAR construct has the same length. In some embodiments, the iBAR sequences of different sgRNA ^iBAR constructs have different lengths. In some embodiments, the iBAR sequences within a set of sgRNA ^iBAR constructs have the same length. In some embodiments, the iBAR sequences within a set of sgRNA ^iBAR constructs have different lengths. In some embodiments, the iBAR sequences within one set of sgRNA ^iBAR constructs have different lengths from the iBAR sequences within another set of sgRNA ^iBAR constructs. In some embodiments, the iBAR sequence is about 6nt, hereinafter referred to as “iBAR ₆. ” In some embodiments, each iBAR sequence within the sgRNA ^iBAR library is about 6nt.

The iBAR sequences may have any suitable sequences. In some embodiments, the iBAR sequence is a DNA sequence made of any of A, T, C and/or G nucleotides. In some embodiments, the iBAR sequence is an RNA sequence made of any of A, U, C, and/or G nucleotides. In some embodiments, the iBAR sequence has non-conventional or modified nucleotides other than A, T/U, C, and G. In some embodiments, each iBAR sequence is 6 nucleotides long consisting of A, T, C, and G nucleotides. In some embodiments, the iBAR sequence in the encoded sgRNA ^iBAR is 6 nucleotides long consisting of A, U, C, and G nucleotides.

In some embodiments, the set of iBAR sequences associated with each set of sgRNA ^iBAR constructs in the sgRNA ^iBAR library is different from each other. In some embodiments, the iBAR sequences for at least two sgRNA ^iBAR constructs among different sets of sgRNA ^iBAR constructs are the same (e.g., the first set and the second set of sgRNA ^iBAR constructs have at least 1, 2, 3, 4, or more shared iBAR sequences among the two sets of sgRNA ^iBAR constructs, but the iBAR sequences for each sgRNA ^iBAR construct within the same set of sgRNA ^iBAR constructs are different from each other) . In some embodiments, the iBAR sequences for at least two (e.g., at least about any of 2, 3, 4, 5, 10, 50, 100, 1000, or more) sets of sgRNA ^iBAR constructs in the sgRNA ^iBAR library are the same. In some embodiments, one or more same iBAR sequences are used for one or more sgRNA ^iBAR constructs of each set of sgRNA ^iBAR constructs in the sgRNA ^iBAR library (but the iBAR sequences for each sgRNA ^iBAR construct within the same set of sgRNA ^iBAR constructs are different from each other) . In some embodiments, the same set of iBAR sequences are used for each set of sgRNA ^iBAR constructs in the sgRNA ^iBAR library. In some embodiments, it is not necessary to design different iBAR sets for different sets of sgRNA ^iBAR constructs. In some embodiments, a fixed set of iBARs is used for all sets of sgRNA ^iBAR constructs in the sgRNA ^iBAR library. In some embodiments, a plurality of iBAR sequences are randomly assigned to different sets of sgRNA ^iBAR constructs in the sgRNA ^iBAR library. The iBAR strategy with a streamlined analytic tool (MAGeCK ^iBAR; Zhu et al., Genome Biol. 2019; 20: 20) described herein can facilitate large-scale CRISPR/Cas screens for biomedical discoveries in various settings.

The iBAR sequence may be inserted (including appended) to any suitable regions in a guide RNA (e.g., sgRNA) that does not affect the efficiency of the gRNA in guiding the Cas nuclease (e.g., Cas9) to its target site. In some embodiments, the iBAR sequence is placed at the 3’ end of an sgRNA. In some embodiments, the iBAR sequence is placed at the 5’ end of an sgRNA. In some embodiments, the iBAR sequence is placed at an internal position in an sgRNA. For example, an sgRNA may comprise various stem loops that interact with the Cas nuclease in a CRISPR complex, and the iBAR sequence may be embedded in the loop region of any one of the stem loops. In some embodiments, each sgRNA ^iBAR sequence comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a double-stranded RNA (dsRNA) region that interacts with the Cas protein, and wherein the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence. In some embodiments, the guide RNA (e.g., sgRNA) further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3, and wherein the iBAR sequence is inserted in the loop region of stem loop 1, stem loop 2, and/or stem loop 3.

For example, the guide RNA of a CRISPR/Cas9 system may comprise a guide sequence targeting a genomic locus (e.g., a target site in a hit gene) , and a guide hairpin sequence coding for a Repeat: Anti-Repeat Duplex and a tetraloop. In some embodiments, the iBAR is inserted in the tetraloop serving as internal replicates. In the context of an endogenous CRISPR/Cas9 system, the crRNA hybridizes with the trans-activating crRNA (tracrRNA) to form a crRNA: tracrRNA duplex, which is loaded onto Cas9 to direct the cleavage of cognate DNA sequences bearing appropriate protospacer-adjacent motifs (PAMs) . An endogenous crRNA sequence can be divided into guide (20 nt) and repeat (12 nt) regions, whereas an endogenous tracrRNA sequence can be divided into anti-repeat (14 nt) and three tracrRNA stem loops. In some embodiments, the sgRNA binds the target DNA to form a T-shaped architecture comprising a guide: target heteroduplex, a repeat: anti-repeat duplex, and stem loops 1–3. In some embodiments, the repeat and anti-repeat parts are connected by the tetraloop, and the repeat and anti-repeat form a repeat: anti-repeat duplex, connected with stem loop 1 by a single nucleotide (A51) , whereas

stem loops

1 and 2 are connected by a 5 nt single-stranded linker (nucleotides 63–67) . In some embodiments, the guide sequence (nucleotides 1–20) and target DNA (nucleotides 10–200) form the guide: target heteroduplex via 20 Watson-Crick base pairs, and the repeat (nucleotides 21–32) and the anti-repeat (nucleotides 37–50) form the repeat: anti-repeat duplex via nine Watson-Crick base pairs (U22: A49–A26: U45 and G29: C40–A32: U37) . In some embodiments, the tracrRNA tail (nucleotides 68–81 and 82–96) forms stem

loops

2 and 3 via four and six Watson-Crick base pairs (A69: U80–U72: A77 and G82: C96–G87: C91) , respectively. Nishimasu et al. describes a crystal structure of an exemplary CRISPR/Cas9 system (Nishimasu et al. “Crystal structure of cas9 in complex with guide RNA and target DNA. ” Cell. 2014; 156: 935–949) , which is incorporated herein by reference in its entirety.

In some embodiments, the iBAR sequence is inserted in the tetraloop, or the loop region of the repeat: anti-repeat stem loop of an sgRNA. In some embodiments, the iBAR sequence of each sgRNA ^iBAR within the library is inserted in the loop region of the repeat-anti-repeat stem loop. The tetraloop of the Cas9 sgRNA scaffold is outside the Cas9-sgRNA ribonucleoprotein complex, which has been subject to alterations for various purposes without affecting the activity of its upstream guide sequence (Gilbert et al. Cell 159, 647-661 (2014) ; Zhu et al. Methods Mol Biol 1656, 175-181 (2017) ) . Applicant has previously demonstrated in WO2020125762 that a 6-nt-long iBAR (iBAR ₆) may be embedded in the tetraloop of a typical Cas9 sgRNA scaffold without affecting the gene editing efficiency of the sgRNA or increasing off-target effects, and without sequence bias in the iBAR ₆. The exemplary iBAR ₆ gives rise to 4,096 barcode combinations, which provides sufficient variations for a high throughput screen (see FIG. 1A of WO2020125762) .

Guide sequences

The guide sequence hybridizes with the target sequence (e.g., a target site in a hit gene) and directs sequence-specific binding of a CRISPR complex to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about any of 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more (e.g., 100%complementary) . A guide sequence that is “complementary” to a target site or a hit gene can be fully or partially complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to the target site or the hit gene. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wimsch algorithm, algorithms based on the Burrows-Wheeler Transform. In certain embodiments, a guide sequence is about or more than about any of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides in length. In some embodiments, the guide sequence comprises about 17 to about 23 nucleotides. The ability of a guide sequence to direct sequence-specific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay. For example, the components of a CRISPR system sufficient to form a CRISPR complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions.

In some embodiments, a guide sequence can be as short as about 10 nucleotides and as long as about 30 nucleotides. In some embodiments, the guide sequence is about any one of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. Synthetic guide sequences can be about 20 nucleotides long, but can be longer or shorter. By way of example, a guide sequence for a CRISPR/Cas9 system may consist of 20 nucleotides complementary to a target sequence (e.g., a target site in a hit gene) , i.e., the guide sequence may be identical to the 20 nucleotides upstream of the PAM sequence except for the A/U difference between DNA and RNA. In some embodiments, the guide sequence comprises about 17 to about 23 nucleotides. In some embodiments, the guide sequence of each sgRNA or sgRNA ^iBAR within the library has the same length. In some embodiments, the guide sequences of at least two sgRNAs or sgRNAs ^iBAR within the library have different lengths. In some embodiments, the guide sequences within a set of sgRNA ^iBAR constructs have the same length. In some embodiments, the guide sequences within a set of sgRNA ^iBAR constructs have different lengths. In some embodiments, the guide sequences within one set of sgRNA ^iBAR constructs have different lengths from the guide sequences within another set of sgRNA ^iBAR constructs.

In some embodiments, the guide sequences within a set of sgRNA ^iBAR constructs are the same. In some embodiments, the guide sequences within a set of sgRNA ^iBAR constructs are the same, while the guide sequence within each set of sgRNA ^iBAR constructs is complementary to a different target site (e.g., different hit genes, or different target sites of the same hit gene) . In some embodiments, the guide sequences of at least two sets of sgRNA ^iBAR constructs are complementary to two different target sites of the same hit gene. In some embodiments, the guide sequences of 3 sets of sgRNA ^iBAR constructs are complementary to 3 different target sites of the same hit gene. In some embodiments, each hit gene is targeted by at least two (e.g., 2, 3, 4 or more, such as 3) guide sequences of at least two (e.g., 2, 3, 4 or more, such as 3) sets of sgRNA ^iBAR constructs in at least two (e.g., 2, 3, 4 or more, such as 3) different target sites. In some embodiments, the guide sequence within each set of sgRNA ^iBAR constructs is complementary to a different hit gene in the genome.

The guide sequence in an sgRNA construct or an sgRNA ^iBAR construct may be designed according to any known methods in the art. The guide sequence may target the coding region such as an exon or a splicing site, the 5’ untranslated region (UTR) or the 3’ untranslated region (UTR) of a gene of interest. For example, the reading frame of a gene could be disrupted by indels mediated by double-strand breaks (DSB) at a target site of a guide RNA. Alternatively, a guide RNA targeting the 5’ end of a coding sequence may be used to produce gene knockouts with high efficiency. The guide sequence may be designed and optimized according to certain sequence features for high on-target gene-editing activity and low off-target effects. For instance, the GC content of a guide sequence may be in the range of about 20%to about 70%, and sequences containing homopolymer stretches (e.g., TTTT, GGGG) may be avoided.

The guide sequence may be designed to target any genomic locus of interest (e.g., any target site of any hit gene) . In some embodiments, the guide sequence targets a protein-coding gene. In some embodiments, the guide sequence targets a gene encoding an RNA, such as a small RNA (e.g., microRNA, piRNA, siRNA, snoRNA, tRNA, rRNA and snRNA) , a ribosomal RNA, or a long non-coding RNA (lincRNA) . In some embodiments, the guide sequence targets a non-coding region of the genome. In some embodiments, the guide sequence targets a chromosomal locus. In some embodiments, the guide sequence targets an extrachromosomal locus. In some embodiments, the guide sequence targets a mitochondrial gene. In some embodiments, the guide sequence is complementary to a target site of any annotated genes in the genome (e.g., human genome) . In some embodiments, the guide sequence targets a gene whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) , such as in cancer patients (e.g., based on literature or databases) . In some embodiments, the guide sequence targets a gene whose RNA expression level is up-regulated or down-regulated by more than about 1.2-fold (e.g., more than about any of 1.5, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the guide sequence targets a gene whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) . In some embodiments, the guide sequence targets a gene whose encoded protein is expressed within a cell or on cell surface (either in healthy cells or in cancer cells) . In some embodiments, the guide sequence targets a region without any gene annotation in the genome ( “non-gene region” ) . sgRNA or sgRNA ^iBAR constructs comprising or encoding such guide sequence complementary to a non-gene region can serve as negative control.

In some embodiments, the guide sequence is designed to repress or inactivate the expression of any hit gene or target gene of interest. The hit gene or target gene may be an endogenous gene or a transgene. In some embodiments, the hit gene or target gene may be known to be associated with a particular phenotype. In some embodiments, the hit gene or target gene is a gene that has not been implicated in a particular phenotype, such as a known gene that is not known to be associated with a particular phenotype, or an unknown gene that has not been characterized. In some embodiments, the guide sequence targeted region is located on a different chromosome as the hit gene or target gene.

Other sgRNA or sgRNA ^iBAR components

In some embodiments, the sgRNA or sgRNA ^iBAR comprises additional sequence element (s) that promotes formation of the CRISPR complex with the Cas protein. In some embodiments, the sgRNA or sgRNA ^iBAR comprises a second sequence comprising a repeat-anti-repeat stem loop. A repeat-anti-repeat stem loop comprises a tracr mate sequence fused to a tracr sequence that is complementary to the tracr mate sequence via a loop region.

Typically, in the context of an endogenous CRISPR/Cas9 system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. The tracr sequence, which may comprise or consist of all or a portion of a wild-type tracr sequence (e.g., about or more than about any of 20, 26, 32, 45, 48, 54, 63, 67, 85, or more nucleotides of a wild-type tracr sequence) , may also form part of a CRISPR complex, such as by hybridization along at least a portion of the tracr sequence to all or a portion of a tracr mate sequence that is operably linked to the guide sequence. In some embodiments, the tracr sequence has sufficient complementarity to a tracr mate sequence to hybridize and participate in formation of a CRISPR complex. As with the target sequence, it is believed that complete complementarity is not needed, provided there is sufficient to be functional. In some embodiments, the tracr sequence has at least about any of 50%, 60%, 70%, 80%, 90%, 95%or 99%of sequence complementarity along the length of the tracr mate sequence when optimally aligned. Determining optimal alignment is within the purview of one of skill in the art. For example, there are publically and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, Smith-Waterman in Matlab, Bowtie, Geneious, Biopython and SeqMan. In some embodiments, the tracr sequence is about or more than about any of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. Any one of the known tracr mate sequences and tracr sequences derived from naturally occurring CRISPR system, such as the tracr mate sequence and tracr sequence from the S. pyogenes CRISPR/Cas9 system as described in US8697359 and those described herein, may be used.

In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a stem loop (also known as a hairpin) , known as the “repeat-anti-repeat stem loop. ”

In some embodiments, the loop region of the stem loop in an sgRNA construct without an iBAR sequence is four nucleotides in length, and such loop region is also referred to as the “tetraloop. ” In some embodiments, the loop region has the sequence of GAAA. However, longer or shorter loop sequences may be used, or alternative sequences may be used, such as sequences including a nucleotide triplet (for example, AAA) , and an additional nucleotide (for example C or G) . In some embodiments, the sequence of the loop region is CAAA or AAAG. In some embodiments, the iBAR is inserted in the loop region, such as the tetraloop. For example, the iBAR sequence may be inserted before the first nucleotide, between the first nucleotide or the second nucleotide, between the second nucleotide and the third nucleotide, between the third nucleotide and the fourth nucleotide, or after the fourth nucleotide in the tetraloop. In some embodiments, the iBAR sequence replaces one or more nucleotides in the loop region.

In some embodiments, the sgRNA ^iBAR comprises at least two or more stem loops. In some embodiments, the sgRNA ^iBAR has two, three, four or five stem loops. In some embodiments, the sgRNA ^iBAR has at most five hairpins. In some embodiments, the sgRNA or sgRNA ^iBAR construct further includes a transcription termination sequence, such as a polyT sequence, for example six T nucleotides.

In some embodiments, wherein the Cas protein is Cas9, each sgRNA or sgRNA ^iBAR comprises a guide sequence fused to a second sequence comprising a repeat-anti-repeat stem loop that interacts with the Cas 9. In some embodiments, the iBAR sequence is inserted in the loop region of the repeat-anti-repeat stem loop. In some embodiments, the iBAR sequence replaces one or more nucleotides in the loop region of the repeat-anti-repeat stem loop. In some embodiments, the second sequence of each sgRNA or sgRNA ^iBAR further comprises a stem loop 1, stem loop 2, and/or stem loop 3. In some embodiments, the iBAR sequence is inserted in the loop region of stem loop 1. In some embodiments, the iBAR sequence replaces one or more nucleotides in the loop region of stem loop 1. In some embodiments, the iBAR sequence is inserted in the loop region of stem loop 2. In some embodiments, the iBAR sequence replaces one or more nucleotides in the loop region of stem loop 2. In some embodiments, the iBAR sequence is inserted in the loop region of stem loop 3. In some embodiments, the iBAR sequence replaces one or more nucleotides in the loop region of stem loop 3.

In some embodiments, each sgRNA ^iBAR comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a double-stranded RNA (dsRNA) region that interacts with the Cas protein, and wherein the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence.

In a CRISPR/Cas9 system, a guide RNA can be used to guide the cleavage of a genomic DNA by the Cas9 nuclease. For example, the guide RNA may be composed of a nucleotide spacer of variable sequence (guide sequence) that targets the CRISPR/Cas system nuclease to a genomic location in a sequence-specific manner, and an invariant hairpin sequence that is constant among different guide RNAs and allows the guide RNA to bind to the Cas nuclease. In some embodiments, the CRISPR/Cas guide RNA comprising a CRISPR/Cas variable guide sequence that is homologous or complementary to a target genomic sequence (e.g., target site of a hit gene) in a host cell and an invariant hairpin sequence that when transcribed is capable of binding a Cas nuclease (e.g., Cas9) , wherein the hairpin sequence codes for a Repeat: Anti-Repeat Duplex and a tetraloop, and an iBAR is embedded in the tetraloop region.

The guide sequence for a CRISPR/Cas9 guide RNA can be about any of 17-23, 18-22, or 19-21 nucleotides in length. The guide sequence can target the Cas nuclease to a genomic locus in a sequence-specific manner and can be designed following general principles known in the art. The invariant guide RNA hairpin sequences can be provided according to common knowledge in the art, for example, as disclosed by Nishimasu et al. (Nishimasu H, et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell. 2014; 156: 935–949) . Any invariant hairpin sequences may be used as long as they are capable of binding to a Cas nuclease once transcribed.

Previous studies showed that, although sgRNA with a 48-nt tracrRNA tail (referred to as sgRNA (+48) ) is the minimal region, for the Cas9-catalyzed DNA cleavage in vitro (Jinek et al., 2012) , sgRNAs with extended tracrRNA tails, sgRNA (+67) and sgRNA (+85) , may improve the Cas9 cleavage activity in vivo (Hsu et al., 2013) . In some embodiments, the sgRNA or sgRNA ^iBAR comprises stem loop 1, stem loop 2, and/or stem loop 3. The stem loop 1, stem loop 2 and/or stem loop 3 regions may improve editing efficiency in a CRISPR/Cas9 system.

In some embodiments, the sgRNA comprises from 5’ to 3’: a guide sequence, a repeat-anti-repeat stem loop, a stem loop 1, a stem loop 2, and a stem loop 3. In some embodiments, the sgRNA ^iBAR comprises from 5’ to 3’: a guide sequence, a repeat-anti-repeat stem loop with an iBAR sequence inserted in the loop region, a stem loop 1, a stem loop 2, and a stem loop 3.

Vectors and vehicles

In some embodiments, the sgRNA construct comprises one or more regulatory elements operably linked to the guide RNA sequence. In some embodiments, the sgRNA ^iBAR construct comprises one or more regulatory elements operably linked to the guide RNA sequence and the iBAR sequence. Exemplary regulatory elements include, but are not limited to, promoters, enhancers, internal ribosomal entry sites (IRES) , and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences) . Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990) . Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) .

The sgRNA or sgRNA ^iBAR constructs may be present in a vector. In some embodiments, the vector is suitable for replication and integration in eukaryotic cells, such as mammalian cells (e.g., cancer cells) . In some embodiments, the sgRNA or sgRNA ^iBAR construct is an expression vector, such as a viral vector or a plasmid. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, lentiviral vector, retroviral vectors, herpes simplex viral vector, and derivatives thereof. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York) , and in other virology and molecular biology manuals. It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression desired, etc. In some embodiments, the sgRNA or sgRNA ^iBAR construct is a lentiviral vector. In some embodiments, the sgRNA or sgRNA ^iBAR construct is a virus. In some embodiments, the sgRNA or sgRNA ^iBAR construct is an adenovirus or an adeno-associated virus. In some embodiments, the sgRNA or sgRNA ^iBAR construct is a lentivirus. In some embodiments, the vector further comprises a selection marker. In some embodiments, the vector further comprises one or more nucleotide sequences encoding one or more elements of the CRISPR/Cas system, such as a nucleotide sequence encoding a Cas nuclease (e.g., Cas9) . In some embodiments, there is provided a vector system comprising one or more vectors encoding nucleotide sequences encoding one or more elements of the CRISPR/Cas system, and a vector comprising any one of the sgRNA or sgRNA ^iBAR constructs described herein. A vector may include one or more of the following elements: an origin of replication, one or more regulatory sequences (e.g., promoters and/or enhancers) that regulate the expression of the polypeptide of interest, and/or one or more selectable marker genes (e.g., antibiotic resistance genes, or fluorescent protein-encoding genes) .

A number of viral based systems have been developed for gene transfer into mammalian cells. For example, retroviruses provide a convenient platform for gene delivery systems. The heterologous nucleic acid can be inserted into a vector and packaged in retroviral particles using techniques known in the art. The recombinant virus can then be isolated and delivered to the engineered mammalian cell in vitro or ex vivo. A number of retroviral systems are known in the art. In some embodiments, adenovirus vectors are used. A number of adenovirus vectors are known in the art. In some embodiments, lentivirus vectors are used. In some embodiments, self-inactivating lentiviral vectors are used. Self-inactivating lentiviral vectors can be packaged into lentiviruses with protocols known in the art. The resulting lentiviruses can be used to transduce a mammalian cell (e.g., cancer cell) using methods known in the art. Vectors derived from retroviruses such as lentivirus are suitable tools to achieve long-term gene transfer, because they allow long-term, stable integration of a transgene and its propagation in progeny cells. Lentiviral vectors also have low immunogenicity, and can transduce non-proliferating cells.

In some embodiments, the vector is a non-viral vector. In some embodiments, the vector is a transposon, such as a Sleeping Beauty transposon system, or a PiggyBac transposon system. In some embodiments, the vector is a polymer-based non-viral vector, including for example, poly (lactic-co-glycolic acid) (PLGA) and poly lactic acid (PLA) , poly (ethylene imine) (PEI) , and dendrimers. In some embodiments, the vector is a cationic-lipid based non-viral vector, such as cationic liposome, lipid nanoemulsion, and solid lipid nanoparticle (SLN) . In some embodiments, the vector is a peptide-based gene non-viral vector, such as poly-L-lysine. Any of the known non-viral vectors suitable for gene editing can be used for introducing the sgRNA or sgRNA ^iBAR-encoding nucleic acid to a cancer cell. See, for example, Yin H. et al. Nature Rev. Genetics (2014) 15: 521-555; Aronovich EL et al. “The Sleeping Beauty transposon system: a non-viral vector for gene therapy. ” Hum. Mol. Genet. (2011) R1: R14-20; and Zhao S. et al. “PiggyBac transposon vectors: the tools of the human gene editing. ” Transl. Lung Cancer Res. (2016) 5 (1) : 120-125, which are incorporated herein by reference. In some embodiments, any one or more of the nucleic acids encoding the sgRNAs or sgRNAs ^iBAR described herein is introduced to a cancer cell by a physical method, including, but not limited to electroporation, sonoporation, photoporation, magnetofection, hydroporation.

In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR, and the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system (e.g., Cas nuclease such as Cas9) , are on separate vectors (e.g., viral vector such as lentiviral vector) . In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR, and the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system, are on the same vector. In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR and the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system are operably controlled by separate promoters. In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR and the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system are operably controlled by the same promoter. In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR and the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system are connected by one or more linking sequences such as IRES.

The nucleic acid can be cloned into the vector using any known molecular cloning methods in the art, including, for example, using restriction endonuclease sites and one or more selectable markers. In some embodiments, the nucleic acid is operably linked to a promoter. Varieties of promoters have been explored for gene expression in mammalian cells, and any of the promoters known in the art may be used in the present invention. Promoters may be roughly categorized as constitutive promoters or regulated promoters, such as inducible promoters.

In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR and/or the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system (e.g., Cas9) is operably linked to a constitutive promoter. Constitutive promoters allow heterologous genes (also referred to as transgenes) to be expressed constitutively in the host cells. Exemplary promoters contemplated herein include, but are not limited to, cytomegalovirus immediate-early promoter (CMV IE) , human elongation factors-1alpha (hEF1α) , ubiquitin C promoter (UbiC) , phosphoglycerokinase promoter (PGK) , simian virus 40 early promoter (SV40) , chicken β-Actin promoter coupled with CMV early enhancer (CAGG) , a Rous Sarcoma Virus (RSV) promoter, a polyoma enhancer/herpes simplex thymidine kinase (MC1) promoter, a beta actin (β-ACT) promoter, a “myeloproliferative sarcoma virus enhancer, negative control region deleted, d1587rev primer-binding site substituted (MND) ” promoter. The efficiencies of such constitutive promoters on driving transgene expression have been widely compared in a huge number of studies.

In some embodiments, the nucleic acid encoding the sgRNA or sgRNA ^iBAR and/or the one or more nucleic acids encoding the one or more elements of the CRISPR/Cas system (e.g., Cas9) is operably linked to an inducible promoter. Inducible promoters belong to the category of regulated promoters. The inducible promoter can be induced by one or more conditions, such as a physical condition, microenvironment of the cancer cells (e.g., engineered cancer cells) , or the physiological state of the cancer cells, an inducer (i.e., an inducing agent) , or a combination thereof. In some embodiments, the inducing condition does not induce the expression of endogenous genes in the engineered cancer cell, and/or in the subject that receives cancer cell therapy. In some embodiments, the inducing condition is selected from the group consisting of: inducer, irradiation (such as ionizing radiation, light) , temperature (such as heat) , redox state, tumor environment, and the activation state of the engineered cancer cell. In some embodiments, the inducible promoter can be an NFAT promoter, a

promoter, or an NFκB promoter.

Library

The sgRNA libraries described herein comprise one or a plurality of sgRNA constructs, wherein each sgRNA construct comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene. The sgRNA libraries described herein may be designed to target one or a plurality of genomic loci (e.g., a plurality of target sites in one or more hit genes in the genome) according to the needs of a genetic screen. In some embodiments, a single sgRNA construct is designed to target each hit gene. In some embodiments, a plurality of (e.g., at least about 2, 3, 4, 5, 10, 20, 100, or more) sgRNA constructs with different guide sequences targeting a single hit gene may be designed. For example, such plurality of sgRNA constructs may comprise or encode guide sequences targeting different target sites of a single hit gene, such as 3 (or about 6 to about 12) different target sites of a single hit gene.

sgRNA library comprising one or a plurality of sgRNA ^iBAR constructs are also referred to herein as sgRNA ^iBAR library, in which each sgRNA construct comprises or encodes an iBAR sequence. The sgRNA ^iBAR libraries described herein comprise one or a plurality of sgRNA ^iBAR constructs, wherein each sgRNA ^iBAR construct comprises or encodes an sgRNA ^iBAR, and wherein each sgRNA ^iBAR comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene. The sgRNA ^iBAR libraries described herein may be designed to target one or a plurality of genomic loci (e.g., a plurality of target sites in one or more hit genes in the genome) according to the needs of a genetic screen. In some embodiments, a single sgRNA ^iBAR construct is designed to target each hit gene. In some embodiments, a plurality of (e.g., at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, or more) sgRNA ^iBAR constructs with different guide sequences targeting a single hit gene may be designed. For example, such plurality of sgRNA ^iBAR constructs may comprise or encode guide sequences targeting different target sites of a single hit gene, such as 3 different target sites of a single hit gene.

In some embodiments, the sgRNA ^iBAR library described herein comprises one or a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein each sgRNA ^iBAR comprises a guide sequence and an iBAR sequence, wherein the guide sequences for the three or more sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site of a hit gene (e.g., different hit genes, or different sites within the same hit gene) . In some embodiments, each set of sgRNA ^iBAR constructs comprises four sgRNA ^iBAR constructs, and wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other. In some embodiments, a single set of sgRNA ^iBAR constructs is designed to target each hit gene. In some embodiments, the sgRNA ^iBAR library comprises a plurality of (e.g., at least about 2, 3, 4, 5, 10, 20, or more) sets of sgRNA ^iBAR constructs with different guide sequences targeting a single hit gene. In some embodiments, the sgRNA ^iBAR library comprises at least 3 sets of sgRNA ^iBAR constructs designed to target 3 different target sites of every hit gene, wherein each set of sgRNA ^iBAR constructs comprises four sgRNA ^iBAR constructs. In some embodiments, the sgRNA ^iBAR library comprises at least about 100 sets of sgRNA ^iBAR constructs, such as at least about any of 200, 300, 400, 800, 1,000, 2,000, 3,000, 5,000, 10,000, 15,000, 19,000, 20,000, 40,000, 50,000, 100,000, 150,000, 200,000 or more sets of sgRNA ^iBAR constructs. In some embodiments, the sgRNA ^iBAR library comprises about 100 to about 30,000 sets of sgRNA ^iBAR constructs, such as about 1000 to about 4000, about 1000 to about 6000, or about 3000 to about 5000 sets of sgRNA ^iBAR constructs.

In some embodiments, the sgRNA library or sgRNA ^iBAR library comprises at least about any of 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, 400, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 10,000, 15,000, 19,000, 20,000, 38,000, 39,000, 40,000, 50,000, 100,000, 150,000, 155,000, 200,000 or more sgRNA constructs or sgRNA ^iBAR constructs. In some embodiments, the sgRNA library or sgRNA ^iBAR library comprises at least about 100 (e.g., at least about any of 200, 300, 400, 600, 1000, 1200, 3000, 6000, 10,000, 20,000, or more) sgRNA constructs or sgRNA ^iBAR constructs, such as at least about 300 or about 400 sgRNA constructs or sgRNA ^iBAR constructs. In some embodiments, the sgRNA library comprises about 1000 to about 300,000 sgRNA constructs, such as about 6000 to about 14,000, about 1000 to about 20,000, about 1000 to about 5000, about 10,000 to about 200,000, about 15,000 to about 20,000, about 100,000 to about 300,000, or about 150,000 to about 180,000 sgRNA constructs. In some embodiments, the sgRNA ^iBAR library comprises about 1000 to about 1,200,000 sgRNA ^iBAR constructs, such as about 1000 to about 20,000, about 10,000 to about 18,000, about 1000 to about 5000, about 10,000 to about 200,000, about 15,000 to about 20,000, about 100,000 to about 300,000, about 300,000 to about 1,200,000, or about 150,000 to about 180,000 sgRNA ^iBAR constructs. In some embodiments, the sgRNA ^iBAR library comprises at least about any of 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, 400, 500, 1,000, 2,000, 3,000, 5,000, 10,000, 15,000, 19,000, 20,000, 38,000, 50,000, 100,000, 150,000, 200,000 or more sets of sgRNA ^iBAR constructs, such as about 1000 to about 4000 sets of sgRNA ^iBAR constructs. In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets at least about any of 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 15,000, 19,000, 20,000, 38,000, 50,000 or more genes in a cell or an organism. In some embodiments, the organism is human. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is a whole-genome library for protein-coding genes and/or non-coding RNAs. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is a whole-genome library for every annotated gene. In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets at least about any of 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95%of the genes in a cell or an organism. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is a targeted library, which targets selected genes in a signaling pathway or associated with a cellular process, such as sensitivity or resistance to anti-cancer drug-mediated killing, cell proliferation, cell cycle, transcriptional regulation, ubiquitination, apoptosis, immune response such as autoimmune, tumor metastasis, tumor malignant transformation, etc. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is used for a genome-wide screen associated with a particular modulated phenotype, such as sensitivity or resistance to anti-cancer drug-mediated killing. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is used for a genome-wide screen to identify at least one target gene associated with a particular modulated phenotype, such as a target gene in a cancer cell that modulates the activity of the cancer cell in response to an anti-cancer drug. In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets “cancer-related genes, ” e.g., genes whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients, and/or genes whose RNA expression level is up-regulated or down-regulated by at least about 1.2-fold (e.g., at least about any of 1.5, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more, such as about 2-fold) in cancer patients, such as based on literature or databases) . In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets genes whose encoded mRNA and/or protein express within cells (in heathy cells or in cancer cells) . In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets genes whose encoded protein express on the cell surface (in heathy cells or in cancer cells) . In some embodiments, the sgRNA library or the sgRNA ^iBAR library targets genes i) whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients (e.g., based on literature or databases) , ii) whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) , and iii) whose encoded mRNA or protein express within the cell, or whose encoded protein express on the cell surface, either in heathy cells or in cancer cells. Thus in some embodiments, the sgRNA library comprising a plurality of sgRNA constructs comprises or encodes sgRNAs with guide sequences complementary to target sites of cancer-related genes, such as target sites of about 1323 colorectal cancer-related genes in the human genome with DNA mutation frequency ≥5%and RNA expression level up-or down-regulated by more than 2-fold from patients with stage III and IV colorectal cancer, with gene product either expressed in cell or on cell surface. In some embodiments, the sgRNA ^iBAR library comprising a plurality of sgRNA ^iBAR constructs comprises or encodes sgRNAs ^iBAR with guide sequences complementary to target sites of cancer-related genes, such as target sites of about 1323 colorectal cancer-related genes in the human genome with DNA mutation frequency ≥5%and RNA expression level up-or down-regulated by more than 2-fold from patients with stage III and IV colorectal cancer, with gene product either expressed in cell or on cell surface. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is designed to target an eukaryotic genome, such as a mammalian genome. Exemplary genomes of interest include genomes of a rodent (mouse, rat, hamster, guinea pig) , a domesticated animal (e.g., cow, sheep, cat, dog, horse, or rabbit) , a non-human primate (e.g., monkey) , fish (e.g., zebrafish) , non-vertebrate (e.g., Drosophila melanogaster and Caenorhabditis elegans) , and human.

The guide sequences of the sgRNA libraries or the sgRNA ^iBAR libraries may be designed using any known algorithms that identify CRISPR/Cas target sites in user-defined lists with a high degree of targeting specificity in the human genome, such as Genomic Target Scan (GT-Scan) (see O’Brien et al., Bioinformatics (2014) 30: 2673-2675) ) , DeepCRISPR, CasFinder, CHOPCHOP, CRISPRscan, etc. In some embodiments, at least about any of 100, 400, 500, 1,000, 3,000, 5,000, 10,000, 15,000, 19,000, 20,000, 50,000, 100,000, 150,000, 155,000, 200,000, or more sgRNA constructs or sgRNA ^iBAR constructs can be generated on a single array. This approach can also be scaled up to enable genome-wide screens by the synthesis of multiple sgRNA libraries or sgRNA ^iBAR libraries in parallel. The exact number of sgRNA constructs in an sgRNA library, or the exact number of sgRNA ^iBAR constructs (or sets of sgRNA ^iBAR constructs) in an sgRNA ^iBAR library, can depend on whether the screen 1) targets genes or regulatory elements, 2) targets the complete genome, or subgroup of the genomic genes.

In some embodiments, the sgRNA library or the sgRNA ^iBAR library is designed to target every PAM sequence overlapping a gene in a genome, wherein the PAM sequence corresponds to the Cas protein. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is designed to target a subset of the PAM sequences found in the genome, wherein the PAM sequence corresponds to the Cas protein.

In some embodiments, the sgRNA library comprises one or more control sgRNA constructs that do not target any genomic loci in a genome. In some embodiments, sgRNA constructs that do not target putative genomic genes can be included in an sgRNA library as negative controls. In some embodiments, the sgRNA ^iBAR library comprises one or more control sgRNA ^iBAR constructs that do not target any genomic loci in a genome. In some embodiments, sgRNA ^iBAR constructs that do not target putative genomic genes can be included in an sgRNA ^iBAR library as negative controls. In some embodiments, the sgRNA library (or sgRNA ^iBAR library) comprises one or more control sgRNA constructs (or control sgRNA ^iBAR constructs) that target non-cancer related genes, e.g., genes whose expression (RNA level or protein level) does not differ by at least 1.2-fold (e.g., at least about any of 1.5, 2, 2.5, 3, 4, 5 folds, or more) between cancer patients and healthy individuals, such as genes whose expression levels differ by less than 2-fold between cancer patients and healthy individuals; and/or genes whose mutation frequency is less than about 5% (e.g., less than about any of 4%, 3%, 2%, or 1%) in cancer patients.

The sgRNA constructs and libraries described herein may be prepared using any known nucleic acid synthesis and/or molecular cloning methods in the art. In some embodiments, the sgRNA library is synthesized by electrochemical means on arrays (e.g., CustomArray, Twist, Gen9) , DNA printing (e.g., Agilent) , or solid phase synthesis of individual oligos (e.g., by IDT) . The sgRNA constructs can be amplified by PCR and cloned into an expression vector (e.g., a lentiviral vector) . In some embodiments, the lentiviral vector further encodes one or more components of the CRISPR/Cas-based genetic editing system, such as the Cas protein, e.g., Cas9.

The present invention in some embodiments provides isolated nucleic acids encoding any of the sgRNA constructs, sgRNA ^iBAR constructs, sets of sgRNA ^iBAR constructs, sgRNA library, or sgRNA ^iBAR library described herein. Also provided are vectors (e.g., non-viral vector, or viral vector such as lentiviral vector) and virus (e.g., lentivirus) comprising any of the nucleic acids encoding any of the sgRNA constructs, sgRNA ^iBAR constructs, sets of sgRNA ^iBAR constructs, sgRNA library, or sgRNA ^iBAR library described herein.

Cas protein

The sgRNA constructs or sgRNA ^iBAR constructs described herein may be designed to operate with any one of the naturally-occurring or engineered CRISPR/Cas systems known in the art. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Type I CRISPR/Cas system. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Type II CRISPR/Cas system. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Type III CRISPR/Cas system. Exemplary CRISPR/Cas systems can be found in WO2013176772, WO2014065596, WO2014018423, WO2016011080, US8697359, US8932814, US10113167B2, the disclosures of which are incorporated herein by reference in their entireties for all purposes.

In certain embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Cas protein derived from a CRISPR/Cas type I, type II, or type III system, which has an RNA-guided polynucleotide binding and/or nuclease activity. Examples of such Cas proteins are recited in, e.g., WO2014144761 WO2014144592, WO2013176772, US20140273226, and US20140273233, which are incorporated herein by reference in their entireties.

In certain embodiments, the Cas protein is derived from a type II CRISPR-Cas system. In certain embodiments, the Cas protein is or is derived from a Cas9 protein. In certain embodiments, the Cas protein is or is derived from a bacterial Cas9 protein, including those identified in WO2014144761.

In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with Cas9 (also known as Csn1 and Csx12) , a homolog thereof, or a modified version thereof. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with two or more (e.g., 2, 3, 4, 5, or more) Cas proteins. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Cas9 protein from S. pyogenes or S. pneumoniae. Cas enzymes are known in the art; for example, the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2.

The Cas protein (also referred herein as “Cas nuclease” ) provides a desired activity, such as target binding, target nicking or cleaving activity. In certain embodiments, the desired activity is target binding. In certain embodiments, the desired activity is target nicking or target cleaving. In certain embodiments, the desired activity also includes a function provided by a polypeptide that is covalently fused to a Cas protein or a nuclease-deficient Cas protein. Examples of such a desired activity include a transcription regulation activity (either activation or repression) , an epigenetic modification activity, or a target visualization/identification activity.

In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a Cas nuclease that cleaves the target sequence, including double-strand cleavage and single-strand cleavage. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a catalytically inactive Cas ( “dCas” ) . In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a dCas of a CRISPR activation ( “CRISPRa” ) system, wherein the dCas is fused to a transcriptional activator. In some embodiments, the sgRNA construct or the sgRNA ^iBAR construct is operable with a dCas of a CRISPR interference (CRISPRi) system. In some embodiments, the dCas is fused to a repressor domain, such as a KRAB domain. Such CRISPR/Cas systems can be used to modulate (e.g., induce, repress, increase, or reduce) gene expression.

In certain embodiments, the Cas protein is a mutant of a wild type Cas protein (such as Cas9) or a fragment thereof. A Cas9 protein generally has at least two nuclease (e.g., DNase) domains. For example, a Cas9 protein can have a RuvC-like nuclease domain and an HNH-like nuclease domain. The RuvC and HNH domains work together to cut both strands in a target site to make a double-stranded break in the target polynucleotide. (Jinek et al., Science 337: 816-21) . In certain embodiments, a mutant Cas9 protein is modified to contain only one functional nuclease domain (either a RuvC-like or an HNH-like nuclease domain) . For example, in certain embodiments, the mutant Cas9 protein is modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., the nuclease activity is absent) . In some embodiments where one of the nuclease domains is inactive, the mutant is able to introduce a nick into a double-stranded polynucleotide (such protein is termed a “nickase” ) but not able to cleave the double-stranded polynucleotide. In certain embodiments, the Cas protein is modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein. In certain embodiments, the Cas protein is truncated or modified to optimize the activity of the effector domain. In certain embodiments, both the RuvC-like nuclease domain and the HNH-like nuclease domain are modified or eliminated such that the mutant Cas9 protein is unable to nick or cleave the target polynucleotide. In certain embodiments, a Cas9 protein that lacks some or all nuclease activity relative to a wild-type counterpart, nevertheless, maintains target recognition activity to a greater or lesser extent.

In certain embodiments, the Cas protein is a fusion protein comprising a naturally-occurring Cas or a variant thereof fused to another polypeptide or an effector domain. The another polypeptide or effector domain may be, for example, a cleavage domain, a transcriptional activation domain, a transcriptional repressor domain, or an epigenetic modification domain. In certain embodiments, the fusion protein comprises a modified or mutated Cas protein in which all the nuclease domains have been inactivated or deleted. In certain embodiments, the RuvC and/or HNH domains of the Cas protein are modified or mutated such that they no longer possess nuclease activity.

In certain embodiments, the effector domain of the fusion protein is a cleavage domain obtained from any endonuclease or exonuclease with desirable properties.

In certain embodiments, the effector domain of the fusion protein is a transcriptional activation domain. In general, a transcriptional activation domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc. ) to increase and/or activate transcription of a gene. In certain embodiments, the transcriptional activation domain is a herpes simplex virus VP16 activation domain, VP64 (which is a tetrameric derivative of VP16) , a NFκB p65 activation domain,

p53 activation domains

1 and 2, a CREB (cAMP response element binding protein) activation domain, an E2A activation domain, or an NFAT (nuclear factor of activated T-cells) activation domain. In certain embodiments, the transcriptional activation domain is Gal4, Gcn4, MLL, Rtg3, Gln3, Oaf1, Pip2, Pdr1, Pdr3, Pho4, or Leu3. The transcriptional activation domain may be wild type, or modified or truncated version of the original transcriptional activation domain.

In certain embodiments, the effector domain of the fusion protein is a transcriptional repressor domain, such as inducible cAMP early repressor (ICER) domains, Kruppel-associated box A (KRAB-A) repressor domains, YY1 glycine rich repressor domains, Sp1-like repressors, E (spI) repressors, I. kappa. B repressor, or MeCP2.

In certain embodiments, the effector domain of the fusion protein is an epigenetic modification domain which alters gene expression by modifying the histone structure and/or chromosomal structure, such as a histone acetyltransferase domain, a histone deacetylase domain, a histone methyltransferase domain, a histone demethylase domain, a DNA methyltransferase domain, or a DNA demethylase domain.

In certain embodiments, the Cas protein further comprises at least one additional domain, such as a nuclear localization signal (NLS) , a cell-penetrating or translocation domain, and a marker domain (e.g., a fluorescent protein marker) .

The Cas protein can be introduced into cancer cells as a (i) Cas protein, or (ii) mRNA encoding the Cas protein, or (iii) a linear or circular DNA encoding the protein. The Cas protein or construct encoding the Cas protein may be purified, or non-purified in a composition. Methods of introducing a protein or nucleic acid construct into a host cell are well known in the art, and are applicable to all methods described herein which requires introduction of a Cas protein or construct thereof to a cancer cell. In certain embodiments, the Cas protein is delivered into a cancer cell as a protein. In certain embodiments, the Cas protein is constitutively expressed from an mRNA or a DNA in a host cancer cell (e.g., engineered cancer cell) . In certain embodiments, the expression of Cas protein from mRNA or DNA is inducible or induced in a host cancer cell. In certain embodiments, a Cas protein can be introduced into a host cancer cell in Cas protein: sgRNA complex using recombinant technology known in the art. Exemplary methods of introducing a Cas protein or construct thereof have been described, e.g., in WO2014144761 WO2014144592 and WO2013176772, which are incorporated herein by reference in their entireties.

In some embodiments, the method uses a CRISPR/Cas9 system. Cas9 is a nuclease from the microbial type II CRISPR (clustered regularly interspaced short palindromic repeats) system, which has been shown to cleave DNA when paired with a single-guide RNA (sgRNA) . The sgRNA directs Cas9 to complementary regions in the target genome gene, which may result in site-specific double-strand breaks (DSBs) that can be repaired in an error-prone fashion by cellular non-homologous end joining (NHEJ) machinery. Wildtype Cas9 primarily cleaves genomic sites at which the gRNA sequence is followed by a PAM sequence (-NGG) . NHEJ-mediated repair of Cas9-induced DSBs induces a wide range of mutations initiated at the cleavage site which are typically small (<10 bp) insertion/deletions (indels) but can include larger (>100 bp) indels.

Cancer cell library

The cancer cell library described herein comprises a plurality of (e.g., at least about any of 2, 3, 4, 5, 10, 100, 1×10 ³, 1×10 ⁴, 1×10 ⁵, 1×10 ⁶, 1×10 ⁷, 2×10 ⁷, 1×10 ⁸ or more) cancer cells, wherein each of the plurality of cancer cells has a mutation (e.g., inactivating mutation) at a hit gene in the genome (e.g., human genome) , and wherein the hit gene in at least two of the plurality of cancer cells are different from each other.

In some embodiments, the cancer cell library comprises a plurality of cancer cells that have mutations (e.g., inactivating mutations) in at least about any of 2, 3, 4, 5, 10, 20, 50, 100, 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000, or more hit genes in a cell or organism. In some embodiments, the organism is human. In some embodiments, the cancer cell library comprises a plurality of cancer cells that have mutations (e.g., inactivating mutations) at about 100 to about 30,000 hit genes, such as about 500 to about 5000, or about 1000 to about 1500 hit genes. In some embodiments, the cancer cell library comprises at least about any of 2, 3, 4, 5, 10, 20, 50, 100, 200, 500, 1,000, 2,000, 5,000, 1×10 ⁴, 2×10 ⁴, 5×10 ⁴, 1×10 ⁵, 2×10 ⁵, 1×10 ⁶, 5×10 ⁶, 1×10 ⁷, 1.5×10 ⁷, 2×10 ⁷, 1×10 ⁸, 1×10 ⁹, 1×10 ¹⁰, or more cancer cells. In some embodiments, at least two cancer cells within the cancer cell library have mutations (e.g., inactivating mutation) at different target sites (e.g., different hit genes, or different sites within the same hit gene) . In some embodiments, each cancer cell within the cancer cell library has a mutation (e.g., inactivating mutation) at a different hit gene. In some embodiments, each cancer cell within the cancer cell library has a mutation (e.g., inactivating mutation) at a different target site (e.g., can be within the same hit gene, or within different hit genes) . In some embodiments, the cancer cell library does not contain cancer cells that have mutation (e.g., inactivating mutation) at the same hit gene, such as inactivating mutation at the same target site of the same hit gene, or inactivating mutations at different target sites of the same hit gene. In some embodiments, the cancer cell library does not contain cancer cells that have mutation (e.g., inactivating mutation) at the same target site. In some embodiments, the plurality of (e.g., at least about 2, 3, 4, 5, 10, 100, 500, 1000, 2000, 5000, 10000, 2×10 ⁷, or more) cancer cells within the cancer cell library have a mutation (e.g., inactivating mutation) at the same hit gene, such as inactivating mutation at the same target site of the same hit gene, or inactivating mutations at different target sites of the same hit gene. In some embodiments, the cancer cell library comprises a plurality of cancer cells that contain mutations (e.g., inactivating mutations) in at least about 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%, 95%, or more hit genes in the genome. In some embodiments, the cancer cell library comprises a plurality of cancer cells that contain mutations (e.g., inactivating mutations) at all genes in the genome (also referred to herein as “whole-genome cancer cell library” ) , such as all annotated genes of the human genome. In some embodiments, for each annotated gene in the genome or for each hit gene, there are at least two (e.g., 2, 3, 4, 5, or more, such as 3) cancer cells in the cancer cell library that each contains a mutation (e.g., inactivating mutation) in a different target site of the same hit gene, e.g., cancer cell A contains a mutation (e.g., inactivating mutation) in target site A’ of gene X, cancer cell B contains a mutation (e.g., inactivating mutation) in target site B’ of gene X, and cancer cell C contains a mutation (e.g., inactivating mutation) in target site C’ of gene X. In some embodiments, the cancer cell library is a targeted library, which contains mutations (e.g., inactivating mutations) at selected genes in a signaling pathway or associated with a cellular process, such as sensitivity or resistance to anti-cancer drug-mediated killing, cell proliferation, cell cycle, transcriptional regulation, ubiquitination, apoptosis, immune response such as autoimmune, tumor metastasis, tumor malignant transformation, etc. In some embodiments, the cancer cell library is used for a genome-wide screen associated with a particular modulated phenotype, such as sensitivity or resistance to anti-cancer drug-mediated killing. In some embodiments, the cancer cell library is used for a genome-wide screen to identify at least one target gene associated with a particular modulated phenotype, such as a target gene in a cancer cell that modulates the activity of the cancer cell in response to anti-cancer drug treatment. In some embodiments, the cancer cell library is a mammalian cancer cell library. Exemplary genomes of interest covered by the cancer cell library include genomes of a rodent (mouse, rat, hamster, guinea pig) , a domesticated animal (e.g., cow, sheep, cat, dog, horse, or rabbit) , a non-human primate (e.g., monkey) , fish (e.g., zebrafish) , non-vertebrate (e.g., Drosophila melanogaster and Caenorhabditis elegans) , and human. In some embodiments, the cancer cell library is a human cancer cell library, such as a human colorectal cancer cell library.

In some embodiments, the cancer cell library contains mutations at “cancer-related genes, ” e.g., genes whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients, and/or genes whose RNA expression level is up-regulated or down-regulated by at least about 1.2-fold (e.g., at least about any of 1.5, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more, such as about 2-fold) in cancer patients, such as based on literature or databases) . In some embodiments, the cancer cell library contains mutations at genes whose encoded mRNA and/or protein express within cells (in heathy cells or in cancer cells) . In some embodiments, the cancer cell library contains mutations at genes whose encoded protein express on the cell surface (in heathy cells or in cancer cells) . In some embodiments, the cancer cell library contains mutations at genes i) whose DNA mutation frequency is at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients (e.g., based on literature or databases) , ii) whose RNA expression level is up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) , and iii) whose encoded mRNA or protein express within the cell, or whose encoded protein express on the cell surface, either in heathy cells or in cancer cells. In some embodiments, the cancer cell library contains mutations at about 1323 colorectal cancer-related genes in the human genome with DNA mutation frequency ≥5%and RNA expression level up-or down-regulated by more than 2-fold from patients with stage III and IV colorectal cancer, with gene product either expressed in cell or on cell surface.

In some embodiments, a plurality of (e.g., about 2, 3, 4, 5, 10, 100, 500, 1000, 2000, 5000, 10000, or more) cancer cells within a cancer cell library have a mutation (e.g., inactivating mutation) at the same hit gene, such cancer cell library is also referred to as “having X-fold coverage for the hit gene, ” wherein “X” is the number of cancer cells with mutation (e.g., inactivating mutation) at the same hit gene. For example, for a cancer cell library that targets about 1000 hit genes and comprises about 2×10 ⁷ cancer cells, the cancer cell library has about 20,000-fold coverage for each hit gene. In some embodiments, the cancer cell library described herein has at least about 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 100-fold, 200-fold, 500-fold, 1,000-fold, 2,000-fold, 5,000-fold, 10,000-fold, or more fold coverage of each hit gene (e.g., cancer-related genes) , such as averagely about 600-fold to about 12000-fold, averagely about 600-fold to about 1200-fold, or averagely about 1200-fold to about 12000-fold for each hit gene. In some embodiments, the Cas9 ⁺ sgRNA cancer cell library has averagely about 600-fold to about 1200-fold coverage for each sgRNA. In some embodiments, the Cas9 ⁺ sgRNA (or mutagenic agent-induced mutation) cancer cell library described herein has averagely about 600-fold to about 1200-fold coverage of each hit gene (e.g., cancer-related genes) . In some embodiments, the Cas9 ⁺ sgRNA ^iBAR cancer cell library has averagely about 100-fold to about 1,000-fold, such as about 1000-fold, coverage for each sgRNA ^iBAR. In some embodiments, the Cas9 ⁺ sgRNA ^iBAR cancer cell library has averagely about 400-fold to about 4000-fold, such as about 4000-fold, coverage for each set of sgRNAs ^iBAR. In some embodiments, the Cas9 ⁺sgRNA ^iBAR cancer cell library described herein has averagely about 1200-fold to about 12,000-fold, such as about 12,000-fold, coverage of each hit gene (e.g., cancer-related genes) .

Mutations at hit genes

In some embodiments, all annotated genes in the genome (e.g., human genome) are selected as hit genes. In some embodiments, genes whose DNA mutation frequency are at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) in cancer patients (e.g., based on literature or databases) are selected as hit genes. In some embodiments, genes whose RNA expression levels are up-regulated or down-regulated by at least about 1.2-fold (e.g., at least about any of 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or higher, such as about 2-fold) in cancer patients (e.g., based on literature or databases) are selected as hit genes. In some embodiments, genes whose DNA mutation frequency are at least about 5% (e.g., at least about any of 10%, 20%, 30%, 40%, 50%, 60%. 70%, 80%, 90%, or higher) and whose RNA expression levels are up-regulated or down-regulated by more than about 2-fold (e.g., more than about any of 2.5, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100 folds, or more) in cancer patients (e.g., based on literature or databases) , such as in patients with stage III and/or IV colorectal cancer, are selected as hit genes. In some embodiments, a hit gene is further selected based on that the encoded mRNA or protein expresses within a cell, or that the encoded protein expresses on the cell surface, either in heathy cells or in cancer cells.

In some embodiments, the mutation at a hit gene is a pathogenic or inactivating mutation. An inactivating mutation described herein can be any mutation, such as insertion, deletion (indels) , substitution, frame shift, chromosomal rearrangement, or combinations thereof, that leads to complete abolishment or elimination of a gene’s expression (transcription and/or translation) and/or function. Inactivating mutations in some embodiments can completely abolish the transcription, translation, post-translation modification, association with other molecules (e.g., other molecules in a protein complex) , and/or function (e.g., signal transduction or receptor activation) of a gene. In some embodiments, the mutation at a hit gene is a mutation that reduces (e.g., reduces at least about any of 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or more) or affects (e.g., disrupts) one or more of hit gene transcription, hit gene translation, hit gene mRNA processing, hit gene mRNA stability, hit gene mRNA function, hit gene protein function, association with other molecules (e.g., other molecules in a protein complex) , and hit gene post-translation modification. The mutation (e.g., inactivating mutation) at a hit gene can be within one or more of regulatory region such as enhancer, promoter, 5’ untranslated region (UTR) , 3’UTR, or the coding region such as an exon or a splicing site, of a hit gene. A hit gene described herein can be any genomic sequence, such as a protein-encoding gene, a gene encoding an RNA, such as a small RNA (e.g., microRNA, piRNA, siRNA, snoRNA, tRNA, rRNA and snRNA) , a ribosomal RNA, a long non-coding RNA (lincRNA) , or a mitochondrial gene. The hit gene may be known to be associated with a particular phenotype (e.g., cancer phenotype) ; or has not been implicated in a particular phenotype, such as a known gene that is not known to be associated with a particular phenotype, or an unknown gene that has not been characterized. In some embodiments, the hit gene is a genomic sequence that does not encode anything, or not yet known to encode anything.

Pathogenic inactivating mutations (loss-of-function) of certain genes can be determined by review of experimental evidence within the published scientific literature and review of critical regions that may be disrupted, including but not limited to frameshift, missense mutations, truncating mutations, deletions, copy number variations, nonsense mutations, and loss or deletion of the gene. Pathogenic or inactivating mutation includes but not limited to homozygous deletions, bi-allelic (double hit) mutations, splice site mutations (e.g., a 2nd or an additional splice site mutation) , frameshift mutations, and nonsense mutations in coding region, missense mutations with confirmed impact.

In some embodiments, the cancer cell library is generated by subjecting (e.g., contacting) an initial population of cancer cells to mutagenic agents. Mutagenic agents can be classified into three categories: physical (e.g., gamma rays, ultraviolet radiations) , chemical (e.g., ethyl methane sulphonate or EMS) and transposable elements (such as transposons, retrotransposons, T-DNA, retroviruses) .

In some embodiments, the cancer cell library is generated by subjecting an initial population of cancer cells to gene editing. Any known gene editing methods can be used for generating cancer cell libraries described herein, such as Zinc-finger nucleases (ZFNs) , transcription activator-like effector nucleases (TALENs) , and CRISPR/Cas-based methods for gene editing or genome engineering. See, e.g., Gaj et al. (Trends Biotechnol. 2013; 31 (7) : 397–405) . In some embodiments, the cancer cell library is generated by subjecting an initial population of cancer cells to gene editing via CRISPR/Cas-based methods.

In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA library or an sgRNA ^iBAR library descried herein; and ii) a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., Cas9) , under a condition that allows introduction of the sgRNA constructs or sgRNA ^iBAR constructs and the Cas component into the initial population of cancer cells and generation of mutations at the hit genes. Hence in some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA library comprising a plurality of sgRNA constructs, wherein each sgRNA construct comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a corresponding hit gene; and ii) a Cas component comprising a Cas protein (e.g., Cas9) or a nucleic acid encoding the Cas protein, under a condition that allows introduction of the sgRNA constructs and the Cas component into the initial population of cancer cells and generation of mutations at the hit genes. In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells with i) an sgRNA ^iBAR library comprising a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise three or more (e.g., 3, 4, 5, or more, such as 4) sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein the guide sequences for the three or more sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the three or more sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary (e.g., at least about any of 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a different target site of a hit gene (e.g., different hit genes, or different sites within the same hit gene) ; and ii) a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein, under a condition that allows introduction of the sgRNA constructs and the Cas component into the initial population of cancer cells and generation of mutations at the hit genes. In some embodiments, each set of sgRNA ^iBAR constructs comprises four sgRNA ^iBAR constructs, and wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other. In some embodiments, the sgRNA library or the sgRNA ^iBAR library, and the Cas component, are introduced into the initial population of cancer cells via separate vectors (e.g., lentiviral vectors) or separate viruses. In some embodiments, the sgRNA library or the sgRNA ^iBAR library, and the Cas component, are introduced into the initial population of cancer cells via the same vector or the same virus. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is introduced into the initial population of cancer cells via lentiviral vectors or lentiviruses, and the Cas component is introduced into the initial population of cancer cells as mRNA encoding the Cas component (e.g., Cas9) . In some embodiments, the initial population of cancer cells already each carries a Cas component (e.g., transgenic Cas9, or Cas9 introduced as mRNA; hereinafter also referred to as “Cas9 ⁺ cancer cells” ) , and the sgRNA library or the sgRNA ^iBAR library is then introduced into each cell via a vector (e.g., lentiviral vector) or virus (e.g., lentivirus) .

In some embodiments, the cancer cell library only comprises the sgRNA library or the sgRNA ^iBAR library described herein and does not comprise a Cas component (e.g., Cas9) , i.e., the hit genes targeted by the sgRNA library or the sgRNA ^iBAR library have not been inactivated in the cancer cell library yet, until a Cas component (e.g., Cas9) is further introduced. Cancer cell libraries only comprising an sgRNA library or an sgRNA ^iBAR library described herein are referred to hereinafter as “sgRNA cancer cell library, ” or “sgRNA ^iBAR cancer cell library. ” In some embodiments, the cancer cell library comprises both the sgRNA library or the sgRNA ^iBAR library, and the Cas component (e.g., Cas9) , i.e., the cancer cell library comprises inactivated hit genes. In some embodiments, the initial population of cancer cells express a Cas protein. In some embodiments, the cancer cell library is generated by contacting an initial population of cancer cells expressing a Cas protein with an sgRNA library or an sgRNA ^iBAR library descried herein, which will result in cancer cell library comprising inactivated hit genes. Cancer cell libraries comprising an sgRNA library or an sgRNA ^iBAR library described herein, and a Cas9 component (e.g., Cas9 protein, or nucleic acid encoding thereof) are referred to hereinafter as “Cas9 ⁺ sgRNA cancer cell library, ” or “Cas9 ⁺ sgRNA ^iBAR cancer cell library. ”

In some embodiments, the Cas component (e.g., Cas9) is introduced into the cancer cells before the introduction of the sgRNA library or the sgRNA ^iBAR library. In some embodiments, the cancer cells are sorted to obtain Cas ⁺ cancer cells before the introduction of the sgRNA library or the sgRNA ^iBAR library. In some embodiments, the sgRNA library or the sgRNA ^iBAR library is introduced into the cancer cells before the introduction of the Cas component (e.g., Cas9) . In some embodiments, the cancer cells are sorted to obtain sgRNA ⁺ or sgRNA ^iBAR+ cancer cells before the introduction of the Cas component (e.g., Cas9) . In some embodiments, the Cas component (e.g., Cas9) and the sgRNA library or the sgRNA ^iBAR library are introduced into the cancer cells at the same time. In some embodiments, the cancer cells are sorted to obtain Cas ⁺ sgRNA ⁺ cancer cells (Cas ⁺ sgRNA ⁺ cancer cell library) or Cas ⁺sgRNA ^iBAR+ cancer cells (Cas ⁺ sgRNA ^iBAR+ cancer cell library) , before the drug-treatment.

In some embodiments, at least about 50% (such as at least about any of 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or more) of the sgRNA constructs in the sgRNA library, or the sgRNA ^iBAR constructs in the sgRNA ^iBAR library, or the sets of sgRNA ^iBAR constructs in the sgRNA ^iBAR library, are introduced into the initial population of cancer cells, or Cas9 ⁺ cancer cells described herein. In some embodiments, at least about 95% (e.g., at least about any of 96%, 97%, 98%, 99%, or more) of the sgRNA constructs in the sgRNA library, or the sgRNA ^iBAR constructs in the sgRNA ^iBAR library, or the sets of sgRNA ^iBAR constructs in the sgRNA ^iBAR library, are introduced into the initial population of cancer cells, or Cas9 ⁺ cancer cells. In some embodiments, the hit gene inactivating efficiency by the sgRNA library or the sgRNA ^iBAR library is at least about 80%, such as at least about any of 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more. In some embodiments, the hit gene inactivating efficiency by the sgRNA library or the sgRNA ^iBAR library is at least about 90%.

In some embodiments, the cancer cell library comprises one or a plurality of (e.g., about 2, 3, 4, 5, 8, 10, 100, 250, 400, 500, 1,000, 2,000, 5,000, 10,000, or more) cancer cells that comprise the same sgRNA construct or the same sgRNA ^iBAR construct, which targets the same target site of a hit gene. Such cancer cell library is also referred to as “having X-fold coverage for the sgRNA/sgRNA ^iBAR” or “having X-fold coverage for each sgRNA/sgRNA ^iBAR, ” wherein “X” is the number of cancer cells expressing the same sgRNA or sgRNA ^iBAR. In some embodiments, the cancer cell library has about 1 to about 12,000 fold coverage for each sgRNA or sgRNA ^iBAR, or each set of sgRNA ^iBAR, such as any of about 1,000 to about 5,000, about 1 to about 1,000, about 10 to about 100, about 50 to about 500, about 80 to about 200, about 100 to about 400, about 100 to about 800, about 100 to about 1,000, or about 300 to about 600 fold coverage of each sgRNA or sgRNA ^iBAR, or each set of sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 100-fold, 400-fold, 500-fold, 1,000-fold, 2,000-fold, 5,000-fold, 10,000-fold, or more fold coverage of each sgRNA or sgRNA ^iBAR, or each set of sgRNA ^iBAR.

In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 200-, 400-, 500-, 1,000-, 5,000-, or more fold) coverage for each sgRNA or mutation (e.g., mutagenic agent-induced mutation) . In some embodiments, each hit gene is targeted by about 6 to about 12 different sgRNAs, or has mutations in at least 2 (e.g., about 6 to about 12) different target sites. In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 200-, 300-, 400-, 500-, 1,000-, 5,000-, or more fold) coverage for each hit gene, such as about 600-fold to about 1200-fold coverage for each hit gene.

In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 200-, 400-, 500-, 1,000-, 5,000-, or more fold) coverage for each sgRNA ^iBAR, such as about 100-fold to about 1000-fold, or about 1000-fold coverage for each sgRNA ^iBAR. In some embodiments, the cancer cell library has at least about 400-fold (e.g., at least about any of 800-, 1000-, 2000-, 4000-, 16,000-, or more fold) coverage for each set of sgRNA ^iBAR, such as about 400-fold to about 4000-fold, or about 4000-fold coverage for each set of sgRNAs ^iBAR. In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 200-, 400-, 500-, 1,000-, 5,000-, or more fold) coverage for the sgRNA ^iBAR library, such as about 100- fold to about 1000-fold, or about 1000-fold coverage for the sgRNA ^iBAR library. In some embodiments, the cancer cell library has at least about 400-fold (e.g., at least about any of 800-, 1000-, 2000-, 4000-, 10,000, 16,000-, or more fold) coverage for each hit gene, such as about 1200-fold to about 12,000-fold coverage for each hit gene, or about 12,000-fold coverage for each hit gene. In some embodiments, the sgRNA ^iBAR library targets every annotated gene in the genome (i.e., the sgRNA ^iBAR library is a whole-genome sgRNA ^iBAR library) . In some embodiments, the cancer cell library has at least about 100-fold (e.g., at least about any of 400-fold, 800-fold, 1000-fold, or 1200-fold) coverage for the whole-genome sgRNA ^iBAR library.

Endogenous mutation (s)

In some embodiments, the cancer cells in the initial population of cancer cells or in the final cancer cell library may comprise endogenous mutation (s) not generated by the CRISPR/Cas system or mutagenic agents (e.g., EMS) , such as naturally occurring mutations, or mutations in cancer cells that do not meet the hit gene selection criteria (e.g., DNA mutation frequency is at least about 5%, and/or RNA expression level is up-regulated or down-regulated by more than about 2-fold in cancer patients, and/or the encoded RNA/protein is expressed within cell or the encoded protein is expressed on the cell surface) . Endogenous mutation (s) should not affect the target gene identification methods described herein, as the profiles of sgRNAs or hit gene mutations in the post-treatment cancer cell population are compared to a control cancer cell population comprising the same endogenous mutation (s) .

Cancer cells

In some embodiments, there is provided a method of editing a genomic locus in a cancer cell, comprising introducing into a host cancer cell (e.g., initial cancer cell, unmodified cancer cell) a guide RNA construct comprising a guide sequence targeting a genomic locus (e.g., a target site of a hit gene) and a guide hairpin sequence coding for a Repeat: Anti-Repeat Duplex and a tetraloop, wherein an iBAR is embedded in the tetraloop serving as internal replicates, expressing the guide RNA that targets the genomic locus in the host cancer cell, and thereby editing the targeted genomic locus (e.g., hit gene) in the presence of a Cas nuclease (e.g., Cas9) .

In some embodiments, there is provided a cancer cell library prepared by transfecting any one of the sgRNA libraries or the sgRNA ^iBAR libraries described herein to a plurality of host cancer cells (e.g., an initial population of cancer cells, with or without Cas component) , wherein the sgRNA constructs or the sgRNA ^iBAR constructs are present in viral vectors (e.g., lentiviral vectors) or viruses (e.g., lentiviruses) . In some embodiments, the method further comprises introducing into the initial population of cancer cells a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein, e.g., as Cas9 mRNA. In some embodiments, the multiplicity of infection (MOI) between the viral vectors or viruses and the host cancer cells (e.g., initial population of cancer cells) during the transfection is at least about 1. In some embodiments, the MOI is at least about any one of 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, or higher. In some embodiments, the MOI is about 1, about 1.5, about 2, about 2.5, about 3, about 3.5, about 4, about 4.5, about 5, about 5.5, about 6, about 6.5, about 7, about 7.5, about 8, about 8.5, about 9, about 9.5, or about 10. In some embodiments, the MOI is about any one of 1-10, 1-3, 3-5, 5-10, 2-9, 3-8, 4-6, or 2-5. In some embodiments, the MOI between the viral vectors or viruses and the host cancer cells (e.g., initial population of cancer cells) during transfection is less than 1, such as less than about any of 0.8, 0.5, 0.3, or lower. In some embodiments, the MOI is about 0.3 to about 1. In some embodiments, the viral sgRNA library or the viral sgRNA ^iBAR library is contacted with the initial population of cancer cells at an MOI of at least about 2, such as at least about 3.

In some embodiments, one or more vectors driving expression of one or more elements of a CRISPR/Cas system are introduced into a host cancer cell (e.g., the initial population of cancer cells) such that expression of the elements of the CRISPR system directs formation of a CRISPR complex with an sgRNA molecule or an sgRNA ^iBAR molecule described herein at one or more target sites of one or more hit genes. In some embodiments, the host cancer cell (e.g., the initial population of cancer cells) has been introduced a Cas nuclease (e.g., Cas9 mRNA) or is engineered to stably express CRISPR/Cas nuclease.

In some embodiments, the host cancer cell (e.g., the initial population of cancer cells) is a cancer cell line, such as a pre-established cancer cell line. The host cancer cells and cancer cell lines may be human cancer cells or cancer cell lines, or they may be non-human, mammalian cancer cells or cancer cell lines. In some embodiments, the host cancer cell is difficult to transfect with a viral vector, such as lentiviral vector, at a low MOI (e.g., lower than 1, 0.5, or 0.3) . In some embodiments, the host cancer cell is difficult to edit using a CRISPR/Cas system at low MOI (e.g., lower than 1, 0.5, or 0.3) . In some embodiments, the host cancer cell is available at a limited quantity. In some embodiments, the host cancer cell is obtained from a tumor sample from an individual (e.g., human cancer patient) .

The methods described herein are suitable for identifying sensitive or resistant target genes in a variety of cancer cells, including both solid cancer and hematologic cancer, as well as cancers of all stages, including early stage cancer, non-metastatic cancer, primary cancer, advanced cancer, locally advanced cancer, metastatic cancer, or cancer in remission. In some embodiments, the solid or hematologic cancer can be of any of stages I, II, III, and IV, according to the American Joint Committee on Cancer (AJCC) staging groups.

In some embodiments, the cancer is a solid cancer selected from the group consisting of colon cancer, rectal cancer, renal-cell carcinoma, liver cancer, non-small cell carcinoma of the lung, cancer of the small intestine, cancer of the esophagus, melanoma, bone cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular malignant melanoma, uterine cancer, breast cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, testicular cancer, uterine cancer, carcinoma of the fallopian tubes, carcinoma of the endometrium, carcinoma of the cervix, carcinoma of the vagina, carcinoma of the vulva, Hodgkin's Disease, non-Hodgkin's lymphoma (NHL) , cutaneous T-cell lymphoma (CTCL) , cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, cancer of the urethra, cancer of the penis, solid tumors of childhood, cancer of the bladder, cancer of the kidney or ureter, carcinoma of the renal pelvis, neoplasm of the central nervous system (CNS) , primary CNS lymphoma, tumor angiogenesis, spinal axis tumor, brain stem glioma, pituitary adenoma, Kaposi's sarcoma, epidermoid cancer, squamous cell cancer, T-cell lymphoma, environmentally induced cancers, combinations of said cancers, and metastatic lesions of said cancers.

In some embodiments, the cancer is a hematologic cancer chosen from one or more of acute myeloid leukemia (AML) , chronic lymphocytic leukemia (CLL) , acute leukemia, acute lymphoid leukemia (ALL) , B-cell acute lymphoid leukemia (B-ALL) , T-cell acute lymphoid leukemia (T-ALL) , chronic myelogenous leukemia (CML) , B cell prolymphocytic leukemia, blastic plasmacytoid dendritic cell neoplasm (BPDCN) , Burkitt’s lymphoma, diffuse large B cell lymphoma, follicular lymphoma, hairy cell leukemia, small cell-or a large cell-follicular lymphoma, malignant lymphoproliferative conditions, MALT lymphoma, mantle cell lymphoma, marginal zone lymphoma, multiple myeloma, myelodysplasia and myelodysplastic syndrome, non-Hodgkin's lymphoma, Hodgkin's lymphoma, plasmablastic lymphoma, plasmacytoid dendritic cell neoplasm, Waldenstrom macroglobulinemia, or pre-leukemia.

In some embodiments, the cancer cells are derived from cancer cell lines. The cancer cells in some embodiments are obtained from a xenogeneic source, for example, from mouse, rat, non-human primate, and pig. In some embodiments, the cancer cells are human cancer cells. In some aspects, the cancer cells are primary cells, such as those isolated directly from a subject and/or isolated from a subject and frozen. In some embodiments, the initial population of cancer cells is homogenous. In some embodiments, the initial population of cancer cells is heterogeneous, such as primary cancer cells, or comprising same cancer cells of mixed stages, or mixed cell lines of the same cancer type (such as colorectal cancer) . In some embodiments, after collecting cancer cells from a subject, the cancer cells are sorted to obtain a subset of cancer cells, e.g., using the immunomagnetic bead method. In some embodiments, cancer cells are obtained from a patient directly following a cancer treatment (e.g., with an anti-cancer agent) . It is contemplated within the context of the present invention to collect cancer cells during their recovery phase, as host cancer cells, or to test hit gene expression level change.

In some embodiments, the cancer cell is a stage III or IV colorectal cancer cell. In some embodiments, the initial population of cancer cells is HCT116 (human colon cancer cell line) . In some embodiments, the initial population of cancer cells is SW480 (human colorectal adenocarcinoma cell line) . In some embodiments, the colorectal cancer is any of advanced colon cancer, malignant colon cancer, metastatic colon cancer, stage I, II, III, or IV colon cancer, a colon cancer characterized with a genomic instability, a colon cancer characterized with an alteration of a pathway, a colon cancer classified under the colon cancer subtype (CCS) system as CCS1, CCS2, or CCS3, a colon cancer classified under colorectal cancer assigner (CRCA system) as stem-like, goblet-like, inflammatory, transit-amplifying, or enterocyte subtype, a colon cancer classified under the colon cancer molecular subtype (CCMS) system as C1, C2, C3, C4, C5, or C6 subtype, a colon cancer classified under the CRC intrinsic subtype (CRCIS) system as Type A, Type B, or Type C subtype, or a colon cancer classified under the colorectal cancer subtyping consortium (CRCSC) classification system as CMS1, CMS2, CMS3, or CMS4. In some embodiments, the colon cancer has a microsatellite instability (MSI) status of MSI-high or MSI-low. In some embodiments, the cancer cells are obtained from an individual (e.g., human) who has previously undergone a therapy (e.g., chemotherapy, radiation, surgery or immunomodulatory therapy) . In some embodiments, the individual does not respond to a previous therapy (e.g., chemotherapy, radiation, surgery or immunomodulatory therapy) .

Cancer cells, such as the initial population of cancer cells, or the cancer cell library described herein, can be cultured using any suitable methods or media known in the art. See, e.g., Cree, Ian A. (Ed. ) , “Cancer Cell Culture. Methods and Protocols, 2 ^nd Edition, ” 2011, Springer Science + Business Media, New York, NY, USA.

Anti-cancer drug treatments and obtaining cancer cells that are resistant to the anti-cancer drug

The methods described herein comprise subjecting the cancer cell library described herein (e.g., cancer cell library generated by mutagenic agent (s) , Cas9 ⁺ sgRNA cancer cell library, or Cas9 ⁺ sgRNA ^iBAR cancer cell library) to treatment with an anti-cancer drug, and obtaining a cancer cell from the post-treatment cancer cell library that is resistant to the killing of the anti-cancer drug. In some embodiments, the method comprises contacting the cancer cell library described herein with an anti-cancer drug, and growing the cancer cell library to obtain a post-treatment cancer cell population. Also see Example 1 and FIG. 2 for exemplary methods.

Anti-cancer drugs

Any agent for treating cancer can be used herein as an anti-cancer drug. Anti-cancer drugs include, but are not limited to, anticancer substances for all types and stages of cancer and cancer treatments (chemotherapeutic, proliferative, acute, genetic, spontaneous etc. ) , anti-proliferative agents, chemosensitizing agents, anti-inflammatory agents (including steroidal and non-steroidal anti-inflammatory agents and anti-pyretic agents) , antioxidants, hormones, immunosuppressants, enzyme inhibitors, cell growth inhibitors and anti-adhesion molecules, inhibitors of DNA, RNA or protein synthesis, anti-angiogenic factors, antisecretory factors, radioactive agents. In some embodiments, the anti-cancer drug is a small molecule drug. In some embodiments, the anti-cancer drug is an antibody. In some embodiments, the anti-cancer drug is an antibody drug conjugate (ADC) .

In some embodiments, the anti-cancer drug is a PARP inhibitor. In some embodiments, the PARP inhibitor is any of talazoparib, veliparib, pamiparib, olaparib, rucaparib, veliparib, CEP 9722, E7016, iniparib, or 3-aminobenzamide.

Contacting a cancer cell library with an anti-cancer drug step

In some embodiments, treatment with an anti-cancer agent (hereinafter also referred to as “the anti-cancer drug treatment step, ” “the anti-cancer drug treatment step b) , ” or “step b) ” ) comprises a single step of contacting the cancer cell library with an anti-cancer drug. In some embodiments, step b) comprises contacting the cancer cell library with an anti-cancer drug at a concentration of at least about IC5 (e.g., at least about any of IC10, IC20, IC30, IC40, IC50, IC60, IC70, IC80, IC90, IC95, or higher, or about IC20 to about IC95) for at least about 1 (e.g., at least about any of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, or more) doubling time. “IC50” , or half maximal inhibitory concentration (IC) , refers to the concentration of an inhibitory substance (e.g. anti-cancer drug) needed to inhibit, in vitro, a given biological process (e.g., cancer cell proliferation) or biological component (e.g., cancer cell) by 50%. Similarly, IC70, or 70%inhibitory concentration, herein refers to the inhibitory concentration of an anti-cancer drug needed to inhibit cancer cell proliferation by 70% (or to kill 70%cancer cells) . In some embodiments, a drug toxicity curve is measured to determine anti-cancer drug concentration before treatment step b) . Briefly, a series of anti-cancer drug concentrations are tested on a population of cancer cells (e.g., the initial population of cancer cells that are not modified) , let cells grow for a few (e.g., 3) doubling time with the presence of anti-cancer drug, and then cell survival percentage or cell killing rate is plotted against anti-cancer drug concentration to obtain IC (e.g., IC50, IC70, etc. ) . In some embodiments, an ATP assay (e.g.,

Luminescent Cell Viability Assay) can be conducted for measuring the drug toxicity curve. Cell killing rate or death rate can also be tested using any other known methods, such as propidium iodide (PI) staining. In some embodiments, the cell culture medium containing the anti-cancer drug is changed once, twice, 3, 4, 5, 6, or more times every day, or every 2, 3, 4, 5, 6, 7, 8, 9, 10, or longer days, with continuously provided anti-cancer drug. In some embodiments, the cell culture medium is changed after every doubling time, e.g., changing cell culture medium twice per day, and one doubling time is 12 hours. In some embodiments, the cell culture medium is changed after at least about 2 doubling time, such as at least about any of 3, 4, 5, 6, 7, or more doubling time. In some embodiments, the cell culture medium containing the anti-cancer drug is changed every 3 days. For example, the cell culture medium containing the anti-cancer drug is changed every 3 days, while the doubling time is about 20 to about 40 hours, such as about 21 hours, or about 38 hours.

In some embodiments, the anti-cancer drug treatment step b) comprise contacting the cancer cell library with an anti-cancer drug at a concentration of about IC50 to about IC70 (e.g., about any of IC50, IC55, IC60, IC65, IC70, or any values in-between) for about 9 to about 10 (e.g., about any of 9, 9.5, 10, or any values in-between) doubling time. In some embodiments, the anti-cancer drug treatment step b) comprise contacting the cancer cell library with an anti-cancer drug at a concentration of about IC50 to about IC70 (e.g., about any of IC50, IC55, IC60, IC65, IC70, or any values in-between) for about 15 to about 16 (e.g., about any of 15, 15.5, 16, or any values in-between) doubling time. In some embodiments, the anti-cancer drug treatment step b) comprise contacting the cancer cell library with an anti-cancer drug at a concentration of about IC50 to about IC70 (e.g., about any of IC50, IC55, IC60, IC65, IC70, or any values in-between) for about 18 to about 19 (e.g., about any of 18, 18.5, 19, or any values in-between) doubling time.

In some embodiments, the anti-cancer drug treatment step comprises (or consists essentially of, or consists of) contacting the cancer cell library with the anti-cancer agent for at least about 24 hours, such as at least about any of 30 hours, 36 hours, 48 hours, 50 hours, 52 hours, 54 hours, 56 hours, 58 hours, 60 hours, 62 hours, 64 hours, 66 hours, 68 hours, 70 hours, 72 hours, 74 hours, 76 hours, 78 hours, 80 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 12 days, 14 days, 16 days, 18 days, 20 days, 24 days, 30 days, or longer. In some embodiments, the anti-cancer drug treatment step comprises (or consists essentially of, or consists of) contacting the cancer cell library with an anti-cancer drugs for about 6 to about 10 days, about 12 to about 14 days, about 14 to about 16 days, or about 22 to about 26 days.

The longer the anti-cancer drug contacting time, and/or the higher the anti-cancer drug concentration, the harsher the treatment condition.

In some embodiments, during the anti-cancer drug contacting step, the cancer cells (e.g., those not sensitive or less sensitive to anti-cancer drug killing) continue to grow. In some embodiments, the cancer cells are passaged every 1, 2, 3, 4, 5, or more (such as 3) doubling time, while keeping the same or similar (e.g., within about 10%difference) library fold coverage for each hit gene (or for each mutation, or sgRNA, or sgRNA ^iBAR) for continuous anti-cancer drug treatment. In some embodiments, cancer cells are passaged when reaching about 90%confluence.

Growing the anti-cancer drug treated cancer cell library to obtain a post-treatment cancer cell population step

In some embodiments, obtaining a cancer cell from the anti-cancer drug treated cancer cell library that is resistant to the anti-cancer drug (hereinafter also referred to as “the post-treatment cancer cell population obtaining step, ” “the cancer cell obtaining step c) , ” or “step c) ” ) comprises a single step of growing the anti-cancer drug treated cancer cell library to obtain a post-treatment cancer cell population. In some embodiments, the obtained post-treatment cancer cell population is an alive population, i.e., resistant to anti-cancer drug killing.

In some embodiments, step b) and growing cells in step c) can happen at the same time or have overlap (s) , for example, drug treatment can overlap with cell growth period, hereinafter also referred to as “treatment/growth step. ” See, e.g., Example 1. For example, in some embodiments, the cancer cell library is contacted with an anti-cancer drug (step b) ) by providing the anti-cancer drug in the culture medium, the cancer cells are allowed to grow (step c) ) while being treated by the anti-cancer drug containing medium continuously (step b) ) , anti-cancer drug containing medium can be changed every few hours or days such as every 3 days (step c) ) , and cancer cells are collected after certain doubling time (e.g., about 9 to about 10 doubling time, or about 15 to about 16 doubling time) to obtain a post-treatment cancer cell population (step c) ) . In some embodiments, the cancer cells are passaged every 1, 2, 3, 4, 5, or more (such as 3) doubling time, while keeping the same or similar (e.g., within about 10%difference) library fold coverage for each hit gene (or for each mutation, or sgRNA, or sgRNA ^iBAR) for continuous anti-cancer drug treatment. In some embodiments, cancer cells are passaged when reaching about 90%confluence.

In some embodiments, growing the anti-cancer drug treated cancer cell library to obtain a post-treatment cancer cell population comprises a “recovery step” after the anti-cancer drug treatment, i.e., the post-treated cancer cells are grown in a fresh medium without any anti-cancer drug. Hence in some embodiments, step c) comprises a recovery step comprising growing the treated cancer cells without the presence of anti-cancer drug after the anti-cancer drug contacting step b) . In some embodiments, the recovery step comprises growing the cancer cells after contacting the cancer cell library with an anti-cancer drug for at least about 24 hours, such as at least about any of 26 hours, 28 hours, 30 hours, 32 hours, 34 hours, 36 hours, 38 hours, 40 hours, 48 hours, 52 hours, 56 hours, 60 hours, 64 hours, 68 hours, 72 hours, 78 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 12 days, 14 days, 16 days, 18 days, 20 days, 24 days, 30 days, or longer.

The culturing condition during the “recovery step” should be suitable for cancer cell growth and/or proliferation. In some embodiments, the culturing condition does not induce cancer cells to a specific phenotype during expansion/growth. Such culture conditions are well known in the art. For example, in 37℃, 5%CO ₂ incubator. Also see Cree, Ian A. id. In some embodiments, the culture medium is a cancer cell complete medium. In some embodiments, the culture condition is the same as that for the cancer cell library before anti-cancer drug treatment. The type of culture media for successful culture can vary depending on the types of cancer cells. In some embodiments, the culture medium is further supplemented with an agent for selectable markers, e.g., to select cancer cells that do not lose transgenes or mutations during proliferation.

In some embodiments, “obtaining a post-treatment cancer cell population” comprises (or consists essentially of, or consists of) a simple “harvest step, ” i.e., removing culture medium (can contain dead cells or floating cells) and collecting the remaining cancer cells after anti-cancer drug treatment/growth step, or collecting the remaining cancer cells after the recovery step. The cancer cell harvest step in some embodiments comprises collecting the post-treatment/growth or post-recovery cancer cells into a container (e.g., Falcon tubes, EP tubes, or centrifugation tubes) for storage or for later experiments. In some embodiments, the harvest step comprises washing the obtained cancer cells, so that the cancer cells are in suitable condition for storage (e.g., 4℃, -20℃, or -80℃ storage) or later experiments (e.g., cell lysis, PCR, or sequencing) . For example, for adherent cancer cells, after removing culture medium (contains dead cells or floating cells) , remaining cancer cells in the cell culture container (e.g., cell culture dish) are dissociated using trypsin and collected (e.g., transferred to a fresh container) . The obtained post-treatment cancer cell population will be alive cancer cells, or those resistant to anti-cancer drug killing.

Optional enrichment step

If one desires to obtain an alive (or drug resistant) population of post-treatment/growth or post-recovery cancer cells from non-adherent cancer cells (e.g., hematopoietic cancer) , or if an enriched (or purer) alive population of post-treatment/growth or post-recovery cancer cells is desired (e.g., from adherent or non-adherent cancer cells) , the method of “obtaining a post-treatment cancer cell population” can comprise an “enrichment step, ” comprising sorting the cancer cells to obtain purely alive cancer cell population. In some embodiments, “obtaining a post-treatment cancer cell population” comprises sorting the post-treatment/growth or post-recovery cancer cell population to obtain an alive cancer cell population, i.e., a post-treatment cancer cell population that is resistant to the anti-cancer drug (hereinafter also referred to as “alive enrichment” ) .

In some embodiments, the enrichment step further comprises staining the post-treatment/growth or post-recovery cancer cells with a cell viability marker (e.g., dye) before sorting. Methods and reagents for assessing cell viability are well known in the art, e.g., fluorescent based or colorimetric (enzymatic) based. For example, membrane permeability-based assays such as staining with DAPI, propidium iodide (PI) , 7-AAD, or amine-reactive dyes indicates dead cells; while acridine orange stains viable cells more efficiently. Carboxyfluorescein diacetate (CFDA) is a nonfluorescent, cell permeable dye that is hydrolyzed to form the fluorescent molecule carboxyfluorescein by nonspecific intracellular esterases present only in viable cells. CFDA-SE is a derivative of CFDA that is better retained upon hydrolysis, in viable cells. Tetramethylrhodamine ethyl esters (TMRE) and Tetramethylrhodamine methyl esters (TMRM) localize to mitochondria in healthy cells and to the cytoplasm in dying cells. JC-1 is a commonly used potentiometric dye. In healthy cells JC-1 localizes to the mitochondria, where it forms red fluorescent aggregates. Upon breakdown of the mitochondrial membrane potential, JC-1 diffuses throughout the cell and exists as a green fluorescent monomer. BrdU incorporation into newly synthesized DNA indicates live cells.

In some embodiments, the enrichment step further comprises staining the post-treatment/growth or post-recovery cancer cells with propidium iodide (PI) before sorting, wherein PI staining indicates cell death. Thus in some embodiments, the enrichment step comprises sorting the post-treatment/growth or post-recovery cancer cells that are PI-negative (no PI staining) , thus obtaining a post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) . Any cell sorting methods can be used herein, such as Fluorescence-activated cell sorting (FACS) , Magnetic-activated cell sorting (MACS) , microfluidic cell-sorting, buoyancy-activated cell sorting (BACS) , etc.

Hence in some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) , and harvesting cancer cells by removing the cell culture medium containing the anti-cancer drug (and dead floating cells) and collecting the remaining adherent cancer cells (e.g., by trypsinization) , thus obtaining a post-treatment cancer cell population. For adherent cancer cells, these are largely alive or all alive, or resistant to the anti-cancer drug.

In some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) , removing the cell culture medium containing the anti-cancer drug (and dead floating cells) , growing the remaining adherent cancer cells in a cell culture medium not containing the anti-cancer drug (a recovery step) , and harvesting cancer cells by removing the cell culture medium and collecting the remaining adherent cancer cells (e.g., by trypsinization) , thus obtaining a post-treatment cancer cell population. For adherent cancer cells, these are largely alive or all alive, or resistant to the anti-cancer drug.

In some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) , optionally removing the cell culture medium containing the anti-cancer drug, staining the remaining cancer cells with a cell viability marker (e.g., PI) , sorting cancer cells that are alive (PI-negative, e.g., by FACS) , thus obtaining a post-treatment cancer cell population. The obtained post-treatment cancer cell population are enriched alive cancer cells, or resistant to the anti-cancer drug. For non-adherent cancer cells (e.g., hematopoietic cancer cells) , cell culture medium is not removed before staining and sorting, or a centrifugation step is added to collect all cancer cells (mixture of alive and dead cells) when getting rid of the cell culture medium.

In some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (e.g., for about 9 to about 10 doubling time, or for about 15 to about 16 doubling time) , removing the cell culture medium containing the anti-cancer drug (and dead floating cells) , growing the remaining adherent cancer cells in a cell culture medium not containing the anti-cancer drug (a recovery step) , removing the cell culture medium, staining the remaining cancer cells with a cell viability marker (e.g., PI) , sorting cancer cells that are alive (PI-negative, e.g., by FACS) , thus obtaining a post-treatment cancer cell population that is enriched alive cancer cells and is resistant to the anti-cancer drug.

Optional second treatment step

In some embodiments, the cancer cell library is subjected to two treatment steps. In some embodiments, the method described herein comprises a second treatment step comprising contacting the post-initial treatment cancer cells (with or without further cultured during a recovery step, or with or without sorting alive cancer cells during an enrichment step) with the anti-cancer drug. In some embodiments, the treatment condition of the two treatment steps are the same, i.e., anti-cancer drug concentrations are the same, and treatment periods are the same. In some embodiments, the treatment condition of the two treatment steps are different. In some embodiments, the second treatment step is harsher than the first treatment step, i.e., the cancer cells are contacted with higher concentration of anti-cancer drug in the second treatment step, such as at least about any of 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 15-fold, or 20-fold higher concentration compared to that in the first treatment step; and/or the cancer cells are contacted with the anti-cancer drug with a longer period, such as about any of 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 10 hours, 12 hours, 24 hours, 36 hours, 48 hours, 60 hours, 72 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, or 10 days longer as compared to the first treatment step. In some embodiments, the second treatment step is milder than the first treatment step, i.e., the cancer cells are contacted with lower concentration of anti-cancer drug in the second treatment step, such as at least about any of 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 15-fold, or 20-fold lower concentration compared to that in the first treatment step; and/or the cancer cells are contacted with the anti-cancer drug with a shorter period, such as about any of 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 10 hours, 12 hours, 24 hours, 36 hours, 48 hours, 60 hours, 72 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, or 10 days less as compared to the first treatment step.

Optional second recovery step

In some embodiments, after the second treatment step (or after an enrichment step for alive cancer cells after the second treatment step) , the method further comprises an additional recovery step comprising growing the post-second treatment cancer cells in a fresh medium without any anti-cancer drug. In some embodiments, the second recovery step has the same culturing condition as in the first recovery step, e.g., same culture duration. In some embodiments, the second recovery step has a different culturing condition as in the first recovery step. In some embodiments, the second recovery step is longer than the first recovery step, such as at least about any of 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 10 hours, 12 hours, 24 hours, 36 hours, 48 hours, 60 hours, 72 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, or 10 days longer than the first recovery step. In some embodiments, the second recovery step is shorter than the first recovery step, such as at least about any of 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 10 hours, 12 hours, 24 hours, 36 hours, 48 hours, 60 hours, 72 hours, 84 hours, 96 hours, 5 days, 6 days, 7 days, 8 days, 9 days, or 10 days shorter than the first recovery step.

Optional second enrichment step

In some embodiments, the cancer cell library is subjected to two enrichment steps. In some embodiments, the method of obtaining a post-treatment cancer cell population described herein further comprises sorting the post-second treatment/growth or post-second recovery cancer cells to obtain a purely alive cancer cell population. In some embodiments, the method comprises sorting the post-second treatment/growth or post-second recovery cancer cell population to obtain an alive cancer cell population, i.e., a post-treatment cancer cell population that is resistant to the anti-cancer drug (hereinafter also referred to as “second alive enrichment” ) . In some embodiments, the second enrichment method is the same as the first enrichment method, e.g., cells are labeled with the same cell viability marker (e.g., both stained with PI) , cells are sorted with the same sorting method (e.g., both using FACS) . In some embodiments, the second enrichment method is different from the first enrichment method, e.g., cells are labeled with different cell viability markers (e.g., PI vs. DAPI staining in two enrichment steps, or based on morphology under the microscope in the second enrichment step) , and/or cells are sorted using different sorting methods (e.g., FACS vs. manually sorting, or by rinsing away dead floating cells) .

Hence in some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (first treatment step, e.g., for about 9 to about 10 doubling time) , removing the cell culture medium containing the anti-cancer drug (and dead floating cells) , growing the remaining adherent cancer cells in a cell culture medium not containing the anti- cancer drug (first recovery step) , removing the cell culture medium not containing the anti-cancer drug, contacting the remaining cancer cells (adherent cancer cells, largely or all alive) with the anti-cancer drug (second treatment step, e.g., for about 15 to about 16 doubling time) , removing the cell culture medium containing the anti-cancer drug (and dead floating cells) , growing the remaining adherent cancer cells in a cell culture medium not containing the anti-cancer drug (second recovery step) , and harvesting cancer cells by removing the cell culture medium (and dead floating cells if any) and collecting the remaining adherent cancer cells (e.g., by trypsinization) , thus obtaining a post-treatment cancer cell population that is resistant to the anti-cancer drug. In some embodiments, between the first treatment step and the first recover step, between the first recovery step and the second treatment step, between the second treatment step and the second recover step, and/or between the second recovery step and the harvesting step, the method can comprise one or more enrichment steps, such as by staining the cancer cells with a cell viability marker (e.g., PI) and sorting cancer cells that are alive (PI-negative, first enrichment step, e.g., by FACS) .

In some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (first treatment step, e.g., for about 9 to about 10 doubling time) , removing the cell culture medium containing the anti-cancer drug (and dead floating cells) , growing the remaining adherent cancer cells in a cell culture medium not containing the anti-cancer drug (first recovery step) , removing the cell culture medium not containing the anti-cancer drug, contacting the remaining cancer cells (adherent cancer cells, largely or all alive) with the anti-cancer drug (second treatment step, e.g., for about 15 to about 16 doubling time) , and harvesting cancer cells by removing the cell culture medium containing the anti-cancer drug (and dead floating cells) and collecting the remaining adherent cancer cells (e.g., by trypsinization) , thus obtaining a post-treatment cancer cell population that is resistant to the anti-cancer drug. In some embodiments, between the first treatment step and the first recover step, between the first recovery step and the second treatment step, and/or between the second treatment step and the harvesting step, the method can comprise one or more enrichment steps, such as by staining the cancer cells with a cell viability marker (e.g., PI) and sorting cancer cells that are alive (PI-negative, first enrichment step, e.g., by FACS) .

In some embodiments, the anti-cancer drug treatment step b) and the cancer cell obtaining step c) comprises: contacting the cancer cell library with the anti-cancer drug while allowing alive cancer cells to grow (first treatment step, e.g., for about 9 to about 10 doubling time) , optionally removing the cell culture medium containing the anti-cancer drug, staining the remaining cancer cells with a cell viability marker (e.g., PI) , sorting cancer cells that are alive (PI-negative, first enrichment step, e.g., by FACS) , optionally growing the sorted alive cancer cells in a cell culture medium not containing the anti-cancer drug (optional first recovery step) , contacting the sorted alive cancer cells with the anti-cancer drug (second treatment step, e.g., for about 15 to about 16 doubling time) and allowing alive cancer cells to grow, optionally removing the cell culture medium containing the anti-cancer drug, optionally staining the remaining cancer cells with a cell viability marker (e.g., PI) , sorting cancer cells that are alive (PI-negative, second enrichment step, e.g., by FACS) , thus obtaining a post-treatment cancer cell population that is resistant to the anti-cancer drug. In some embodiments, the method further comprises growing the sorted alive cancer cells after the second enrichment step in cell culture medium not containing the anti-cancer drug (optional second recovery step) , before harvesting the cancer cells by removing the cell culture medium (and floating dead cells if any) and collecting the remaining adherent cancer cells (e.g., by trypsinization) .

Hit gene identification

The method described herein comprises identifying the hit gene in the post-treatment cancer cell population that is resistant to the anti-cancer drug ( “hit gene identification step” ) . In some embodiments, the hit gene identified from the post-treatment cancer cell population that is resistant to the anti-cancer drug is considered as the target gene whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug, respectively.

In some embodiments, the hit gene identification step comprises: i) identifying a sequence comprising the hit gene mutation (e.g., inactivating mutation) in the post-treatment cancer cell population obtained from “the cancer cell obtaining step c) ” ; and ii) identifying the hit gene corresponding to the sequence comprising the hit gene mutation (e.g., inactivating mutation) . In some embodiments, the sequence comprising the hit gene mutation (e.g., inactivating mutation) is identified by sequencing, e.g., PCR-sequencing (e.g., Sanger sequencing) , or genome-sequencing (or DNA-seq, such as next-generation sequencing or “NGS” ) . For example, in some embodiments, the sequences (nucleic acid fragments, PCR fragments, or whole-genome) of the post-treatment cancer cell population that is resistant to the anti-cancer drug are identified by sequencing, by comparing to the wild-type (or heathy individual) genomic sequence, or by comparing to the genomic sequence of the initial population of cancer cells, and sequence (s) comprising the hit gene mutation (s) (e.g., inactivating mutation (s) ) can be identified and mapped to the hit gene (s) . In some embodiments, the hit gene identification step further comprises isolating genomic DNA or RNA from the post-treatment cancer cell population from step c) . In some embodiments, the hit gene identification step further comprises PCR amplification of nucleic acid sequence comprising the hit gene mutation (e.g., inactivating mutation) .

In some embodiments, the cancer cell library described herein comprises the sgRNA constructs or the sgRNA ^iBAR constructs against hit genes described herein. Thus in some embodiments, the hit gene identification step comprises: i) identifying the sgRNA sequence or the sgRNA ^iBAR sequence in the post-treatment cancer cell population obtained from “the cancer cell obtaining step c) ” ; and ii) identifying the hit gene corresponding to (targeted by) the guide sequence of the sgRNA or the sgRNA ^iBAR. In some embodiments, the sgRNA sequence or the sgRNA ^iBAR sequence is identified by RNA sequencing (RNA-seq) , e.g., RNA NGS. In some embodiments, the hit gene identification step comprises: i) identifying the nucleic acid sequence encoding the sgRNA or the sgRNA ^iBAR in the post-treatment cancer cell population obtained from “the cancer cell obtaining step c) ” ; and ii) identifying the hit gene corresponding to the guide sequence encoded by the nucleic acid sequence. In some embodiments, the nucleic acid sequence encoding the sgRNA or the sgRNA ^iBAR is identified by sequencing, e.g., PCR-sequencing (e.g., Sanger sequencing) , or genome-sequencing (DNA-seq) , e.g., NGS. In some embodiments, the iBAR sequences can be used for identifying the sgRNA ^iBAR sequences or the nucleic acid sequences encoding the sgRNA ^iBAR. In some embodiments, the hit gene identification step further comprises isolating genomic DNA or RNA from the post-treatment cancer cell population obtained from “the cancer cell obtaining step c) . ” In some embodiments, the hit gene identification step further comprises PCR amplification of nucleic acid sequence encoding the sgRNA or the sgRNA ^iBAR.

Methods for DNA-seq, RNA-seq, PCR-sequencing (e.g., Sanger sequencing) , DNA/RNA extraction, cDNA preparation, and data analysis are well known in the art, and can be used herein as appropriate to identify the hit gene (s) in the post-treatment cancer cell population that is resistant to the anti-cancer drug. The sequencing data can be analyzed and aligned to the genome using any known methods in the art.

Target gene identification

In some embodiments, the hit gene identified in the post-treatment cancer cell population that is resistant to the anti-cancer drug is considered as the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug, respectively. In some embodiments, the hit genes identified in the post-treatment cancer cell population that is resistant to the anti-cancer drug (i.e., alive post-treatment cancer cell population) are target genes whose mutations (e.g., inactivation) make the cancer cells resistant to the anti-cancer drug.

In some embodiments, the hit gene (s) identified in the post-treatment cancer cell population that is resistant to the anti-cancer drug is further compared to a control, and/or is further ranked and/or filtered with a predetermined threshold level.

In some embodiments, identifying the target gene comprises: i) obtaining sequences comprising the hit gene mutations (e.g., inactivating mutations) in the post-treatment cancer cell population obtained from step c) ; ii) ranking the sequences comprising the hit gene mutations (e.g., inactivating mutations) based on sequence counts; and iii) identifying the hit gene corresponding to a sequence comprising the hit gene mutation (e.g., inactivating mutation) ranked above a predetermined threshold level. In some embodiments, the ranking step comprises adjusting the rank of each sequence comprising the hit gene mutation (e.g., inactivating mutation) based on data consistency among all sequences comprising the hit gene mutation (e.g., inactivating mutation) corresponding to the same hit gene (or same target site of the same hit gene) . For example, data inconsistency (such as different directions of fold changes relative to control) will increase variance of the sequences comprising the hit gene mutation (e.g., inactivating mutation) corresponding to the same hit gene and lower the rank of such hit gene. In some embodiments, the hit gene is identified to correspond to sequence (s) comprising the hit gene mutations (e.g., inactivating mutation (s) ) that rank consistently better than expected for permuted sequences under null hypothesis based on an RRA or α-RRA algorithm. In some embodiments, the predetermined threshold level is an FDR of value “X” (e.g., 0.1) , and the hit gene corresponding to a sequence comprising the hit gene mutation (e.g., inactivating mutation) with FDR ≤ “X” is identified as the target gene. In some embodiments, the predetermined threshold level is an enrichment or depletion of value “X” -fold (e.g., about 2-fold) , and the hit gene corresponding to a sequence comprising the hit gene mutation (e.g., inactivating mutation) with enrichment or depletion ≥ “X” -fold is identified as the target gene. In some embodiments, the sequence comprising the hit gene mutation (e.g., inactivating mutation) is identified by sequencing, e.g., Sanger-sequencing or genome-sequencing (or DNA-seq, such as NGS) .

In some embodiments, the cancer cell library described herein comprises the sgRNA constructs or the sgRNA ^iBAR constructs against hit genes described herein. Thus in some embodiments, identifying the target gene comprises: i) obtaining sgRNA sequences or sgRNA ^iBAR sequences in the post-treatment cancer cell population obtained from step c) ; ii) ranking the corresponding guide sequences of the sgRNA sequences or the sgRNA ^iBAR sequences based on sequence counts; and iii) identifying the hit gene corresponding to a guide sequence ranked above a predetermined threshold level. In some embodiments, the ranking comprises adjusting the rank of each guide sequence of the sgRNA sequence or the sgRNA ^iBAR sequence based on data consistency among all guide sequences corresponding to the same hit gene (or same target site of the same hit gene) . For example, data inconsistency (such as different direction of fold change relative to control) will increase variance of the guide sequences corresponding to the same hit gene and lower the rank of such hit gene. In some embodiments, the hit gene is identified to correspond to guide sequence (s) that rank consistently better than expected for permuted guide sequences under null hypothesis based on an RRA or α-RRA algorithm. In some embodiments, the predetermined threshold level is an FDR of value “X” (e.g., 0.1) , and the hit gene corresponding to a guide sequence with FDR ≤ “X” is identified as the target gene. In some embodiments, the predetermined threshold level is an enrichment or depletion of value “X” -fold (e.g., about 2-fold) , and the hit gene corresponding to a guide sequence with enrichment or depletion ≥ “X” -fold is identified as the target gene. In some embodiments, the sgRNA sequence or the sgRNA ^iBAR sequence is identified by RNA-seq, e.g., RNA NGS. In some embodiments, the nucleic acid sequences encoding the sgRNAs or the sgRNAs ^iBAR are identified by genome-sequencing (DNA-seq) , e.g., NGS.

In some embodiments, the cancer cell library described herein comprises the sgRNA ^iBAR constructs against hit genes described herein. In some embodiments, identifying the target gene comprises: i) obtaining sgRNA ^iBAR sequences in the post-treatment cancer cell population obtained from step c) ; ii) ranking the corresponding guide sequences of the sgRNA ^iBAR sequences based on sequence counts, wherein the ranking comprises adjusting the rank of each guide sequence based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence; and iii) identifying the hit gene corresponding to a guide sequence ranked above a predetermined threshold level. In some embodiments, the hit gene is identified to correspond to guide sequence (s) that rank (s) consistently better than expected for permuted guide sequences under null hypothesis based on an RRA or α-RRA algorithm. In some embodiments, the predetermined threshold level is an FDR of value “X” (e.g., 0.1) , and the hit gene corresponding to a guide sequence with FDR ≤ “X” is identified as the target gene. In some embodiments, the predetermined threshold level is at least about 2-fold enrichment or depletion.

In some embodiments, the sequence counts of sequences comprising the hit gene mutations (e.g., inactivating mutations) or guide RNAs are determined from statistical analysis. In some embodiments, the sequence counts of guide RNAs and the corresponding iBAR sequences are determined from statistical analysis. See FIG. 3 for exemplary target gene identification workflow. Statistical methods may be used to determine the identity of the sequences comprising the hit gene mutations (e.g., inactivating mutations) , the sgRNA molecules, or the sgRNA ^iBAR molecules that are enriched or depleted in the post-treatment cancer cell population. In some embodiments, more than one (e.g., 2, 3, or more) biological or technical replicate is conducted for an anti-cancer drug treated cancer cell library. In some embodiments, more than one (e.g., 2, 3, or more) biological or technical replicate is conducted for a control cancer cell population. In some embodiments, sequences comprising the hit gene mutations (e.g., inactivating mutations) or guide RNAs from the two or more (e.g., 2, 3, 4, or more) replicates of the anti-cancer drug treated group (or control group) are combined to calculate mean and variance among replicates of the anti-cancer drug treated group (or control group) . Exemplary statistical methods include, but are not limited to, linear regression, generalized linear regression and hierarchical regression. In some embodiments, the sequence counts are subject to normalization methods, such as total count normalization, or median ratio normalization. In some embodiments, e.g., for positive screens, median ratio normalization is preferred. In some embodiments, for example, for sequence counts that follow a normal distribution, the sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, MAGeCK (Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014) ) is used to rank sequences comprising the hit gene mutations (e.g., inactivating mutations) or guide RNA sequences, and/or to identify target genes. In some embodiments, MAGeCK ^iBAR (Zhu et al., Genome Biol. 2019; 20: 20) is used to rank sequences comprising the hit gene mutations (e.g., inactivating mutations) or guide RNA sequences, and/or to identify target genes.

In some embodiments, identifying the target gene whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug is based on the difference between the profiles of sgRNAs (or sgRNAs ^iBAR) or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population. In some embodiments, the identification of the target gene is based on the difference between the profiles of hit gene mutations in the post-treatment cancer cell population and the control cancer cell population. In some embodiments, the identification of the target gene is based on the difference between the profiles of sgRNAs (or sgRNAs ^iBAR) in the post-treatment cancer cell population and the control cancer cell population. In some embodiments, the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug. In some embodiments, the profiles of sgRNAs (or sgRNAs ^iBAR) or hit gene mutations in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing (NGS) , such as DNA-seq or RNA-seq. In some embodiments, the profiles of sgRNAs (or sgRNAs ^iBAR) comprise sequence counts of the sgRNAs (or sgRNAs ^iBAR) , or sequence counts of the corresponding guide sequences of the sgRNAs (or sgRNAs ^iBAR) . In some embodiments, the profiles of sgRNAs (or sgRNAs ^iBAR) comprise sequence counts of the nucleic acids encoding the sgRNAs (or sgRNAs ^iBAR) , or sequence counts of the nucleic acids encoding the guide sequences of the corresponding sgRNAs (or sgRNAs ^iBAR) . In some embodiments, the profiles of the hit gene mutations comprise sequence counts of the sequences comprising the hit gene mutations. In some embodiments, the methods described herein further comprise culturing a same cancer cell library under the same condition without contacting with the anti-cancer drug.

In some embodiments, the sequence counts (e.g., the sequence counts of sgRNAs or sgRNAs ^iBAR or guide sequences thereof, the sequence counts of nucleic acid sequences encoding the sgRNAs or sgRNAs ^iBAR or guide sequences thereof, or sequence counts of sequences comprising the hit gene mutations) obtained from the post-treatment cancer cell population from step c) are compared to corresponding sequence counts obtained from a control cancer cell population or a control cancer cell library, e.g., to provide fold changes (e.g., actual fold changes, or derivatives of fold changes such as log2 or log10 fold changes) , for significance tests (e.g., FDR, p-value) , for distribution statistics, and/or to provide gene or sequence rankings via scoring and/or deriving. In some embodiments, the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug, e.g., continuously cultured under the same culture condition for the same amount of time as the test group (treated with anti-cancer drug) from test beginning till final sample harvest. In some embodiments, the control cancer cell population is the entire same cancer cell library cultured in the same condition without subjected to treatment with the anti-cancer drug, and without subjecting to any selecting, recovering, or obtaining method in step b) and step c, hereinafter also referred to as “control cancer cell library. ” In some embodiments, the control cancer cell population is obtained from a same cancer cell library cultured in the same condition without subjected to treatment with the anti-cancer drug, and subjected to the same obtaining method in step c) .

In some embodiments, the methods described herein further comprise culturing a same cancer cell library under the same condition without contacting with the anti-cancer drug, and optionally subjected to the same obtaining method in “the cancer cell obtaining step c) ” to obtain a control cancer cell population, wherein the presence of identifying the hit gene corresponding to the sequence comprising the hit gene mutation (e.g., inactivating mutation) or the guide sequence of the sgRNA or sgRNA ^iBAR from the control cancer cell population or control cancer cell library, but absence of identifying from the post-treatment cancer cell population obtained from step c) , identifies the hit gene as the target gene. For example, for a cancer cell library comprising mutations A, B, and C in separate cancer cells, if only mutation A is identified from the post-treatment cancer cell population, the absence of identifying mutations B and C from this post-treatment cancer cell population indicates hit genes B and C are the target genes, e.g., conferring sensitivity to anti-cancer drug killing when mutated.

In some embodiments, the post-treatment cancer cell population obtained is alive cancer cells, which are resistant to the anti-cancer drug. In some embodiments, identifying the target gene comprises comparing the sgRNA (or sgRNA ^iBAR or guide sequence thereof, or nucleic acid encoding sgRNA or sgRNA ^iBAR or guide sequence thereof) sequence counts obtained from the post-treatment cancer cell population with sgRNA (or sgRNA ^iBAR or guide sequence thereof, or nucleic acid encoding sgRNA or sgRNA ^iBAR or guide sequence thereof) sequence counts obtained from the control cancer cell population, wherein: i) the hit genes whose corresponding sgRNA (or sgRNA ^iBAR) guide sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA (or sgRNA ^iBAR) guide sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, the sgRNA (or sgRNA ^iBAR or guide sequence thereof, or nucleic acid encoding sgRNA or sgRNA ^iBAR or guide sequence thereof) sequence counts are subject to median ratio normalization followed by mean-variance modeling. In some embodiments, identifying the target gene comprises comparing the hit gene mutation sequence counts obtained from the post-treatment cancer cell population with hit gene mutation sequence counts obtained from the control cancer cell population, wherein: i) the hit genes whose corresponding hit gene mutation sequences are identified as enriched in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding hit gene mutation sequences are identified as depleted in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug. In some embodiments, the hit gene mutation sequence counts are subject to median ratio normalization followed by mean-variance modeling.

In some embodiments, the sgRNA library is an sgRNA ^iBAR library. In some embodiments, the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence. In some embodiments, the variance of each guide sequence or sequence comprising the hit gene mutation (e.g., inactivating mutation) is adjusted based on data consistency among the same gene. “Data consistency” as used herein refers to consistency of sequencing results of the same guide sequences (e.g., sequence counts, normalized sequence counts, rankings, or fold changes) corresponding to different iBAR sequences in a screening experiment; or consistency of sequencing results of different hit gene mutations such as inactivating mutations (e.g., at different target sites of the same hit gene) or different sgRNA sequences corresponding to the same gene. A true hit from a screen theoretically should have biologically relevant performance similarities, such as similar normalized sequence counts, rankings, and/or fold changes corresponding to sgRNA ^iBAR constructs having the same guide sequence, but different iBARs; and/or similar normalized sequence counts, rankings, and/or fold changes corresponding to the same gene but different hit gene mutation sequences such as inactivating mutation sequences (e.g., at different target sites of the hit gene) or different sgRNA sequences. Also see WO2020125762 for how mean-variance modeling can be conducted, and how the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence.

In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions (e.g., increased vs. reduced, increased vs. unchanged, or reduced vs. unchanged are all considered as different directions) with respect to each other. In some embodiments, the data consistency among the different hit gene mutation (e.g., inactivating mutation) sequences or different sgRNA sequences corresponding to the same gene is determined based on the direction of the fold change of each hit gene mutation (e.g., inactivating mutation) sequence or each sgRNA sequence, wherein the variance of the hit gene mutation (e.g., inactivating mutation) sequence or the guide sequence is increased if the fold changes of the different hit gene mutation (e.g., inactivating mutation) sequences or the different sgRNA sequences are in different directions with respect to each other. Such data inconsistency-resulted variance increase can help rule out rare but dramatically changed hit gene mutation (e.g., inactivating mutation) /sgRNA/sgRNA ^iBAR sequences in positive screens under high MOI. For example, for the iBAR system, due to the high MOI during library construction, there can be “free riders” of false-positive sgRNAs associated with sgRNAs against true-positive hit genes. The “free rider” described herein refers to sgRNAs targeting irrelevant sequences (e.g., irrelevant hit genes) that are mis-associated with sgRNAs targeting true-positive hit genes to enter the same cancer cells. In some embodiments, the variance of sgRNAs ^iBAR is modified based on the enrichment directions of different iBARs for each guide sequence within a set of sgRNA ^iBAR constructs. If all iBARs of one set of sgRNA ^iBAR constructs (i.e., all iBARs corresponding to the same guide sequence) present the same direction of fold change, i.e., all greater or less than that of the control group, then the variance of the set of sgRNA ^iBAR constructs (or the variance of the guide sequence) would be unchanged. If iBARs of one set of sgRNA ^iBAR constructs (or iBARs corresponding to the same guide sequence) reveal inconsistent directions of fold change relative to control, then the corresponding guide sequence is penalized by increasing its variance. In some embodiments, the final adjusted variance for inconsistent sgRNAs ^iBAR is the model-estimated variance (e.g., by mean-variance modeling) plus the experimental variance calculated from the anti-cancer drug treated sample (s) and the control group (s) . In some embodiments, a hit gene comprises two or more (e.g., 2, 3, 4, 5, or more, such as 3) hit gene mutations (e.g., inactivating mutations) , or a hit gene is targeted by two or more (e.g., 2, 3, 4, 5, or more, such as 3) different guide sequences at different target sites (e.g., two or more different sgRNAs, or two or more sets of sgRNA ^iBAR constructs each comprising a guide sequence targeting different target sites) . In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence and to the same hit gene is both determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the corresponding iBAR sequences are in different directions with respect to each other, and the variance of the guide sequence (or the variance of the hit gene) is further increased if the two or more (e.g., 2, 3, 4, 5, or more, such as 3) different guide sequences targeting the same hit gene have fold changes in different directions with respect to each other. For example, for sgRNA A and sgRNA B targeting different target sites of the same hit gene X, if the guide sequences of both sgRNA A and sgRNA B are enriched or depleted compared to control, the variance of each guide sequence or the hit gene do not change; if the guide sequence of sgRNA A is enriched while the guide sequence of sgRNA B is depleted compared to control, the variance of each guide sequence or the hit gene is increased. In some embodiments, the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the same hit gene is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of each guide sequence targeting the same hit gene is increased if the fold changes of the iBAR sequences corresponding to the same hit gene are in different directions with respect to each other, and the variance of each guide sequence targeting the same hit gene (or the variance of the hit gene) is increased. For example, if 3 sets of sgRNAs ^iBAR (4 sgRNAs ^iBAR in each set) target 3 different target sites of the same hit gene, if all 12 iBAR sequences are identified as enriched compared to control, the variances of all 3 guide sequences remain unchanged; if some iBAR sequences are identified as enriched while others are identified as unchanged or depleted compared to control, the variances of all 3 guide sequences are increased.

In some embodiments, the sequences comprising hit gene mutations (e.g., inactivating mutations) at different target sites of the same hit gene whose fold changes among corresponding target sites are shown in different directions, the sgRNAs or sgRNAs ^iBAR targeting different target sites of the same hit gene whose fold changes among corresponding target sites are shown in different directions, or the sgRNAs whose fold changes among corresponding iBARs are shown in different directions, can be penalized through the increased variance leading to lower scores and rankings for certain hit genes. For example, if 3 sets of sgRNAs ^iBAR (4 sgRNAs ^iBAR in each set) target 3 different target sites of the same hit gene, if all 12 iBAR sequences are identified as enriched compared to control, the hit gene has low variance and hence high ranking and/or score (e.g., high ranking drug sensitive gene, with high sensitivity score) ; if some iBAR sequences are identified as enriched while others are identified as unchanged or depleted compared to control, the hit gene has high variance and hence low ranking and/or score (e.g., low ranking drug resistant gene, with low resistance score) .

In a set of sgRNA ^iBAR constructs, the ranking for the guide sequence may be adjusted based on the consistency of enrichment directions of a pre-determined threshold number x of different iBAR sequences in the set, wherein x is an integer between 1 and y. For example, if at least x iBAR sequences of the sgRNA ^iBAR set present the same direction of fold change, i.e., all greater or less than that of the control cancer cell population, then the ranking (or variance) of the guide sequence is unchanged. However, if more than y-x different iBAR sequences revealed inconsistent directions of fold change, then the sgRNA ^iBAR set would be penalized by lowering its ranking, e.g., by increasing its variance. In some embodiments, the ranking for the sequences containing the hit gene mutations (e.g., inactivating mutations) or the guide sequences may be adjusted (or further adjusted) based on the consistency of enrichment directions of a pre-determined threshold number x of different hit gene mutations (e.g., inactivating mutations) or different guide sequences corresponding to the same hit gene, wherein x is an integer between 1 and y. For example, if at least x hit gene mutations (e.g., inactivating mutations) or x guide sequences corresponding to the same hit gene present the same direction of fold change, i.e., all greater or less than that of the control cancer cell population, then the ranking (or variance) is unchanged. However, if more than y-x different hit gene mutations (e.g., inactivating mutations) or more than y-x different guide sequences revealed inconsistent directions of fold change, then the sequences comprising the hit gene mutations (e.g., inactivating mutations) or the guide sequences would be penalized by lowering their ranking, e.g., by increasing their variance.

In some embodiments, the P-value of each sequence comprising a hit gene mutation (e.g., inactivating mutation) , or the P-value of each guide sequence of sgRNA or sgRNA ^iBAR, is calculated using the mean and variance (e.g., experimental variance, model-estimated variance, or modified variance based on data inconsistency) of the treatment group compared to those of the control group.

Robust Rank Aggregation (RRA; Kolde R et al. Bioinformatics. 2012; 28: 573–580) or modified RRA (e.g., α-RRA in MAGeCK; Li W et al. Genome Biol. 2014; 15: 554) is one of available tools for statistics and ranking in the art, which can detect genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assign a significance score for each gene, and combine ranking lists into a single ranking. It assumes that all informative normalized ranks come from a distribution strongly skewed toward zero, and gets the binomial probability calculated from the supposed uniform distribution of ranks to detect these distributions. And a P-value assigned to each element in the aggregated list is used to rank genes and describe how much better it was ranked than expected, making the randomly ranked genes less significant. The underlying probabilistic model makes the RRA algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make this approach robust and compelling for many settings. Briefly, in RRA and α-RRA, for each sequence comprising a hit gene mutation (e.g., inactivating mutation) , each sgRNA guide sequence, or each sgRNA ^iBAR guide sequence (hereinafter also referred to as “hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence” ) corresponding to a hit gene (e.g., when there are 3 sgRNAs targeting the same hit gene) , the algorithm looks at how such sequence is positioned in a normalized ranked list of all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences obtained from a post-treatment cancer cell population, or control cancer cell population/control cancer cell library) and compares this to the baseline case where all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences are randomly shuffled ( “permuted sequences” ) . As a result, a P-value is assigned for all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences corresponding to their hit genes, showing how much better it is positioned in the ranked lists than expected by chance. This P-value is used both for re-ranking the hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences corresponding to hit genes and deciding their significance. A skilled person in the art can understand that other tools can also be used for this statistics and ranking. In some embodiments, RRA or α-RRA is employed to calculate the final score of each hit gene in order to obtain the ranking of hit genes based on mean and variance (e.g., modified variance) of every hit gene.

In some embodiments, sequences comprising the hit gene mutations (e.g., inactivating mutations) , sgRNA guide sequences, or sgRNA ^iBAR guide sequences (hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences) were ranked based on P-values calculated using the mean and variance (e.g., modified variance adjusted for data inconsistency) from the negative binomial (NB) distribution model, which is used to estimate probability of every hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence across biological/experimental replicates and treatment vs. control groups, then RRA or α-RRA algorithm is applied to identify positively or negatively selected hit genes corresponding to the top ranking (e.g., top α%such as top 5%) hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence. A lower RRA score corresponded to a stronger enrichment of the hit genes. In some embodiments, the P-values of such top-ranking hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence lower than a threshold (e.g., P-value<0.25) are selected, and the corresponding hit genes are identified as the target gene. In some embodiments, the FDRs of such top ranking hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence lower than a threshold (e.g., FDR≤0.1) are selected, and the corresponding hit genes are identified as the target gene. In some embodiments, when multiple hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences are designed for the same hit gene, only the top hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences of one gene is considered in the RRA or α-RRA calculation. RRA or α-RRA assumes if a hit gene has no effect on cancer cell sensitivity/resistance to anti-cancer drug treatment, then hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences corresponding to such hit gene should be uniformly distributed across the ranked list of all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences obtained from the cancer cell library. In some embodiments, all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences are ranked and compared by RRA or α-RRA among treatment and control groups according to their relative ranking in each group and the different distributions of the groups. All cancer cell library covered hit genes are ranked by comparing the skew in beta distribution of the hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences to the uniform null hypothesis model, and hit genes whose corresponding hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence rankings are consistently higher than expected with statistical significance (P-value) by permutation test and/or acceptable FDR by the Benjamini-Hochberg Procedure, are prioritized in RRA or α-RRA (lower RRA score) . Such RRA or α-RRA analysis can significantly reduce or eliminate false positives due to perturbations in experiments or sampling. In some embodiments, hit genes are ranked based on ranking scores of corresponding hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequence obtained by median ratio normalization followed by mean-variance modeling. In some embodiments, hit genes are further ranked by RRA or α-RRA taking into consideration of multiple hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences for the same hit gene.

In some embodiments, the predetermined threshold level is an FDR value from a permutation test of all hit gene mutation (e.g., inactivating mutation) /sgRNA guide/sgRNA ^iBAR guide sequences obtained from an experiment (treatment or control) . In some embodiments, the FDR value is determined by considering the maximum potential true target genes in a specific screen (e.g., a specific pathway involved in response to the anti-cancer drug treatment) . In some embodiments, the threshold is top β %of sequence counts (normalized or not) obtained from a cancer cell library, and the corresponding hit gene is identified as target gene.

Any target identification methods known in the art can be used herein. For example, the Empirical Bayesian method (identifies target by likelihood) or algorithm based therefrom, such as casTLE (cas9 High Throughput maximum Likelihood Estimator) which uses an Empirical Bayesian framework to account for multiple sources of variability, including reagent efficacy and off-target effects for the analysis of large scale genomic perturbation screens, and provides casTLE scores for ranking and threshold cutoff (Morgens, D. W. et al. (2016) Nat Biotechnol 34, 634-636) . In some embodiments, log2 ratio difference and p-value from t-test can be used to identify target genes. For example, RIGER (Luo, J. et al. (2009) . Cell 137, 835-848) , which ranks shRNAs according to their differential effects between two classes of samples, then identifies the genes targeted by the shRNAs at the top of the list, thereby identifying genes essential to the difference between the classes. LFC and P-value can be used for ranking and threshold cutoff. In some embodiments, probability mass function of binomial distribution (or algorithm based therefrom) can be used for target gene identification. For example, STARS (Doench, J.G., et al. (2016) Nat Biotechnol 34, 184-191) , in which STAR Scores can be used for ranking and threshold cutoff. In some embodiments, Negative Binomial model-based and α-RRA algorithm can be used for target gene identification, such as MAGeCK (Li, W. et al. (2014) Genome Biol 15, 554) , and RRA Scores can be used for ranking and threshold cutoff. In some embodiments, β-binomial modeling based algorithm can be used for target gene identification, such as CRISPRBetaBinomial (CB ²) (Jeong, H.H. et al. (2019) . Genome Res 29, 999-1008) , P-value or FDR can be used for ranking and threshold cutoff. In some embodiments, such as during stringent positive screens, sgRNA or sgRNA ^iBAR raw read count ranking, normalized read count ranking, and/or log2 fold change between treatment group and control group can be used for target gene identification, e.g., hit genes corresponding to top X%of read counts are identified as target genes.

In some embodiments, the target gene identification is a positive screening, i.e., by identifying hit gene mutation (e.g., inactivating mutation) sequences or guide sequences that are enriched in the post-treatment cancer cell population. In some embodiments, the target gene identification is a negative screening, i.e., by identifying hit gene mutation (e.g., inactivating mutation) sequences or guide sequences that are depleted in the post-treatment cancer cell population. Hit gene mutation (e.g., inactivating mutation) sequences or guide sequences that are enriched in the post-treatment cancer cell population rank high based on sequence counts or fold changes, while hit gene mutation (e.g., inactivating mutation) sequences or guide sequences that are depleted in the post-treatment cancer cell population rank low based on sequence counts or fold changes. In some embodiments, the enrichment or depletion is relative to the total sequence counts obtained from the post-treatment cancer cell population. In some embodiments, the enrichment or depletion is relative to the corresponding sequence counts in a control cancer cell population or control cancer cell library, such as a control cancer cell population obtained from a same cancer cell library not treated with the anti-cancer drug. In some embodiments, the enrichment or depletion is calculated based on RRA or α-RRA algorithm.

In some embodiments, the method comprises subjecting the cancer cell library from step a) to at least two (e.g., at least 2, 3, 4, 5, 6, 7, 7, 8, 10, or more) separate different treatments with the anti-cancer drug in step b) , and in step c) growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., all alive, resistant to the anti-cancer drug) , identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment; and obtaining one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, identifying the target gene comprises identifying one or more hit genes in the post-treatment cancer cell populations obtained from at least two (e.g., at least 3, 4, 5, 6, 7, 7, 8, 10, or more) separate different treatments with the anti-cancer drug, wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as enriched in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) in all separate different treatments are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as depleted in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) in all separate different treatments are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments, the method comprises subjecting the cancer cell library from step a) to at least two (e.g., at least 2, 3, 4, 5, 6, 7, 7, 8, 10, or more) separate different treatments with the anti-cancer drug in step b) , and in step c) growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment (e.g., all alive, resistant to the anti-cancer drug) , identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment; and combining the one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, identifying the target gene comprises identifying one or more hit genes in the post-treatment cancer cell populations obtained from at least two (e.g., at least 3, 4, 5, 6, 7, 7, 8, 10, or more) separate different treatments with the anti-cancer drug, wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as enriched in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) in at least one treatment are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as depleted in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) in at least one treatment are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments, the methods described herein comprise subjecting the cancer cell library from step a) to two separate treatments b1) and b2) : b1) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time; b2) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time; c1) growing the cancer cell library from treatment b1) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; c2) growing the cancer cell library from treatment b2) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b1) , d2) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b2) , and d3) obtaining one or more hit genes identified from both treatment b1) and treatment b2) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, identifying the target gene comprises identifying one or more hit genes in the post-treatment cancer cell populations obtained from two separate treatments b1) and b2) , wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as enriched in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) in both treatments b1) and b2) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as depleted in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) in both treatments b1) and b2) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments, the methods described herein comprise subjecting the cancer cell library from step a) to two separate treatments b1) and b2) : b1) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time; b2) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time; c1) growing the cancer cell library from treatment b1) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; c2) growing the cancer cell library from treatment b2) to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) ; d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b1) , d2) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b2) , and d3) combining the one or more hit genes identified from treatment b1) and treatment b2) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug. In some embodiments, identifying the target gene comprises identifying one or more hit genes in the post-treatment cancer cell populations obtained from two separate treatments b1) and b2) , wherein: i) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as enriched in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold enrichment, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more enrichment) in either treatment b1) or b2) are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or ii) the hit genes whose corresponding sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations are identified as depleted in the post-treatment cancer cell population that is resistant to the anti-cancer drug (alive) compared to the control cancer cell population with an FDR ≤ 0.1 (e.g., FDR ≤ any of 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, 0.001, or less) (and/or with at least about 2-fold depletion, such as at least about any of 3-, 4-, 5-, 10-, 20-, 50-, 100-fold, or more depletion) in either treatment b1) or b2) are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.

In some embodiments, the method comprises identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to two or more (e.g., 2, 3, 4, 5, or more) anti-cancer drugs. In some embodiments, the two or more different anti-cancer drugs target the same cancer target (e.g., PARP) . In some embodiments, the two or more different anti- cancer drugs target different cancer targets (e.g., one targets PARP, one targets non-PARP target) . In some embodiments, the method comprises: i) separately identifying a set of one or more target genes whose mutations make the cancer cells sensitive to an anti-cancer drug using any of the methods described herein (e.g., can comprise one or more separate different treatments) , for two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs when treated alone; and ii) obtaining one or more target genes present in every set of target genes identified for each anti-cancer drug, thereby identifying target genes whose mutations make the cancer cells sensitive to a combination treatment of the two or more different anti-cancer drugs. In some embodiments, the method comprises: i) separately identifying a set of one or more target genes whose mutations make the cancer cells resistant to an anti-cancer drug using any of the methods described herein (e.g., can comprise one or more separate different treatments) , for two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs when treated alone; and ii) obtaining one or more target genes present in a combination of sets of target genes identified for all anti-cancer drugs, thereby identifying target genes whose mutations make the cancer cells resistant to a combination treatment of the two or more different anti-cancer drugs. In some embodiments, the method comprises: a) providing a cancer cell library described herein; b) contacting the cancer cell library with a combination of two or more (e.g., 2, 3, 4, 5, or more) different anti-cancer drugs (e.g., contacting at the same time, contacting with an overlapping period of time, or contacting sequentially) ; c) growing the cancer cell library to obtain a post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug (s) ) ; and identifying the target gene based on the difference between the profiles of sgRNAs or sgRNAs ^iBAR or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population, using any of the target gene identification methods described herein.

In some embodiments, the method further comprises ranking the identified target genes, wherein the target gene ranking is based on the degree of enrichment or depletion (e.g., fold of enrichment, fold of depletion, enrichment FDR, or depletion FDR) of the sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations in the post-treatment cancer cell population compared to the control cancer cell population. In some embodiments, the target gene ranking is further adjusted based on data consistency among all sequences comprising the hit gene mutation (e.g., inactivating mutation) corresponding to the same target gene. In some embodiments, the sgRNA library is an sgRNA ^iBAR library, and the target gene ranking is further adjusting based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence of the target gene, and/or based on data consistency among all guide sequences corresponding to (e.g., same or different target sites) of the same target gene. In some embodiments, RRA or α-RRA algorithm is used for ranking the identified target genes. In some embodiments, the ranking of the identified target genes is i) based on data consistency among all sequences comprising the hit gene mutation (e.g., inactivating mutation) corresponding to the same target gene; or ii) based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence of the target gene; and/or iii) based on data consistency among all guide sequences of sgRNAs or sgRNAs ^iBAR corresponding to (e.g., same or different target sites) of the same target gene; wherein the identified target genes are ranked from high to low based on the degree of data consistency from high to low. In some embodiments, the post-treatment cancer cell population is an alive population, i.e., resistant to the anti-cancer drug. In some embodiments, the method further comprises assigning a sensitivity score or a resistance score to the identified target gene, wherein target genes whose mutations make the cancer cells resistant to the anti-cancer drug are ranked from high to low based on the fold of enrichment (or based on enrichment FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population, and each target gene is assigned a resistance score from high to low accordingly; and/or wherein target genes whose mutations make the cancer cells sensitive to the anti-cancer drug are ranked from high to low based on the fold of depletion (or based on depletion FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA or sgRNA ^iBAR guide sequences or hit gene mutations in the post-treatment cancer cell population (e.g., alive, resistant to the anti-cancer drug) compared to the control cancer cell population, and each target gene is assigned a sensitivity score from high to low accordingly.

In some embodiments, the method further comprising validating the identified target gene by: a) modifying a cancer cell by creating a mutation (e.g., inactivating mutation) in the target gene in the cancer cell; and b) determining the sensitivity or resistance of the modified cancer cell to the anti-cancer drug. In some embodiments, the method comprises subjecting the modified cancer cell to any of the anti-cancer drug treatment steps b) and optionally any of the cancer cell obtaining step c) described herein. Any cell viability assays known in the art and described herein can be used to determine the sensitivity or resistance of the modified cancer cell to the anti-cancer drug. When the modified cancer cells are a homogenous population (i.e., comprising the same mutation (s) such as inactivating mutation (s) ) , more cell viability assays can be used, such as metabolic activity-based assays, e.g., resazurin (oxidation-reduction (redox) indicator) , tetrazolium salts MTT and XTT, Dihydrorhodamines, -calceins, or -fluoresceins, luminescent ATP assays. The mutation (e.g., inactivating mutation) in the target gene can be generated by any methods known in the art and described herein, such as by mutagenic agent, or TALEN-, ZFN-, or CRISPR/Cas-mediated gene editing (e.g., using Cas, sgRNA against the target gene) . In some embodiments, the cancer cell before creating a mutation (e.g., inactivating mutation) in the target gene contains an endogenous mutation, such as an endogenous mutation frequently occurs in cancer cells.

III. Methods of treating cancer and/or selecting patients

The present invention in another aspect provides methods of treating a cancer in an individual, methods of selecting an individual suffering from a cancer for an anti-cancer drug treatment, and methods of excluding an individual suffering from a cancer from an anti-cancer drug treatment, based on any of the target genes described herein, or based on one or more target genes identified using any of the target gene identification methods described herein.

An “aberration” at a gene (e.g., target gene, drug sensitive gene, drug resistant gene) refers to a genetic and/or epigenetic aberration of a gene, an aberrant expression level, and/or an aberrant activity level, and/or an aberrant modification level of the gene (or gene product, such as RNA or protein) that may lead to abnormal loss of function or reduced function and/or abnormal expression (e.g., reduced or absent) of the RNA and/or protein encoded by the gene. In some embodiments, a genetic aberration comprises a change to the nucleic acid (such as DNA or RNA) or protein sequence (i.e. mutation) or an aberrant epigenetic feature associated with the gene, including, but not limited to, coding, non-coding, regulatory, enhancer, silencer, promoter, intron, exon, and untranslated regions of the gene. In some embodiments, an aberration at a gene comprises a mutation of the gene, includes, but not limited to, deletion, frameshift, insertion, indel, missense mutation, nonsense mutation, point mutation, silent mutation, splice site mutation, splice variant, and translocation. In some embodiments, the mutation may be a loss or deletion of the gene. In some embodiments, the mutation is a deleterious mutation. In some embodiments, an aberration at a gene comprises aberrant (e.g., reduced or absent) expression (e.g., mRNA or protein) of a gene compared to a control level. In some embodiments, an aberration at a gene comprises aberrant (e.g., reduced or abolished) activity of a gene product (e.g., RNA or protein) compared to a control level, such as activation or inhibition of downstream targets. In some embodiments, an aberration at a gene comprises aberrant modification (e.g., increased, decreased, or mis-modification) of a gene (e.g., at DNA level or histone level) or gene product (e.g., RNA or protein) compared to a control level, such as post-translational modification (e.g., phosphorylation, ubiquitination) . In some embodiments, an aberration at a gene comprises a copy number variation of the gene. In some embodiments, the copy number variation of the gene is caused by structural rearrangement of the genome, including deletions, duplications, inversion, and translocations. In some embodiments, an aberration at a gene comprises an aberrant epigenetic feature of the gene, including, but not limited to, DNA methylation, hydroxymethylation, increased or decreased histone binding, histone methylation, histone acetylation, chromatin remodeling, and the like. In some embodiments, the aberration is determined in comparison to a control or reference, such as a reference sequence (such as a nucleic acid sequence or a protein sequence) , a control expression (such as RNA or protein expression) level, a control activity (such as activation or inhibition of downstream targets) level, or a control modification (e.g., post-translational modification or epigenetic modification) level. In some embodiments, the aberrant expression level or the aberrant activity level in a gene may be below the control level (such as about any of 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%or more below the control level) . In some embodiments, the aberrant modification level in a gene (e.g., modification of DNA, nucleosome, RNA, or protein) may be below the control level (such as about any of 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%or more below the control level) , or above the control level (such as about any of 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%or more above the control level) . In some embodiments, the aberrant modification in a gene is a mis-modification, e.g., ubiquitination instead of phosphorylation. In some embodiments, the control level (e.g. expression level or activity level or modification level) is the median level (e.g. expression level or activity level or modification level) of a control population. In some embodiments, the control population is a population having the same cancer as the individual being/to be treated. In some embodiments, the control population is a healthy population that does not have the cancer, and optionally with comparable demographic characteristics (e.g. gender, age, ethnicity, etc. ) as the individual being/to be treated. In some embodiments, the control level (e.g. expression level or activity level or modification level) is a level (e.g. expression level or activity level or modification level) of a healthy tissue from the same individual. An aberration at a gene may be determined by comparing to a reference sequence, including epigenetic patterns of the reference sequence in a control sample. In some embodiments, the reference sequence is the sequence (DNA, RNA or protein sequence) corresponding to a fully functional allele of the corresponding gene, such as an allele (e.g. the prevalent allele) of the corresponding gene present in a healthy population of individuals that do not have the cancer, but may optionally have similar demographic characteristics (such as gender, age, ethnicity etc. ) as the individual being/to be treated.

An aberration at a target gene is herein also referred to as “target gene aberration, ” including but not limited to target gene mutation. An aberration at a drug sensitive gene is herein also referred to as “drug sensitive aberration, ” including but not limited to drug sensitive mutation, which makes the cancer cells sensitive to the anti-cancer drug. An aberration at a drug resistant gene is herein also referred to as “drug resistant aberration, ” including but not limited to drug resistant mutation, which makes the cancer cells resistant to the anti-cancer drug. An aberration at a patient gene is herein also referred to as “patient gene aberration, ” including but not limited to patient gene mutation. An aberration at a patient target gene is herein also referred to as “patient target gene aberration, ” including but not limited to patient target gene mutation.

The “status” of an aberration at a gene may refer to the presence or absence of the aberration at the gene, or the aberrant level (expression or activity or modification level) of the gene. In some embodiments, the presence of an aberration (such as a mutation) in one or more drug sensitive genes as compared to a control indicates that (a) the individual is more likely to respond to an anti-cancer drug treatment or (b) the individual is selected for an anti-cancer drug treatment. In some embodiments, the absence of an aberration (such as a mutation) in one or more drug sensitive genes compared to a control, indicates that (a) the individual is less likely to respond to an anti-cancer drug treatment or (b) the individual is not selected for an anti-cancer drug treatment. In some embodiments, an aberrant level (such as expression level or activity level or modification level) of one or more drug sensitive genes and/or one or more drug resistant genes is correlated with the likelihood of the individual to respond to treatment. For example, a larger deviation of the level (e.g. expression or activity or modification level) of one or more drug sensitive genes in the direction of reducing or abolishing the gene function indicates that the individual is more likely to respond to an anti-cancer drug treatment. In some embodiments, a prediction model (e.g., composite score) based on the level (s) (e.g. expression level or activity level or modification level) of one or more drug sensitive genes and/or one or more drug resistant genes is used to predict (a) the likelihood of the individual to respond to an anti-cancer drug treatment and (b) whether to select the individual for an anti-cancer drug treatment. In some embodiments, the prediction model, including, for example, coefficient for each level, may be obtained by statistical analysis, such as regression analysis, using clinical trial data.

In some embodiments, there is provided a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected for treatment based on that the individual has an aberration (e.g., carries a mutation) in a target gene ( “a drug sensitive gene” ) which makes the cancer cells sensitive to the anti-cancer drug ( “drug sensitive aberration” (such as “drug sensitive mutation” ) ) , and wherein the drug sensitive gene (or drug sensitive mutation) is identified using any of the target gene identification methods described herein. In some embodiments, there is provided a method of treating a colorectal cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of a PARPi, wherein the individual is selected for treatment based on that the individual has a drug sensitive aberration (e.g., carries a drug sensitive mutation) in a drug sensitive gene, and wherein the drug sensitive gene is selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1.

In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a cancer who may benefit from a treatment comprising administration of an anti-cancer drug, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes identified using any of the target gene identification methods described herein, wherein the presence of the one or more drug sensitive aberrations (e.g., drug sensitive mutations) in the sample identifies the individual as one who may benefit from the treatment. In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a colorectal cancer who may benefit from a treatment comprising administration of a PARPi, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1, wherein the presence of the one or more drug sensitive aberrations (e.g., drug sensitive mutations) in the sample identifies the individual as one who may benefit from the treatment.

In some embodiments, provided herein is a method of selecting a treatment for an individual (e.g., human) having a cancer, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes identified using any of the target gene identification methods described herein, wherein the presence of the one or more drug sensitive aberrations (e.g., drug sensitive mutations) in the sample identifies a treatment comprising administration of an anti-cancer drug as a suitable treatment for the individual. In some embodiments, provided herein is a method of selecting a treatment for an individual (e.g., human) having a colorectal cancer, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1, wherein the presence of the one or more drug sensitive aberrations (e.g., drug sensitive mutations) in the sample identifies a treatment comprising administration of a PARPi as a suitable treatment for the individual.

In some embodiments, there is provided a method of excluding an individual (e.g., human) suffering from a cancer from a treatment comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is excluded if the individual has an aberration (e.g., carries a mutation) in a target gene ( “a drug resistant gene” ) which makes the cancer cells resistant to the anti-cancer drug ( “drug resistant aberration” (such as “drug resistant mutation” ) ) , and wherein the drug resistant gene is identified using any of the target gene identification methods described herein. In some embodiments, there is provided a method of excluding an individual (e.g., human) suffering from a colorectal cancer from a treatment comprising administering to the individual an effective amount of a PARPi, wherein the individual is excluded if the individual has a drug resistant aberration (e.g., carries a drug resistant mutation) in a drug resistant gene, and wherein the drug resistant gene is selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a cancer who may not benefit from a treatment comprising administration of an anti-cancer drug, the method comprising detecting in a sample from the individual one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes identified using any of the target gene identification methods described herein, wherein the presence of the one or more drug resistant aberrations (e.g., drug resistant mutations) in the sample identifies the individual as one who may not benefit from the treatment. In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a colorectal cancer who may not benefit from a treatment comprising administration of a PARPi, the method comprising detecting in a sample from the individual one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, wherein the presence of the one or more drug resistant aberrations (e.g., drug resistant mutations) in the sample identifies the individual as one who may not benefit from the treatment.

In some embodiments, provided herein is a method of excluding a treatment from an individual (e.g., human) having a cancer, the method comprising detecting in a sample from the individual one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes identified using any of the target gene identification methods described herein, wherein the presence of the one or more drug resistant aberrations (e.g., drug resistant mutations) in the sample excludes a treatment comprising administration of an anti-cancer drug as a suitable treatment for the individual. In some embodiments, provided herein is a method of excluding a treatment for an individual (e.g., human) having a colorectal cancer, the method comprising detecting in a sample from the individual one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, wherein the presence of the one or more drug resistant aberrations (e.g., drug resistant mutations) in the sample excludes a treatment comprising administration of a PARPi as a suitable treatment for the individual.

In some embodiments, there is provided a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on: i) aberrations (e.g., mutations) in one or more target genes ( “drug sensitive genes” ) which make the cancer cells sensitive to the anti-cancer drug ( “drug sensitive aberrations” ) , and ii) aberrations (e.g., mutations) in one or more target genes ( “drug resistant genes” ) which make the cancer cells resistant to the anti-cancer drug ( “drug resistant aberrations” ) , wherein the drug sensitive genes and drug resistant genes are identified using any of the target gene identification methods described herein, and wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations and the drug resistant aberrations is above a composite score threshold level.

In some embodiments, the method of treating cancer, or selecting or excluding a cancer treatment for a patient further comprises detecting the one or more drug sensitive aberrations (e.g., drug sensitive mutations) and/or the one or more drug resistant aberrations (e.g., drug resistant mutations) in a sample from the individual (e.g., by NGS) . In some embodiments, the method further comprises identifying the one or more drug sensitive genes and/or the one or more drug resistant genes. In some embodiments, the method further comprises detecting aberrant (e.g., reduced or absent) expression (e.g., RNA or protein) of the one or more drug sensitive genes and/or the one or more drug resistant genes compared to a control level, such as by qPCR, RNA-seq, mass spectrometry, western blot, or any other RNA or protein expression level detection methods. In some embodiments, the method further comprises detecting aberrant modification at the one or more drug sensitive genes and/or the one or more drug resistant genes compared to a control level, such as epigenetic modification (e.g., DNA methylation, histone methylation, histone acetylation) or post-translational modification (e.g., phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, lipidation and proteolysis) . Any known methods for detecting modification (s) on DNA, nucleosome, RNA, or protein can be used herein, such as ChIP-seq, ChIP-qPCR, DNase-seq, MNase-seq, mass spectrometry, western blot, etc. In some embodiments, the method further comprises detecting aberrant (e.g., reduced or absent) activity of expression product (e.g., RNA or protein) of the one or more drug sensitive genes and/or the one or more drug resistant genes compared to a control level. Any suitable gene function/activity testing methods can be used herein, such as detecting signal transduction, activation status (e.g., phosphorylation status) of downstream pathway molecules, protein-protein binding affinity and/or specificity, metabolism, cell behavior (e.g., cell proliferation, death, cell cycle) , cytokine release, etc. In some embodiments, the method further comprises obtaining a composite score for the individual. In some embodiments, the composite score is based on one or more of drug sensitive aberrations and/or drug resistant aberrations, such as one or more of drug sensitive mutations, drug resistant mutations, aberrant expression of the drug sensitive genes, aberrant expression of the drug resistant genes, aberrant activity of expression products of the drug sensitive genes, aberrant activity of expression products of the drug resistant genes, aberrant modification of the drug sensitive genes (or gene product) , and aberrant modification of the drug resistant genes (or gene product) , etc. In some embodiments, the composite score is obtained by subtracting (the number of drug resistant genes with drug resistant aberrations carried by the patient) from (the number of drug sensitive genes with drug sensitive aberrations carried by the patient) , wherein the individual is selected for treatment if the composite score is above zero. In some embodiments, the severity of the drug sensitive mutation or drug resistant mutation in the patient adds weight to the composite score, for example, a drug sensitive mutation that affects the expression and/or activity of a drug sensitive gene adds more weight to the composite score compared to another drug sensitive mutation that affects less of the expression and/or activity of the same drug sensitive gene. In some embodiments, the degree of the aberrant expression of a drug sensitive gene or a drug resistant gene in the patient compared to a control level (e.g., healthy individual) adds weight to the composite score, for example, loss of expression of a drug sensitive gene adds more weight to the composite score compared to reduced expression of the same drug sensitive gene. In some embodiments, the degree of the aberrant activity (e.g., RNA or protein activity) of a drug sensitive gene or a drug resistant gene in the patient compared to a control level (e.g., healthy individual) adds weight to the composite score, for example, loss of protein activity (e.g., abolished binding) of a drug sensitive gene adds more weight to the composite score compared to reduced protein activity (e.g., reduced binding) of the same drug sensitive gene. In some embodiments, the degree of the aberrant modification (e.g., modification of DNA, nucleosome, RNA, or protein) of a drug sensitive gene or a drug resistant gene in the patient compared to a control level (e.g., healthy individual) adds weight to the composite score, for example, loss of protein phosphorylation of a drug sensitive gene (e.g., abolishes signal transduction) adds more weight to the composite score compared to reduced protein phosphorylation of the same drug sensitive gene. In some embodiments, the composite score is obtained by subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , wherein the individual is selected for treatment if the composite score is above zero. In some embodiments, the method further comprises ranking the drug sensitive genes and drug resistant genes identified using any of the target gene identification methods described herein, wherein the ranking of the drug resistant genes or drug sensitive genes is based on the degree of enrichment or the degree of depletion (e.g., fold of enrichment, fold of depletion, enrichment FDR, or depletion FDR) of the sgRNA or sgRNA ^iBAR guide sequences (or sequences comprising the hit gene mutations) in the post-treatment cancer cell population (e.g., alive) compared to the control cancer cell population. In some embodiments, the ranking of the drug resistant genes or drug sensitive genes is further adjusted i) based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence of the same target gene, or ii) based on data consistency among all guide sequences corresponding to the same target gene (or same target site of the same target gene) , or iii) based on data consistency among all sequences comprising the hit gene mutation (e.g., inactivating mutation) corresponding to the same target gene (or same target site of the same target gene) . In some embodiments, RRA or α-RRA algorithm is used for ranking the drug resistant genes and/or drug sensitive genes. In some embodiments, the method further comprises assigning a sensitivity score to the identified drug sensitive gene, and/or a resistance score to the identified drug resistant gene, i) wherein drug resistant genes are ranked from high to low based on the fold of enrichment (or based on enrichment FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA or sgRNA ^iBAR guide sequences (or sequences comprising the hit gene mutations) in the post-treatment cancer cell population (e.g., alive) compared to the control cancer cell population, and each drug resistant gene is assigned a resistance score from high to low accordingly; and/or ii) wherein drug sensitive genes are ranked from high to low based on the fold of depletion (or based on depletion FDR -the smaller the FDR, the higher the ranking; or based on the degree of data consistency –the higher the degree of data consistency, the higher the ranking) of the sgRNA or sgRNA ^iBAR guide sequences (or sequences comprising the hit gene mutations) in the post-treatment cancer cell population (e.g., alive) compared to the control cancer cell population, and each drug sensitive gene is assigned a sensitivity score from high to low accordingly.

Besides the above methods, the composite score can be calculated, and/or the composite score threshold level can be selected, using any methods known in the art. For example, see response score or recombination proficiency score (RPS) in US20160369353, also see US20200254259, US20180068083, the contents of each of which are incorporated herein by reference in their entirety.

In some embodiments, for a particular cancer type (e.g., colorectal cancer) and/or a particular anti-cancer drug (e.g., PARPi) , parameter “m” is the total number of drug resistant genes and drug sensitive genes identified using any of the methods described herein; or is the total number of genes in the combination of drug resistant gene panel (i) AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, and drug sensitive gene panel (ii) ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1. In some embodiments, from a sample of an individual, one or more patient aberrations (e.g., mutation, or aberrant expression/activity/modification) , such as one or more patient mutations (e.g., nonsynonymous, nonsense, missense, frameshift, insertion, deletion, stop-loss, stop-gain, mutation that results in mis-splicing, gene fusion, etc. ) , are identified in one or more patient genes that belong to the combination of drug resistant genes and drug sensitive genes identified using any of the methods described herein, or belong to the combination of panels (i) and (ii) above. In some embodiments, from a sample of an individual, no patient aberration (e.g., patient mutation) is identified in a patient gene that belong to the combination of drug resistant genes and drug sensitive genes identified using any of the methods described herein, or belong to the combination of panels (i) and (ii) above. Patient gene (s) or patient aberrations (e.g., mutation, or aberrant expression/activity/modification) identified from a patient (e.g., patient sample, such as by NGS) that belong to drug resistant genes or drug sensitive genes identified using any of the methods described herein, or in the above (i) and (ii) panels of target genes, are hereinafter also referred to as “patient target gene (s) ” or “patient target aberration (s) ” (such as “patient target mutation (s) ” ) , respectively. Parameter “m” is an integer of at least 1, and is a constant for specific cancer type and specific anti-cancer drug.

In some embodiments, the composite score is calculated based on one or more patient-related parameters, such as i) the number of deleterious mutation (s) (e.g., nonsynonymous, nonsense, missense, frameshift, insertion, deletion, stop-loss, stop-gain, mutation that results in mis-splicing, gene fusion, etc. ) carried on each patient target gene identified (e.g., via NGS) from the patient (parameter “n” ) , ii) the estimated fraction of cells carrying a specific deleterious mutation in a specific patient target gene identified (e.g., via NGS) from the patient (parameter

) , iii) the log-scale (e.g., log2) fold change of expression level of a patient target gene in patient disease tissue vs. normal tissue (parameter “LFC” ) , etc. In some embodiments, the one or more patient-related parameters are derived based on data/information from patient sample, such as sequencing read counts. Parameter

denotes estimated fraction of cells carrying j ^th mutation in i ^th patient target gene identified from the patient.

“n” is an integer of at least 1, and is the total number of detected deleterious patient target mutations of the corresponding identified patient target gene. “j” is an integer, and 1 ≤ j ≤ n. “i” is an integer, and 0 ≤ i ≤ m. When i = 0, it indicates that from the sample of the individual, no deleterious mutation is identified in any patient gene that belongs to the combination of drug resistant genes and drug sensitive genes identified using any of the methods described herein, or belongs to the combination of gene panels (i) and (ii) above. In some embodiments, the fraction of cells carrying j ^th mutation in i ^th patient target gene is estimated based on the fraction of sequences comprising the j ^th mutation among all sequences comprising a mutation in the i ^th patient target gene identified from the patient sample. Parameter “LFC _i” denotes the log-scale (e.g., log2) fold change of expression level of i ^th patient target gene in patient disease tissue vs. normal tissue. Expression level of a patient target gene can be measured using any known methods, such as RNA-seq, qPCR, mass spectrometry, western blot, FISH, immunofluorescence staining, etc.

In some embodiments, the composite score is calculated based on one or more gene-related parameters, such as i) the correlation (positive correlation or negative correlation) between a patient target gene and an anti-cancer drug treatment (e.g., at IC50) (parameter “r” ) , which is derived from machine learning (e.g., based on training models from public data on cell lines) , ii) the normalized weight of a patient target gene in response to an anti-cancer drug treatment (parameter

) , which is derived from machine learning (e.g., based on training models from public data on cell lines) , iii) the predicted impact of a deleterious mutation of a patient target gene (parameter “η” ; e.g., based on harmfulness prediction with public databases, such as aberrant gene or gene product activity) , iv) the ratio of net survival contribution of a patient target gene to total survival at a given time point according to the Kaplan-Meier survival curve (parameter

e.g., based on The Cancer Genome Atlas (TCGA) database and/or cBioPortal database) , v) the log-scale (e.g., log2) fold change of expression level of a patient target gene in disease tissue vs. normal tissue (parameter “LFC” ; e.g., based on patient databases, i.e., information collected from patients having the same cancer) , etc. In some embodiments, the one or more gene-related parameters are derived based on data in public or patient database (s) , for training the composite score model. Parameter “r _i” denotes the correlation (positive correlation or negative correlation) between i ^th patient target gene identified from the patient and an anti-cancer drug treatment (e.g., at IC50) , which is derived from machine learning. Parameter

denotes the normalized weight of i ^th patient target gene in response to an anti-cancer drug treatment (i.e., the contribution of the loss-of-function of i ^th patient target gene to the anti-cancer drug treatment) , which is derived from machine learning. Parameter “η _ij” denotes the predicted impact of the j ^th deleterious mutation of i ^th patient target gene (e.g., based on harmfulness prediction with public databases, or is a manually assigned constant) . Parameter

denotes the ratio of net survival contribution of i ^th patient target gene to total survival at a given time point according to the Kaplan-Meier survival curve (e.g., based on TCGA and/or cBioPortal databases) . Parameter “LFC _i” denotes the log-scale (e.g., log2) fold change of expression level of i ^th patient target gene in disease tissue vs. normal tissue (e.g., based on based on patient database, i.e., information collected from patients having the same cancer) . “i” is an integer, and 0 ≤ i ≤ m. “j” is an integer, and 1 ≤ j ≤ n.

In some embodiments, the composite score is calculated based on one or more pathway-related parameters, such as i) the estimated weight of a patient target gene in pathway (s) and/or regulatory network (s) involving the patient target gene (parameter

e.g., based on public database (s) such as KEGG and InterProScan) , ii) the normalized weight of a patient target gene in anti-cancer drug-related pathway (s) (parameter “ψ” ; e.g., based on public database (s) ) , etc. In some embodiments, the one or more pathway-related parameters are derived based on data in public database (s) , for training the composite score model. Parameter

denotes the estimated weight of i ^th patient target gene in pathway (s) and/or regulatory network (s) involving i ^th patient target gene (e.g., based on public database (s) such as KEGG and InterProScan) . Parameter “ψ _i” denotes the normalized weight of i ^th patient target gene in anti-cancer drug-related pathway (s) , e.g., based on public database (s) . “i” is an integer, and 0 ≤ i ≤ m.

In some embodiments, the composite score is calculated based on one or more parameters selected from one or more of patient-related parameters, gene-related parameters, and pathway-related parameters described herein.

In some embodiments, the composite score is calculated using Formula I:

wherein

a, b, and c are constants for model tuning (e.g., constants derived from trained model for corresponding anti-cancer drug) , wherein -1 ≤ a ≤ 1, -1 ≤ b ≤ 1, and -1 ≤ c ≤ 1;

m is the total number of drug resistant genes and drug sensitive genes identified using any of the target gene identification methods described herein or is the total number of target genes in the combination of panels (i) and (ii) described above;

n is the number of deleterious mutation (s) detected on i ^th patient target gene in the patient;

is the estimated fraction of patient cells carrying j ^th deleterious mutation in i ^th patient target gene, wherein

r _i is the correlation (positive correlation or negative correlation) between i ^th patient target gene and the anti-cancer drug treatment (e.g., at IC50) ;

is the normalized weight of i ^th patient target gene in response to the anti-cancer drug treatment;

η _ij is the predicted impact of the j ^th deleterious mutation of i ^th patient target gene;

is the ratio of net survival contribution of i ^th patient target gene to total survival at a given time point according to the Kaplan-Meier survival curve;

LFC _i is the log-scale (e.g., log2) fold change of expression level of i ^th patient target gene in disease tissue vs. normal tissue;

is the estimated weight of i ^th patient target gene in pathway (s) and/or regulatory network (s) involving i ^th patient target gene;

ψ _i is the normalized weight of i ^th patient target gene in the anti-cancer drug-related pathway (s) ; wherein i and j are both integers, 0 ≤ i ≤ m, and 1 ≤ j ≤ n; and

Z (LFC _i) is the standard score ( “Z-score” ) of LFC _i:

wherein

is the median log-scale (e.g., log2) fold change of expression level of i ^th patient target gene in disease tissue vs. normal tissue (e.g., based on patient databases, i.e., information collected from patients having the same cancer) ; and

wherein σ _i is the standard deviation of log-scale (e.g., log2) fold change of expression level of i ^th patient target gene in disease tissue vs. normal tissue (e.g., based on patient databases, i.e., information collected from patients having the same cancer) .

In some embodiments, the composite score threshold level is 0. In some embodiments, if the composite score of the patient according to Formula I is above 0, the patient is suitable for (i.e., may benefit from) the anti-cancer drug treatment. In some embodiments, if the composite score of the patient according to Formula I is above or equal to at least 0.1 (e.g., 0.3) , the patient is selected for or is recommended for the anti-cancer drug treatment. In some embodiments, if the composite score of the patient according to Formula I is more than 0 but less than 0.1, the patient is suitable for the anti-cancer drug treatment, but should be further evaluated using other method (s) (e.g., drug dosage test, cancer genetic testing (e.g., look for additional synergistic mutations that may contribute to the anti-cancer drug treatment, or verify the primary cancer type) , etc. ) or based on other information (e.g., patient’s clinical record or known drug resistance, etc. ) to determine whether the patient should be selected or recommended for the anti-cancer drug treatment. In some embodiments, if the composite score of the patient according to Formula I is below or equal to 0, the patient is not suitable for (i.e., may not benefit from) or is excluded from the anti-cancer drug treatment. In some embodiments, further evaluation using other method (s) (e.g., drug dosage test, cancer genetic testing (e.g., look for additional synergistic mutations that may contribute to the anti-cancer drug treatment, or verify the primary cancer type) , etc. ) or based on other information (e.g., patient’s clinical record or known drug resistance, etc. ) should be conducted if the composite score of the patient according to Formula I is equal to 0, or very close to 0 (e.g., -0.1 to 0) , before completely ruling out the patient from receiving the anti-cancer drug treatment.

Thus in some embodiments, there is provided a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on: i) one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes, and ii) one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes, wherein the drug sensitive genes and drug resistant genes are identified using any of the target gene identification methods described herein, and wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) is above a composite score threshold level; wherein the composite score is obtained by subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , and the composite score threshold level is zero. In some embodiments, there is provided a method of treating a cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on: i) one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes, and ii) one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes, wherein the drug sensitive genes and drug resistant genes are identified using any of the target gene identification methods described herein, and wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) according to Formula I is above zero (e.g., above or equal to at least 0.1 (e.g., 0.3) ) . In some embodiments, the method further comprises detecting the one or more drug sensitive aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) and the one or more drug resistant aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) in a sample from the individual.

In some embodiments, there is provided a method of treating a colorectal cancer in an individual (e.g., human) , comprising administering to the individual an effective amount of a PARPi, wherein the individual is selected based on: i) one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1, and ii) one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, and wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) is above a composite score threshold level. In some embodiments, the method further comprises detecting the one or more drug sensitive aberrations (e.g., drug sensitive mutations) and the one or more drug resistant aberrations (e.g., drug resistant mutations) in a sample from the individual. In some embodiments, the method further comprises obtaining a composite score for the individual. In some embodiments, the composite score is obtained by subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , wherein the individual is selected for treatment if the composite score is above zero. In some embodiments, the composite score is obtained according to Formula I, wherein the individual is selected for treatment if the composite score is above zero (e.g., above or equal to at least 0.1 (e.g., 0.3) ) .

In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a cancer who may benefit from a treatment comprising administration of an anti-cancer drug, the method comprising: detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes identified using any of the target gene identification methods described herein, and one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes identified using any of the target gene identification methods described herein, wherein a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) above a composite score threshold level identifies the individual as one who may benefit from the treatment. In some embodiments, provided herein is a method of identifying an individual (e.g., human) having a colorectal cancer who may benefit from a treatment comprising administration of a PARPi, the method comprising: detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1, and one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, wherein a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) above a composite score threshold level identifies the individual as one who may benefit from the treatment. In some embodiments, the method further comprises detecting the one or more drug sensitive aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) and the one or more drug resistant aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) in a sample from the individual. In some embodiments, the method further comprises obtaining a composite score for the individual. In some embodiments, the composite score is obtained by subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , wherein the composite score of above zero identifies the individual as one who may benefit from the treatment. In some embodiments, the composite score is obtained according to Formula I, wherein the composite score of above zero (e.g., above or equal to at least 0.1 (e.g., 0.3) ) identifies the individual as one who may benefit from the treatment.

In some embodiments, provided herein is a method of selecting a treatment for an individual (e.g., human) having a cancer, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes identified using any of the target gene identification methods described herein, and one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes identified using any of the target gene identification methods described herein, wherein a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) in the sample above a composite score threshold level identifies a treatment comprising administration of an anti-cancer drug as a suitable treatment for the individual. In some embodiments, provided herein is a method of selecting a treatment for an individual (e.g., human) having a colorectal cancer, the method comprising detecting in a sample from the individual one or more drug sensitive aberrations (e.g., drug sensitive mutations) in one or a plurality of drug sensitive genes selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1, and one or more drug resistant aberrations (e.g., drug resistant mutations) in one or a plurality of drug resistant genes selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2, wherein a composite score of the drug sensitive aberrations (e.g., drug sensitive mutations) and the drug resistant aberrations (e.g., drug resistant mutations) in the sample above a composite score threshold level identifies a treatment comprising administration of a PARPi as a suitable treatment for the individual. In some embodiments, the method further comprises detecting the one or more drug sensitive aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) and the one or more drug resistant aberrations (e.g., mutation, aberrant expression, aberrant activity, aberrant modification) in a sample from the individual. In some embodiments, the method further comprises obtaining a composite score for the individual. In some embodiments, the composite score is obtained by subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , wherein the composite score of above zero identifies the treatment comprising administration of the PARPi as suitable treatment for the individual. In some embodiments, the composite score is obtained according to Formula I, wherein the composite score of above zero (e.g., above or equal to at least 0.1 (e.g., 0.3) ) identifies the treatment comprising administration of the PARPi as suitable treatment for the individual.

IV. Modified cancer cells and methods of generation

One aspect of the present invention provides methods of generating modified cancer cells, such as modified cancer cells resistant to an anti-cancer drug, or sensitive to an anti-cancer drug. In some embodiments, the method of generating a modified cancer cell comprises inactivating one or more target genes identified by any of the target gene identification methods described herein in the cancer cell. Further provided are modified cancer cells generated by any of the methods described herein.

In some embodiments, the method of generating a modified cancer cell comprises creating one or more mutations (e.g., inactivating mutations) at one or more target genes identified by any of the target gene identification methods described herein. In some embodiments, the method comprises contacting an initial population of cancer cells with a mutagenic agent, and selecting modified cancer cells comprising one or more mutations (e.g., inactivating mutations) at one or more target genes identified herein. Methods of detecting such mutations are well known in the art, such as by PCR. In some embodiments, the method comprises creating one or more mutations (e.g., inactivating mutations) at one or more target genes identified herein in a cancer cell by gene editing, such as any gene editing methods known in the art or described herein. For example, non-homologous end joining (NHEJ) -or homologous recombination-mediated gene disruption, or ZFN-, TALEN-, or CRISPR/Cas-mediated gene disruption. In some embodiments, the method of generating a modified cancer cell comprises introducing an sgRNA construct into a host cancer cell, wherein the sgRNA construct comprises or encodes an sgRNA (e.g., an sgRNA, or a vector (e.g., viral vector such as lentiviral vector) carrying a nucleic acid encoding the sgRNA) , wherein the sgRNA comprises a guide sequence that is complementary (e.g., at least about any of 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%complementary) to a target site in a target gene identified herein. In some embodiments, the method further comprises introducing a vector (e.g., viral vector such as lentiviral vector) carrying a nucleic acid encoding a Cas protein (e.g., Cas9) , or a Cas (e.g., Cas9) mRNA, into the host cancer cell or the host cancer cell comprising said sgRNA construct. In some embodiments, the host cancer cell comprises a Cas component. In some embodiments, the sgRNA construct against the target gene, and/or the Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., vector, or mRNA) , are introduced into the host cancer cell simultaneously. In some embodiments, the nucleic acid encoding the target gene sgRNA, and/or the nucleic acid encoding the Cas protein, are on the same vector, either under the same promoter control, or under separate promoter controls. In some embodiments, the nucleic acid encoding the target gene sgRNA, and/or the nucleic acid encoding the Cas protein are connected by one or more IRES linking sequences and under the same promoter control. In some embodiments, the nucleic acid encoding the target gene sgRNA, and/or the nucleic acid encoding the Cas protein, are on different vectors. In some embodiments, the sgRNA construct against the target gene, and/or the Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein (e.g., vector, or mRNA) , are introduced into the host cancer cell sequentially.

In some embodiments, when a population of host cancer cells (or initial population of cancer cells) are used for the production of modified cancer cells described herein, the methods also include one or more isolation and/or enrichment steps, for example, isolating and/or enriching cancer cells that comprise one or more mutations (e.g., inactivating mutations) in the target gene, the target gene sgRNA construct, or the Cas component, from the population of cancer cells contacted with any of the modifying agents described herein. Such isolation and/or enrichment steps can be performed using any known techniques in the art and described herein, such as magnetic-activated cell sorting (MACS) . Also see methods described in “optional enrichment step” subsections above.

In some embodiments, the target gene sgRNA construct, and/or the Cas component, are introduced into the host cancer cells by transducing/transfecting the nucleic acid (DNA or RNA) or vector encoding thereof (e.g., non-viral vector, or viral vector such as lentiviral vector) , or a virus (e.g., lentivirus) comprising a nucleic acid encoding thereof. In some embodiments, the Cas component (e.g., Cas9 protein) is introduced into the host cancer cells by inserting proteins into the cell membrane while passing cells through a microfluidic system, such as CELL

(see, for example, U.S. Patent Application Publication No. 20140287509) .

Methods of introducing vectors (e.g., viral vectors) or isolated nucleic acids into a mammalian cell are known in the art. The nucleic acids or vectors described herein can be transferred into a cancer cell by physical, chemical, or biological methods.

Physical methods for introducing a vector (e.g., viral vector) into a cancer cell include calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, and the like. Methods for producing cells comprising vectors and/or exogenous nucleic acids are well-known in the art. See, for example, Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York. In some embodiments, the vector (e.g., viral vector) is introduced into the cancer cell by electroporation.

Biological methods for introducing a vector into a cancer cell include the use of DNA and RNA vectors. Viral vectors have become the most widely used method for inserting genes into mammalian, e.g., human cells.

Chemical means for introducing a vector (e.g., viral vector) into a cancer cell include colloidal dispersion systems, such as macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes. An exemplary colloidal system for use as a delivery vehicle in vitro is a liposome (e.g., an artificial membrane vesicle) .

In some embodiments, RNA molecules (e.g., sgRNA, or mRNA encoding Cas9) may be prepared by a conventional method (e.g., in vitro transcription) and then introduced into the cancer cell via known methods such as mRNA electroporation. See, e.g., Rabinovich et al., Human Gene Therapy 17: 1027-1035.

In some embodiments, the viral vectors (lentiviral vector) or viruses (e.g., lentiviruses) comprising the nucleic acid encoding any of the target gene sgRNAs, target gene sgRNAs ^iBAR, and/or Cas protein described herein are contacted with the host cancer cell (or initial cancer cell population) , e.g., at an MOI of at least about 1, such as at least about any of 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 8, 9, or 10. In some embodiments, the viral vectors (lentiviral vector) or viruses (e.g., lentiviruses) comprising the nucleic acid encoding any of the target gene sgRNAs, target gene sgRNAs ^iBAR, and/or Cas protein described herein are contacted with the host cancer cell (or initial cancer cell population) at an MOI of about 3.

In some embodiments, the transduced/transfected cancer cell is propagated ex vivo after introduction of the vector or isolated nucleic acid. In some embodiments, the transduced/transfected cancer cell is cultured to propagate for at least about any of 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 10 days, 12 days, or 14 days. In some embodiments, the transduced/transfected cancer cell is further evaluated or screened to select desired modified cancer cells described herein.

Reporter genes may be used for identifying potentially transfected/transduced cells and for evaluating the functionality of regulatory sequences. In general, a reporter gene is a gene that is not present in or expressed by the recipient organism or tissue and that encodes a polypeptide whose expression is manifested by some easily detectable property, e.g., enzymatic activity. Expression of the reporter gene is assayed at a suitable time after the DNA/RNA has been introduced into the recipient cells. Suitable reporter genes may include genes encoding luciferase, beta-galactosidase, chloramphenicol acetyl transferase, secreted alkaline phosphatase, or the green fluorescent protein (GFP) gene (e.g., Ui-Tei et al. FEBS Letters 479: 79-82 (2000) ) . Suitable expression systems are well known and may be prepared using known techniques or obtained commercially. Antibiotic selection markers can also be used to identifying potentially transfected/transduced cells.

Other methods to confirm the presence of any of the nucleic acids described herein (e.g., sgRNA construct) or the presence of a mutation (e.g., inactivating mutation) in a target gene in a modified cancer cell, include, for example, molecular biological assays well known to those of skill in the art, such as Southern and Northern blotting, RT-PCR, PCR, DNA-seq, or RNA-seq; biochemical assays, such as detecting the presence or absence of a particular peptide, e.g., by immunological methods (such as ELISAs and Western blots) , Fluorescence-activated cell sorting (FACS) , or Magnetic-activated cell sorting (MACS) .

In some embodiments, there is provided a modified colorectal cancer cell comprising one or more mutations (e.g., inactivating mutations such as knock-out) in one or more target genes, wherein the target gene is selected from the group consisting of ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1. In some embodiments, there is provided a modified colorectal cancer cell comprising one or more mutations (e.g., inactivating mutations such as knock-out) in one or more target genes, wherein the target gene is selected from the group consisting of AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

In some embodiments, there is provided a method of screening for an anti-cancer drug capable of treating a cancer (e.g., colorectal cancer) in an individual (e.g., human) , wherein the cancer comprises one or more drug resistant mutations in one or more drug resistant genes identified using any of the target gene identification methods described herein, the method comprising: a) providing a cancer cell library comprising the one or more drug resistant mutations in the one or more drug resistant genes, b) separately contacting the cancer cell library with one or a plurality of candidate anti-cancer drugs, wherein the candidate anti-cancer drug capable of inhibiting the growth of the cancer cell library above a certain threshold (e.g., inhibit at least about 10%, 20%, 30%, 40%, 50%, or more growth) is identified as the anti-cancer drug capable of treating the cancer in the individual.

V. Kits and articles of manufacture

The present application further provides kits and articles of manufacture for use in any embodiment of the methods of identifying a target gene in a cancer cell described herein, such as using the sgRNA libraries or sgRNA ^iBAR libraries described herein. Also provided are kits and articles of manufacture for generating modified cancer cells sensitive or resistant to an anti-cancer drug.

In some embodiments, there is provided a kit for identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising any of the sgRNA libraries or sgRNA ^iBAR libraries described herein. In some embodiments, the kit further comprises a Cas protein or a nucleic acid encoding the Cas protein (e.g., Cas9) . In some embodiments, the kit further comprises one or more positive and/or negative control sets of sgRNA ^iBAR constructs, or one or more positive and/or negative control of sgRNA constructs. In some embodiments, the kit further comprises the anti-cancer drug, and/or the initial population of cancer cells, or cancer cells comprising the Cas component. In some embodiments, the kit further comprises data analysis software. In some embodiments, the kit comprises instructions for carrying out any one of the methods described herein.

In some embodiments, there is provided a kit for identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising any of the cancer cell libraries described herein, such as cancer cell libraries comprising mutations (e.g., inactivating mutations) in some or all hit genes in the genome (or in cancer-related genes) , or cancer cell libraries comprising any of the sgRNA libraries or sgRNA ^iBAR libraries described herein. In some embodiments, the kit further comprises a Cas protein or a nucleic acid encoding the Cas protein. In some embodiments, the kit further comprises the anti-cancer drug. In some embodiments, the kit further comprises a control cancer cell library, such as having one or more mutations (e.g., inactivating mutations) at non-gene region in the genome, or comprising one or more endogenous cancer mutations, or comprising one or more positive and/or negative control of sgRNA constructs or one or more positive and/or negative control sets of sgRNA ^iBAR constructs. In some embodiments, the kit further comprises data analysis software. In some embodiments, the kit comprises instructions for carrying out any one of the methods described herein.

The kit may contain additional components, such as containers, reagents, culturing media, primers, buffers, enzymes, and the like to facilitate execution of any one of the screening methods described herein. In some embodiments, the kit comprises reagents, buffers and vectors for introducing the sgRNA library or sgRNA ^iBAR library and the Cas protein or nucleic acid encoding the Cas protein to the cancer cell. In some embodiments, the kit comprises primers, reagents and enzymes (e.g., polymerase) for preparing a sequencing library of sequences comprising hit gene mutations (e.g., inactivating mutations) , sgRNA sequences, or sgRNA ^iBAR sequences extracted from the post-treatment cancer cell population.

The kits of the present application are in suitable packaging. Suitable packaging includes, but is not limited to, vials, bottles, jars, flexible packaging (e.g., sealed Mylar or plastic bags) , and the like. Kits may optionally provide additional components such as buffers and interpretative information. The present application thus also provides articles of manufacture, which include vials (such as sealed vials) , bottles, jars, flexible packaging, and the like.

The article of manufacture can comprise a container and a label or package insert on or associated with the container. Suitable containers include, for example, bottles, vials, syringes, etc. The containers may be formed from a variety of materials such as glass or plastic. Generally, the container holds a composition (e.g., modified cancer cells sensitive or resistant to an anti- cancer drug) , and may have a sterile access port. Package insert refers to instructions customarily included in commercial packages that contain information about the instructions and/or warnings concerning the use of such products. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters.

EXAMPLES

The examples and exemplary embodiments below are intended to be purely exemplary of the invention and should therefore not be considered to limit the invention in any way. The following examples and detailed description are offered by way of illustration and not by way of limitation.

Example 1. Identification of drug sensitive genes and drug resistant genes in cancer cells

This example provides exemplary methods for identifying drug sensitive genes and/or drug resistant genes. Briefly, a cancer cell library carrying sgRNA ^iBAR targeting cancer-related genes was constructed for Cas9-mediated gene knock-out (KO) . By examining anti-cancer drug (e.g., PARPi) killing efficacy of the Cas9 ⁺ sgRNA ^iBAR cancer cell library constructed, genes conferring resistant phenotype or sensitive phenotype to anti-cancer drug (e.g., PARPi) killing after KO can be identified. FIGs. 1-2 show the exemplary workflow.

1. Design and construction of sgRNA ^iBAR library

Based on public databases, genes with DNA mutation frequency ≥5%and RNA expression level up-or down-regulated by more than 2-fold from patients with stage III and IV colorectal cancer (expressed in cell, or on cell surface) were selected as library genes for further sgRNA ^iBAR design (total 1323 genes) .

sgRNA ^iBAR library was designed and constructed similarly as described in WO2020125762 and Zhu et al. ( “Guide RNAs with embedded barcodes boost CRISPR-pooled screens, ” Genome Biol. 2019; 20: 20) , the contents of each of which are incorporated herein by reference in their entirety. Briefly, 1323 genes selected above were retrieved from UCSC human genome. sgRNAs targeting each gene were designed using the DeepRank algorithm (see Zhu et al. ) , each gene had three different targeting sgRNAs, and four 6-bp iBARs (iBAR ₆s) were randomly assigned to each sgRNA ( “sgRNA ^iBAR” ) . The internal barcode sequence was designed to be placed in the tetra loop of the gRNA scaffold outside of the Cas9-sgRNA ribonucleoprotein complex, which did not affect the activity of its upstream guide sequence. In addition, 500 control sgRNAs not targeting any human genes were designed as negative control, and four iBAR ₆s were randomly assigned to each control sgRNA ( “control sgRNA ^iBAR” ) . The designed CRISPR sgRNA ^iBAR library therefore included a total of 17876 sgRNAs ^iBAR (target and control) .

DNA oligonucleotides encoding the sgRNAs ^iBAR were designed and synthesized (by Twist Bioscience) , then PCR amplified. PCR products were purified with PCR purification kit, then cloned via Golden Gate cloning into lentiviral sgRNA ^iBAR-expressing backbone modified in house based on pLenti-sgRNA-Lib (addgene #53121) to obtain sgRNA ^iBAR plasmids, which encodes 15876 sgRNAs ^iBAR covering 1323 human genes (3 sets of sgRNA ^iBAR for each gene targeting 3 different target sites, each set of sgRNA ^iBAR contains 4 sgRNAs ^iBAR) , and 2000 control sgRNAs ^iBAR targeting 500 non-gene regions (1 set of sgRNA ^iBAR for each non-gene region, each set of sgRNA ^iBAR contains 4 sgRNAs ^iBAR) .

In order to ensure the abundance of sgRNAs ^iBAR in the cancer cell library (at least 1000-fold coverage for each sgRNA ^iBAR) , 10 electroporation reactions were performed using sgRNA ^iBAR plasmids obtained above. For each electroporation reaction, 1 μL sgRNA ^iBAR plasmids were added into a sterile 1.5 mL Eppendorf tube, 50 μL competent cells (E. coli) were further added to the tube and swirled, then electroporation was conducted. 950 μL Super Optimal Broth (SOC) medium without antibiotics was immediately added to each reaction tube, gently pipetted to mix, then incubated in a shaker at 37℃ and 225 rpm for 1 hour. The resulting bacteria were transferred to 1 L LB liquid medium supplemented with Ampicillin, cultured overnight in a shaker at 37℃ and 225 rpm. The next day, plasmid extraction was performed on the obtained bacteria using

Plasmid Purification Kit (QIAGEN, #12391) .

sgRNA ^iBAR library lentiviruses were obtained using standard protocol. Briefly, 1×10 ⁷ 293T cells were placed in a 150 mm cell culture dish, 20 mL cell culture medium was added, then 293T cells were cultured overnight in a 37℃, 5%CO ₂ incubator. The next day, culture medium was discarded, 10 mL fresh serum-free medium was added to the 293T cell. The transfection complex was prepared using serum-free medium (4 mL) , sgRNA ^iBAR library plasmids obtained above (20 μg) , pCMVR8.74 plasmid (20 μg) , and pCMV-VSV-G plasmid (2 μg); after mixing, 105 μL PEI was added; after mixing, the transfection complex was let stand for 15 minutes in room temperature. The transfection complex was then added to 293T cells in 10 mL fresh serum-free medium, incubated in an incubator at 37℃, 5%CO ₂ for 6 hours. Cell medium was discarded. 20 mL fresh complete medium was added to 293T cells, then incubated in an incubator at 37℃, 5%CO ₂.3 days later, culture medium was collected and centrifuged at 1000 rpm, 4℃. The supernatant containing sgRNA ^iBAR library lentiviruses was collected, viral titer was measured, and aliquoted for future use.

2. Construction of Cas9 ⁺ sgRNA ^iBAR cancer cell library

HCT116 (human colon cancer cell line) and SW480 (human colorectal adenocarcinoma cell line) were chosen for Cas9 ⁺ sgRNA ^iBAR cancer cell library construction and drug treatment.

2×10 ⁵ cancer cells from each cell line were seeded in 6-well plate and cultured in 37℃, 5%CO ₂ incubator. After 24 hours, 100 μL Cas9 packaged lentivirus was added into the cell medium, and cancer cells were cultured in 37℃, 5%CO ₂ incubator. After 24 hours, the medium was discarded, and fresh complete medium was added to the cancer cells. The cancer cells were allowed to grow for 7 days in 37℃, 5%CO ₂ incubator, then sorted with FACS using mCherry marker (carried on the Cas9-lentiviral vector) . The sorted cancer cells with mCherry fluorescence were Cas9 expressing (Cas9 ⁺) cells, and were expanded for Cas9 ⁺ sgRNA ^iBAR library construction.

In order to ensure at least 1000-fold sgRNA ^iBAR coverage in the Cas9 ⁺ sgRNA ^iBAR cancer cell library, sgRNA ^iBAR library lentivirus obtained above were added to 2×10 ⁷ Cas9 ⁺ cancer cells in medium (no antibiotics) at an MOI of 3 and gently mixed. Cas9 ⁺ cancer cells were cultured for 24 hours in a 37℃, 5%CO ₂ incubator for infection. The next day, the medium was discarded, fresh complete medium was added to the Cas9 ⁺ cancer cells, then cultured in a 37℃, 5%CO ₂ incubator. Cas9 ⁺ cancer cells were passaged every 3 days, in fresh complete medium supplemented with Puromycin. Cas9 ⁺ cancer cells not successfully transfected with sgRNA ^iBAR plasmids would die. After two consecutive passages, sgRNA ^iBAR cancer cell library was obtained (hereinafter also referred to as “Cas9 ⁺ sgRNA ^iBAR HCT116 library” and “Cas9 ⁺ sgRNA ^iBAR SW480 library” , respectively) .

3. Screening of Cas9 ⁺ sgRNA ^iBAR cancer cell library treated with anti-cancer drug

Before treating the Cas9 ⁺ sgRNA ^iBAR cancer cell library with anti-cancer drugs (e.g., PARP inhibitors; PARPi) , drug toxicity curve was measured for each cancer cell line.

2000 HCT116 or SW480 cells were added per well in a 96-well plate, 100 μL medium was added per well, then incubated in a 37℃, 5%CO ₂ cell incubator. The next day, anti-cancer drug (e.g., PARPi) of various concentrations was added to each well, 3 replicates for each concentration. The final drug concentrations were 33 μM, 11 μM, 3.7 μM, 1.23 μM, 0.41 μM, 0.14 μM, 0.05 μM, and 0.02 μM. After three doubling time with the presence of anti-cancer drug,

Luminescent Cell Viability Assay (ATP assay) was conducted to obtain the drug toxicity curve.

Based on the obtained drug toxicity curve for each cancer cell line, drug concentrations corresponding to cell growth inhibition of IC ₅₀-IC ₇₀ were chosen for Cas9 ⁺ sgRNA ^iBAR cancer cell library screening. For example, the concentration of PARPi was 5 μM for HCT116 and 10 μM for SW480.1×10 ⁶ Cas9 ⁺ sgRNA ^iBAR cancer cells were placed in a 150 mm cell culture dish and cultured in a 37℃, 5%CO ₂ cell incubator. The next day, Cas9 ⁺ sgRNA ^iBAR cancer cells were treated with the anti-cancer drug (e.g., PARPi; test group) or DMSO (control group) . Two biological replicates were set up for each group. Fresh cell medium (added with drug or DMSO) was changed every three days. The drug or control treatment continued, and cells were collected after treating for 9-10 doubling time or after treating for 15-16 doubling time (see FIG. 2) . For adherent cells, dead cells would be floating in the culture medium, hence adherent cells harvested by trypsinization were alive (or mostly alive) cells. During the entire screening process and cell collection process, the cell number was always at least about 1000-fold of the size of the sgRNA ^iBAR library for each replicate, i.e., at least about 1000 cells for each sgRNA ^iBAR.

4. Identification and analysis of target genes

Genomic DNA was extracted from post-treatment cancer cells collected above (mostly alive Cas9 ⁺ sgRNA ^iBAR cancer cells) . For each cancer cell type, there was a “9-10 PDT test group, ” a “15-16 PDT test group, ” a “9-10 PDT control group, ” and a “15-16 PDT control group” ; with two biological replicates for each group. For each anti-cancer drug (e.g., PARPi) , two different cell line libraries were tested (e.g., Cas9 ⁺ sgRNA ^iBAR HCT116 library and Cas9 ⁺ sgRNA ^iBAR SW480 library) . sgRNA ^iBAR encoding fragments were PCR amplified from the extracted genome, purified, and prepared for NGS sequencing. MAGeCK ^iBAR algorithm was used for sequencing data analysis (see Zhu et al., “Guide RNAs with embedded barcodes boost CRISPR-pooled screens, ” Genome Biol. 2019; 20: 20; the content of which is incorporated herein by reference in its entirety) , which contains three main parts: analysis preparation, statistical tests, and rank aggregation. Briefly, each sgRNA ^iBAR targeted gene was scored and ranked based on the enrichment or depletion degree of each gene between the test group and the control group, in order to determine if such gene was a candidate gene with high confidence. See FIG. 3 for target gene identification workflow. sgRNA ^iBAR encoding fragments would be depleted compared to control (negative screen) for candidate genes whose inactivation result in sensitive phenotype to anti-cancer drug killing; while sgRNA ^iBAR encoding fragments would be enriched compared to control (positive screen) for candidate genes whose inactivation result in resistant phenotype to anti-cancer drug killing. These top ranking candidates were found to be involved in cell proliferation, cell death, or cell cycle regulation.

5. Results

Compared to control, candidate genes whose sgRNA ^iBAR encoding fragments are depleted in the harvested alive cells in either “9-10 PDT test group” or “15-16 PDT test group” and in either cell line library with FDR≤0.1 were categorized as drug sensitive genes whose inactivation makes the cancer cells sensitive to the anti-cancer drug. Exemplary drug sensitive genes (e.g., of PARPi) include, but are not limited to, ARID2, ATM, BIRC6, BRCA1, BRCA2, CCNA2, CCND1, CDK2, FBXW7, HRAS, KAT2B, NBN, PBRM1, PTEN, SKP2, SMAD7, TGFB2, TSC1, TSC2, ATR, RIF1, POLQ, AXIN1, GSK3A, GSK3B, CHD7, SCAF4, FANCM, NIPBL, ATRX, STAG1, RAD51, RAD51B, RAD51C, RAD51D, FANCL, EXO1, DIDO1, LRBA, FAM71A, HDAC2, PMS2, MSH6, MSH2, MLH1, and WEE1.

Compared to control, candidate genes whose sgRNA ^iBAR encoding fragments are enriched in the harvested alive cells in either “9-10 PDT test group” or “15-16 PDT test group” and in either cell line library with FDR≤0.1 were categorized as drug resistant genes whose inactivation makes the cancer cells resistant to the anti-cancer drug. Exemplary drug resistant genes (e.g., of PARPi) include, but are not limited to, AKT1, CDKN1A, CKS1B, CKS2, CTNNB1, DLG5, E2F3, E2F4, HDAC1, MAPK1, MYC, RAC1, RAF1, RICTOR, SMAD4, TP53, BRAF, HSP90B1, PARP2, PARP1, PIK3CA, EIF3A, CCNA1, RBL1, ZMYND8, MED12, GCN1, Kras, TP53BP1, CHD2, DOCK5, IGF1R, ILK, IRS1, RAPGEF1, EP300, TCF7L2, KMT2B, CDKN2A, CHEK1, CHEK2, RHEB, SPTA1, PKMYT1, SIDT2, APC, and SETD2.

A subset of PARPi sensitive and resistant genes identified in “15-16 PDT test group” and in either Cas9 ⁺ sgRNA ^iBAR HCT116 library or Cas9 ⁺ sgRNA ^iBAR SW480 library for PARPi with screening scores (reflecting the significance and extent of enrichment/depletion) and FDRs (reflecting significance) are shown in Table 1.

Table 1. Drug sensitive or resistant genes of PARPi

Results obtained here, particularly genes whose inactivation were found to confer cancer cell sensitivity to anti-cancer drug (e.g., PARPi) killing, demonstrate valuable targets in cancer therapy as well as biomarkers for patient selection. Drug resistant genes whose inactivation make cancer cells resistant to anti-cancer drug (s) would serve as biomarkers for not selecting such patients, and/or that alternative cancer therapeutic agent (s) should be used.

6. Target gene verification

To verify the identified drug sensitive genes and drug resistant genes, a subset of genes were selected from PARPi sensitive genes and PARPi resistant genes (see Table 2) for experimental testing.

Briefly, nucleic acids encoding the sgRNAs targeting these genes were designed and synthesized. The forward strand and the reverse strand were allowed to anneal to form double-stranded nucleic acid with over-hangs on both ends. The lentiviral sgRNA-expressing backbone modified in house based on pLenti-sgRNA-Lib (addgene #53121) was enzymatically cleaved, the double-stranded nucleic acid was ligated into the cleavage site, to obtain sgRNA plasmids. This sgRNA plasmid carrys puromycin and ampicillin antibiotic genes.

To amplify sgRNA plasmids, 2 μL sgRNA plasmid was added to 20 μL competent cells (E. coli) in a 1.5 mL Eppendorf tube, followed by standard ice/heat-shock transformation protocol, let grow in liquid LB in 37℃ shaker for 1 hour, then spread onto an LB ^Amp+ plate and let grow overnight at 37℃. The next day, 5-10 single clones were picked for growth overnight in LB ^Amp+ liquid medium in 37℃ shaker. The following day, sgRNA plasmids were extracted with kit, then sequenced to verify sequences.

sgRNA lentiviruses were then obtained using standard protocol. Briefly, 5×10 ⁶ 293T cells were placed in a 10 cm cell culture dish and cultured overnight in a 37℃, 5%CO ₂ incubator. The next day, culture medium was discarded, fresh serum-free medium was added to the 293T cells. The transfection complex was prepared using serum-free medium (1 mL) , sgRNA plasmid purified above (10 μg) , pCMVR8.74 plasmid (10 μg) , and pCMV-VSV-G plasmid (1 μg) ; after mixing, 52.5 μL PEI was added. After mixing, the transfection complex was let stand for 15 minutes in room temperature. The transfection complex was then added to 293T cells in fresh serum-free medium, incubated in an incubator at 37℃, 5%CO ₂ for 6-8 hours. Cell medium was discarded, fresh complete medium was added to 293T cells, then incubated in an incubator at 37℃, 5%CO ₂.72 hours later, the cell culture was collected and centrifuged at 200 g, 5 minutes. The supernatant containing sgRNA lentiviruses was collected, filtered with a 0.45 μm filter, then stored at -80℃ for later use.

To construct cancer cell line with target gene KO, 2×10 ⁵ SW620 cancer cells were seeded in 6-well plate and cultured in 37℃, 5%CO ₂ incubator. After 24 hours, 100 μL Cas9 packaged lentivirus was added into the cell medium, and cancer cells were cultured in 37℃, 5% CO ₂ incubator. After 24 hours, the medium was discarded, and fresh complete medium was added to the cancer cells. The cancer cells were allowed to grow for 7 days in 37℃, 5%CO ₂ incubator, then sorted with FACS using mCherry marker (carried on the Cas9-lentiviral vector) . The sorted cancer cells with mCherry fluorescence were Cas9 expressing (Cas9 ⁺) cells, and were expanded for Cas9 ⁺ sgRNA construction. 500 μL non-concentrated sgRNA lentiviruses obtained above were added to 2×10 ⁷ Cas9 ⁺ cancer cells in medium (no antibiotics) at an MOI of 3 and gently mixed. Cas9 ⁺ cancer cells were cultured overnight in a 37℃, 5%CO ₂ incubator for infection. The next day, the medium was discarded, fresh complete medium was added to the Cas9 ⁺ cancer cells, then cultured in a 37℃, 5%CO ₂ incubator for 48 hours. Then 1μL puromycin was added to the culture medium for selection. Cas9 ⁺ cancer cells not successfully transfected with sgRNA plasmids would die.

To test target gene KO efficiency (%) , a subset of cancer cells treated with puromycin from above were collected. Genomic DNA was extracted, and target gene sequence was amplified and sequenced. KO efficiency was calculated by Tracking of Indels by Decomposition (TIDE) web tool, which can accurately reconstructs the spectrum of indels from the sequence traces, and reporting the detected indels and their frequencies as KO efficiency. Results are summarized in Table 2.

To test the response of the target gene KO-cancer cells to PARPi treatment, 1000 cancer cells for each target gene KO were placed into a 96-well plate, culture medium was added per well, then incubated in a 37℃, 5%CO ₂ cell incubator overnight. The next day, PARPi of various concentrations was prepared at 1: 3 dilution, then added to each well, 3 replicates for each concentration. The final PARPi concentrations were 33.3 μM, 11.1 μM, 3.70 μM, 1.27 μM, 0.41 μM, 0.13 μM, 0.05 μM, and 0.015 μM. A control set of experiments was conducted with cancer cells not carrying any of the target gene KOs ( “WT cancer cells” ) , cultured and treated with PARPi under the same condition. After 2-3 doubling time with the presence of PARPi,

Luminescent Cell Viability Assay (ATP assay) was conducted to obtain the drug toxicity curve (see FIG. 4) . IC50 results are shown in Table 2.

Table 2. Verification of drug sensitive genes and drug resistant genes of PARPi

As shown in FIG. 4 and Table 2, the screening identified drug sensitive genes after KO indeed conferred sensitivity to PARPi killing in cancer cells (e.g., see ATM, BRCA1, WEE1, etc. ) , and the screening identified drug resistant genes after KO conferred resistance to PARPi killing in cancer cells (e.g., see PARP1, MYC) . Further, the IC50 fold change between target gene KO and WT cancer cells largely followed screening results: highly enriched or depleted target genes from the screen (e.g., with higher screen score, e.g., see Table 1) also showed greater difference in IC50.

These target gene verification results demonstrate that the screening method described herein is effective in obtaining drug sensitive genes and/or drug resistant genes, and that the drug sensitive genes and drug resistant genes provided herein are of high accuracy and will be of great value in cancer diagnostics and therapy.

7. Discussion

The above method can be used in drug sensitive gene and/or drug resistant gene screening for any anti-cancer drugs (such as drugs targeting different pathways or the same pathway) and any cancer types. The obtained drug sensitive genes and/or drug resistant genes have significant implications in cancer therapy, patient selection, and new drug screening or design.

For example, if the diagnosis of a cancer patient indicates that for a single pathway (e.g., targeted by PARPi, etc. ) : 1) the patient only has inactivate mutation in target gene (s) whose inactivation confers sensitivity to pathway-targeting drug (s) , then this patient is a perfect candidate for treatment with such drug (s) ; 2) the patient only has inactivate mutation in target gene (s) whose inactivation confers resistance to pathway-targeting drug (s) , then this patient may not be suitable for treatment with drug (s) targeting such pathway, and alternative treatment methods should be sought; 3) the patient has inactivate mutation in both target gene (s) whose inactivation confers resistance to pathway-targeting drug (s) , and target gene (s) whose inactivation confers sensitivity to pathway-targeting drug (s) , then more analysis needs to be conducted, e.g., if the drug sensitivity is sufficient to help kill cancer cells before drug resistance occurs, if genes conferring drug resistance are of less significance in cancer development compared to those that confer drug sensitivity, whether one pathway-targeting drug should be selected over the other pathway-targeting drug, whether one drug should be used before the other or used together, whether alternative treatment method exists, etc. The composite score of drug sensitive aberrations and drug resistant aberrations described herein (e.g., Formula I) may also contribute to the treatment decision.

Target genes obtained for multiple anti-cancer drugs, such as anti-cancer drugs targeting the same or different pathways involved in cancer development, can be combined or overlapped to find common target genes. Gene functions and/or mechanisms of action can be further analyzed to make treatment decision, and/or for drug design/development. For example, if a patient carries inactivate mutation in a gene whose inactivation confers sensitivity to drugs X, Y, and Z, then a combination therapy with drugs X, Y, and Z might confer synergistic anti-cancer activity. For example, if a patient carries inactivate mutations in different genes (of the same pathway or different pathways) whose inactivation confers sensitivity to drugs X, Y, and Z, then a combination therapy with drugs X, Y, and Z might confer synergistic anti-cancer activity. For another example, if a new drug can be designed to target various pathways involving target genes whose deletion confer sensitivity to known drugs, then the obtained new drug might have superior therapeutic effect compared to known drugs.

As an example, for various pathways targeted by, e.g., PARPi, other drugs for treating colorectal cancer, or other drugs for treating other cancer types but not yet tested/developed for colorectal cancer treatment, etc., if a patient is diagnosed that the patient 1) has inactivate mutations in shared target gene (s) (or different genes with shared pathways) whose inactivation confers sensitivity to various pathway-targeting drugs, then a single or preferably a combination therapy with such drugs can be used for cancer treatment; 2) has inactivate mutation in shared target gene (s) (or different genes with shared pathways) whose inactivation confers resistance to various pathway-targeting drugs, then alternative treatment methods should be sought, or further analysis on gene function and/or mechanism of action should be conducted to determine if combination therapy (e.g., one used before the other) using such pathway-targeting drugs may alleviate the drug resistance phenotype. For example, drug X that will experience drug resistance from the target gene mutation later during treatment can be used first, and a drug Y that will experience drug resistance from the target gene mutation early on but may be sufficiently effective can be used only at the beginning or throughout the process, in combination with drug X.

Example 2. Composite score correctly reflects cancer killing efficacy by anti-cancer drug

This Example provides evidence that composite score calculated using methods described herein (e.g., Formula I) based on drug sensitive genes and drug resistant genes of anti-cancer agent (e.g., DNA damaging agents such as PARPi or ATRi) identified using screening methods described herein correctly reflects/can predict cancer killing efficacy by the corresponding anti-cancer agent.

1. Random selection of colorectal cancer samples for composite score calculation

A collection of colorectal cancer cell lines and patient-derived xenografts (PDXs) was tested for response to PARPi treatment, by measuring cell viability rate (reflected as IC50) or PDX growth inhibition rate following standard methods (also see Example 1) . 16 cancer samples (10 PDXs and 6 cancer cell lines; see FIG. 5A) were selected based on various response against PARPi treatment, for use in composite score calculation. Their corresponding cell viability response or PDX growth inhibition response is reflected as “drug response” in FIG. 5C.

2. Detection, filtration, and annotation of mutations in selected cancer samples

Above selected 16 cancer samples were individually sequenced by NGS. For each sample, mutations were detected from the sequencing data. The raw mutation sites were further filtered according to mutation quality to remove low confidence mutation sites. The remaining high-quality mutation sites were mapped to corresponding genes, and annotated for the impact of the mutation on the corresponding gene function based on database. Only mutation sites with deleterious impact to corresponding genes were retained for subsequent analysis.

3. Gene-level functional annotation and series of database information integration for further filtration

The remaining mutation sites from above were then annotated based on prevalence, clinical significance, curated impact, gene ontology, and pathway information etc. from both external and internal database sources. Low-clinical impact mutations were further filtered out. Then overall loss-of-function (LOF) probability was calculated for each gene mapped to the remaining mutations for each sample.

4. Composite score calculation

To calculate composite score for each cancer sample and test its accuracy for predicting PARPi treatment response, a total of 51 PARPi sensitive genes and PARPi resistant genes, obtained from and/or verified in Example 1 and with at least one high confidence deleterious mutation in any of the 16 selected cancer samples after filtration, were selected ( “test gene panel” ) . Their corresponding LOF probability is shown in FIG. 5A. For each cancer sample, the gene-level LOF probability across the 51 PARPi sensitive/resistant genes was used to calculate the

portion in Formula I. Gene-level contribution and pathway-level contribution of mutations detected in the “test gene panel” to PARPi treatment were quantified by integrating the corresponding weight

coefficient of correlation of each gene (r _i) , and pre-calculated weight for the related pathways in PARPi response

to calculate the raw composite score. The raw composite score was further adjusted and scaled accordingly to the sample types (i.e., cell line, PDX, patient) to generate the final composite score for each cancer sample (see FIGs. 5B and 5C “composite score” row) .

5. Results and conclusion

As can be seen from FIGs. 5B and 5C, when the composite score of the cancer sample according to Formula I was above 0, the cancer sample indeed showed sensitivity to PARPi killing; while when the composite score of the cancer sample according to Formula I was below 0, the cancer sample indeed showed resistance to PARPi killing. Further, the higher the absolute value of the composite score, the better the prediction power of actual PARPi response for that cancer sample (see “prediction” row based on composite score in FIG. 5C) . For example, for cancer samples with a composite score of above 0.1, the prediction is “true” for the sample’s actual sensitivity to PARPi killing (see PDX3, PDX6, PDX10, cell line 1, cell line 2, and cell line 6) . For cancer samples with a smaller negative composite score (bigger absolute value of the composite score) , the prediction is “true” for the sample’s actual resistance to PARPi killing (see PDX8, cell line 4, cell line 4) . No actually tested-sensitive cancer sample received a composite score of below 0 according to Formula I, demonstrating the great prediction power of “true positives” of methods described herein. One actually tested-sensitive cancer sample, cell line 3, received a composite score of 0. All actually tested-resistant cancer samples received a composite score of below or equal to 0 according to Formula I (demonstrating the great prediction power of “true positives” of methods described herein) , with the exception of PDX9, which had a composite score of 0.011. Hence, for cancer samples with a composite score close to or equal to 0 (e.g., -0.1 to + 0.1) , the prediction of response to PARPi treatment based on composite score was “ambiguous. ” More evaluation may be needed to identify false positives and false negatives based on the prediction of the composite score.

These findings demonstrate that drug sensitive genes and drug resistant genes of anti-cancer agents (e.g., DNA damaging agents such as PARPi or ATRi) identified using screening methods described herein, and composite scores obtained based on them using methods described herein, correctly reflect/can predict cancer killing efficacy by the corresponding anti-cancer agent, and can serve as tools for cancer diagnosis, treatment selection, and/or patient selection. For example, when the composite score of the patient according to Formula I is above 0, the patient may be suitable for (i.e., may benefit from) the anti-cancer drug (e.g., DNA damaging agent such as PARPi or ATRi) treatment. If the composite score of the patient according to Formula I is above or equal to at least 0.1 (e.g., 0.3) , the patient can be selected or recommended for the anti-cancer drug treatment. If the composite score of the patient according to Formula I is more than 0 but less than 0.1, the patient may be suitable for the anti-cancer drug treatment, but should be further evaluated using other method (s) (e.g., drug dosage test, cancer genetic testing (e.g., look for additional synergistic mutations that may contribute to the anti-cancer drug treatment, or verify the primary cancer type) , etc. ) or based on other information (e.g., patient’s clinical record or known drug resistance, etc. ) to determine whether the patient should be selected or recommended for the anti-cancer drug treatment. If the composite score of the patient according to Formula I is below or equal to 0, the patient may not be suitable for (i.e., may not benefit from) or should be excluded from the anti-cancer drug treatment. Further evaluation using other method (s) or or based on other information described herein may be needed if the composite score of the patient according to Formula I is equal to 0, or very close to 0 (e.g., -0.1 to 0) , before completely ruling out the patient from receiving the anti-cancer drug treatment.

Claims

A method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive or resistant to an anti-cancer drug, comprising:

a) providing a cancer cell library comprising a plurality of cancer cells, wherein each of the plurality of cancer cells has a mutation at a hit gene ( “hit gene mutation” ) , wherein the hit gene in at least two of the plurality of cancer cells are different from each other;

wherein the cancer cell library is generated by contacting an initial population of cancer cells with i) a single-guide RNA ( “sgRNA” ) library comprising a plurality of sgRNA constructs, wherein each sgRNA construct comprises or encodes an sgRNA, and wherein each sgRNA comprises a guide sequence that is complementary to a target site in a corresponding hit gene; and ii) a Cas component comprising a Cas protein or a nucleic acid encoding the Cas protein, under a condition that allows introduction of the sgRNA constructs and the Cas component into the initial population of cancer cells and generation of the mutations at the hit genes;

b) contacting the cancer cell library with the anti-cancer drug;

c) growing the cancer cell library to obtain a post-treatment cancer cell population; and

d) identifying the target gene based on the difference between the profiles of sgRNAs or hit gene mutations in the post-treatment cancer cell population and a control cancer cell population.
The method of claim 1, wherein the control cancer cell population is obtained from the cancer cell library cultured under the same condition without contacting with the anti-cancer drug.
The method of claim 1 or 2, wherein the identification of the target gene is based on the difference between the profiles of sgRNAs in the post-treatment cancer cell population and the control cancer cell population.
The method of claim 3, wherein the profiles of sgRNAs in the post-treatment cancer cell population and the control cancer cell population are identified by next generation sequencing.
The method of claim 4, wherein the method comprises comparing the sgRNA sequence counts obtained from the post-treatment cancer cell population with sgRNA sequence counts obtained from the control cancer cell population, wherein:

i) the hit genes whose corresponding sgRNA guide sequences are identified as enriched in the post-treatment cancer cell population compared to the control cancer cell population with an false discovery rate (FDR) ≤ 0.1 are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or

ii) the hit genes whose corresponding sgRNA guide sequences are identified as depleted in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.
The method of any one of claims 1-5, wherein the sgRNA library and the Cas component are introduced into the initial population of cancer cells sequentially.
The method of any one of claims 1-6, wherein the Cas protein is Cas9.
The method of claim 7, wherein each sgRNA comprises the guide sequence fused to a second sequence, wherein the second sequence comprises a repeat-anti-repeat stem loop that interacts with the Cas9.
The method of claim 8, wherein the second sequence of each sgRNA further comprises a stem loop 1, a stem loop 2, and/or a stem loop 3.
The method of any one of claims 1-9, wherein each sgRNA further comprises an internal barcode (iBAR) sequence ( “sgRNA ^iBAR” ) , wherein each sgRNA ^iBAR is operable with the Cas protein to modify the hit gene.
The method of claim 10, wherein each sgRNA ^iBAR comprises in the 5’-to-3’ direction a first stem sequence and a second stem sequence, wherein the first stem sequence hybridizes with the second stem sequence to form a double-stranded RNA (dsRNA) region that interacts with the Cas protein, and wherein the iBAR sequence is disposed between the 3’ end of the first stem sequence and the 5’ end of the second stem sequence.
The method of claim 10 or 11, wherein the Cas protein is Cas9, and wherein the iBAR sequence of each sgRNA ^iBAR is inserted in the loop region of the repeat-anti-repeat stem loop.
The method of any one of claims 1-12, wherein each guide sequence comprises about 17 to about 23 nucleotides.
The method of any one of claims 10-13, wherein the sgRNA library is an sgRNA ^iBAR library, wherein the sgRNA ^iBAR library comprises a plurality of sets of sgRNA ^iBAR constructs, wherein each set of sgRNA ^iBAR constructs comprise four sgRNA ^iBAR constructs each comprising or encoding an sgRNA ^iBAR, wherein the guide sequences for the four sgRNA ^iBAR constructs are the same, wherein the iBAR sequence for each of the four sgRNA ^iBAR constructs is different from each other, and wherein the guide sequence of each set of sgRNA ^iBAR constructs is complementary to a different target site in the hit gene.
The method of any one of claims 1-14, wherein at least about 95%of the sgRNA constructs in the sgRNA library are introduced into the initial population of cancer cells.
The method of any one of claims 10-15, wherein the cancer cell library has at least about 100-fold coverage for each sgRNA ^iBAR.
The method of any one of claims 1-16, wherein the cancer cell library has at least about 400-fold coverage for each sgRNA.
The method of any one of claims 1-17, wherein the sgRNA library comprises at least about 400 sgRNA constructs.
The method of any one of claims 1-18, wherein each sgRNA construct in the sgRNA library is a plasmid.
The method of any one of claims 1-18, wherein each sgRNA construct in the sgRNA library is a viral vector.
The method of claim 20, wherein the viral vector is a lentiviral vector.
The method of claim 20 or 21, wherein the sgRNA library is contacted with the initial population of cancer cells at a multiplicity of infection (MOI) of at least about 2.
The method of any one of claims 1-22, wherein step b) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time.
The method of any one of claims 1-22, wherein step b) comprise contacting the cancer cell library with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time.
The method of any one of claims 5-24, wherein the sgRNA sequence counts are subject to median ratio normalization followed by mean-variance modeling.
The method of claim 25, wherein the sgRNA library is an sgRNA ^iBAR library, and wherein the variance of each guide sequence is adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence.
The method of claim 26, wherein the data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to each guide sequence is determined based on the direction of the fold change of each iBAR sequence, wherein the variance of the guide sequence is increased if the fold changes of the iBAR sequences are in different directions with respect to each other.
The method of any one of claims 1-27, wherein the method comprises:

subjecting the cancer cell library from step a) to at least two separate different treatments with the anti-cancer drug in step b) ;

growing the cancer cell library to obtain a post-treatment cancer cell population from each treatment;

identifying the one or more hit genes in the post-treatment cancer cell population obtained from each treatment; and

combining the one or more hit genes identified from all treatments, thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug.
The method of any one of claims 1-28, wherein the method comprises

subjecting the cancer cell library from step a) to two separate treatments b1) and b2) :

b1) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 9 to about 10 doubling time;

b2) contacting the cancer cell library from step a) with the anti-cancer drug at a concentration of about IC50 to about IC70 for about 15 to about 16 doubling time;

c1) growing the cancer cell library from treatment b1) to obtain a post-treatment cancer cell population;

c2) growing the cancer cell library from treatment b2) to obtain a post-treatment cancer cell population;

d1) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b1) ,

d2) identifying the one or more hit genes in the post-treatment cancer cell population obtained from treatment b2) , and

d3) combining the one or more hit genes identified from treatment b1) and treatment b2) , thereby identifying the target gene in the cancer cell whose mutation makes the cancer cell sensitive or resistant to the anti-cancer drug.
The method of claim 28 or 29, wherein

i) the hit genes whose corresponding sgRNA guide sequences are identified as enriched in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells resistant to the anti-cancer drug; and/or

ii) the hit genes whose corresponding sgRNA guide sequences are identified as depleted in the post-treatment cancer cell population compared to the control cancer cell population with an FDR ≤ 0.1 in at least one treatment are identified as target genes whose mutations make the cancer cells sensitive to the anti-cancer drug.
The method of any one of claims 1-30, comprising:

i) separately identifying a set of one or more target genes whose mutations make the cancer cells sensitive to an anti-cancer drug, for two or more different anti-cancer drugs when treated alone;

ii) obtaining one or more target genes present in every set of target genes identified for each anti-cancer drug, thereby identifying target genes whose mutations make the cancer cells sensitive to a combination treatment of the two or more different anti-cancer drugs;

and/or

i) separately identifying a set of one or more target genes whose mutations make the cancer cells resistant to an anti-cancer drug, for two or more different anti-cancer drugs when treated alone;

ii) obtaining one or more target genes present in a combination of sets of target genes identified for all anti-cancer drugs, thereby identifying target genes whose mutations make the cancer cells resistant to a combination treatment of the two or more different anti-cancer drugs.
The method of claim 31, wherein the two or more different anti-cancer drugs target the same cancer target.
The method of claim 31, wherein the two or more different anti-cancer drugs target different cancer targets.
The method of any one of claims 5-33, further comprising ranking the identified target genes, wherein the target gene ranking is based on the degree of enrichment or depletion of the sgRNA guide sequences in the post-treatment cancer cell population compared to the control cancer cell population.
The method of claim 34, wherein the sgRNA library is an sgRNA ^iBAR library, and wherein the target gene ranking is further adjusted based on data consistency among the iBAR sequences in the sgRNA ^iBAR sequences corresponding to the guide sequence of the target gene.
The method of claim 34 or 35, further comprising assigning a sensitivity score or a resistance score to the identified target gene,

wherein target genes whose mutations make the cancer cells resistant to the anti-cancer drug are ranked from high to low based on the fold of enrichment of the sgRNA guide sequences in the post-treatment cancer cell population compared to the control cancer cell population, and each target gene is assigned a resistance score from high to low accordingly; and/or

wherein target genes whose mutations make the cancer cells sensitive to the anti-cancer drug are ranked from high to low based on the fold of depletion of the sgRNA guide sequences in the post-treatment cancer cell population compared to the control cancer cell population, and each target gene is assigned a sensitivity score from high to low accordingly.
The method of any one of claims 1-36, wherein the anti-cancer drug is a PARP inhibitor.
The method of any one of claims 1-37, wherein the cancer cell is a colorectal cancer cell.
A method of identifying a target gene in a cancer cell whose mutation makes the cancer cell sensitive to a combination therapy comprising a first anti-cancer drug and a second anti-cancer drug, comprising:

i) identifying a first set of one or more target genes in a cancer cell whose mutation make the cancer cell sensitive to the first anti-cancer drug according to the method of any one of claims 1-38;

ii) identifying a second set of one or more target genes in a cancer cell whose mutation make the cancer cell sensitive to the second anti-cancer drug according to the method of any one of claims 1-38; and

iii) obtaining one or more target genes present in both the first set of target genes and the second set of target genes, thereby identifying the target gene whose mutation makes the cancer cell sensitive to the combination therapy.
A method of treating a cancer in an individual, comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected for treatment based on that the individual has an aberration in a target gene ( “a drug sensitive gene” ) which makes the cancer cells sensitive to the anti-cancer drug, and wherein the drug sensitive gene is identified using the method of any one of claims 1-39.
A method of excluding an individual suffering from a cancer from a treatment comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is excluded if the individual has an aberration in a target gene ( “a drug resistant gene” ) which makes the cancer cells resistant to the anti-cancer drug, and wherein the drug resistant gene is identified using the method of any one of claims 1-38.
A method of treating a cancer in an individual, comprising administering to the individual an effective amount of an anti-cancer drug, wherein the individual is selected based on:

i) aberrations in one or more target genes ( “drug sensitive genes” ) which make the cancer cells sensitive to the anti-cancer drug ( “drug sensitive aberrations” ) , and

ii) aberrations in one or more target genes ( “drug resistant genes” ) which make the cancer cells resistant to the anti-cancer drug ( “drug resistant aberrations” ) ,

wherein the drug sensitive genes and drug resistant genes are identified using the method of any one of claims 1-39, and

wherein the individual is selected for treatment if a composite score of the drug sensitive aberrations and the drug resistant aberrations is above a composite score threshold level.
The method of claim 42, wherein the composite score is obtained by

(i) subtracting (the absolute value of the sum of the resistance scores of the drug resistant genes) from (the absolute value of the sum of the sensitivity scores of the drug sensitive genes) , or

(ii) Formula I,

wherein the individual is selected for treatment if the composite score is above zero.