WO2024075756A1

WO2024075756A1 - Cell library and method for producing same

Info

Publication number: WO2024075756A1
Application number: PCT/JP2023/036147
Authority: WO
Inventors: 康則相澤; 知幸大野
Original assignee: 株式会社Logomix
Priority date: 2022-10-05
Filing date: 2023-10-04
Publication date: 2024-04-11

Abstract

Provided is a method for producing a library containing gene-modified cells. This library is useful in the functional evaluation of a specific region of a genome and the introduction, etc., of various types of mutations (such as SNPs) to a specific region of a genome.

Description

Cell library and method for producing same

The present invention relates to a cell library and a method for producing the same.

Since the CRISPR/Cas system was reported as a new genome editing tool, various studies using the CRISPR/Cas system have been conducted (for example, Patent Document 1). In genome editing using the CRISPR/Cas system, the target region targeted by the guide RNA is double-stranded broken by Cas9 nuclease. It is known that double-stranded DNA is repaired by Homologous Directed Repair (HDR) or Non-Homologous End-Joining Repair (NHEJ). In HDR, any sequence can be incorporated into the target region by introducing donor DNA having a sequence homologous to the surrounding region of the target region into a cell together with the CRISPR/Cas system.

A technique has been developed that uses genome modification technology to efficiently modify two or more alleles simultaneously by HDR (for example, Patent Document 2). Patent Document 2 discloses that it has been possible to efficiently modify two or more alleles simultaneously to create large-scale deletions of several hundred kb.

International Publication No. 2014/093661 International Publication No. 2021/206054

The present disclosure provides a cell library and a method for producing the same. More specifically, the present disclosure provides a cell library containing a plurality of modified cells having a rich variety of base sequences at a specific site of one allele in a cell having multiple alleles. The cell library may be a combination of aqueous compositions containing one type of modified cell.

(1) A library of modified cells, comprising:
The library comprises a combination of a plurality of aqueous compositions,
Each aqueous composition comprises one type of modified cell;
Each of the modified cells has a first allele and a second allele at a locus to be modified;
Each of the modified cells has a cassette containing a DNA fragment that differs from each other between the aqueous compositions at the same position of the first allele.
A library, wherein preferably said locus of the modified cell does not contain a recognition site or recombination sequence for a site-specific recombinase.
(2) The library of claim 1 or 2, wherein each modified cell has a second allele that is disrupted or deleted in part or in whole.
(3) The library described in (2) above, wherein the second allele seamlessly lacks the part or all of the sequence.
(4) The library according to any one of (1) to (3) above, wherein the sequences of each modified cell other than the cassette containing the DNA fragment are substantially identical before and after modification.
(5) The library according to any of (1) to (4) above, wherein each of the sequences of the cassettes is composed of one or more modified portions (A) and one or more unmodified portions (B), each of the modified portions (A) has one or more modifications selected from the group consisting of sequence insertion, deletion, and substitution, the modifications of the one or more modified portions differ between the aqueous compositions in terms of the position or content of the modification, each of the one or more unmodified portions (B) is identical to the sequence of the corresponding site before modification, the unmodified portion (B1) on the centromere side of the cassette is seamlessly linked to the adjacent sequence (C1) on the centromere side of the cassette, the unmodified portion (Bt) on the telomere side of the cassette is seamlessly linked to the adjacent sequence (C2) on the telomere side of the cassette, and the region where the adjacent sequence (C1) and the unmodified portion (B1) are linked, and the region where the unmodified portion (Bt) and the adjacent sequence (C2) are linked, constitute a sequence identical to the sequence of the corresponding region before modification.
(6) The library according to any one of (1) to (5) above, wherein the modified cells do not contain a target sequence for a site-specific recombinase.
(7) The library according to any one of (1) to (6) above, wherein the library contains 50 or more types of the aqueous compositions.
(8) A method for producing a library of modified cells, comprising:
(α) providing a group of cells having a genome including a first allele and a second allele at a locus to be modified, the first allele and the second allele each including a cassette including a selection marker gene and a target nucleic acid sequence;
wherein the selection marker gene carried by the first allele and the selection marker gene carried by the second allele are distinguishably different, the target nucleic acid sequence is a target of a genome modification system and is designed so that the first allele and the second allele can be distinguishably cleaved by the genome modification system, and each selection marker gene is a negative selection marker gene that can be used for negative selection,
(β) introducing into the provided group of cells:
(x) a sequence-specific nucleic acid cleavage molecule that targets the unique base sequence contained in the first allele, or a genome modification system comprising a polynucleotide encoding the sequence-specific nucleic acid cleavage molecule;
(y) a plurality of types of second recombination donor DNAs {wherein each of the plurality of types of second recombination donor DNAs has an upstream homology arm having a base sequence homologous to a base sequence adjacent to the upstream side of the target site of (x) above, and a downstream homology arm having a base sequence homologous to a base sequence adjacent to the downstream side of the target region, and contains a modified base sequence between the upstream homology arm and the downstream homology arm, and the modified base sequence is different for each second recombination donor DNA and is unique to each second recombination donor DNA},
(γ) after the step (β), selecting cells that do not express the selection marker gene contained in the first allele;
Including,
This allows for the production of a library of modified cells comprising a plurality of cells, wherein in the plurality of cells obtained, a first allele has a modified base sequence unique to each cell, and a second allele has a sequence common to the cells.
Method.
(9) The method according to (8) above, further comprising the steps of:
In a cell having a genome including a first allele and a second allele at a locus to be modified, replacing the replaced sequence included in the first allele and the second allele with a cassette including a selection marker gene and a target nucleic acid sequence, thereby removing the replaced sequence from the first allele and the second allele;
The method further comprising:
(10) The method according to (9) above,
A method in which each of the modified base sequences of the first allele has one or more mutations selected from the group consisting of base addition, insertion, substitution, deletion, and deletion relative to the replaced sequence of the first allele.
(11) The method according to (9) or (10) above,
The modified sequence is a coding region for a protein,
A method in which the modified base sequence of the first allele has one or more mutations selected from the group consisting of addition, insertion, substitution, deletion, and deletion of bases relative to the replaced sequence of the first allele.
(12) The method according to (10) or (11) above, wherein the modified base sequence has a sequence identity of 80% or more with the modified sequence.
(13) The method according to any one of (8) to (12) above, further comprising the steps:
The method further comprising removing the cassette from the second allele.
(14) The method according to (13) above, wherein the cassette is seamlessly removed.
(15) A library of modified cells comprising a plurality of modified cells, produced by the method according to any one of (8) to (14) above.

FIG. 1 shows an overview of one example of a scheme for constructing a library of the present invention. FIG. 2 shows an example of a scheme for preparing a library of the first modified cell of the present invention. 3 is a diagram showing an example of the positional relationship between a modified portion and a non-modified portion in a modified base sequence in a modified cell library, in which the modified base sequence contains the modified portion and the non-modified portion. In this way, a library containing a variety of modified cells having various mutations at various positions in the modified base sequence can be constructed. FIG. 4 shows the characteristics (disadvantages) of the genome modification method using site-specific recombinase.

[Definition]
The terms "polynucleotide" and "nucleic acid" are used interchangeably and refer to a nucleotide polymer in which nucleotides are linked by phosphodiester bonds. A "polynucleotide" and a "nucleic acid" may be DNA, RNA, or a combination of DNA and RNA. A "polynucleotide" and a "nucleic acid" may be a polymer of natural nucleotides, a polymer of natural nucleotides and non-natural nucleotides (such as analogs of natural nucleotides, nucleotides in which at least one of the base moiety, sugar moiety, and phosphate moiety is modified (e.g., phosphorothioate backbone), etc.), or a polymer of non-natural nucleotides.

The base sequence of a "polynucleotide" or "nucleic acid" is written in the generally accepted single letter code unless otherwise specified. The base sequence is written from the 5' to the 3' side unless otherwise specified. The nucleotide residues that make up a "polynucleotide" or "nucleic acid" may be written simply as adenine, thymine, cytosine, guanine, or uracil, etc., or by their single letter codes.

The term "gene" refers to a polynucleotide that contains at least one open reading frame that encodes a particular protein. A gene can contain both exons and introns.

The terms "polypeptide", "peptide" and "protein" are used interchangeably and refer to a polymer of amino acids linked by amide bonds. A "polypeptide", "peptide" or "protein" may be a polymer of natural amino acids, a polymer of natural and non-natural amino acids (e.g., chemical analogues or modified derivatives of natural amino acids), or a polymer of non-natural amino acids. Unless otherwise specified, amino acid sequences are written from the N-terminus to the C-terminus.

The term "allele" refers to a set of base sequences present at the same locus on a chromosomal genome. In some embodiments, a diploid cell has two alleles at the same locus, and a triploid cell has three alleles at the same locus. In some embodiments, additional alleles may be formed by an abnormal copy of the chromosome or an abnormal additional copy of the locus.

The terms "genome modification" or "genome editing" are used interchangeably and refer to the induction of a mutation at a desired position (target region) on a genome. Genome modification may include the use of a sequence-specific nucleic acid cleaving molecule designed to cleave the target region DNA. In a preferred embodiment, genome modification may include the use of a nuclease engineered to cleave the target region DNA. In a preferred embodiment, genome modification may include the use of a nuclease engineered to cleave a target sequence having a specific base sequence in the target region (e.g., TALEN or ZFN). In a preferred embodiment, genome modification may use a sequence-specific endonuclease such as a restriction enzyme having only one cleavage site in the genome, such as a meganuclease (e.g., a restriction enzyme with 16-base sequence specificity (theoretically present at a ratio of 1 in 4 ¹⁶ bases), a restriction enzyme with 17-base sequence specificity (theoretically present at a ratio of 1 in 4 ¹⁷ bases), and a restriction enzyme with 18-base sequence specificity (theoretically present at a ratio of 1 in 4 ¹⁸ bases)) to cleave a target sequence having a specific base sequence in the target region. Typically, a double-stranded break (DSB) is induced in the DNA of the target region by the use of a site-specific nuclease, and then the genome is repaired by endogenous processes of the cell, such as Homologous Directed Repair (HDR) and Non-Homologous End-Joining Repair (NHEJ). NHEJ is a repair method that joins the ends of double-stranded breaks without using donor DNA, and insertions and/or deletions (indels) are frequently induced during repair. HDR is a repair mechanism that uses donor DNA, and it is also possible to introduce desired mutations into the target region. A preferred example of a genome modification technique is the CRISPR/Cas system. Examples of meganucleases include I-SceI, I-SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIIP, I-CrepsbIVP, I-TliI, I-PpoI, PI-PspI, F-SceI, F-SceII, F-SuvI, F-TevI, F-TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaIII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI, I-DdiIII, I-DirI, I-DmoI, I-HmuI, I-HmuIII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-Na nI, I-NclIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIIP, I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP, I-SpomIP, I-SpomIIP, I-SquiIP, I-Ssp68031, I-SthPhiJP, I-SthPhiST3P, I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, I-UarAP, I-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-Mtul, PI-MtuHIP A meganuclease and its cleavage site (or recognition site) selected from the group consisting of restriction enzymes PI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP, PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, and PI-TliII, as well as functional derivatives thereof, preferably a meganuclease that is a restriction enzyme having sequence specificity of 18 bases or more and its cleavage site (or recognition site), in particular a meganuclease that does not cleave the genome of a cell at one or more sites and its cleavage site, can be used.

The term "target region" refers to a genomic region that is subject to genome modification. "Deletion" includes deletions of one or more bases and deletions of one or more genes relative to a reference genome. The deletions can be deletions of 100 bp or more, deletions of 200 bp or more, deletions of 300 bp or more, deletions of 400 bp or more, deletions of 500 bp or more, deletions of 600 bp or more, deletions of 700 bp or more, deletions of 800 bp or more, deletions of 900 bp or more, deletions of 1 kbp or more, deletions of 10 kbp or more, deletions of 50 kbp or more, deletions of 100 kbp or more, deletions of 200 kbp or more, deletions of 300 kbp or more, deletions of 400 kbp or more, deletions of 500 kbp or more, or deletions of 1 Mbp or more or less. The deletions can be deletions of 1 Mbp or less. The deletions can be deletions of 700 kbp or less. The deletion may be a deletion of 600 kbp or less. The deletion may be a deletion of 500 kbp or less. The deletion may be a deletion of 10 kbp to 600 kbp or less. The deletion may be a deletion of 100 kbp to 600 kbp or less. The deletion may be a deletion of 100 kbp to 500 kbp or less.

The term "donor DNA" refers to DNA used to repair double-stranded DNA breaks and capable of homologous recombination with DNA surrounding a target region. The donor DNA contains a base sequence upstream and a base sequence downstream of the target region (e.g., a base sequence adjacent to the target region) as homology arms. In this specification, a homology arm consisting of a base sequence upstream of a target region (e.g., a base sequence adjacent to the upstream side) may be referred to as an "upstream homology arm", and a homology arm consisting of a base sequence downstream of a target region (e.g., a base sequence adjacent to the downstream side) may be referred to as a "downstream homology arm". The donor DNA may contain a desired base sequence between the upstream homology arm and the downstream homology arm. The length of each homology arm is preferably 300 bp or more, and is usually about 500 to 3000 bp. The lengths of the upstream homology arm and the downstream homology arm may be the same or different from each other. If the target region is successfully induced to undergo homologous recombination with the donor DNA after sequence-dependent cleavage, the sequence between the upstream and downstream base sequences of the target region will be replaced with the sequence sandwiched between the upstream and downstream base sequences of the donor DNA.

"Upstream" of a target region means the DNA region located on the 5' side of a reference nucleotide strand in the double-stranded DNA of the target region. "Downstream" of a target region means the DNA located on the 3' side of the reference nucleotide strand. It is arbitrary which strand of the double strand is used as the reference nucleotide strand. However, for convenience, when the target region contains a protein coding sequence, the reference nucleotide strand is usually the sense strand. In general, a promoter is located upstream of a protein coding sequence. A terminator is located downstream of a protein coding sequence.

The term "sequence-specific nucleic acid cleaving molecule" refers to a molecule that can recognize a specific nucleic acid sequence and cleave a nucleic acid at said specific nucleic acid sequence. A sequence-specific nucleic acid cleaving molecule is a molecule that has the activity of cleaving a nucleic acid in a sequence-specific manner (sequence-specific nucleic acid cleaving activity).

The term "target sequence" refers to a DNA sequence in a genome that is to be cleaved by a sequence-specific nucleic acid cleaving molecule. When the sequence-specific nucleic acid cleaving molecule is a Cas protein, the target sequence refers to a DNA sequence in a genome that is to be cleaved by the Cas protein. When a Cas9 protein is used as the Cas protein, the target sequence must be adjacent to the 5' side of a protospacer adjacent motif (PAM). The target sequence is usually selected as a sequence of 17 to 30 bases (preferably 18 to 25 bases, more preferably 19 to 22 bases, and even more preferably 20 bases) adjacent to and immediately preceding the 5' side of the PAM. A known design tool such as CRISPR DESIGN (crispr.mit.edu/) can be used to design the target sequence.

The term "Cas protein" refers to a CRISPR-associated protein. In a preferred embodiment, the Cas protein forms a complex with a guide RNA and exhibits endonuclease activity or nickase activity. Examples of Cas proteins include, but are not limited to, Cas9 protein, Cpf1 protein, C2c1 protein, C2c2 protein, and C2c3 protein. Cas proteins include wild-type Cas proteins and their homologs (paralogs and orthologs), as well as mutants thereof, so long as they cooperate with a guide RNA to exhibit endonuclease activity or nickase activity.
In a preferred embodiment, the Cas protein is involved in the class 2 CRISPR/Cas system, more preferably in the type II CRISPR/Cas system. A preferred example of the Cas protein is the Cas9 protein. A preferred example of the Cas protein is the Cas3 protein.

The term "Cas9 protein" refers to a Cas protein involved in the type II CRISPR/Cas system. The Cas9 protein forms a complex with a guide RNA and exhibits the activity of cleaving DNA in a target region in cooperation with the guide RNA. The Cas9 protein includes wild-type Cas9 protein and its homologs (paralogs and orthologs), as well as mutants thereof, so long as it has the above-mentioned activity. The wild-type Cas9 protein has a RuvC domain and an HNH domain as nuclease domains, but the Cas9 protein in this specification may have either the RuvC domain or the HNH domain inactivated. Cas9 in which either the RuvC domain or the HNH domain is inactivated introduces a single-stranded break (nick) into double-stranded DNA. Therefore, when Cas9 in which either the RuvC domain or the HNH domain has been inactivated is used to cleave double-stranded DNA, a modified system can be constructed in which Cas9 target sequences are set for each of the sense and antisense strands, and nicks in the sense and antisense strands are generated at positions sufficiently close to each other, thereby inducing double-stranded cleavage.
The species of organism from which the Cas9 protein is derived is not particularly limited, but preferred examples include bacteria belonging to the genus Streptococcus, Staphylococcus, Neisseria, or Treponema. More specifically, preferred examples include Cas9 proteins derived from S. pyogenes, S. thermophilus, S. aureus, N. meningitidis, or T. denticola. In a preferred embodiment, the Cas9 protein is a Cas9 protein derived from S. pyogenes.

The amino acid sequences of various Cas proteins and information on their coding sequences can be obtained from various databases such as GenBank, UniProt, and Addgene. For example, the amino acid sequence of the Cas9 protein of S. pyogenes can be that registered in Addgene as plasmid number 42230. An example of the amino acid sequence of the Cas9 protein of S. pyogenes is shown in SEQ ID NO:1.

The terms "guide RNA" and "gRNA" are used interchangeably and refer to an RNA that can form a complex with Cas protein and guide Cas protein to a target region. In a preferred embodiment, the guide RNA comprises CRISPR RNA (crRNA) and transactivating CRISPR RNA (tracrRNA). The crRNA is involved in binding to a target region on the genome, and the tracrRNA is involved in binding to Cas protein. In a preferred embodiment, the crRNA comprises a spacer sequence and a repeat sequence, and the spacer sequence binds to the complementary strand of the target sequence in the target region. In a preferred embodiment, the tracrRNA comprises an anti-repeat sequence and a 3' tail sequence. The anti-repeat sequence has a sequence complementary to the repeat sequence of the crRNA and forms a base pair with the repeat sequence, and the 3' tail sequence usually forms three stem loops.
The guide RNA may be a single guide RNA (sgRNA) in which the 5' end of the tracrRNA is linked to the 3' end of the crRNA, or the crRNA and the tracrRNA may be separate RNA molecules in which the repeat sequence and the anti-repeat sequence form base pairs. In a preferred embodiment, the guide RNA is an sgRNA.

The crRNA repeat sequence and tracrRNA sequence can be appropriately selected depending on the type of Cas protein, and those derived from the same bacterial species as the Cas protein can be used.
For example, when using Cas9 protein derived from S. pyogenes, the length of the sgRNA can be about 50 to 220 nucleotides (nt), preferably about 60 to 180 nt, more preferably about 80 to 120 nt. The length of the crRNA can be about 25 to 70 bases including the spacer sequence, preferably about 25 to 50 nt. The length of the tracrRNA can be about 10 to 130 nt, preferably about 30 to 80 nt.
The repeat sequence of the crRNA may be the same as that in the bacterial species from which the Cas protein is derived, or may be one in which a part of the 3' end has been deleted. The tracrRNA may have the same sequence as the mature tracrRNA in the bacterial species from which the Cas protein is derived, or may be a truncated type in which the 5' end and/or the 3' end of the mature tracrRNA has been truncated. For example, the tracrRNA may be a truncated type in which about 1 to 40 nucleotide residues have been removed from the 3' end of the mature tracrRNA. The tracrRNA may also be a truncated type in which about 1 to 80 nucleotide residues have been removed from the 5' end of the mature tracrRNA. The tracrRNA may also be a truncated type in which, for example, about 1 to 20 nucleotide residues have been removed from the 5' end and about 1 to 40 nucleotide residues have been removed from the 3' end.
Various crRNA repeat sequences and tracrRNA sequences for sgRNA design have been proposed, and those skilled in the art can design sgRNAs based on known techniques (e.g., Jinek et al. (2012) Science, 337, 816-21; Mali et al. (2013) Science, 339: 6121, 823-6; Cong et al. (2013) Science, 339: 6121, 819-23; Hwang et al. (2013) Nat. Biotechnol. 31: 3, 227-9; Jinek et al. (2013) eLife, 2, e00471).

The terms "protospacer adjacent motif" and "PAM" are used interchangeably and refer to a sequence recognized by the Cas protein during DNA cleavage by the Cas protein. The sequence and position of the PAM vary depending on the type of Cas protein. For example, in the case of the Cas9 protein, the PAM must be immediately adjacent to the 3' side of the target sequence. The sequence of the PAM corresponding to the Cas9 protein varies depending on the bacterial species from which the Cas9 protein is derived. For example, the PAM corresponding to the Cas9 protein of S. pyogenes is "NGG", the PAM corresponding to the Cas9 protein of S. thermophilus is "NNAGAA", the PAM corresponding to the Cas9 protein of S. aureus is "NNGRRT" or "NNGRR(N)", the PAM corresponding to the Cas9 protein of N. meningitidis is "NNNNGATT", and the PAM corresponding to the Cas9 protein of T. "NAAAAC" corresponds to the Cas9 protein of B. denticola (where "R" is A or G; "N" is A, T, G, or C).

The terms "spacer sequence" and "guide sequence" are used interchangeably and refer to a sequence contained in a guide RNA that can bind to a complementary strand of a target sequence. Usually, the spacer sequence is the same sequence as the target sequence (with the exception that T in the target sequence becomes U in the spacer sequence). In an embodiment of the present invention, the spacer sequence may contain one or more base mismatches with the target sequence. When multiple base mismatches are contained, the mismatches may be located adjacent to each other or may be located distant from each other. In a preferred embodiment, the spacer sequence may contain 1 to 5 base mismatches with the target sequence. In a particularly preferred embodiment, the spacer sequence may contain one base mismatch with the target sequence.
In the guide RNA, the spacer sequence is positioned 5' to the crRNA.

The term "operably linked" when used with respect to a polynucleotide means that a first base sequence is positioned sufficiently close to a second base sequence that the first base sequence can affect the second base sequence or a region under the control of the second base sequence. For example, a polynucleotide is operably linked to a promoter means that the polynucleotide is linked such that it is expressed under the control of the promoter.

The term "expressible state" refers to a state in which a polynucleotide can be transcribed in a cell into which it has been introduced.
The term "expression vector" refers to a vector containing a target polynucleotide and equipped with a system that allows the target polynucleotide to be expressed in a cell into which the vector is introduced. For example, "Cas protein expression vector" refers to a vector that can express Cas protein in a cell into which the vector is introduced. Also, for example, "guide RNA expression vector" refers to a vector that can express guide RNA in a cell into which the vector is introduced.

In this specification, sequence identity (or homology) between base sequences or amino acid sequences is determined by juxtaposing two base sequences or amino acid sequences with gaps at the insertion and deletion sites so that the corresponding bases or amino acids are most commonly matched, and calculating the ratio of matching bases or amino acids to the entire base sequence or entire amino acid sequence excluding gaps in the resulting alignment. Sequence identity between base sequences or amino acid sequences can be determined using various homology search software known in the art. The sequence identity value (identity value) of base sequences is not particularly limited, and can be obtained, for example, by a BLAT search installed in the known homology search software UCSC Genome Browser.

In this specification, for convenience, when referring to the position of the human genome, the position in the hg38 genome sequence is used as the reference genome. hg38 is a reference genome released by the University of California, Santa Cruz (UCSC) in December 2013. The reference genome is a reference genome created by combining various genomes, and it does not mean that there is a human having this genome. However, by comparing the fragmentary sequence information decoded from the genomic DNA of a human individual with the reference genome, the decoded fragmentary sequence information is linked to construct a continuous sequence on a computer, and the sequence of the genomic DNA of the human individual can be estimated. In this way, the genomic DNA of an individual such as a human individual is usually decoded by matching the sequence of the genomic DNA of the human individual to the reference genome. And, a position or region corresponding to a specific position or specific region of the hg38 genome sequence means a position or region linked to the specific position or specific region in the genome of another individual having a different specific sequence. Specifically, a position or region having a sequence characteristic of the position or region based on sequence identity corresponds to a specific position or region of the hg38 genome sequence. The corresponding position can be determined by aligning the partial sequences of two genomic DNAs. Even if there is a difference in the specific sequence, the correspondence between the two genomic DNAs can be determined by aligning them if they have an orthologous relationship or sequence identity. In a region rich in paralogs generated by gene duplication, simply determining the correspondence between sequences based on individual sequences may not be sufficient to determine the true correspondence between the two genomes. This increases the difficulty of sequence deciphering a region where similar sequences are accumulated. When determining the corresponding sequence, the correspondence between the two genomes can be clarified by seeking high sequence identity. In addition, when the specific region is a large region containing multiple genes, synteny can be taken into consideration. Synteny refers to the conservation of the physical positional relationship of orthologs on the genome. Synteny can exist between individuals and between organisms. Therefore, the specific region can be determined by taking synteny into consideration.

[Library creation method]
The method for preparing a library may include preparing cells. The prepared cells are pre-modified cells of the present disclosure, and may be referred to as "reference cells" because they can serve as a reference for comparison with modified cells. The cells may preferably be cloned cells, established cells, or immortalized cells. In a preferred embodiment, the cells may include a single type of cell. In the library of the present disclosure, a library of cells in which only a specific region of a specific locus is replaced with a target sequence is provided. From the viewpoint that the technical significance of replacing the target sequence with the target sequence can be clarified by unifying sequences other than the target sequence, it is preferable that the cells consist of a single type of cell. The single type of cell is a cloned cell.

A schematic overview of the first preparation mode is shown in FIG.
The method for preparing the library may include obtaining a first intermediate cell by using the following genome modification method for the cell. The first intermediate cell is a cell having a genome including a first allele and a second allele at a locus to be modified, and the first allele and the second allele each have a cassette including a selection marker gene and a target nucleic acid sequence. In the first intermediate cell, the selection marker gene of the first allele and the selection marker gene of the second allele are distinguishably different, the target nucleic acid sequence is a target of the genome modification system, and is designed so that the first allele and the second allele can be distinguishably cleaved by the genome modification system, and each selection marker gene is a negative selection marker gene that can be used for negative selection. The negative selection marker gene may also be used for positive selection (e.g., a visualization marker gene, etc.). The method for preparing the library may include obtaining a library of modified cells (sometimes referred to as a "library of first modified cells" or simply "first library") from the first intermediate cell. The modified cells contained in the first library may be referred to as "first modified cells".
The first intermediate cell can be obtained by a person skilled in the art. Although not particularly limited, it can be easily prepared by using the genome modification method described below. Obtaining a modified cell from the intermediate cell can be achieved by cleaving the first cassette or its vicinity in the presence of donor DNA for introducing a modified base sequence.

A schematic overview of the second construction mode is shown in FIG.
The method for producing a library may include obtaining a first intermediate cell using the following genome modification method for a cell. The method for producing a library may include obtaining a second intermediate cell from the first intermediate cell. The method for producing a library may include obtaining a library from the second intermediate cell. Here, the first intermediate cell is as described above. The second intermediate cell is a cell having a genome including a first allele and a second allele at a locus to be modified, the first allele having a cassette including a selection marker gene and a target nucleic acid sequence, and the second allele not including the cassette. The second intermediate cell can be produced by removing the cassette from the second allele of the first intermediate cell. Removal of the cassette can be appropriately performed by a person skilled in the art using a genome modification method. Specifically, a genome modification system capable of cleaving a target sequence contained in a second allele in the presence of a donor DNA having an upstream homology arm capable of homologous recombination with the upstream of the cassette and a downstream homology arm capable of homologous recombination with the downstream of the cassette upon cleavage can be applied to the first intermediate cell, thereby obtaining a second intermediate cell from the first intermediate cell. The method for producing a library can include obtaining a library of modified cells (sometimes referred to as a "library of second modified cells" or simply a "second library") from the second intermediate cell. The modified cells contained in the second library can be sometimes referred to as "second modified cells". Obtaining the second intermediate cell from the first intermediate cell can be achieved by cleaving a sequence in or near the second cassette in the presence of a donor DNA for removing the cassette. Obtaining the modified cell from the second intermediate cell can be achieved by cleaving the first cassette in the presence of a donor DNA for introducing a modified base sequence.

Variation of the second production form In the second production form, the second intermediate cell was produced by removing the cassette from the second allele of the first intermediate cell. In a variation of the second production form, the cassette in the second allele of the first intermediate cell is removed to return the second allele to the sequence before modification, and a third intermediate cell is obtained in which the first allele has a cassette containing a selection marker gene and a target nucleic acid sequence, and the second allele has the sequence before modification. In a variation of the second production form, a library of modified cells is then obtained by applying a donor DNA for library production to the first allele of the third intermediate cell. In this embodiment, the modified cells contained in the library of modified cells contain a modified base sequence in the first allele, and the second allele has the sequence before modification. The operation of returning the second allele of the first intermediate cell to the sequence before modification can be achieved by cutting within or near the second cassette in the presence of a donor DNA consisting of an upstream homology arm capable of homologous recombination with the upstream of the second cassette and a downstream homology arm capable of homologous recombination with the downstream of the cassette.

The present disclosure provides a first intermediate cell, a second intermediate cell, a composition comprising these intermediate cells, a first modified cell, a first library, a second modified cell, and a second library.

In general, in genome editing, in the presence of a donor DNA having an upstream homology arm (e.g., having a complementary sequence) capable of homologous recombination with the upstream of the target region and a downstream homology arm (e.g., having a complementary sequence) capable of homologous recombination with the downstream of the target region, a cut is typically made in the target region, and the target region is replaced with a sequence sandwiched between the upstream and downstream homology arms in the donor DNA. If the sequence sandwiched between the upstream and downstream homology arms does not exist, the target region is deleted (or seamlessly deleted). In genome editing, cuts may be made at multiple locations in the target region. Typically, it is beneficial to make cuts in both the target region adjacent to the upstream homology arm and the target region adjacent to the downstream homology arm.

In a preferred embodiment, in the first intermediate cell, the first allele and the second allele are each replaced with a cassette of the target region, and the first allele and the second allele may have a deletion of the target region. In a preferred embodiment, the first allele and the second allele are each inserted with a cassette into the target region, and the first allele and the second allele may not have a deletion of the target region. In a preferred embodiment, the insertion of the cassette of the target region is made into a non-functional target region, and therefore does not result in a loss of function or deficiency associated with the destruction of the target region.

The cells used in the genome modification method of this embodiment are not particularly limited, and may be cells having a haploid or diploid or higher chromosomal genome. The cells may be diploid, triploid, or quadruploid or higher. Examples of cells include, but are not limited to, eukaryotic cells. The cells may be plant cells, animal cells, or fungal cells. The animal cells may be, but are not limited to, cells of humans, non-human mammals (e.g., non-human primates such as monkeys, non-human mammals such as dogs, cats, cows, horses, sheep, goats, llamas, and rodents), birds, reptiles, amphibians, fish, and other vertebrates.

Examples of such cells include pluripotent cells (e.g., pluripotent stem cells such as embryonic stem cells (ES cells) and induced pluripotent stem cells (iPS cells)), hematopoietic stem cells, hematopoietic progenitor cells, bone marrow cells, spleen cells, common myeloid progenitor cells, immune cells (e.g., T cells, B cells, NK cells, NKT cells, macrophages, monocytes, neutrophils, eosinophils, basophils), erythrocytes, megakaryocytes, cardiac cells, cardiomyocytes, cardiac fibroblasts, pancreatic beta cells, corneal cells (e.g., corneal epithelial cells and corneal endothelial cells), epidermal cells, dermal cells, adipocytes, chondrocytes, osteocytes, osteoclasts, osteoblasts, mesenchymal stem cells (e.g., adipose-derived, bone marrow-derived, placenta-derived, and umbilical cord-derived), dental pulp cells, tendon cells, ligament cells, nerve cells (e.g., cone cells, astrocytes, and granule cells), glial cells, Purkinje cells, retinal ganglion cells, retinal cells, optic nerve cells, and neural stem cells. In some preferred embodiments, the cells can be primary cells. In some preferred embodiments, the cells can be immortalized cells or cell lines. In some preferred embodiments, the cells are human cells.

In some embodiments, the cell may be an isolated cell, a cloned cell, or a cell line. The cell may be an immortalized cell. In some preferred embodiments, the cell is a cloned cell. In some preferred embodiments, the cell is a cell line. In some preferred embodiments, the cell is an immortalized cell. In some embodiments, the cell is a primary somatic cell. It will be understood by those skilled in the art that the cell is appropriately selected depending on the intended use.

In one embodiment, the cells (or library) can be frozen in a cell cryoprotectant. In one embodiment, the cell cryoprotectant containing the cells can be provided in a non-frozen state or preferably in a frozen state. The cell cryoprotectant containing the cells in a frozen state (also called a "freeze stock") can be used as a research cell bank (RCB), master cell bank (MCB), or working cell bank (WCB). Thus, the present invention provides a research cell bank (RCB), master cell bank (MCB), or working cell bank (WCB) that includes the above-mentioned frozen stock.

[Genome modification method]
The method described below (see, for example, UKiS; International Publication No. 2021/206054) can be preferably used to produce the above-mentioned cells. This method is preferably used to produce a first intermediate cell from a cell, particularly in step S1 shown in FIG.

In one embodiment, in the present invention, a method including the following (a) and (b) can be used to prepare the first intermediate cell:
(a) introducing the following (i) and (ii) into a cell containing two or more alleles to introduce a selection marker gene into each of the two or more alleles;
(i) a sequence-specific nucleic acid cleaving molecule capable of targeting and cleaving a target region in two or more alleles of the chromosomal genome, or a genome modification system comprising a polynucleotide encoding the sequence-specific nucleic acid cleaving molecule;
(ii) Two or more types of donor DNA for selection markers, each of which has an upstream homology arm having a base sequence capable of homologous recombination with a base sequence on the upstream side of the target region and a downstream homology arm having a base sequence capable of homologous recombination with a base sequence on the downstream side of the target region, and contains a base sequence of a selection marker gene between the upstream homology arm and the downstream homology arm, wherein the two or more donor DNAs for selection markers each have a selection marker gene that is distinguishable from each other, the selection marker gene is unique for each type of donor DNA for selection markers, and the number of types of donor DNA for selection markers is equal to or greater than the number of alleles to be subjected to genome modification;
(b) after the step (a), a step of selecting cells expressing all of the introduced selectable marker genes by homologous recombination of different types of selectable marker donor DNAs with respect to the two or more alleles, and the selectable marker genes being uniquely different from each other and being different from each other (a step for positive selection);
The method may include:

(Step (a))
In step (a), (i) and (ii) are introduced into a cell containing the chromosome.

(i) Genome modification system
"Genome modification system" refers to a molecular mechanism capable of modifying a desired target region. The genome modification system includes a sequence-specific nucleic acid cleavage molecule that targets a target region of a chromosomal genome, or a polynucleotide that encodes the sequence-specific nucleic acid cleavage molecule. More specifically, the genome modification system can cleave at least one, preferably two, in or near the target region.

The target region to be subjected to genome modification can be any region on the genome having one or more alleles. The size of the target region is not particularly limited. In the genome modification method of this embodiment, a region of a larger size than conventionally can be modified. The target region may be, for example, 10 kbp or more. The target region may be, for example, 100 bp or more, 200 bp or more, 400 bp or more, 800 bp or more, 1 kbp or more, 2 kbp or more, 3 kbp or more, 4 kbp or more, 5 kbp or more, 8 kbp or more, 10 kbp or more, 20 kbp or more, 40 kbp or more, 80 kbp or more, 100 kbp or more, 200 kbp or more, 300 kbp, 400 kbp or more, 500 kbp or more, 600 kbp or more, 700 kbp or more, 800 kbp or more, 900 kbp or more, or 1 Mbp or more, or any of the above values or less. In one embodiment, the target region is deleted in the modified cell.

The sequence-specific nucleic acid cleaving molecule is not particularly limited as long as it has sequence-specific nucleic acid cleaving activity, and may be a synthetic organic compound or a biopolymer compound such as a protein. An example of a protein having sequence-specific site cleavage activity is a sequence-specific endonuclease.

A sequence-specific endonuclease is an enzyme that can cleave nucleic acids at a specific sequence. A sequence-specific endonuclease can cleave double-stranded DNA at a specific sequence. Sequence-specific endonucleases are not particularly limited, but examples include zinc finger nucleases (ZFNs), TALENs (transcription activator-like effector nucleases), Cas proteins, etc., but are not limited to these.

ZFNs are artificial nucleases that contain a nucleic acid cleavage domain conjugated to a binding domain that contains a zinc finger array. Examples of cleavage domains include the cleavage domain of the type II restriction enzyme FokI. Zinc finger nucleases capable of cleaving a target sequence can be designed by known methods.

TALENs are artificial nucleases that contain a DNA-binding domain of a transcription activator-like (TAL) effector in addition to a DNA-cleavage domain (e.g., a FokI domain). TALE constructs capable of cleaving a target sequence can be designed by known methods (e.g., Zhang, Feng et. al. (2011) Nature Biotechnology 29 (2)).

When a Cas protein is used as the sequence-specific nucleic acid cleavage molecule, the genome modification system includes a CRISPR/Cas system. That is, the genome modification system preferably includes a Cas protein and a guide RNA having a base sequence homologous to a base sequence in the target region. The guide RNA may include a sequence homologous to a sequence in the target region (target sequence) as a spacer sequence. The guide RNA may be capable of binding to DNA in the target region, and does not need to have a sequence completely identical to the target sequence. This binding may be formed under physiological conditions in the cell nucleus. The guide RNA may include, for example, 0 to 3 base mismatches with respect to the target sequence. The number of mismatches is preferably 0 to 2 bases, more preferably 0 to 1, and even more preferably no mismatches. The guide RNA may be designed based on a known method. The genome modification system is preferably a CRISPR/Cas system, and preferably includes a Cas protein and a guide RNA. The Cas protein is preferably a Cas9 protein.

The sequence-specific endonuclease may be introduced into the cell as a protein, or may be introduced into the cell as a polynucleotide encoding the sequence-specific endonuclease. For example, the mRNA of the sequence-specific endonuclease may be introduced, or an expression vector of the sequence-specific endonuclease may be introduced. In the expression vector, the coding sequence of the sequence-specific endonuclease (sequence-specific endonuclease gene) is functionally linked to a promoter. The promoter is not particularly limited, and for example, various pol II promoters can be used. Examples of pol II promoters include, but are not limited to, the CMV promoter, the EF1 promoter (EF1α promoter), the SV40 promoter, the MSCV promoter, the hTERT promoter, the β-actin promoter, the CAG promoter, and the CBh promoter.

The promoter may be an inducible promoter. An inducible promoter is a promoter that can induce expression of a polynucleotide functionally linked to the promoter only in the presence of an inducer that drives the promoter. Inducible promoters include promoters that induce gene expression by heating, such as heat shock promoters. Inducible promoters also include promoters in which the inducer that drives the promoter is a drug. Such drug-inducible promoters include, for example, cumate operator sequences, lambda operator sequences (e.g., 12×λOp), tetracycline-based inducible promoters, and the like. Tetracycline-based inducible promoters include, for example, promoters that drive gene expression in the presence of tetracycline or a derivative thereof (e.g., doxycycline), or reverse tetracycline-controlled transactivator (rtTA). Tetracycline-based inducible promoters include, for example, the TRE3G promoter.

Any known expression vector can be used without any particular restrictions. Examples of expression vectors include plasmid vectors and viral vectors. When the sequence-specific endonuclease is a Cas protein, the expression vector may contain a guide RNA coding sequence (guide RNA gene) and in addition to the coding sequence of the Cas protein (Cas protein gene). In this case, it is preferable that the guide RNA coding sequence (guide RNA gene) is functionalized in a pol III promoter. Examples of pol III promoters include mouse and human U6-snRNA promoters, human H1-RNase P RNA promoters, and human valine-tRNA promoters.

<(ii) Donor DNA for selection marker>
The donor DNA for a selection marker is a donor DNA for knocking in a selection marker into a target region. The donor DNA for a selection marker contains the base sequence of one or more selection marker genes between an upstream homology arm having a base sequence homologous to a base sequence adjacent to the upstream side of the target region and a downstream homology arm having a base sequence homologous to a base sequence adjacent to the downstream side of the target region.

The donor DNA for the selection marker may have a length of, but is not limited to, 1 kb or more, 2 kb or more, 3 kb or more, 4 kb or more, 5 kb or more, 6 kb or more, 7 kb or more, 8 kb or more, 9 kb or more, 9.5 kb or more, or 10 kb or more. The donor DNA for the selection marker may have a length of, but is not limited to, 50 kb or less, 45 kb or less, 40 kb or less, 35 kb or less, 30 kb or less, 25 kb or less, 20 kb or less, 15 kb or less, 14 kb or less, 13 kb or less, 12 kb or less, 11 kb or less, 10 kb or less, 9 kb or less, 8 kb or less, 7 kb or less, 6 kb or less, 5 kb or less, or 4 kb or less.

A "selection marker" refers to a protein that can select cells based on the presence or absence of its expression. A selection marker gene is a gene that codes for a selection marker. When a selection marker-expressing cell is selected in a cell population in which selection marker-expressing cells and non-expressing cells are mixed, the selection marker is called a "positive selection marker" or a "selection marker for positive selection". When a selection marker-non-expressing cell is selected in a cell population in which selection marker-expressing cells and non-expressing cells are mixed, the selection marker is called a "negative selection marker" or a "selection marker for negative selection". When selection markers are different from each other, it means that they can be distinguished from each other (e.g., they are distinguishably different), and for example, they can be distinguished from each other at least in physiological properties such as the property of drug resistance that they confer on cells into which the selection marker is introduced or in other physicochemical properties. In other words, when selection markers are different from each other, it means that different selection markers can be detected in a distinguishable manner from other selection markers, or that they can be selected for drugs in a distinguishable manner from other selection markers. Furthermore, the selective marker gene being unique to each type of donor DNA for selective markers means that the selective marker gene possessed by one type of donor DNA for selective markers is not contained in other types of donor DNA for selective markers, or, when contained in multiple types of donor DNA, is configured so that it is not expressed from two or more types of donor DNA at the same time. In this case, the two or more types of donor DNA may be identical except for the selective marker, or may differ in the sequence and/or structure other than the selective marker.

The positive selection marker is not particularly limited as long as it allows the selection of cells expressing it. Examples of positive selection marker genes include drug resistance genes, fluorescent protein genes, luminescent enzyme genes, and chromogenic enzyme genes.

The negative selection marker is not particularly limited as long as it is capable of selecting cells that do not express it. Examples of negative selection marker genes include suicide genes (such as thymidine kinase), fluorescent protein genes, luminescent enzyme genes, and chromogenic enzyme genes. When the negative selection marker gene is a gene that has a negative effect on cell survival (such as a suicide gene), the negative selection marker gene can be functionally linked to an inducible promoter. By functionally linking the negative selection marker gene to an inducible promoter, the negative selection marker gene can be expressed only when it is desired to remove cells that have the negative selection marker gene. When the negative selection marker gene has little negative effect on cell survival, such as when it is an optically detectable marker gene (visible marker gene) that is fluorescent, luminescent, or chromogenic, it may be expressed constitutively.

Examples of drug resistance genes include, but are not limited to, a puromycin resistance gene, a blasticidin resistance gene, a geneticin resistance gene, a neomycin resistance gene, a tetracycline resistance gene, a kanamycin resistance gene, a zeocin resistance gene, a hygromycin resistance gene, and a chloramphenicol resistance gene.
Examples of fluorescent protein genes include, but are not limited to, green fluorescent protein (GFP) gene, yellow fluorescent protein (YFP) gene, red fluorescent protein (RFP) gene, and the like.
Examples of the luminescent enzyme gene include, but are not limited to, the luciferase gene.
Examples of chromogenic enzyme genes include, but are not limited to, β-galactosidase gene, β-glucuronidase gene, alkaline phosphatase gene, and the like.
Examples of suicide genes include, but are not limited to, herpes simplex virus thymidine kinase (HSV-TK), inducible caspase 9, and the like.

The selection marker gene contained in the selection marker donor DNA is preferably a positive selection marker gene. In other words, cells expressing the selection marker can be selected as cells in which the selection marker gene has been knocked in.

The upstream homology arm has a base sequence capable of homologous recombination with a base sequence upstream of the target region in the genome to be modified, for example, a base sequence homologous to a base sequence adjacent to the upstream side of the target sequence. The downstream homology arm has a base sequence capable of homologous recombination with a base sequence downstream of the target region in the genome to be modified, for example, a base sequence homologous to a base sequence adjacent to the downstream side of the target sequence. The length and sequence of the upstream homology arm and the downstream homology arm are not particularly limited as long as they are capable of homologous recombination with the surrounding region of the target region. The upstream homology arm and the downstream homology arm do not necessarily have to completely match the upstream or downstream sequence of the target region as long as they can perform homologous recombination. For example, the upstream homology arm can be a sequence having 90% or more sequence identity (homology) with the base sequence adjacent to the upstream side of the target region, and it is preferable that the sequence identity is 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more. For example, the downstream homology arm can be a sequence having 90% or more sequence identity (homology) with the base sequence adjacent to the downstream side of the target region, and preferably has 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more sequence identity. In addition, if at least one of the upstream homology arm and the downstream homology arm is closer to the cleavage site in the target region, the efficiency of allele modification can be further increased. Here, "close" can mean that the distance between the two sequences is 100 bp or less, 50 bp or less, 40 bp or less, 30 bp or less, 20 bp or less, or 10 bp or less.

In the donor DNA for selection markers, the selection marker gene is located between the upstream homology arm and the downstream homology arm. As a result, when the donor DNA for selection markers is introduced into a cell together with the genome modification system described above in (i), the selection marker gene is introduced into the target region by HDR (if a gene is destroyed by this, it is called gene knockout, and if a desired gene is introduced by this, it is called gene knockin, in which it is possible to knock out a gene while knocking in another gene).

The selection marker gene is preferably functionally linked to a promoter so that it is expressed under the control of an appropriate promoter. The promoter can be appropriately selected depending on the type of cell into which the donor DNA is introduced. Examples of promoters include SRα promoter, SV40 early promoter, retroviral LTR, CMV (cytomegalovirus) promoter, RSV (Rous sarcoma virus) promoter, HSV-TK (herpes simplex virus thymidine kinase) promoter, EF1α promoter, metallothionein promoter, and heat shock promoter. The donor DNA for the selection marker may have any control sequence such as an enhancer, a polyA addition signal, or a terminator.

The donor DNA for the selection marker may have an insulator sequence. An "insulator" refers to a sequence that blocks or alleviates the influence of the adjacent chromosomal environment and ensures or enhances the independence of the transcriptional regulation of the DNA sandwiched between the regions. An insulator is defined by its enhancer blocking effect (the effect of blocking the effect of the enhancer on promoter activity by inserting it between an enhancer and a promoter) and its suppression effect on position effect (the effect of preventing the expression of the introduced gene from being affected by the position on the genome where it is inserted by sandwiching both sides of the introduced gene with insulators). The donor DNA for the selection marker may have an insulator sequence between the upstream arm and the selection marker gene (or between the upstream arm and the promoter that controls the selection marker gene). The donor DNA for the selection marker may have an insulator sequence between the downstream arm and the selection marker gene.

The donor DNA for the selection marker may be linear or circular, but is preferably circular. Preferably, the donor DNA for the selection marker is a plasmid. The donor DNA for the selection marker may contain any sequence in addition to the above sequences. For example, it may contain a spacer sequence in all or part of the sequences between the upstream homology arm, the insulator, the selection marker gene, and the downstream homology arm.

In step (a), donor DNA for selection markers is introduced into cells in a number equal to or greater than the number of alleles to be modified in the genome. Different types of donor DNA for selection markers have mutually different (distinguishable) types of selection marker genes. In one embodiment, different types of donor DNA for selection markers do not have completely identical selection marker genes or sets thereof. That is, a first type of donor DNA for selection markers has a first type of selection marker gene, a second type of donor DNA for selection markers has a second type of selection marker gene, a third type of donor DNA for selection markers has a third type of selection marker gene, and so on for subsequent types of donor DNA for selection markers. When there are two alleles to be modified in the genome, there are two or more types of donor DNA for selection markers. When there are three alleles to be modified in the genome, there are three or more types of donor DNA for selection markers. In one aspect, one donor DNA for a selection marker may have two or more different (distinguishable) selection markers (even in this case, the different types of donor DNA for a selection marker must have different (distinguishable) types (e.g., unique) of selection marker genes). In one aspect, the donor DNA for a selection marker does not have a recombination sequence of a site-specific recombinase (e.g., a loxP sequence recombined by Cre recombinase and its variants). In another aspect, the method of the present invention does not use a site-specific recombinase and its recombination sequence (e.g., a loxP sequence recombined by Cre recombinase and its variants). When a site-specific recombinase is used, one recombination sequence of the site-specific recombinase usually remains in the edited genome. In contrast, in one aspect, the modified genome of the cell obtained by the method of the present invention does not have a recombination sequence (which is foreign) of a site-specific recombinase.

The number of types of donor DNA for selection markers may be equal to or greater than the number of alleles to be targeted for genome modification, with no particular upper limit. By using donor DNA for selection markers of a number equal to or greater than the number of alleles to be targeted for genome modification, two or more alleles can be stably modified. From the viewpoint of the selection operation in step (b) described below, the number of types of donor DNA for selection markers is preferably equal to the number of alleles to be targeted for genome modification or about 1 to 2 more, and more preferably equal to the number of alleles to be targeted for genome modification.

The method of introducing (i) and (ii) into cells is not particularly limited, and known methods can be used without particular limitation. Examples of methods of introducing (i) and (ii) into cells include, but are not limited to, viral infection, lipofection, microinjection, calcium phosphate, DEAE-dextran, electroporation, and particle gun. By introducing (i) and (ii) into cells, the DNA in the target region is cleaved by the sequence-specific nucleic acid cleavage molecule (i) above, and then the selection marker in the donor DNA for selection marker (ii) is knocked into the target region by HDR. At this time, two or more types of donor DNA for selection marker can be knocked into two or more alleles of the target region randomly when the upstream homology arm and downstream homology arm of each are identical. However, two or more types of donor DNA for selection markers can modify each of the two or more alleles as long as the donor DNA for selection markers has a base sequence of a homology arm that can undergo homologous recombination with the upstream and downstream sequences of the target region of each of the two or more alleles, and therefore does not need to have completely identical base sequences of the homology arms. In one embodiment, the donor DNA for selection markers may have base sequences of the upstream and downstream homology arms that are more identical to the upstream and downstream sequences of the target region of each allele (e.g., may be optimized in this way).

In one embodiment, the donor DNA for the selection marker has an upstream homology arm and a downstream homology arm, and has a selection marker gene between the upstream homology arm and the downstream homology arm, and preferably may further have a target sequence for an endonuclease (a base sequence-specific nucleic acid cleavage molecule) such as a cleavage site for a meganuclease. In this embodiment, in a preferred embodiment, the selection marker includes a selection marker gene for positive selection and a marker gene for negative selection. In another preferred embodiment, the selection marker includes a selection marker for positive selection, but may not include a negative selection marker gene separately from this. In a preferred embodiment, the selection marker gene for positive selection can also be used for negative selection, and such a marker gene may include a visualization marker gene.
A set of two or more donor DNAs for selection markers is a combination of the above donor DNAs for selection markers, and each of them has a selectable marker gene for positive selection that can be distinguished from the others. The above set may further have a target sequence for an endonuclease (base sequence-specific nucleic acid cutting molecule) such as a cleavage site of a meganuclease, and the target sequences may be different from each other, but are preferably the same (or can be cut by the same base sequence-specific nucleic acid cutting molecule). The length of the donor DNA for selection markers is as described above, but may be, for example, 5 kbp or more, 8 kbp or more, or 10 kbp or more.

(Step (b))
After the step (a), step (b) is performed. In step (b), cells into which two or more alleles have been introduced with selectable marker genes or a combination thereof that are distinct from each other are selected based on the expression of the distinctly different selectable marker genes. More specifically, in step (b), cells that express all of the distinctly different selectable marker genes introduced into the two or more alleles by homologous recombination of different types of selectable marker donor DNAs with respect to the two or more alleles are selected. In one aspect, in step (b), cells in which the different selectable marker donor DNAs have been introduced and each allele has been modified are selected based on the expression of all the selectable marker genes contained in the two or more selectable marker donor DNAs and integrated into the chromosomal genome. In one aspect, in step (b), cells are selected based on all the selectable marker genes contained in the two or more selectable marker donor DNAs. In one aspect, in step (b), cells in which each allele has been modified by the introduction of a distinguishable donor DNA for selection marker are selected based on the expression of all the selection marker genes (marker genes for positive selection) that are contained in the two or more types of donor DNA for selection marker and that have been integrated into the chromosomal genome. In one aspect, the cells obtained in step (b) have different marker genes for positive selection in each allele. In one aspect, the cells obtained in step (b) have a common marker gene for positive selection in each allele. Here, in one aspect, single cell cloning is not performed in step (b) {however, it may or may not include single cell cloning after selecting cells in which two or more alleles have been modified in step (b)}. In one aspect, in step (b), the selection of cells is performed based on the expression of multiple distinguishable marker genes for positive selection introduced into each allele. In one aspect, step (b) is not performed by a method of estimating the number of modified alleles based on the expression intensity of a single selection marker gene (e.g., expression intensity or fluorescence intensity of a fluorescent protein). When cells are selected using a method that estimates the number of modified alleles based on the expression intensity of a single selection marker gene, the gene expression level varies from cell to cell, making it difficult to completely separate cells in which two or more alleles are modified from cells in which only one allele is modified; therefore, single-cell cloning is required in step (b).

In step (b), cells may be selected as appropriate depending on the type of selection marker gene used in step (a). In this case, cells are selected based on the expression of all of the selection marker genes used in step (a).

For example, when the selection marker gene is a positive selection marker gene, cells expressing all the selection marker genes that are incorporated (or have been incorporated) into the chromosomal genome to be modified can be selected, for example, cells expressing the same number of positive selection markers as the number of alleles to be modified can be selected. When the positive selection marker gene is a drug resistance gene, cells expressing the positive selection marker can be selected by culturing the cells in a medium containing the drug. When the positive selection marker gene is a fluorescent protein gene, a luminescent enzyme gene, or a chromogenic enzyme gene, cells expressing the positive selection marker can be selected by selecting cells that exhibit fluorescence, luminescence, or color due to the fluorescent protein, luminescent enzyme, or chromogenic enzyme. In this process, when the same number of selection marker donor DNAs as the number of alleles to be modified are incorporated into the genome, the number of alleles is modified. In an n-ploid cell, the number of alleles to be modified is n or less, and when the number of selection marker donor DNAs of types greater than or equal to n are incorporated into the genome, at least the alleles to be modified (which are two or more alleles) are modified. In one embodiment, the number of alleles to be modified is n, and the corresponding number of types of donor DNA for selection markers are incorporated into the chromosomal genome, and all alleles are modified. In one embodiment, in this step, the same number or more types of donor DNA for selection markers as the number of alleles to be modified are used, so the number of positive selection markers expressed by the cells means that the corresponding number of alleles have been reliably modified. From the viewpoint of increasing the efficiency of cell selection in step (b), it is preferable that the number of alleles to be modified is the same as the number of types of donor DNA for selection markers.

As described above, in the genome modification method of this embodiment, by inducing HDR using n types of donor DNA for selection markers to modify n alleles in an n-ploid cell, it is possible to efficiently obtain cells in which all alleles possessed by the cell have been modified. Furthermore, because it is possible to reliably obtain cells in which all alleles have been modified, it is possible to efficiently obtain cells in which the target region has been modified even if the target region is large in size (e.g., 10 kbp or more). This makes large-scale genome modification possible.

In a preferred embodiment, the donor DNA for selection marker may contain a negative selection marker gene in addition to the positive selection marker gene between the upstream homology arm and the downstream homology arm. In a preferred embodiment, in the donor DNA for selection marker, the positive selection marker gene may be a marker gene that can also be used for negative selection (a marker gene that can be used for both positive and negative selection). In a preferred embodiment, the positive selection marker gene may be a drug resistance gene. In a preferred embodiment, the positive selection marker gene may be a visualization marker gene that can also be used for negative selection, etc.

The donor DNA for the selection marker contains a negative selection marker gene in addition to a positive selection marker gene between the upstream homology arm and the downstream homology arm, and may further contain a target nucleic acid sequence. The target nucleic acid sequence is a sequence that can be cleaved by the above-mentioned genome modification system. The target nucleic acid sequence is preferably an allele-specific sequence, which makes it possible to cleave only the first allele or only the second allele of the cassette of the first allele (or the first cassette) and the cassette of the second allele (the second cassette). In this way, selective editing of only one allele is possible by inducing cleavage in an allele-specific manner. In a preferred embodiment, the donor DNA for the selection marker contains one target nucleic acid sequence between the upstream homology arm and the downstream homology arm. In a preferred embodiment, the donor DNA for the selection marker contains a first target nucleic acid sequence and a second target nucleic acid sequence between the upstream homology arm and the downstream homology arm, and contains a selection marker gene between the first target nucleic acid sequence and the second target nucleic acid sequence. Another donor DNA for a selection marker contains a third target nucleic acid sequence and a fourth target nucleic acid sequence between the upstream homology arm and the downstream homology arm, and contains a selection marker gene between the third target nucleic acid sequence and the fourth target nucleic acid sequence. The first target nucleic acid sequence and the second target nucleic acid sequence may be the same or different, and the third target nucleic acid sequence and the fourth target nucleic acid sequence may be the same or different. For example, if the third target nucleic acid sequence and the fourth target nucleic acid sequence are designed not to be cleaved when the first target nucleic acid sequence and the second target nucleic acid sequence are cleaved, and/or the first target nucleic acid sequence and the second target nucleic acid sequence are designed not to be cleaved when the third target nucleic acid sequence and the fourth target nucleic acid sequence are cleaved, only one of the first cassette and the second cassette can be specifically cleaved, and one of the cassettes can be selectively edited. It will be apparent that when the first cassette and the second cassette are edited simultaneously, the first to fourth target nucleic acid sequences may be the same.

In one embodiment, in step (b), modified cells can be selected from a pool containing cells obtained by step (a) without cloning the cells. By eliminating the cloning step, the time required for the process can be reduced.

The first intermediate cell can be obtained from the pre-modified cell by the above steps (a) and (b) (see step S1 in FIG. 1). In step (a), the donor DNA for the selection marker contains a positive selection marker gene and a negative selection marker gene, or a marker gene that can be used for both positive and negative selection, between the upstream homology arm and the downstream homology arm for the target sequence. The donor DNA for the selection marker preferably contains two target nucleic acid sequences between the upstream homology arm and the downstream homology arm for the target sequence. The positive selection marker gene and the negative selection marker gene, or the marker gene that can be used for both positive and negative selection, are preferably present between the two target nucleic acid sequences. In this way, the first intermediate cell can be obtained by positive selection.

The first intermediate cell is a cell having a genome including a first allele and a second allele at a locus to be modified, and having a cassette including a selection marker gene and a target nucleic acid sequence in each of the first allele and the second allele. In a preferred embodiment, in the first intermediate cell, the selection marker gene of the first allele and the selection marker gene of the second allele are distinguishably different. In a preferred embodiment, in the first intermediate cell, the target nucleic acid sequence is a target of a genome modification system, and is designed so that the first allele and the second allele can be cleaved by the genome modification system in a distinguishable manner. In a preferred embodiment, in the first intermediate cell, each selection marker gene is a negative selection marker gene that can be used for negative selection. Although a positive selection marker gene is useful when obtaining the first intermediate cell, the positive selection marker gene is not necessary in the process after obtaining the first intermediate cell. Therefore, the positive selection marker gene may be removed. The removal can be performed, for example, by a genome editing technique. In this way, the first intermediate cell does not need to have a positive selection marker.

A second intermediate cell may be obtained from the first intermediate cell (see step S3 in FIG. 1). The second intermediate cell can be prepared by removing the second cassette from the first intermediate cell. The cassette can be removed by specifically cleaving the target nucleic acid sequence inside the second cassette in the presence of a donor DNA (cassette removal donor DNA) that preferably includes an upstream homology arm capable of homologous recombination with the upstream of the second cassette and a downstream homology arm capable of homologous recombination with the downstream of the cassette. By cleaving the target nucleic acid sequence in the presence of a donor DNA that includes an upstream homology arm capable of homologous recombination with the upstream of the cassette and a downstream homology arm capable of homologous recombination with the downstream of the cassette, the upstream and downstream of the cassette can be seamlessly linked when removing the cassette. The second intermediate cell thus obtained has a cassette including a selection marker gene and a target nucleic acid sequence in the first allele, but does not include the second cassette.

The library of the present disclosure can be prepared from the first intermediate cell and the second intermediate cell (see steps S2 and S4 in FIG. 1 for reference). The first intermediate cell and the second intermediate cell (collectively referred to as "intermediate cell") have a cassette containing a selection marker gene and a target nucleic acid sequence in the first allele. In the library preparation process, the first cassette can be removed and a modified base sequence can be introduced instead. That is, the first cassette can be replaced with a modified base sequence. This replacement can be performed by a genome modification system. In the presence of a modified base sequence introduction donor DNA (or library preparation donor DNA) that includes an upstream homology arm capable of homologous recombination with the upstream of the first cassette and a downstream homology arm capable of homologous recombination with the downstream of the cassette, the target nucleic acid sequence inside the first cassette can be specifically cleaved. The library preparation donor DNA has a modified base sequence between the upstream homology arm and the downstream homology arm. Basically, the region sandwiched between the region where the upstream homology arm on the genome undergoes homologous recombination and the region where the downstream homology arm undergoes homologous recombination is replaced with the sequence sandwiched between the upstream homology arm and the downstream homology arm of the library construction donor DNA, so a library construction donor DNA having the replaced sequence between the upstream homology arm and the downstream homology arm can be preferably used. Therefore, the cassette can be replaced with a modified base sequence by the above operation. The library construction donor DNA can be a DNA group containing various modified base sequences. In this way, the cassette of each intermediate cell can be replaced with various modified base sequences by the above operation. When the cassette is replaced, the negative selection marker gene in the cassette is removed, so that the absence of expression of the negative selection marker gene can be used as an indicator to obtain a cell in which the cassette has been replaced with a modified base sequence. In one embodiment, the library construction donor DNA is linear (not circular). In this way, there is an advantage in the ease of preparation of the library construction donor DNA (see, for example, disadvantage 2 in FIG. 4). The replacement of the modified base sequence of the cassette can be confirmed by a person skilled in the art using well-known conventional techniques, for example, by the presence or absence of cleavage by a restriction enzyme, the presence or absence of PCR amplification (e.g., junction PCR), or sequencing. The modified base sequence may be 0 bases long, i.e., non-existent, but preferably 1 base or more long. The modified base sequence is not particularly limited, but may be, for example, 10 to 1 million bases long, 10 to 500,000 bases long, 10 to 100,000 bases long, 10 to 20,000 bases long, 10 to 15,000 bases long, or 10 to 10,000 bases long. The modified base sequence is not particularly limited, but may be, for example, 3 bases long or more, 10 bases long or more, 30 bases long or more, 50 bases long, or 100 bases long or more. The modified base sequences of each donor DNA for library construction may be the same or different. The modified base sequences of each donor DNA for library construction may independently have any of the above lengths.

When the intermediate cell has three or more alleles in the target region, it is sufficient that it has a unique negative marker gene that allows at least the first allele to be distinguished. In this way, cassettes other than the first allele can be removed, or an operation can be performed to replace the cassette of the first allele with a modified base sequence while maintaining the cassettes other than the first allele. Therefore, the intermediate cell is a cell in which only the first allele has a unique negative marker gene that allows at least the first allele to be distinguished, and such a cell can be selected and used as the intermediate cell.

The method of the present disclosure does not have to solve any of the disadvantages 1 to 3 shown in FIG. 4, but preferably solves one or more of the disadvantages 1 to 3 shown in FIG. 4. Specifically, the method of the present disclosure has a higher efficiency of recombination of a foreign gene into a target genome than the GatewayTM method and the LoxP/Cre method. In a preferred embodiment of the present disclosure, the donor DNA for introducing a modified base sequence is linear and not circular. In a preferred embodiment of the present disclosure, the modified cell does not have a recognition site for a site-specific recombinase. In a preferred embodiment of the method of the present disclosure, the donor DNA for introducing a modified base sequence is linear and the modified cell does not have a recognition site for a site-specific recombinase. The characteristic of not having a recognition site for a site-specific recombinase is beneficial, for example, when seamlessly linking an introduction cassette and a genome.

The library of the present disclosure includes:
A combination of a plurality of aqueous compositions,
Each aqueous composition comprises one type of modified cell;
Each modified cell has a first allele and a second allele at a locus to be modified (a target locus or a locus of interest);
Each modified cell has a cassette containing a DNA fragment that is different between aqueous compositions at the same position of the first allele. Such a library can be obtained as follows. For example, the target nucleic acid sequence of the first cassette of the intermediate cell is cleaved in the presence of a library-making donor DNA containing various modified base sequences. Then, the first cassette of the intermediate cell is replaced with the modified base sequence by the DNA damage repair mechanism provided in the cell. The cell having the modified base sequence can be subjected to single cell cloning. Single cell cloning may include forming a large number of droplets or aqueous compositions containing one cell, subjecting them to culture, and producing a cell clone derived from one cell in the droplet or aqueous composition. In this way, a plurality (or a large number) of aqueous compositions containing cell clones are obtained. A combination of such a plurality (or a large number) of aqueous compositions can be used as a library of modified cells.

In some embodiments of the intermediate or modified cells, the first allele is missing a portion or all of the target region (initially in the genome). In some embodiments of the intermediate or modified cells, the second allele is missing a portion or all of the target region. In some preferred embodiments of the intermediate or modified cells, the first allele and the second allele are missing a portion or all of the target region, more preferably all of the target region.

In one embodiment of the first intermediate cell, the first allele has the entire target region replaced by the first cassette. In one embodiment of the first intermediate cell, the second allele has the entire target region replaced by the second cassette. In one preferred embodiment of the first intermediate cell, the first allele and the second allele have the entire target region replaced by the first cassette and the second cassette, respectively.

In the second intermediate cell, the second cassette is removed. In a preferred embodiment, in the second intermediate cell, the upstream and downstream of the target region of the second allele (initially on the genome) are seamlessly linked. Seamless linking means that the upstream and downstream are linked without the addition of new bases. In the seamless linking of the upstream and downstream, it is preferred that the upstream and downstream are linked without the addition or loss of bases.

In one embodiment of the modified cell, the first allele comprises a modified base sequence, and the modified base sequence has one or more mutations selected from the group consisting of addition, insertion, substitution, deletion, and deletion of bases in a target region (also called a replaced sequence) (on the original genome). The modified cell can be advantageously used to compare with the original cell (reference cell or reference cell) and evaluate the effect of the mutation. In addition, since the library contains various cells that differ in mutations, it is advantageous to evaluate the function of each mutation site in the target region by comparing between cells.

In one embodiment of the first modified cell, the second allele comprises a cassette containing a selection marker. The selection marker may comprise a positive selection marker gene. In one embodiment of the second modified cell, the second allele is seamlessly linked upstream and downstream of the replaced sequence. In a preferred embodiment of the modified cell, the replaced sequence of the first allele and the replaced sequence of the second allele have corresponding sequences. Having corresponding sequences means that the start and end points of the sequences are at the same position on the genome. The corresponding sequences typically have a high identity (e.g., 80% or more, 90% or more, or 95% or more) and are the same length.

In one embodiment, each of the cassette sequences in the modified cells consists of one or more modified portions (A) and one or more unmodified portions (B) (see, for example, FIG. 3). Each of the modified portions (A) has one or more modifications selected from the group consisting of sequence insertion, deletion, and substitution, and the modifications (A) of the one or more modified portions differ between each aqueous composition in terms of the position or content of the modification. Each of the one or more unmodified portions (B) is identical to the sequence of the corresponding site before modification. Here, when the pre-modification sequence replaced by the insertion cassette and the sequence in the insertion cassette are aligned at the same position, the two nucleic acid sequences become the sequences of the corresponding sites. The unmodified portion (B1) on the centromere side of the cassette is seamlessly linked to the adjacent sequence (C1) on the centromere side of the cassette, and the unmodified portion (Bt) on the telomere side of the cassette is seamlessly linked to the adjacent sequence (C2) on the telomere side of the cassette, and the region where the adjacent sequence (C1) and the unmodified portion (B1) are linked, and the region where the unmodified portion (Bt) and the adjacent sequence (C2) are linked may constitute the same sequence as the sequence of the corresponding region before modification. To incorporate such a cassette into DNA, an intermediate cell may be modified using a donor DNA for library production having the structure of the cassette between an upstream homology arm and a downstream homology arm. In a preferred embodiment, the total length of the modified portion (A) may be 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, or 5% or less of the length of the insertion cassette. A library containing modified cells of this embodiment may be a library having different mutations at different positions within the same region, and may be preferably used, for example, to investigate which mutation at which position changes cell activity, or to screen for cells having desired properties. The donor DNA for introducing a modified base sequence may have the above-mentioned cassette structure between the upstream homology arm and the downstream homology arm. The donor DNA for introducing a modified base sequence may be included in a library of donor DNA for introducing a modified base sequence having different cassette structures.

In some embodiments, the modified base sequence in the modified cell consists entirely of mutations.

In one embodiment, there is no recombination sequence (recognition site) for a site-specific recombinase inside or outside (near) the insertion cassette. In one embodiment, other than the insertion cassette, no modification is made or the base sequence is the same as that of the cell before modification. In this way, modified cells that do not contain modifications other than the desired modification can be obtained, and unanticipated effects of modifications other than the desired modification can be eliminated (see, for example, drawback 3 in Figure 4).

The library is not particularly limited, but preferably has 4 or more types, 5 or more types, 6 or more types, 7 or more types, 8 or more types, 9 or more types, 10 or more types, 11 or more types, 12 or more types, 13 or more types, 14 or more types, 15 or more types, 16 or more types, 17 or more types, 18 or more types, 19 or more types, 20 or more types, 25 or more types, 30 or more types, 35 or more types, 40 or more types, 45 or more types, 50 or more types, 60 or more types, 70 or more types, 80 or more types, 90 or more types or more than 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000 types of modified cells (or aqueous compositions containing modified cells).

A library typically includes a combination of multiple aqueous compositions. Each aqueous composition includes one type of cell. In some embodiments, a library includes separate combinations of multiple aqueous compositions. In some embodiments, a library may include a combination of multiple aqueous compositions as a mixture depending on the purpose. For example, when screening for cells with high cell proliferation or viability, a library may include a combination of multiple aqueous compositions as a mixture. Even in such a case, by analyzing the cells after culture, the cells with the highest proliferation or viability can be enriched and the cells with high proliferation or viability can be obtained.

In one embodiment, the genomic sequences of each of the modified cells contained in a library are designed to be identical, except for the modified base sequence (or DNA fragment). In one embodiment, the genomic sequences of each of the modified cells contained in a library are substantially identical, except for the modified base sequence (or DNA fragment). Being substantially identical allows for the presence of differences due to mutations that may occur between cells after simply subculturing cloned cells 10 times in an environment suitable for cell culture (normal environment).

Because the elements other than the modified base sequence are the same, it is suitable for evaluating the effects of the modified base sequence, for example. By removing the cassette in the second allele, the possibility of an effect due to the cassette can be minimized. By seamlessly removing the cassette in the second allele, the possibility of an effect due to the introduction of additional new bases into the second allele can be minimized.

In another embodiment, the cassette (second cassette) in the second allele in the intermediate cell may also be replaced by the second modified base sequence. The second modified sequence may be a sequence common to the modified cells (i.e., the same sequence across the modified cells) or may be a sequence that differs for each aqueous composition. In some cases, the second modified sequence may differ for each cell even in the same aqueous composition. The second modified sequence may be the same as or different from the modified base sequence in the first allele (first modified base sequence). The second modified base sequence may be designed in the same manner as the modified base sequence in the first allele, and may be introduced in the same manner as the introduction of the modified base sequence into the first allele. The second cassette may be replaced by the second modified base sequence by specifically cleaving the vicinity of the second cassette (preferably the target nucleic acid sequence in the second cassette) in the presence of the second library preparation donor DNA. In this specification, the content of the second modified base sequence and the method of its introduction are the same as those of the first modified base sequence, so please refer to this explanation and the explanation will be omitted here.

Application example 1
Application example 1 is an application example to the analysis of a specific region of a genome. According to the present disclosure, it is possible to identify important bases in a specific region of a genome by introducing various mutations into the base sequence of the specific region of a genome and observing the gain or loss of function of the specific region due to the mutation. Examples of the specific region include a region of unknown function, a promoter region, an enhancer region, a region corresponding to an intron, a region corresponding to a 5' untranslated region (UTR), a region corresponding to a 3' untranslated region (UTR), and a region encoding a non-coding RNA.

Application example 2
Application Example 2 is an application example for regulating the expression level of a protein or RNA. In this application example, a region involved or suspected to be involved in regulating the expression level of the protein or RNA, such as a transcriptional control region of the protein or RNA (including a region suspected to be involved in transcriptional control) and a translational control region of the protein (including a region suspected to be involved in translational control), is modified to regulate the expression level of the protein. Thereby, in Application Example 2, a modified cell in which the expression level of the protein or RNA is regulated can be obtained. The regulation can be an increase or decrease in the expression level. The RNA can be mRNA, tRNA, rRNA, or other non-coding RNA (e.g., microRNA).

Application example 3
Application Example 3 is an application example to a coding region of a protein or RNA. According to the present disclosure, it is possible to identify important amino acids or important sequences in the function of the protein or RNA by introducing various mutations into the region encoding the protein or RNA and observing the functional modification of the protein or RNA due to the mutation (e.g., gain or loss of function). In addition, for example, by observing the gain or loss of function of the protein or RNA due to the mutation, it is possible to obtain a mutant protein or RNA having an improved or reduced function or a new function and a modified cell expressing the mutant protein or RNA. When various modifications are made to a part or all of a protein or RNA, it is desirable to seamlessly link a part or all of the protein or RNA including the modified site in frame to the region encoding the protein or RNA. The RNA may be mRNA, tRNA, rRNA, or other non-coding RNA (e.g., microRNA). According to Application Example 3, in combination with Application Example 2 above, a modified cell expressing the mutant protein or RNA, in which the expression level is regulated, can also be obtained.

Application example 4
Application example 5 is an application to screening of cells with high proliferation or viability. According to the present disclosure, a library containing various modified cells having different mutations in genomic regions that may be involved in the proliferation or viability of cells can be obtained. Such a library may contain separate aqueous compositions containing various types of cells, or may be a mixture of aqueous compositions containing various types of cells. It is preferable that the mixture contains equal amounts of each modified cell. By culturing the mixture in an environment suitable for cell culture or in the presence of selective pressure on the cells, the relative concentration of cells with high proliferation or viability is increased, and the relative concentration of cells with low proliferation or viability is decreased. Therefore, after culture, cells with high proliferation or viability are concentrated, which is advantageous in that these cells can be obtained. Screening can also be performed under conditions where a specific selective pressure is applied. In this way, cells with high proliferation or viability against the specific selective pressure can be screened. The selective pressure is not particularly limited, but examples thereof include poor nutrition, high salt concentration, low salt concentration, high temperature, low temperature, low oxygen, and the presence of drugs (e.g., physiologically active substances such as poisons and antibiotics). Modification of an existing gene in the genome can be achieved by replacing the gene with a modified gene, or by simply inserting a modified nucleotide sequence into a safe harbor region (e.g., the AAVS1 locus, the ROSA26 locus, the CLBYL locus, the CXCR4 locus, and the CCR5 locus, etc.).

In one embodiment, the present invention provides a cell in which two or more alleles of a chromosomal genome have been modified, with each of the two or more alleles having a mutually different (distinguishable) selectable marker gene. In one aspect, the cell may be a cell of a unicellular organism. In one aspect, the cell may be an isolated cell. In one aspect, the cell may be a cell selected from the group consisting of a pluripotent cell and a pluripotent stem cell (such as an embryonic stem cell and an induced pluripotent stem cell). In one aspect, the cell may be a tissue stem cell. In one aspect, the cell may be a somatic cell. In one aspect, the cell may be a germline cell (e.g., a germ cell). In one aspect, the cell may be a cell line. In one aspect, the cell may be an immortalized cell. In one aspect, the cell may be a cancer cell. In one aspect, the cell may be a non-cancerous cell. In one aspect, the cell may be a cell of a diseased patient. In one aspect, the cell may be a cell of a healthy individual. In one embodiment, the cell may be an animal cell (e.g., a human cell), such as an insect cell (e.g., a silkworm cell), HEK293 cell, HEK293T cell, Expi293F™ cell, FreeStyle™ 293F cell, Chinese hamster ovary cell (CHO cell), CHO-S cell, CHO-K1 cell, and ExpiCHO cell, and cells derived from these cells. In one preferred embodiment, in the above cell, all alleles of the target region of the chromosomal genome are modified, and the modified regions each have a different (distinguishable) selection marker gene from each other.

In one embodiment, a method for culturing cells is provided in which two or more alleles of a chromosomal genome have been modified, and each of the two or more alleles has a mutually different (distinguishable) selection marker gene. When the selection marker genes are drug resistance marker genes, the cells can be cultured in the presence of a drug against each of the drug resistance marker genes. The cells can be cultured under conditions suitable for the maintenance or growth of the cells.

In one embodiment, the present invention provides a non-human organism having a chromosomal genome with two or more modified alleles, each of which has a selectable marker gene that is different from the other alleles. In one aspect, the cell may be a cell of a unicellular organism. In some aspects, the non-human organism is a yeast (e.g., a fission yeast or budding yeast, e.g., a species of the genus Saccharomyces, such as Saccharomyces cerevisiae, Saccharomyces carlsbergensis, Saccharomyces fragilis, Saccharomyces rouxii, a species of the genus Candida, such as Candida utilis, Candida tropicalis, a species of the genus Pichia, a species of the genus Kluyveromyces, a species of the genus Yarrowia ... In one embodiment, the non-human organism may be a yeast selected from the group consisting of yeasts of the genera Arrowia, Hansenula, and Endomyces. In one embodiment, the non-human organism may be a filamentous fungus (e.g., Aspergillus, Trichoderma, Humicola, Acremonium, Fusarium, and Penicillium). In one embodiment, the non-human organism may be a multicellular organism. In one embodiment, the non-human organism may be a non-human animal. In one embodiment, the non-human organism may be a plant. In one preferred embodiment, all alleles of the target region of the chromosomal genome of the non-human organism are modified, and the modified regions each have a selectable marker gene that is different from each other (distinguishable).

In the cell, one or more of the desired genes, such as genes necessary for cell survival or proliferation, may be contained or collected in another region of the chromosomal genome. The other region may be, for example, a safe harbor region (e.g., the AAVS1 locus, the ROSA26 locus, the CLBYL locus, the CXCR4 locus, and the CCR5 locus). The other region may be, for example, a region of (ii) above having a deletion.

As shown in FIG. 1, a first intermediate cell is obtained from a pre-modified cell (referred to as a reference cell). The process of obtaining the first intermediate cell from the pre-modified cell is called step S1. A library of first modified cells (hereinafter simply referred to as the "first library") can be produced from the first intermediate cell. The process of obtaining the first library from the first intermediate cell is called step S2. A second intermediate cell can be obtained from the first intermediate cell. The process of obtaining the second intermediate cell from the first intermediate cell is called step S3. A library of second modified cells (hereinafter simply referred to as the "second library") can be produced from the second intermediate cell. The process of obtaining the second library from the second intermediate cell is called step S4.

Prepare reference cells. Reference cells are cells to be made into a library. Reference cells are, for example, eukaryotic cells, and can be used to create the library of the present disclosure. Reference cells may be natural cells, or may be modified cells. Reference cells are typically diploid, but may also be triploid or higher.

For example, a first intermediate cell can be produced as shown in FIG. 2. Specifically, the target region on the genome of the reference cell is replaced with a cassette containing a selection marker gene. In FIG. 2, two types of donor DNA and the target region are subjected to homologous recombination. The donor DNA contains a distinguishably different drug selection marker gene (for positive selection) and a distinguishably different visualization marker gene (for negative selection), and each has a target sequence (gRNA1-4) by CRISPR/Cas9 at both ends. After homologous recombination, in order to select cells in which fragments derived from two types of donor DNA have replaced the target sequences on the paternal and maternal chromosomes, respectively, drug selection is performed with two types of drugs (see step (1) in FIG. 2). Cells that survive after selection are cells (first intermediate cells) that have distinguishably different drug resistance genes in the target regions of the two alleles. The first intermediate cells can be subjected to single cell cloning. It can be confirmed whether the cassette containing the selection marker gene has been inserted at the desired position.

Next, either the paternally or maternal cassette is removed. For example, as shown in FIG. 2, the target sequences (gRNA1 and gRNA2) located at both ends of the cassette that replaced the paternally derived target region can be cut by the CRISPR/Cas9 system in the presence of the cassette removal donor DNA. The cassette removal donor DNA consists of an upstream homology arm capable of homologous recombination with the upstream of the target region on the paternally derived allele and a downstream homology arm capable of recombination with the downstream of the target region. Then, by selecting cells in which the GFP signal has disappeared using a cell sorter, a second intermediate cell can be obtained that has a genome in which the upstream and downstream of the target region in the paternally derived allele are seamlessly linked (i.e., the entire target region has been lost).

A library of second modified cells (second library) can be prepared from the second intermediate cells. The target sequences (gRNA3 and gRNA4) located at both ends of the maternal cassette of the second intermediate cells can be cleaved by the CRISPR/Cas9 system in the presence of donor DNA for introducing modified base sequences (donor DNA for library preparation). The donor DNA for introducing modified base sequences has an upstream homology arm capable of homologous recombination with the upstream of the target region on the maternal allele and a downstream homology arm capable of recombination with the downstream of the target region, and contains a modified base sequence between the upstream homology arm and the downstream homology arm. In this way, modified cells having a genome containing a modified base sequence between the upstream and downstream of the target sequence in the maternal allele can be obtained. By preparing donor DNA for introducing modified base sequences having different modified base sequences, modified cells having different modified base sequences can be obtained, thereby obtaining a second library.

As described above, the donor DNA for introducing a modified base sequence consists of one or more modified portions (A) and one or more unmodified portions (B), and other than the modified portion (A), it can have a sequence that is the same as the sequence of the corresponding region of the genome before modification.

Claims

1. A library of modified cells, comprising:
The library comprises a combination of a plurality of aqueous compositions,
Each aqueous composition comprises one type of modified cell;
Each of the modified cells has a first allele and a second allele at a locus to be modified;
Each of the modified cells has a cassette containing a DNA fragment that differs from each other between the aqueous compositions at the same position of the first allele.
Library.
The library of claim 1, wherein each modified cell has a second allele that is partially or completely disrupted or deleted.
The library of claim 1 or 2, wherein the second allele seamlessly lacks the part or all of the sequence.
The library according to any one of claims 1 to 3, wherein the sequences of each modified cell other than the cassette containing the DNA fragment are substantially identical before and after modification.
　The library according to any one of claims 1 to 4, wherein each of the sequences of the cassettes is composed of one or more modified portions (A) and one or more unmodified portions (B), each of the modified portions (A) has one or more modifications selected from the group consisting of sequence insertion, deletion, and substitution, the modifications of the one or more modified portions differ between the aqueous compositions in terms of the position or content of the modification, each of the one or more unmodified portions (B) is identical to the sequence of the corresponding site before modification, the unmodified portion (B1) on the centromere side of the cassette is seamlessly linked to the adjacent sequence (C1) on the centromere side of the cassette, the unmodified portion (Bt) on the telomere side of the cassette is seamlessly linked to the adjacent sequence (C2) on the telomere side of the cassette, and the region where the adjacent sequence (C1) and the unmodified portion (B1) are linked, and the region where the unmodified portion (Bt) and the adjacent sequence (C2) are linked constitute the same sequence as the sequence of the corresponding region before modification.
The library according to any one of claims 1 to 5, wherein the modified cells do not contain a target sequence for a site-specific recombinase.
The library according to any one of claims 1 to 6, wherein the library contains 50 or more types of aqueous compositions.
1. A method for producing a library of modified cells, comprising:
(α) providing a group of cells having a genome including a first allele and a second allele at a locus to be modified, the first allele and the second allele each including a cassette including a selection marker gene and a target nucleic acid sequence;
wherein the selection marker gene carried by the first allele and the selection marker gene carried by the second allele are distinguishably different, the target nucleic acid sequence is a target of a genome modification system and is designed so that the first allele and the second allele can be distinguishably cleaved by the genome modification system, and each selection marker gene is a negative selection marker gene that can be used for negative selection,
(β) introducing into the provided group of cells:
(x) a sequence-specific nucleic acid cleavage molecule that targets the unique base sequence contained in the first allele, or a genome modification system comprising a polynucleotide encoding the sequence-specific nucleic acid cleavage molecule;
(y) a plurality of types of second recombination donor DNAs {wherein each of the plurality of types of second recombination donor DNAs has an upstream homology arm having a base sequence homologous to a base sequence adjacent to the upstream side of the target site of (x) above, and a downstream homology arm having a base sequence homologous to a base sequence adjacent to the downstream side of the target region, and contains a modified base sequence between the upstream homology arm and the downstream homology arm, and the modified base sequence is different for each second recombination donor DNA and is unique to each second recombination donor DNA},
(γ) after the step (β), selecting cells that do not express the selection marker gene contained in the first allele;
Including,
This allows for the production of a library of modified cells comprising a plurality of cells, wherein in the plurality of cells obtained, a first allele has a modified base sequence unique to each cell, and a second allele has a sequence common to the cells.
Method.
The method according to claim 8, further comprising the steps of:
In a cell having a genome including a first allele and a second allele at a locus to be modified, replacing the replaced sequence included in the first allele and the second allele with a cassette including a selection marker gene and a target nucleic acid sequence, thereby removing the replaced sequence from the first allele and the second allele;
The method further comprising:
10. The method of claim 9,
A method in which each of the modified base sequences of the first allele has one or more mutations selected from the group consisting of base addition, insertion, substitution, deletion, and deletion relative to the replaced sequence of the first allele.
The method according to any one of claims 8 to 10,
The modified sequence is a coding region for a protein,
A method in which the modified base sequence of the first allele has one or more mutations selected from the group consisting of addition, insertion, substitution, deletion, and deletion of bases relative to the replaced sequence of the first allele.
The method according to claim 10 or 11, wherein the modified base sequence has a sequence identity of 80% or more with the modified sequence.
The method according to any one of claims 8 to 12, further comprising, between step (α) and step (β),
The method further comprising removing the cassette from the second allele.
The method of claim 13, wherein the cassette is seamlessly removed.
A library of modified cells comprising a plurality of modified cells, produced by the method of any one of claims 8 to 14.