CN114395585A - Compositions for base editing - Google Patents

Compositions for base editing Download PDF

Info

Publication number
CN114395585A
CN114395585A CN202210031173.4A CN202210031173A CN114395585A CN 114395585 A CN114395585 A CN 114395585A CN 202210031173 A CN202210031173 A CN 202210031173A CN 114395585 A CN114395585 A CN 114395585A
Authority
CN
China
Prior art keywords
cbe
gene
encoding
cas9n
aav
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210031173.4A
Other languages
Chinese (zh)
Other versions
CN114395585B (en
Inventor
张学礼
毕昌昊
王玉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Institute of Industrial Biotechnology of CAS
Original Assignee
Tianjin Institute of Industrial Biotechnology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Institute of Industrial Biotechnology of CAS filed Critical Tianjin Institute of Industrial Biotechnology of CAS
Priority to CN202210031173.4A priority Critical patent/CN114395585B/en
Publication of CN114395585A publication Critical patent/CN114395585A/en
Application granted granted Critical
Publication of CN114395585B publication Critical patent/CN114395585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • C07K14/4701Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
    • C07K14/4702Regulators; Modulating activity
    • C07K14/4703Inhibitors; Suppressors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1131Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against viruses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N7/00Viruses; Bacteriophages; Compositions thereof; Preparation or purification thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1276RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04002Adenine deaminase (3.5.4.2)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2510/00Genetically modified cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14121Viruses as such, e.g. new isolates, mutants or their genomic sequences
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Toxicology (AREA)
  • Immunology (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present invention relates to compositions for base editing in the field of AAV delivery split base editors and uses thereof. The composition for constructing the recombinant adeno-associated virus vector and/or for base editing provided by the invention can be CBE-C, ABE-C or PE-C, wherein CBE-C is a composition for constructing a Cytosine Base Editor (CBE), ABE-C is a composition for constructing an Adenine Base Editor (ABE), and PE-C is a composition for constructing a guide editor (PE). The novel split Cytosine Base Editor (CBE), Adenine Base Editor (ABE) and PE base editor (PE) provided by the invention are loaded into an AAV (adeno-associated virus) vector and can be successfully delivered to target cells to realize gene Editing of specific sites.

Description

Compositions for base editing
Technical Field
The present invention relates to compositions for base editing in the field of AAV delivery split base editors and uses thereof.
Background
The single base editing system is a novel base replacement technology developed based on CRISPR/Cas9 technology, and through fusion of an inactive or single-strand cleavage active Cas protein and a base deaminase, directional mutation of a base can be caused under the guidance of a specific gRNA, but double-strand breakage is not caused, and the editing efficiency is higher. Currently common base editors are the Cytosine Base Editor (CBE), which can mutate the base C to T, the Adenine Base Editor (ABE), which can mutate the base a to G, and the GBE base editor, which can effect the conversion of bases C to G in mammalian cells. Wherein, the CBE base editor can introduce a premature stop codon into the gene, thereby achieving the purpose of gene knockout. Currently, a guide editor (prime editor) can realize the conversion between any base of a specific site, mainly comprises a Cas9 protein fusion reverse transcriptase RT with single strand cleavage activity, the gRNA is modified, and a mutated template and a Primer Binding Site (PBS) are inserted into the C end of the gRNA to obtain prime editing guide RNA (pegRNA). The PE system targets a specific site in a genome through the pegRNA, and a non-complementary strand cut by Cas9n-H840A nickase is combined with PBS carried by the pegRNA and extends along a reverse transcription template (RT template) under the action of reverse transcriptase, so that conversion, insertion and deletion of specific bases are realized at the target site.
Diseases caused by base mutation occupy a great proportion of human genetic diseases, and therefore, the development and application of a base editor can make a great contribution to the treatment of the diseases. At present, the base editing technology is widely applied to the fields of gene therapy, animal model construction, accurate animal breeding, gene function analysis and the like, and a powerful technical tool is provided for basic and application research. The gene therapy vector is an important vector for introducing a foreign gene into a target cell in vivo and exerting its function, and its carrying capacity and introduction efficiency seriously affect the therapeutic effect, and it can be classified into viral vectors and non-viral vectors. Viral vectors are used to treat diseases by introducing a target gene into a recipient cell by a virus. Has been widely used in cell, animal model and clinical application. Gov shows on clinical trials, the most commonly used in vivo gene therapy vector is AAV. AAV is a non-pathogenic virus, has been shown to be less immunogenic than other viral vectors, and is a relatively safe gene therapy vector. In addition, AAV-based gene therapy is also expected to achieve long-lasting and even life-long clinical benefits after a single administration, also referred to as "one-time therapy". AAV is preferably administered directly to the target tissue, i.e., directly into the relevant focal site or desired biologically active site.
AAV belongs to the genus dependovirus of the family parvoviridae, is a single-stranded DNA-deficient virus that has no autonomous replication ability and needs to replicate with the aid of a helper virus such as adenovirus or herpes virus. AAV has a diameter of 18-26nm, has an overall genome length of about 4.7kb, and is composed of two Inverted Terminal Repeats (ITRs) and two genes, i.e., a rep gene and a cap gene. The ITRs are the only cis-acting elements located in the AAV genome, are self-structures necessary for AAV viral packaging, and primarily function to direct gene replication and direct viral assembly. A recombinant AAV (rAAV) is a vector which is produced by replacing rep and cap genes with therapeutic exogenous transgene and only keeping ITR sequences at two ends by genetically engineering naturally-occurring AAV, and can be used for artificial transgene. The length limit of the exogenous genome in the rAAV is about 4.7kb, and if the virus toxicity rate is remarkably reduced, the major limiting factor influencing the AAV wide application is also the length limit of the exogenous genome in the rAAV, so that how to insert the CRISPR/Cas9 system or the base editor into the AAV vector to realize efficient virus assembly and deliver the virus to a target point to play the function of the target point is one of the important problems in the current gene therapy. The current commonly used base editors are mainly CBE, ABE, GBE and guided editing PE systems. The main elements of the base editors CBE, ABE and GBE comprise deaminase, Cas9 nuclease, uracil DNA glycosylase inhibitor (UGI) and the like, the PE system comprises Cas9 protein and reverse transcriptase, the lengths of the Cas9 protein and the reverse transcriptase are all more than 5.5kb, and the gRNA sequence and the promoter region thereof far exceed the capacity of AAV. Therefore, there is an urgent need in the art to develop and optimize a system of AAV delivery base editors to achieve efficient gene editing.
Disclosure of Invention
The technical problem to be solved by the invention is how to improve the packaging efficiency of the adeno-associated virus vector, realize the delivery of a base editor and realize higher gene editing efficiency.
In order to solve the above technical problems, in a first aspect, the present invention provides a genomic composition for constructing a recombinant adeno-associated virus vector and/or for base editing, wherein: the composition is CBE-C, ABE-C or PE-C, the CBE-C is a composition for constructing a Cytosine Base Editor (CBE), the ABE-C is a composition for constructing an Adenine Base Editor (ABE), and the PE-C is a composition for constructing a guide editor (PE),
the CBE-C comprises CBE-C1, CBE-C2, CBE-C3, CBE-C4, CBE-C5 or CBE-C6,
the CBE-C1 consists of a CBE-C1-511 encoding gene and a CBE-C512-1368 encoding gene, the CBE-C1-511 encoding gene encodes a fusion protein named CBE-C1-511, the CBE-C1-511 encodes a fusion protein named Cas9 nickase (Cas9 nickase, Cas9N) named Cas9N-1-511, a cytosine deaminase and an intein N-terminal fragment, the CBE-C512-1368 encoding gene encodes a fusion protein named CBE-C512-1368, the CBE-C512-1368 encodes a fusion protein named CBE-C512-1368, the CBE-C512 nickase encodes a C-terminal fragment of intein, the C-terminal fragment of the Cas N-512-1368 and a Uracil Glycosylase Inhibitor (UGI), the Cas9 is encoded by the nucleotide sequence of the Cas-C511-C2331 encoding protein, the UGI encoding sequence of the Cas 802-2331-I protein The Cas9n-512-1368 is a protein encoded by an encoding gene whose encoding sequence is shown by nucleotides 2332-4902 of SEQ ID No.1,
the CBE-C2 consists of a CBE-C1-507 encoding gene and a CBE-C508-1368 encoding gene, the CBE-C1-507 encoding gene encodes a fusion protein named CBE-C1-507, the CBE-C1-507 is fused by an N-terminal fragment of a Cas9 nickase named Cas9N-1-507, the cytosine deaminase and an N-terminal fragment of the intein, the CBE-C508-1368 encoding gene encodes a fusion protein named CBE-C508-1368, the CBE-C508-1368 is fused by a C-terminal fragment of the intein, a C-terminal fragment of the Cas9 nickase named Cas9N-508-1368 and the Uracil Glycosylase Inhibitor (UGI), the Cas9N-1-507 is a protein encoded by a protein whose coding sequence is 2319-802 th nucleotide of SEQ ID No.1, the Cas9n-508-1368 is a protein encoded by an encoding gene of which the encoding sequence is shown as the 2320-4902 th nucleotide of SEQ ID No.1,
the CBE-C3 consists of a CBE-C1-503 encoding gene and a CBE-C504-1368 encoding gene, the CBE-C1-503 encoding gene encodes a fusion protein named CBE-C1-503, the CBE-C1-503 encodes a fusion protein named Cas9 nickase with the name of Cas9N-1-503, the cytosine deaminase and the N-terminal fragment of the intein are fused, the CBE-C504-1368 encoding gene encodes a fusion protein named CBE-C504-1368, the CBE-C504-1368 encodes a fusion protein named CBE-C504-1368 with the name of the C-terminal fragment of the intein, the C-terminal fragment of the Cas9 nickase with the name of Cas9N-504-1368 and the Uracil Glycosylase Inhibitor (UGI), the Cas9N-1-503 is a protein encoded by the gene 2307 th nucleotide at the No.1 ID No. 802, the Cas9n-504-1368 is a protein encoded by a coding gene of which the coding sequence is shown as the 2308-4902 th nucleotide of SEQ ID No.1,
the CBE-C4 consists of a CBE-C1-502 encoding gene and a CBE-C503-1368 encoding gene, the CBE-C1-502 encoding gene encodes a fusion protein named CBE-C1-502, the CBE-C1-502 is fused by the N-terminal fragment of Cas9 nickase named Cas9N-1-502, the cytosine deaminase and the N-terminal fragment of the intein, the CBE-C503-1368 encoding gene encodes a fusion protein named CBE-C503-1368, the CBE-C503-1368 is fused by the C-terminal fragment of the intein, the C-terminal fragment of Cas9 nickase named Cas9N-503-1368 and the Uracil Glycosylase Inhibitor (UGI), the Cas9N-1-502 is a protein encoded by 2304 th nucleotide 2304 of the coding sequence of SEQ ID No.1, the Cas9n-503-1368 is a protein encoded by a coding gene of which the coding sequence is shown as the 2305-4902 th nucleotide of SEQ ID No.1,
the CBE-C5 consists of a CBE-C1-501 encoding gene and a CBE-C502-1368 encoding gene, the CBE-C1-501 encoding gene encodes a fusion protein named CBE-C1-501, the CBE-C1-501 is formed by fusing an N-terminal fragment of Cas9 nickase named Cas9N-1-501, the cytosine deaminase and an N-terminal fragment of the intein, the CBE-C502-1368 encoding gene encodes a fusion protein named CBE-C502-1368, the CBE-C502-1368 is formed by fusing a C-terminal fragment of the intein, a C-terminal fragment of Cas9 nickase named Cas9N-502-1368 and the Uracil Glycosylase Inhibitor (UGI), the Cas9N-1-501 is a protein encoded by a gene expressed by the nucleotide 2301 of the coding sequence ID No.1, the nucleotide 802, the Cas9n-502-1368 is a protein encoded by a coding gene whose coding sequence is shown by nucleotides 2302-4902 of SEQ ID No.1,
the CBE-C6 consists of a CBE-C1-498 encoding gene and a CBE-C499-1368 encoding gene, the CBE-C1-498 encoding gene encodes a fusion protein named CBE-C1-498, the CBE-C1-498 encodes a fusion protein named CBE 9-nickase with the name Cas9N-1-498, the cytosine deaminase and the N-terminal fragment of the intein, the CBE-C499-1368 encoding gene encodes a fusion protein named CBE-C499-1368, the CBE-C499-1368 encodes a fusion protein with the name CBE-C499-1368, the CBE-C499-1368 encodes a protein with the name CBE-C-fragment of the intein, the name Cas 9N-499-nickase and the Uracil Glycosylase Inhibitor (UGI), the Cas 9-1-No is a protein encoded by the gene expressed as 2292 th nucleotide 2292 of the SEQ ID No.1, the Cas9n-499-1368 is a protein coded by a coding gene which is shown by the 2293-4902 th nucleotide of the sequence 1 in the sequence table;
the ABE-C consists of an ABE-C1-511 encoding gene and an ABE-C512-1368 encoding gene, the ABE-C1-511 encoding gene encodes a fusion protein with the name of ABE-C1-511, the ABE-C1-511 encodes a fusion protein with the name of Cas9 nickase (Cas9 nickase, Cas9N) with the name of Cas9N-1-511, adenine deaminase and the N-terminal fragment of intein, the ABE-C512-1368 encoding gene encodes a fusion protein with the name of ABE-C512-1368, the ABE-C512-1368 encodes a fusion protein with the name of Cas9N-512 and the C-terminal fragment of Cas9 nickase (Cas9 nickase, Cas9N), the ABE-C9N-1-511 encoding sequence is the protein encoded by the nucleotide No. 2739 of Cas 1210 of Cas 9-C273-1210, the Cas9n-512-1368 is a protein encoded by the encoding gene shown by the 2740-5310 th nucleotide of SEQ ID No. 3;
the PE-C comprises PE-C1, PE-C2 and PE-C3,
the PE-C1 consists of a PE-C1-1022 encoding gene and a PE-C1023-1368 encoding gene, the PE-C1-1022 encoding gene encodes a fusion protein named as PE-C1-1022, the PE-C1-1022 is fused by an N-terminal fragment of a Cas9 nickase (Cas9 nicke, Cas9N) named as Cas9N-1-1022 and an N-terminal fragment of the intein, the PE-C1023-1368 encoding gene encodes a fusion protein named as PE-C1023-1368, the PE-C1023-1368 is fused by a C-terminal fragment of the intein, a C-terminal fragment of a Cas9 nickase (Cas9 nicke, Cas9N) named as Cas9N-1023 1368 and a reverse transcriptase, the Cas 9-1-9N is a protein encoded by a Cas-C-C N of the sequence No. 22-3084 in the sequence table, the Cas9n-1023-1368 is a protein coded by a coding gene of which the coding sequence is represented by the 3085-4122 th nucleotide of the sequence 4 in the sequence table,
the PE-C2 consists of a PE-C1-1026 encoding gene and a PE-C1027-1368 encoding gene, the PE-C1-1026 encoding gene encodes a fusion protein named as PE-C1-1026, the PE-C1-1026 is fused by an N-terminal fragment of a Cas9 nickase (Cas9 nicke, Cas9N) named as Cas9N-1-1026 and an N-terminal fragment of the intein, the PE-C1027-1368 encoding gene encodes a fusion protein named as PE-C1027-1368, the PE-C1027-1368 is fused by a C-terminal fragment of the intein, a C-terminal fragment of a Cas9 nickase (Cas9 nicke, Cas9N) named as Cas9N-1027 and the Reverse Transcriptase (RT), the PE-C9N-1-1026 is a protein encoded by a Cas-3096-bit sequence of a Cas 9-3022 in a sequence table, the Cas9n-1027-1368 is a protein encoded by a coding gene which is shown by the coding sequence of the 3097-4122 th nucleotide of the sequence 1 in the sequence table,
the PE-C3 consists of a PE-C1-1068 encoding gene and a PE-C1069-1368 encoding gene, the PE-C1-1068 encoding gene encodes a fusion protein named as PE-C1-1068, the PE-C1-1068 is formed by fusing an N-terminal fragment of a Cas9 nickase (Cas9 nickase, Cas9N) named as Cas9N-1-1068 and an N-terminal fragment of the intein, the PE-C1069-1368 encoding gene encodes a fusion protein named as PE-C1069-1368, the PE-C1069-1368 is formed by fusing a C-terminal fragment of the intein, a C-terminal fragment of the Cas9 nickase (Cas9 nickase, Cas9N) named as Cas 9-1069-1368 and the reverse transcriptase, the PE-C3-1068 is formed by fusing a C-terminal fragment of the Cas9 nickase (Cas9 nickase, Cas 9-13662) encoded by a nucleotide sequence of the Cas 3222-position sequence in the Cas sequence table, the Cas9n-1069-1368 is a protein encoded by a coding gene which is shown by the 3223-4122 th nucleotide of the sequence 1 in the sequence table.
In the present invention, the Cytosine Base Editor (CBE), the Adenine Base Editor (ABE) and the guide editor (PE) may be DNA molecules, such as vectors.
Further, in the above gene composition, the cytosine deaminase is A1) or A2)
A1) Protein coded by coding gene shown by 22-705 bit nucleotides of SEQ ID No. 1;
A2) protein coded by coding gene shown by 85-678 nucleotides in SEQ ID No. 2;
the adenine deaminase is protein coded by a coding gene shown by 22 th-1113 rd nucleotides in SEQ ID No. 3;
the Reverse Transcriptase (RT) is a protein coded by a coding gene shown by the 4222-6318 th nucleotide of SEQ ID No. 4;
the Uracil Glycosylase Inhibitor (UGI) is protein coded by a coding gene shown by nucleotides 5212-5460 of SEQ ID No. 1.
The amino acid sequence of the N-terminal segment of the intein is SEQ ID No.5, and the amino acid sequence of the C-terminal segment of the intein is SEQ ID No. 6.
Wherein, the cytosine deaminase described in A1) can BE used for a BE4max CBE base editor, and the cytosine deaminase described in A2) can BE used for an hA3A-BE3 base editor.
The CBE-C1-511 coding gene is formed by fusing the Cas9N-1-511 coding gene, the cytosine deaminase coding gene and the intein N-terminal fragment coding gene, and the CBE-C512-1368 coding gene is formed by fusing the intein C-terminal fragment coding gene, the Cas9N-512-1368 coding gene and the uracil glycosylation inhibiting enzyme coding gene.
The CBE-C1-507 coding gene is formed by fusing the coding gene of the Cas9N-1-507, the coding gene of the cytosine deaminase and the coding gene of the N-end fragment of the intein, and the CBE-C508-1368 coding gene is formed by fusing the coding gene of the C-end fragment of the intein, the coding gene of the Cas9N-508-1368 and the coding gene of the uracil glycosylation inhibiting enzyme.
The CBE-C1-503 coding gene is formed by fusing the coding gene of Cas9N-1-503, the coding gene of cytosine deaminase and the coding gene of N-terminal fragment of intein, and the CBE-C504-1368 coding gene is formed by fusing the coding gene of C-terminal fragment of intein, the coding gene of Cas9N-504-1368 and the coding gene of uracil glycosylation inhibitor.
The CBE-C1-502 encoding gene is formed by fusing the encoding gene of Cas9N-1-502, the encoding gene of cytosine deaminase and the encoding gene of the N-end fragment of the intein, and the CBE-C503-1368 encoding gene is formed by fusing the encoding gene of the C-end fragment of the intein, the encoding gene of Cas9N-503-1368 and the encoding gene of uracil glycosylation inhibitor.
The CBE-C1-501 encoding gene is formed by fusing the encoding gene of the Cas9N-1-501, the encoding gene of the cytosine deaminase and the encoding gene of the N-end fragment of the intein, the CBE-C502-1368 encoding gene is formed by fusing the encoding gene of the C-end fragment of the intein, the encoding gene of the Cas9N-502-1368 and the encoding gene of the uracil glycosylation inhibiting enzyme,
the CBE-C1-498 encoding gene is formed by fusing the Cas9N-1-498 encoding gene, the cytosine deaminase encoding gene and the intein N-terminal fragment encoding gene, and the CBE-C499-1368 encoding gene is formed by fusing the intein C-terminal fragment encoding gene, the Cas9N-499-1368 encoding gene and the uracil glycosylation inhibiting enzyme encoding gene.
The ABE-C1-511 coding gene is formed by fusing the Cas9N-1-511 coding gene, the adenine deaminase coding gene and the intein N-terminal segment coding gene, and the ABE-C512-1368 coding gene is formed by fusing the intein C-terminal segment coding gene and the Cas9N-512-1368 coding gene.
The PE-C1-1022 coding gene is formed by fusing the coding gene of the Cas9N-1-1022 and the coding gene of the N-terminal fragment of the intein, and the PE-C1023-1368 coding gene is formed by fusing the coding gene of the C-terminal fragment of the intein, the coding gene of the Cas9N-1023-1368 and the coding gene of the Reverse Transcriptase (RT).
The PE-C1-1026 coding gene is formed by fusing the coding gene of the Cas9N-1-1026 and the coding gene of the N-terminal fragment of the intein, and the PE-C1027-1368 coding gene is formed by fusing the coding gene of the C-terminal fragment of the intein, the coding gene of the Cas9N-1027-1368 and the coding gene of the Reverse Transcriptase (RT).
The PE-C1-1068 coding gene is formed by fusing the coding gene of the Cas9N-1-1068 and the coding gene of the N-terminal fragment of the intein, and the PE-C1069-1368 coding gene is formed by fusing the coding gene of the C-terminal fragment of the intein, the coding gene of the Cas9N-1069-1368 and the coding gene of the Reverse Transcriptase (RT).
In the invention, the coding gene of the N-terminal segment of the intein is a DNA molecule with a nucleotide sequence of SEQ ID No.9, and the coding gene of the C-terminal segment of the intein is a DNA molecule with a nucleotide sequence of SEQ ID No. 10.
The coding gene of the cytosine deaminase is a DNA molecule with the nucleotide sequence of 22 th to 705 th positions of SEQ ID No.1, or the coding gene of the cytosine deaminase is a DNA molecule with the nucleotide sequence of 85 th to 678 th positions of SEQ ID No. 2. The coding gene of the uracil glycosylation inhibiting enzyme is a DNA molecule (same as the 5212-5460-position of the SEQ ID No. 1) with the nucleotide sequence of the 4933-position 5181 of the SEQ ID No. 1. The coding gene of the adenine deaminase is a DNA molecule with the nucleotide sequence of 22-519 site and/or 616-1113 site of SEQ ID No. 3. The coding gene of the Reverse Transcriptase (RT) is a DNA molecule with the nucleotide sequence of 4222-6318 th site of SEQ ID No. 4.
The encoding gene of the Cas9n-1-511 is a DNA molecule with the nucleotide sequence of 802-2331 th position of SEQ ID No.1, and the encoding gene of the Cas9n-512-1368 is a DNA molecule with the nucleotide sequence of 2332-4902 th position of SEQ ID No. 1.
The encoding gene of the Cas9n-1-507 is a DNA molecule with the nucleotide sequence of the 802 th and 2319 th positions of the SEQ ID No.1, and the encoding gene of the Cas9n-508 and 1368 is a DNA molecule with the nucleotide sequence of the 2320 th and 4902 th positions of the SEQ ID No. 1.
The encoding gene of the Cas9n-1-503 is a DNA molecule with the nucleotide sequence of the 802 nd-2307 th position of the SEQ ID No.1, and the encoding gene of the Cas9n-504-1368 is a DNA molecule with the nucleotide sequence of the 2308 th-4902 th position of the SEQ ID No. 1.
The encoding gene of the Cas9n-1-502 is a DNA molecule with the nucleotide sequence of the 802 nd and 2304 th positions of the SEQ ID No.1, and the encoding gene of the Cas9n-503 and 1368 is a DNA molecule with the nucleotide sequence of the 2305 th and 4902 th positions of the SEQ ID No. 1.
The encoding gene of the Cas9n-1-501 is a DNA molecule with the nucleotide sequence of the 802 nd-2301 th position of the SEQ ID No.1, and the encoding gene of the Cas9n-502-1368 is a DNA molecule with the nucleotide sequence of the 2302 nd-4902 th position of the SEQ ID No. 1.
The encoding gene of the Cas9n-1-498 is a DNA molecule with the nucleotide sequence of the 802 nd and 2292 nd position of the SEQ ID No.1, and the encoding gene of the Cas9n-499 and 1368 is a DNA molecule with the nucleotide sequence of the 2293 nd and 4902 th position of the SEQ ID No. 1.
The encoding gene of the Cas9n-1-511 is a DNA molecule with the nucleotide sequence of 1210 th and 2739 th positions of SEQ ID No.3, and the encoding gene of the Cas9n-512 th and 1368 is a DNA molecule with the nucleotide sequence of 2740 th and 5310 th positions of SEQ ID No. 3.
The encoding gene of the Cas9n-1-1022 is a DNA molecule with the nucleotide sequence of SEQ ID No.4, 22 th to 3084 th; the encoding gene of the Cas9 n-1023-4122-shizandra 1368 is a DNA molecule with the nucleotide sequence of the No.4 3085-4122-position.
The encoding gene of the Cas9n-1-1026 is a DNA molecule with the nucleotide sequence of SEQ ID No.4, 22 th to 3096 th sites; the encoding gene of Cas9n-1027-1368 is a DNA molecule with the nucleotide sequence of position 3097-4122 of SEQ ID No. 4.
The encoding gene of the Cas9n-1-1068 is a DNA molecule with the nucleotide sequence of SEQ ID No.4 at the 22 nd to 3222 nd positions; the encoding gene of Cas9n-1069-1368 is a DNA molecule with the nucleotide sequence of 3223-4122 of SEQ ID No. 4.
In order to solve the above technical problems, in a second aspect, the present invention provides an expression cassette composition related to the above genome composition, wherein the expression cassette composition is any one of the following:
B1) an expression cassette comprising a gene encoding CBE-C1-511 as described above and an expression cassette comprising a gene encoding CBE-C512-1368 as described above;
B2) an expression cassette comprising a gene encoding CBE-C1-507 as described above and an expression cassette comprising a gene encoding CBE-C508-1368 as described above;
B3) an expression cassette comprising a gene encoding CBE-C1-503 as described above and an expression cassette comprising a gene encoding CBE-C504-1368 as described above;
B4) an expression cassette comprising a gene encoding CBE-C1-502 as described above and an expression cassette comprising a gene encoding CBE-C503-1368 as described above;
B5) an expression cassette comprising a gene encoding CBE-C1-501 as described above and an expression cassette comprising a gene encoding CBE-C502-1368 as described above;
B6) an expression cassette comprising a gene encoding CBE-C1-498 as described above and an expression cassette comprising a gene encoding CBE-C499-1368 as described above;
B7) an expression cassette comprising a gene encoding ABE-C1-511 as described above and an expression cassette comprising a gene encoding ABE-C512-1368 as described above;
B8) an expression cassette containing the gene encoding PE-C1-1022 described above and an expression cassette containing the gene encoding PE-C1023-1368 described above;
B9) an expression cassette containing the gene encoding PE-C1-1026 as described above and an expression cassette containing the gene encoding PE-C1027-1368 as described above;
B10) an expression cassette containing the gene encoding PE-C1-1068 and an expression cassette containing the gene encoding PE-C1069-1368.
In order to solve the above technical problem, in a third aspect, the present invention provides a carrier composition related to the above composition, wherein the carrier composition is any one of the following:
v1) a vector containing the above-mentioned CBE-C1-511 encoding gene and a vector containing the above-mentioned CBE-C512-1368 encoding gene;
v2) a vector containing the above CBE-C1-507 encoding gene and a vector containing the above CBE-C508-1368 encoding gene;
v3) a vector containing the gene encoding CBE-C1-503 as described above and a vector containing the gene encoding CBE-C504-1368 as described above;
v4) a vector containing the above-mentioned CBE-C1-502 encoding gene and a vector containing the above-mentioned CBE-C503-1368 encoding gene;
v5) a vector containing the above-mentioned CBE-C1-501 encoding gene and a vector containing the above-mentioned CBE-C502-1368 encoding gene;
v6) a vector containing the above-mentioned CBE-C1-498 encoding gene and a vector containing the above-mentioned CBE-C499-1368 encoding gene;
v7) a vector containing the above-mentioned gene encoding ABE-C1-511 and a vector containing the above-mentioned gene encoding ABE-C512-1368;
v8) a vector containing the gene encoding PE-C1-1022 and a vector containing the gene encoding PE-C1023-1368;
v9) a vector containing the above PE-C1-1026 encoding gene and a vector containing the above PE-C1027-1368 encoding gene;
v10) a vector containing the above-mentioned gene encoding PE-C1-1068 and a vector containing the above-mentioned gene encoding PE-C1069-1368.
Further, in the above vector composition, the vector for the gene encoding CBE-C1-511 described in V1), the vector for the gene encoding CBE-C1-507 described in V2), the vector for the gene encoding CBE-C1-503 described in V3), the vector for the gene encoding CBE-C1-502 described in V4), the vector for the gene encoding CBE-C1-501 described in V5), and the vector for the gene encoding CBE-C1-498 described in V6) further contain sgRNA gene;
v7) the vector encoding the ABE-C512-1368 gene further contains a sgRNA gene;
v8), the PE-C1023-1368 gene, V9), the PE-C1027-1368 gene or V10), and the PE-C1069-1368 gene.
In the present invention, the vector may be a recombinant adeno-associated virus vector.
In order to solve the above-mentioned problems, the present invention provides, in a fourth aspect, a recombinant microorganism comprising the above-mentioned gene composition or comprising the above-mentioned expression cassette composition or comprising the above-mentioned vector composition.
Further, in the above recombinant microorganism, the recombinant microorganism is selected from escherichia coli or adeno-associated virus.
In order to solve the above-mentioned problems, in a fifth aspect, the present invention provides a recombinant cell comprising the above-mentioned gene composition or comprising the above-mentioned expression cassette composition or comprising the above-mentioned vector composition.
Further, in the above recombinant cell, the cell may be a mammalian cell.
In the present invention, the mammalian cell may be a non-human mammalian cell.
In order to solve the above technical problems, in a sixth aspect, the present invention provides the use of the above gene composition or the above expression cassette composition or the above vector composition for constructing a recombinant adeno-associated virus vector and/or for base editing.
Recombinant adeno-associated virus (adeno-associated virus particle) containing the composition is also the protection scope of the invention.
The invention carries out splitting of different sites on Cas9 protein aiming at different base editing systems, wherein a CBE system splits Cas9 protein at 6 sites of 511, 507, 503, 502, 501, 498 and the like, the N end of an intein is fused and expressed at the tail of the N end, the C end of the intein is fused and expressed at the beginning of the C end, and U6 and a gRNA sequence thereof are reversely placed at the C end position of a Cas9N vector. The ABE system has good activity after being split at the 511 site of Cas9 protein, and simultaneously, the U6 and the gRNA sequence thereof are reversely placed at the C-terminal position of Cas9C vector. The split sites are also different because the PE system is not as diverse as the elements of ABE and CBE. The PE system is split at amino acids 1022, 1026 and 1068 of the Cas9 protein, and then the N end of the intein is fused and expressed at the tail of the N end, and the C end of the intein is fused and expressed at the beginning of the C end respectively. U6 and its pegRNA were placed at the C-terminus of Cas9C vector, ensuring that the sequences inside the ITRs after resolution were around 4.7 kb. In AAV vector, in order to reduce ITR internal sequence, we simplify the promoter of PE system and select smaller promoter EF-1 alpha.
The base editor/guide editing system is loaded into an AAV (adeno-associated virus) vector by a splitting method, so that the delivery of the base editor in a target cell is successfully realized, and the editing of a specific site is realized and higher editing efficiency is obtained through experimental verification.
Drawings
FIG. 1 shows the editing efficiency of a split type BE4max base editor with a splitting site Cas9(D10A) 511.
FIG. 2 shows the editing efficiency of a split type BE4max base editor with a splitting site of Cas9(D10A) 507.
FIG. 3 shows the editing efficiency of a split type BE4max base editor with a split site of Cas9(D10A) 503.
FIG. 4 shows the editing efficiency of a split type BE4max base editor with a split site of Cas9(D10A) 502.
FIG. 5 shows the editing efficiency of a split type BE4max base editor with a split site of Cas9(D10A) 501.
FIG. 6 shows the editing efficiency of a split BE4max base editor with a Cas9(D10A)498 as the split site.
FIG. 7 shows the editing efficiency of the control group non-split type BE4max base editor.
FIG. 8 shows the editing efficiency of the base editor split-type hA3A-BE3 with Cas9(D10A)511 as the split site.
FIG. 9 shows the editing efficiency of the base editor split type hA3A-BE3 with the splitting site Cas9(D10A) 507.
FIG. 10 shows the editing efficiency of the base editor split-type hA3A-BE3 with the splitting site Cas9(D10A) 503.
FIG. 11 shows the editing efficiency of the base editor split-type hA3A-BE3 with the splitting site Cas9(D10A) 502.
FIG. 12 shows the editing efficiency of the base editor split-type hA3A-BE3 with the splitting site Cas9(D10A) 501.
FIG. 13 shows the editing efficiency of the base editor split-type hA3A-BE3 with the splitting site Cas9(D10A) 498.
FIG. 14 shows the editing efficiency of the control non-resolved hA3A-BE3 base editor.
Figure 15 is the editing efficiency of a split ABE base editor with Cas9(D10A)511 at the split site.
FIG. 16 shows the editing efficiency of the control non-resolving ABE base editor.
Fig. 17 shows the editing efficiency of a split PE base editor with Cas9(H840A)1022 at the split site.
Figure 18 is the editing efficiency of a split PE base editor with Cas9(H840A)1026 at the split site.
Fig. 19 is the editing efficiency of a split PE base editor with Cas9(H840A)1068 at the split site.
FIG. 20 shows the editing efficiency of the undivided PE base editor of the control group.
FIG. 21 is a map of recombinant AAV-N viral vector AAV-N-BE4maxCas9C 1-502.
FIG. 22 is a map of a recombinant AAV-C viral vector AAV-C-BE4maxCas9C 503-1368.
FIG. 23 is a map of recombinant AAV-N viral vector AAV-N-hA3ABE3Cas9C 1-502.
FIG. 24 is a map of a recombinant AAV-N viral vector AAV-N-SpRYABECas9C 1-511.
FIG. 25 is a map of a recombinant AAV-C viral vector AAV-C-SpRYABECas9C 512-1368.
FIG. 26 is a map of a recombinant AAV-N viral vector AAV-N-PE-Cas9C1-1026-EF-1 α.
FIG. 27 is a map of a recombinant AAV-C viral vector AAV-C-PE-Cas9C1027-1368-pegRNA-EF-1 α.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The examples provided below serve as a guide for further modifications by a person skilled in the art and do not constitute a limitation of the invention in any way.
The experimental procedures in the following examples, unless otherwise indicated, are conventional and are carried out according to the techniques or conditions described in the literature in the field or according to the instructions of the products. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified. The quantitative tests in the following examples, all set up three replicates and the results averaged.
The pCMV _ BE4max _ P2A _ GFP of the present invention is purchased from Addgene, Inc., under the trade name # 112099.
The pCMV _ hA3A-BE3 of the present invention is purchased from Addgene, Inc. under the trade name # 113410.
pCMV-T7-ABEmax (7.10) -SpRY-P2A-EGFP (RTW5025) in the invention is purchased from Addgene company with the commodity number # 140003.
The pCMV-PE2 of the present invention was purchased from Addgene, Inc. under the designation # 132775.
HEK293T cells (indicated 293T in the examples) were stored in the laboratory and were disclosed in the documents Dongdong Zhao, Ju Li, Siwei Li, Xiuqing Xin, Muzi Hu, Marcus A Price, Susan J Rosser, Changhao Bi, Xueli Zhang New base reagents C to A in bacteria and C to G in mammalian cells Nature Biotechnology 2020-07, the above mentioned biomaterials were publicly available from the Applicant and were used only for experiments to replicate the invention and not for other uses.
AAV-N and AAV-C viral vectors of the present invention are purchased from Wuhan vast Ling Biotech, Inc., and the numbers P18828 and P18882, respectively.
Example 1 transfection of AAV vectors following CBE base editor resolution
1.1 transfection of AAV vectors after BE4max base editor resolution
In order to verify the influence of a new splitting site on the activity of the CBE base editor, the invention selects a pCMV _ BE4max _ P2A _ GFP base editor (abbreviated as BE4max base editor in the invention) for splitting. Wherein, the full-length plasmid of the BE4max base editor contains a DNA molecule with a nucleotide sequence shown as SEQ ID No.1, the 22 th to 705 th positions of the SEQ ID No.1 are coding genes of cytosine deaminase APOBEC-1, the 802 th and 4902 th positions are coding genes of Cas9(D10A), and the 4933 th and 5212 th positions are coding genes of Uracil Glycosylase Inhibitor (UGI).
The Cas9 proteins at BE4max were resolved at 6 sites 511, 507, 503, 502, 501, 498, respectively, and then analyzed for editing activity. In a mammalian cell, an N-terminal vector and a C-terminal vector of Cas9 protein at different splitting sites are co-transfected, and related editing activity analysis is carried out on a specific gRNA target site of ZNF410, and the result shows that the split Cas9 protein can have the function of full-length Cas9 protein after co-transfection, and can efficiently edit the ZNF410 gRNA target site.
The experimental process comprises the following steps: HEK293T cells at 5X 105And paving 24 pore plates per pore, and co-transfecting a plasmid expressing a Cas9N terminal vector and a plasmid expressing a Cas9C terminal vector according to the splitting site of the Cas9 when each pore cell grows to 40-60 percent to obtain the recombinant HEK293T cell. And (3) taking the full-length plasmid of the BE4max base editor as a template, designing a primer according to the cutting site, carrying out PCR amplification to obtain a PCR product containing a target fragment, and connecting the PCR product to the adeno-associated virus vector in a homologous recombination mode respectively. The method comprises the following specific steps: the PCR product containing Cas9(N) encoding gene and cytosine deaminase encoding gene is connected to AAV-N viral vector by homologous recombination to obtain recombinant AAV-N viral vector AAV-N-BE4maxCas9C1-511, AAV-N-BE4maxCas9C1-507, AAV-N-BE4maxCas9C1-503, AAV-N-BE4maxCas9C1-502, AAV-N-BE4maxCas9C1-501, AAV-N-BE4maxCas9C 1-498; the PCR product containing the Cas9(C) encoding gene is respectively connected to the AAV-C viral vector by means of homologous recombination to obtain the recombinant AAV-C viral vector AAV-C-BE4maxCas9C512-1368, AAV-C-BE4maxCas9C508-1368, AAV-C-BE4maxCas9C504-1368, AAV-C-BE4maxCas9C503-1368, AAV-C-BE4maxCas9C502-1368, AAV-C-BE4maxCas9C 499-1368.
Spacer N20 targeting the gRNA of the target sequence was constructed by means of Golden gate into the recombinant AAV-N viral vector described above, using Bsmbi as the restriction enzyme recognition site, purchased from NEB. Wherein the target sequence of the gRNA is: 5'-TAGCCTCCGAAAACATCTGG-3' (named ZNF410 gRNA) is positioned in ZNF410 genome, and the GenBank number of the ZNF410 genome is KJ903016.1, which accounts for 1425 bp.
Sequencing the obtained recombinant AAV viral vector, wherein the sequencing result shows that: the recombinant AAV-N virus vector AAV-N-BE4maxCas9C1-511 is a recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC-1, an N-terminal fragment of Cas9 nickase named Cas9N-1-511 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1-2331 of SEQ ID No.1 is used for replacing a fragment between 3915-6431 in AAV-N, a ZNF410 gRNA sequence is used for replacing a small fragment between 7375-7395 nucleotides in AAV-N, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C512-1368 is a recombinant expression vector which replaces the segment between nucleotides 4041-6767 in AAV-C with the DNA molecule shown by nucleotides 2332-5523 in SEQ ID No.1, deletes the nucleotide 7253-7611 in AAV-C, keeps the other nucleotides of AAV-C unchanged, and obtains the C-terminal segment of expression intein, the C-terminal segment of Cas9 nickase named Cas9n-512-1368 and Uracil Glycosylase Inhibitor (UGI).
The recombinant AAV-N virus vector AAV-N-BE4maxCas9C1-507 is a recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC-1, an N-terminal fragment of Cas9 nickase named Cas9N-1-507 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1-2319 of SEQ ID No.1 is used for replacing a fragment between 3915-6431 in AAV-N, a ZNF410 gRNA sequence is used for replacing a small fragment between 7375-7395 nucleotides in AAV-N, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C508-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 2320-5523 of SEQ ID No.1 to replace a segment between nucleotides 4041-6767 in AAV-C, deletes the nucleotide 7253-7611 in AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal segment for expressing intein, a C-terminal segment for Cas9 nickase named Cas9n-508-1368 and a Uracil Glycosylase Inhibitor (UGI).
The recombinant AAV-N viral vector AAV-N-BE4maxCas9C1-503 is a recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC-1, an N-terminal fragment of Cas9 nickase named Cas9N-1-503 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1-2307 of SEQ ID No.1 is used for replacing a fragment between 3915-6431 in AAV-N, a ZNF410 gRNA sequence is used for replacing a small fragment between 7375-7395 nucleotides in AAV-N, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C504-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 2308-5523 of SEQ ID No.1 to replace a segment between nucleotides 4041-6767 in AAV-C, deletes the nucleotide 7253-7611 in AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal segment for expressing intein, a C-terminal segment for Cas9 nickase named Cas9n-504-1368 and a Uracil Glycosylase Inhibitor (UGI).
The recombinant AAV-N virus vector AAV-N-BE4maxCas9C1-502 is a recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC-1, an N-terminal fragment of Cas9 nickase named Cas9N-1-502 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1 to 2304 of SEQ ID No.1 is used for replacing a fragment between 3915 and 6431 in AAV-N, a ZNF410 gRNA sequence is used for replacing a small fragment between 7375 and 7395 nucleotides in AAV-N, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C503-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 2305-5523 of SEQ ID No.1 to replace a segment between nucleotides 4041-6767 in AAV-C, deletes the nucleotide 7253-7611 in AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal segment for expressing intein, a C-terminal segment for Cas9 nickase named Cas9n-503-1368 and a Uracil Glycosylase Inhibitor (UGI).
The recombinant AAV-N viral vector AAV-N-BE4maxCas9C1-501 is a recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC-1, an N-terminal fragment of Cas9 nickase named Cas9N-1-501 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1 to 2301 of SEQ ID No.1 is used for replacing a fragment between nucleotides 3915 and 6431 in AAV-N, a ZNF410 gRNA sequence is used for replacing a small fragment between nucleotides 7375 and 7395 in AAV-N, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C502-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 2302-5523 of SEQ ID No.1 to replace a segment between nucleotides 4041-6767 in AAV-C, deletes the nucleotide 7253-7611 in AAV-C, keeps other nucleotides of AAV-C unchanged, and expresses a C-terminal segment of an intein, a C-terminal segment of a Cas9 nickase named Cas9n-502-1368 and a Uracil Glycosylase Inhibitor (UGI).
The recombinant AAV-N viral vector AAV-N-BE4maxCas9C1-498 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 1-2292 of SEQ ID No.1 to replace a segment between nucleotides 3915-6431 in AAV-N, uses a ZNF410 gRNA sequence to replace a small segment between nucleotides 7375-7395 in AAV-N, keeps other nucleotides of AAV-N unchanged, and expresses sgRNA, cytosine deaminase APOBEC-1, an N-terminal segment of Cas9 nickase named Cas 9N-1-Cas and an N-terminal segment of intein. The recombinant AAV-C viral vector AAV-C-BE4maxCas9C499-1368 is a recombinant expression vector for replacing a segment between nucleotides 4041-6767 in AAV-C by a DNA molecule shown by nucleotides 2293-5523 in SEQ ID No.1, deleting the nucleotide 7253-7611 in AAV-C, keeping other nucleotides of AAV-C unchanged, and obtaining a C-terminal segment for expressing intein, a C-terminal segment for Cas9 nickase named Cas9n-499-1368 and a Uracil Glycosylase Inhibitor (UGI).
The N-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No.5, and the C-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No. 6. Cytosine deaminase APOBEC-1 is a protein coded by 22 th-705 th coding gene of SEQ ID No.1, and Uracil Glycosylase Inhibitor (UGI) is a protein coded by 4933 th 5181 th coding gene of SEQ ID No.1 or a protein coded by 5212 th 5460 th coding gene of SEQ ID No. 1.
The N-terminal fragment of the Cas9 nickase named as Cas9N-1-511 is a protein with a nucleotide sequence coded by an encoding gene at the 802 nd-2331 st position of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-512-1368 is a protein with a nucleotide sequence coded by an encoding gene at the 2332 nd-4902 nd position of SEQ ID No. 1; the N-terminal fragment of the Cas9 nickase named as Cas9N-1-507 is a protein with a nucleotide sequence coded by an encoding gene at the 802 th and 2319 th positions of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-508-1368 is a protein with a nucleotide sequence coded by an encoding gene at the 2320 th and 4902 th positions of SEQ ID No. 1; the N-terminal fragment of the Cas9 nickase named as Cas9N-1-503 is a protein with a nucleotide sequence coded by the coding gene at the 802 th and 2307 th positions of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-504 and 1368 is a protein with a nucleotide sequence coded by the coding gene at the 2308 th and 4902 th positions of SEQ ID No. 1; the N-terminal fragment of the Cas9 nickase named as Cas9N-1-502 is a protein with a nucleotide sequence coded by the coding gene at the 802 th and 2304 th positions of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-503 and 1368 is a protein with a nucleotide sequence coded by the coding gene at the 2305 th and 4902 th positions of SEQ ID No. 1; the N-terminal fragment of the Cas9 nickase named as Cas9N-1-501 is a protein with a nucleotide sequence encoded by the encoding gene at the 802 nd-2301 position of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-502-1368 is a protein with a nucleotide sequence encoded by the encoding gene at the 2302 nd-4902 position of SEQ ID No. 1; the N-terminal fragment of the Cas9 nickase named as Cas9N-1-498 is a protein with a nucleotide sequence coded by an encoding gene at the 802 th and 2292 th positions of SEQ ID No.1, and the C-terminal fragment of the Cas9 nickase named as Cas9N-499 and 1368 is a protein with a nucleotide sequence coded by an encoding gene at the 2293 th and 4902 th positions of SEQ ID No. 1.
Exemplary are the maps of the recombinant AAV-N viral vector AAV-N-BE4maxCas9C1-502 (FIG. 21) and the recombinant AAV-C viral vector AAV-C-BE4maxCas9C503-1368 (FIG. 22). The recombinant vector containing the Cas9N end and the C end expressed from the same splitting site is transfected into HEK293T cells.
The transfection doses were 0.5. mu.g each, and the PEI transfection amount was 3. mu.l. 3 replicates were transfected per plasmid combination to transfect the full length PE plasmid and gRNA as controls. After 72 hours, genomic DNA of the recombinant HEK293T cells was extracted using a rapid extraction DNA extraction solution (Epicentre, USA), and the PCR products were subjected to high-throughput sequencing by Taq DNA polymerase (Kangshiji, China) PCR in a region of 200bp to 300bp near the edited site to calculate the editing efficiency (Kingzhi, China). Wherein, the recombinant vectors containing Cas9N end and C end are any one of the combination of the group numbers 1-6 in the table 1.
TABLE 1 group transfection of HEK293T cells with recombinant expression vectors
Group number Resolution sites Recombinant vector combinations
1 511 AAV-N-BE4maxCas9C1-511 and AAV-C-BE4maxCas9C512-1368
2 507 AAV-N-BE4maxCas9C1-507 and AAV-C-BE4maxCas9C508-1368
3 503 AAV-N-BE4maxCas9C1-503 and AAV-C-BE4maxCas9C504-1368
4 502 AAV-N-BE4max Cas9C1-502 and AAV-C-BE4maxCas9C503-1368
5 501 AAV-N-BE4maxCas9C1-501 and AAV-C-BE4maxCas9C502-1368
6 498 AAV-N-BE4maxCas9C1-498 and AAV-C-BE4maxCas9C499-1368
1.2 transfection of AAV vectors after resolution of hA3A-BE3 System
Referring to the splitting site and implementation steps in 1.1, a CBE base editor pCMV _ hA3A-BE3 (abbreviated as hA3A-BE3 in the invention) is selected for splitting verification, wherein the full-length plasmid of the hA3A-BE3 CBE base editor contains a DNA molecule with a nucleotide sequence shown as SEQ ID No.2, the 55 th to 648 th sites of the SEQ ID No.2 are coding genes of cytosine deaminase APOBEC3A, the 745 th and 4845 th sites of Cas9(D10A) are coding genes, and the 4876 th and 5124 th sites and the 5155 th and 5403 th sites are coding genes of Uracil Glycosylase Inhibitor (UGI).
The cleavage is carried out at 6 sites of Cas9 protein of hA3A-BE3, such as 511, 507, 503, 502, 501, 498 and the like, and then the editing activity is analyzed. In a mammalian cell, an N-terminal vector and a C-terminal vector of Cas9 protein at different splitting sites are co-transfected, and related editing activity analysis is carried out on a specific gRNA target site of ZNF410, and the result shows that the split Cas9 protein can have the function of full-length Cas9 protein after co-transfection, and can efficiently edit the ZNF410 gRNA target site.
The experimental process is the same as that of 1.1, and the only difference is that the PCR amplification template is replaced by a full-length plasmid of hA3A-BE3 base editor, and proper primers are designed for the base editor to carry out PCR amplification, so that a PCR product containing a target sequence is obtained.
And respectively connecting the PCR products containing the target sequences into the AAV viral vectors by means of homologous recombination. The method comprises the following specific steps: the PCR product containing the Cas9(N) encoding gene and the cytosine deaminase encoding gene is respectively connected to an AAV-N viral vector in a homologous recombination mode to obtain a recombinant AAV-N viral vector AAV-N-hA3ABE3Cas9C1-511, AAV-N-hA3ABE3Cas9C1-507, AAV-N-hA3ABE3Cas9C1-503, AAV-N-hA3ABE3Cas9C1-502, AAV-N-hA3ABE3Cas9C1-501 and AAV-N-hA3ABE3Cas9C 1-498. The recombinant AAV-C viral vector with the same splitting site corresponding to the recombinant AAV-N viral vector is the same as the 1.1 sequence and is not repeatedly constructed.
Sequencing the recombinant vector, wherein the sequencing result shows that: the recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-511 is recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC3A, N-end fragment of Cas9 nickase named Cas9N-1-511 and N-end fragment of intein, which is obtained by replacing the fragment between 3915-6431 in AAV with DNA molecule shown in 1-2274 th nucleotides of SEQ ID No.2, replacing the small fragment between 7375-7395 th nucleotides of AAV-N with ZNF410 gRNA sequence, and keeping other nucleotides of AAV-N unchanged.
The recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-507 is recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC3A, N-end segment of Cas9 nickase named Cas9N-1-507 and N-end segment of intein, which is obtained by replacing the segment between 3915-6431 in AAV-N with DNA molecule shown in 1-2262 nucleotides of SEQ ID No.2, replacing the small segment between 7375-7395 nucleotides of AAV-N with ZNF410 gRNA sequence, and keeping other nucleotides of AAV-N unchanged.
The recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-503 is recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC3A, N-end fragment of Cas9 nickase named Cas9N-1-503 and N-end fragment of intein, which is obtained by replacing the fragment between 3915-6431 in AAV-N with DNA molecule shown in 1-2250 nucleotide of SEQ ID No.2, replacing the small fragment between 7375-7395-nucleotide in AAV-N with ZNF410 gRNA sequence, and keeping other nucleotides of AAV-N unchanged.
The recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-502 is recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC3A, N-end segment of Cas9 nickase named Cas9N-1-502 and N-end segment of intein, which is obtained by replacing the segment between 3915-6431 in AAV-N with DNA molecule shown in SEQ ID No.2, nucleotides 1-2247, replacing the small segment between 7375-7395 nucleotides in AAV-N with ZNF410 gRNA sequence and keeping other nucleotides in AAV-N unchanged.
The recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-501 is recombinant expression vector for expressing sgRNA, cytosine deaminase APOBEC3A, N-end segment of Cas9 nickase named Cas9N-1-502 and N-end segment of intein, which is obtained by replacing the segment between 3915-6431 in AAV-N with DNA molecule shown in SEQ ID No.2, nucleotides 1-2244, replacing the small segment between 7375-7395 nucleotides in AAV-N with ZNF410 gRNA sequence and keeping other nucleotides in AAV-N unchanged.
The recombinant AAV-N virus vector AAV-N-hA3ABE3Cas9C1-498 is recombinant expression vector expressing sgRNA, cytosine deaminase APOBEC3A, N-end fragment of Cas9 nickase named Cas9N-1-502 and N-end fragment of intein, which is obtained by replacing the fragment between 3915-6431 in AAV-N with DNA molecule shown in 1-2235 nucleotides in SEQ ID No.2, replacing the small fragment between 7375-7395 nucleotides in AAV-N with ZNF410 gRNA sequence, and keeping the other nucleotides in AAV-N unchanged.
The N-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No.5, and the C-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No. 6. Cytosine deaminase APOBEC3A is a protein encoded by the gene coding for positions 55-648 of SEQ ID No. 2.
The N-terminal fragment of the Cas9 nickase named Cas9N-1-511 is a protein with the nucleotide sequence being encoded by the coding gene at 745 th-2274 th site of SEQ ID No. 2; the N-terminal fragment of the Cas9 nickase named Cas9N-1-507 is a protein with the nucleotide sequence encoded by the encoding gene at 745 th and 2262 th positions of SEQ ID No. 2; the N-terminal fragment of the Cas9 nickase named Cas9N-1-503 is a protein with a nucleotide sequence encoded by a coding gene at 745 th-2250 site of SEQ ID No. 2; the N-terminal fragment of the Cas9 nickase named Cas9N-1-502 is a protein with the nucleotide sequence encoded by the coding gene at 745 th and 2247 th positions of SEQ ID No. 2; the N-terminal fragment of the Cas9 nickase named Cas9N-1-501 is a protein with the nucleotide sequence being encoded by the coding gene at 745 th-2244 site of SEQ ID No. 2; the N-terminal fragment of the Cas9 nickase named Cas9N-1-498 is a protein with the nucleotide sequence encoded by the coding gene at 745 th-2235 th position of SEQ ID No. 2.
Exemplary is given a map of the recombinant AAV-N viral vector AAV-N-hA3ABE3Cas9C1-502 (FIG. 23).
1.3 editing efficiency
The experimental results are as follows: selecting a ZNF410 gRNA locus as a gRNA target locus for result analysis, and analyzing the editing efficiency of CBE of different splitting loci on ZNF410, wherein the result is shown in Table 2. In FIG. 1, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 20%; in FIG. 2, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 22%; in FIG. 3, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 49%; in FIG. 4, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 58%; in FIG. 5, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 56%; in FIG. 6, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 26%; in FIG. 7, the specific editing site is the 7 th base C from the left, and the proportion of the mutation to T is 50%; in FIG. 8, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 29%; in FIG. 9, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 18%; in FIG. 10, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 62%; in FIG. 11, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 63%; in FIG. 12, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 63%; in FIG. 13, the specific editing site is the 8 th base C from the left, and the proportion of the mutation to T is 23%; in FIG. 14, the specific editing site is the 8 th base C from the left, and the ratio of the mutation to T is 54%.
The result shows that after the Cas9 protein is split at 511, 507, 503, 502, 501 and 498 sites, the N-terminal and C-terminal cotransfection still has the activity of the Cas9 protein, the effective editing of the ZNF410 site can be carried out, and the editing efficiency of the 503, 502 and 501 sites is higher, and the editing efficiency is 49%, 58% and 56% respectively. The result shows that the split CBE system can effectively edit the specific locus of the gene.
TABLE 2 efficiency of editing of ZNF410 by CBE of different splitting sites
Figure BDA0003466497020000131
Figure BDA0003466497020000141
Example 2 transfection of AAV vectors following resolution of the SpRY ABEmax System
In order to verify the influence of the splitting site on the ABE base editor activity, the base editor (abbreviated as SpRY ABEmax in the invention) of pCMV-T7-ABEmax (7.10) -SpRY-P2A-EGFP (RTW5025) is selected for splitting. Wherein, the full-length plasmid of the SpRY ABEmax base editor contains a DNA molecule with the nucleotide sequence shown as SEQ ID No.3, the 22 nd to 519 th positions and 616 nd and 1113 rd positions of the SEQ ID No.3 are coding genes of adenine deaminase TadA, and the 1210 nd and 5310 th positions are coding genes of Cas9 (D10A). In a mammalian cell, an N-terminal vector and a C-terminal vector of a SpRY ABEmax Cas9 protein with a split site being the 511 th site of a Cas9 protein are transfected together, and a 36H gRNA target site is subjected to related editing activity analysis, so that the split Cas9 protein can have the function of a full-length Cas9 protein after being transfected together, and the 36H gRNA target site can be efficiently edited.
2.1 Experimental procedures
HEK293T cells at 5X 105Spreading 24-well plates per well, and co-transfecting a plasmid expressing the Cas9N terminal vector and a plasmid expressing the Cas9C terminal vector according to the above Cas9 splitting site when the cells of each well grow to 40% -60%And obtaining the recombinant HEK293T cell. The method comprises the steps of carrying out PCR amplification on Cas9N and Cas9C plasmids by a PCR method respectively, using full-length plasmids of SpRYABEmax base editors stored in a laboratory as templates, designing primers according to the cutting sites to carry out PCR amplification to obtain PCR products containing target fragments, and connecting the PCR products to adeno-associated virus vectors in a homologous recombination mode respectively. The method comprises the following specific steps: the PCR product containing the Cas9(N) coding gene and the adenine deaminase coding gene is connected to the AAV-N viral vector in a homologous recombination mode to obtain a recombinant AAV-N viral vector AAV-N-SpRYABECas9C 1-511; the PCR product containing Cas9(C) encoding gene is connected to AAV-C viral vector by means of homologous recombination to obtain recombinant AAV-C viral vector AAV-C-SpRYABECas9C 512-1368.
Spacer N20 targeting the gRNA of the target sequence was constructed by means of Golden gate into the AAV-C recombinant vector described above, using Bsmbi as the restriction enzyme recognition site, purchased from NEB. Wherein the target sequence of the gRNA is: 5'-GCATAGACTGCGGGGCGGGC-3' (named 36H gRNA) located in the 36H genome, GenBank accession number of 36H genome is AC 093241.4.
Sequencing the recombinant virus vector, wherein the sequencing result shows that: the recombinant AAV-N viral vector AAV-N-SpRYABECas9C1-511 is a recombinant expression vector for expressing adenine deaminase TadA, an N-terminal fragment of Cas9 nickase named Cas9N-1-511 and an N-terminal fragment of intein, wherein a DNA molecule shown by nucleotides 1-2739 in SEQ ID No.3 is used for replacing a fragment between 3915-6431 in AAV-N, and the nucleotides 7286-7644 in AAV-N are deleted, other nucleotides of AAV-N are kept unchanged. The recombinant vector AAV-C-SpRYABECas9C512-1368 is a recombinant expression vector which uses a DNA molecule shown by 2740-5373 nucleotides in SEQ ID No.3 to replace a segment between 4041-6767 nucleotides in AAV-C, uses a 36H gRNA sequence to replace a segment between 7342-7362 nucleotides in AAV-C, keeps other nucleotides of AAV-C unchanged, and expresses sgRNA, a C-terminal segment of intein and a C-terminal segment of Cas9 nickase named Cas9 n-512-1368.
The N-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No.5, and the C-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No. 6. The adenine deaminase TadA is a protein encoded by the 22 nd to 519 th coding gene of SEQ ID No.3 and the 616 th and 1113 rd coding genes of SEQ ID No. 3.
The N-terminal fragment of the Cas9 nickase named Cas9N-1-511 is a protein with the nucleotide sequence being encoded by the encoding gene at 1210-2739 site of SEQ ID No. 3; the N-terminal fragment of the Cas9 nickase named Cas9N-512-1368 is a protein with the nucleotide sequence being encoded by the encoding gene at the 2740-5310 th position of SEQ ID No. 3.
Exemplary are the maps of the recombinant AAV-N viral vector AAV-N-SpRYABECas9C1-511 (FIG. 24) and the recombinant AAV-C viral vector AAV-C SpRYABECas9C512-1368 (FIG. 25).
The recombinant expression vector was transfected into HEK293T at a dose of 0.5. mu.g and PEI at a dose of 3. mu.l. 3 replicates were transfected per plasmid combination to transfect the full length ABE plasmid and gRNA as controls. After 72 hours, recombinant HEK293T cell genome DNA was extracted using a rapid extraction DNA extraction solution (Epicentre, USA), and the PCR product was subjected to high-throughput sequencing by Taq DNA polymerase (Kangji, China) PCR in a region of 200bp to 300bp near the edited site to calculate the editing efficiency (Jinweizhi, China).
2.2, results of the experiment
And selecting a 36H gRNA site as a result analyzed gRNA site, and analyzing the editing efficiency of SpRY ABEmax of the splitting site on the 36H gRNA. The result shows that after the Cas9 protein is split between amino acids 511 and 512, the N-terminal and C-terminal cotransfection still has the activity of the Cas9 protein, and can effectively edit the 36H site, wherein the specific editing site is the fifth base A from the left, the mutation rate of the specific editing site is 42 percent, the editing efficiency of the specific editing site is 42 percent (A > G, FIG. 15), and the editing efficiency of the non-split control group is 54 percent (A > G, FIG. 16). The result shows that the split SpRY ABEmax system can effectively edit specific sites of genes.
Example 3 transfection of AAV vectors after PE base editor resolution
3.1 construction of expression vector for expressing the pegRNA Gene, Cas9n Gene and the reverse transcriptase Gene
In order to verify the influence of the splitting sites on the activity of the PE base editor, the pCMV-PE2 base editor (abbreviated as PE in the invention) is selected for splitting. The PE base editor contains a DNA molecule with a nucleotide sequence shown as SEQ ID No.4, the 22 th to 4122 th sites of the SEQ ID No.4 are coding genes of Cas9(H840A), and the 4222 th site 6318 th site is a coding gene of Reverse Transcriptase (RT). The PE splitting sites are respectively 1015, 1022, 1026, 1029, 1040, 1054 and 1068 sites of the Cas9 protein, in a mammalian cell, the N-terminal vector and the C-terminal vector of the Cas9 protein at different splitting sites in a PE system are co-transfected, and relevant editing activity analysis is carried out at the VEGFA site, and the result shows that the split Cas9 protein can have the function of the full-length Cas9 protein after co-transfection, and the VEGFA site can be efficiently edited.
The experimental process comprises the following steps: HEK293T cells at 5X 105And paving 24 pore plates per pore, and co-transfecting a plasmid expressing a Cas9N terminal vector and a plasmid expressing a Cas9C terminal vector according to the splitting site of the Cas9 when each pore cell grows to 40-60 percent to obtain the recombinant HEK293T cell. Wherein, the Cas9N and Cas9C plasmids amplify the target fragments by PCR method respectively, the full-length plasmid of PE editor stored in the laboratory is used as the template, proper primer pairs are designed according to the splitting sites and used as the amplification primers to carry out PCR amplification respectively to obtain PCR products containing the target fragments,
and respectively connecting the PCR products to AAV viral vectors by means of homologous recombination. The method comprises the following specific steps: the PCR product containing the Cas9(N) encoding gene is respectively connected to an AAV-N viral vector in a homologous recombination mode to obtain a recombinant AAV-N viral vector AAV-N-PECas9C1-1015, AAV-N-PECas9C1-1022, AAV-N-PECas9C1-1026, AAV-N-PECas9C1-1029, AAV-N-PECas9C1-1040, AAV-N-PECas9C1-1054 and AAV-N-PECas9C 1-1068; the PCR product containing the Cas9(C) encoding gene is connected to an AAV-C viral vector in a homologous recombination mode to obtain a recombinant AAV-C viral vector AAV-C-PECas9C1016-1368, AAV-C-PECas9C1023-1368, AAV-C-PECas9C1027-1368, AAV-C-PECas9C1030-1368, AAV-C-PECas9C1041-1368, AAV-C-PECas9C1055-1368 and AAV-C-PECas9C 1069-1368.
The sequencing result shows that: the recombinant AAV-N virus vector AAV-N-PECas9C1-1015 is a recombinant expression vector which uses DNA molecule shown by 22 th-3063 th nucleotides of SEQ ID No.4 to replace the segment between 3915-. The recombinant AAV-C viral vector AAV-C-PECas9C1016-1368 is a recombinant expression vector which replaces a segment between nucleotides 4041-6767 in AAV-C by a DNA molecule shown by nucleotides 3064-6318 in SEQ ID No.4, keeps other nucleotides of AAV-C unchanged, and expresses a C-terminal segment of an intein, a C-terminal segment of a Cas9 nickase named Cas9n-1016-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1022 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 22-3084 of SEQ ID No.4 to replace a segment between nucleotides 3915-6431 in AAV-N, deletes a segment between nucleotides 7299-7667, keeps other nucleotides of AAV-N unchanged, and expresses an N-terminal segment of Cas9 nickase named Cas9N-1-1022 and an N-terminal segment of an intein. The recombinant AAV-C viral vector AAV-C-PECas9C1023-1368 is a recombinant expression vector which is obtained by replacing a segment between nucleotides 4041-6767 in AAV-C by a DNA molecule shown by nucleotides 3085-6318 of SEQ ID No.4, keeping other nucleotides of AAV-C unchanged, and expressing a C-terminal segment of an intein, a C-terminal segment of a Cas9 nickase named Cas9n-1023-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1026 is a recombinant expression vector which is obtained by replacing a segment between nucleotide 3915 and 6431 in AAV-N with a DNA molecule shown by nucleotide 22 to 3096 in SEQ ID No.4, deleting a segment between nucleotide 7299 and 7667, keeping other nucleotides of AAV-N unchanged, and expressing an N-terminal segment of Cas9 nickase named as Cas9N-1-1026 and an N-terminal segment of an intein. The recombinant AAV-C viral vector AAV-C-PECas9C1027-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 3097-6318 of SEQ ID No.4 to replace a segment between nucleotides 4041-6767 of AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal segment for expressing an intein, a C-terminal segment for Cas9 nickase named Cas9n-1027-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1029 is a recombinant expression vector which replaces the segment between nucleotide 3915 and 6431 in AAV-N with the DNA molecule shown by nucleotide 22-3105 in SEQ ID No.4, deletes the segment between nucleotide 7299 and 7667, keeps other nucleotides of AAV-N unchanged, and expresses the N-terminal segment of Cas9 nickase named as Cas9N-1-1029 and the N-terminal segment of intein. The recombinant AAV-C viral vector AAV-C-PECas9C1030-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 3106-6318 of SEQ ID No.4 to replace a small fragment between nucleotides 4041-6767 of AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal fragment for expressing intein, a C-terminal fragment for expressing Cas9 nickase named Cas9n-1030-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1040 is recombinant expression vector expressing N-terminal fragment of Cas9 nickase named Cas9N-1-1040 and N-terminal fragment of intein, in which the DNA molecule shown in the 22 nd-3138 th nucleotide of SEQ ID No.4 is used to replace the fragment between 3915-6431 th nucleotide and the fragment between 7299-7667 th nucleotide is deleted, and other nucleotides of AAV-N are kept unchanged. The recombinant AAV-C viral vector AAV-C-PECas9C1041-1368 is a recombinant expression vector which is obtained by replacing a segment between nucleotides 4041-6767 in AAV-C with a DNA molecule shown by nucleotides 3139-6318 in SEQ ID No.4, keeping other nucleotides of AAV-C unchanged, and expressing a C-terminal segment of an intein, a C-terminal segment of a Cas9 nickase named Cas9n-1041-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1054 is a recombinant expression vector which replaces the segment between nucleotide 3915 and 6431 in AAV-N with the DNA molecule shown by nucleotide 22-3180 of SEQ ID No.4, deletes the segment between nucleotide 7299 and 7667, keeps other nucleotides of AAV-N unchanged, and expresses the N-terminal segment of Cas9 nickase named Cas9N-1-1054 and the N-terminal segment of intein. The recombinant AAV-C viral vector AAV-C-PECas9C1055-1368 is a recombinant expression vector which uses DNA molecule shown by 3181-6318 th nucleotide of SEQ ID No.4 to replace the segment between 4041-6767 th nucleotide in AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains C-terminal segment of expression intein, C-terminal segment of Cas9 nickase named Cas9n-1055-1368 and Reverse Transcriptase (RT).
The recombinant AAV-N virus vector AAV-N-PECas9C1-1068 is a recombinant expression vector which replaces the segment between nucleotide 3915 and 6431 in AAV-N with the DNA molecule shown by nucleotide 22-3222 in SEQ ID No.4, deletes the segment between nucleotide 7299 and 7667, keeps other nucleotides of AAV-N unchanged, and expresses the N-terminal segment of Cas9 nickase named as Cas9N-1-1068 and the N-terminal segment of intein. The recombinant AAV-C viral vector AAV-C-PECas9C1069-1368 is a recombinant expression vector which uses a DNA molecule shown by nucleotides 3223-6318 of SEQ ID No.4 to replace a segment between nucleotides 4041-6767 of AAV-C, keeps other nucleotides of AAV-C unchanged, and obtains a C-terminal segment of an intein, a C-terminal segment of a Cas9 nickase named Cas9n-1069-1368 and Reverse Transcriptase (RT).
A161 bp double-stranded DNA fragment ((pegRNA gene) having a sequence of one strand of the pegRNA gene) was synthesized by the same company
Figure BDA0003466497020000171
Figure BDA0003466497020000172
(SeEQ ID No.7, wavy line shows the target sequence (located in the VEGFA genome, GenBank accession number AC103801.2), single underlining RT template sequence, double underlining PBS (primer binding site) sequence, dot underlining homology arm).
A. The recombinant AAV-C viral vector AAV-C-PECas9C1016-1368 is taken as a template, and AAV-C-F: ttcctgcccgaccttgcggc and AAV-C-R: cggtgtttcgtcctttccacaagatata as primer to proceed PCR amplification to obtain target carrier fragment. And carrying out homologous recombination on the target vector fragment and the 161bp double-stranded DNA fragment to obtain a recombinant expression vector AAV-C-PECas9C1016-1368-pegRNA for expressing the pegRNA, the C-end fragment of the intein, the C-end fragment of the Cas9 nickase named Cas9n-1016-1368 and the Reverse Transcriptase (RT).
The AAV-C-PECas9C1016-1368 in A is replaced by AAV-C-PECas9C1023-1368, the other operations are the same as A, and the recombinant expression vector AAV-C-PECas9C1023-1368-pegRNA for expressing the pegRNA, the C-end fragment of the intein, the C-end fragment of the Cas9 nickase named Cas9n-1023-1368 and the Reverse Transcriptase (RT) is obtained.
AAV-C-PECas9C1016-1368 in A is replaced by AAV-C-PECas9C1027-1368, other operations are performed with A, and a recombinant expression vector AAV-C-PECas9C1027-1368-peg RNA for expressing a C-terminal fragment of peg RNA intein, a C-terminal fragment of Cas9 nickase named Cas9n-1027-1368 and Reverse Transcriptase (RT) is obtained.
Replacing AAV-C-PECas9C1016-1368 in A with AAV-C-PECas9C1030-1368, and performing the other operations with A to obtain recombinant expression vector AAV-C-PECas9C1030-1368-pegRNA for expressing the pegRNA, the C-terminal fragment of the intein, the C-terminal fragment of the Cas9 nickase named Cas9n-1030-1368 and the Reverse Transcriptase (RT).
Replacing AAV-C-PECas9C1016-1368 in A with AAV-C-PECas9C1041-1368, and performing other operations with A to obtain recombinant expression vector AAV-C-PECas9C1041-1368-pegRNA for expressing pegRNA, C-terminal fragment of intein, C-terminal fragment of Cas9 nickase named Cas9n-1041-1368 and Reverse Transcriptase (RT).
Replacing AAV-C-PECas9C1016-1368 in A with AAV-C-PECas9C1055-1368, and performing other operations with A to obtain recombinant expression vector AAV-C-PECas9C1055-1368-pegRNA for expressing pegRNA, C-terminal fragment of intein, C-terminal fragment of Cas9 nickase named Cas9n-1055-1368 and Reverse Transcriptase (RT).
The AAV-C-PECas9C1016-1368 in the A is replaced by AAV-C-PECas9C1069-1368, the other operations are the same as the A, and the recombinant expression vector AAV-C-PECas9C1069-1368-pegRNA for expressing the pegRNA, the C-end fragment of the intein, the C-end fragment of the Cas9 nickase named Cas9n-1069-1368 and the Reverse Transcriptase (RT) is obtained.
3.2 replacement of promoters in 3.1 expression vectors
The replacement of the EF-1. alpha. core of the less long promoter in the PE system, due to the longer original promoter sequence in the 3.1 expression vectorA promoter. The EF-1 alpha core promoter has the sequence of 5-ctagatcagggtaccgggcagagcgcacatcgccc acagtccccgagaagttggggggaggggtcggcaattgatccggtgcctagagaaggtggcgcggggtaaactgggaaagtgatgtcgtgt actggctccgcctttttcccgagggtgggggagaaccgtatataagtgcagtagtcgccgtgaacgttctttttcgcaacgggtttgccgccagaa cacagtggcaccggtccaac-3' (SEQ ID No.8, 15bp of homology arm underlined).
B. The recombinant expression vector AAV-C-PECas9C1016-1368-pegRNA is used as a template, and AAV-V-F: 5'-gttggaccggtgccaccatga-3', respectively; AAV-V-R: 5'-ggtaccctgatctagaggccgc-3' as primer to proceed PCR amplification to obtain target carrier fragment. And carrying out homologous recombination on the target vector fragment and an EF-1 alpha core promoter with the nucleotide sequence of SEQ ID No.8 to obtain a recombinant expression vector AAV-C-PECas9C1016-1368-pegRNA-EF-1 alpha for expressing the pegRNA, the PE-Cas9C1016-1368 and the reverse transcriptase.
And replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1023-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C1023-1368-pegRNA-EF-1 alpha for expressing the pegRNA, the PE-Cas9C1023-1368 and the reverse transcriptase.
Replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1027-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C1027-1368-pegRNA-EF-1 alpha for expressing the pegRNA, a C-end fragment of an intein, a C-end fragment of a Cas9 nickase named Cas9n-1027-1368 and Reverse Transcriptase (RT).
Replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1030-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C1030-1368-pegRNA-EF-1 alpha for expressing the pegRNA, a C-end fragment of an intein, a C-end fragment of a Cas9 nickase named Cas9n-1030-1368 and Reverse Transcriptase (RT).
Replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1041-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C1041-1368-pegRNA-EF-1 alpha for expressing the pegRNA, a C-end fragment of an intein, a C-end fragment of a Cas9 nickase named Cas9n-1041-1368 and Reverse Transcriptase (RT).
Replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1055-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C 1055-1368-pegRNA-1 alpha for expressing the pegRNA, a C-end fragment of an intein, a C-end fragment of a Cas9 nickase named Cas9n-1055-1368 and Reverse Transcriptase (RT).
And replacing the AAV-C-PECas9C1016-1368-pegRNA in the B with the AAV-C-PECas9C1069-1368-pegRNA, and performing other operations with the B to obtain a recombinant expression vector AAV-C-PECas9C1069-1368-pegRNA-EF-1 alpha for expressing the pegRNA, the C-end fragment of the intein, the C-end fragment of the Cas9 nickase named Cas9n-1069-1368 and the Reverse Transcriptase (RT).
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1015, and the other operations are the same as the B, so that the recombinant expression vector AAV-N-PECas9C1-1015-EF-1 alpha containing the N-end fragment of the Cas9 nickase with the expression name of Cas9N-1-1015 and the N-end fragment of the intein is obtained.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1022, and the other operations are the same as the B, so that the recombinant expression vector AAV-N-PECas9C1-1022-EF-1 alpha containing the N-end fragment of the Cas9 nickase with the expression name of Cas9N-1-1022 and the N-end fragment of the intein is obtained.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1026, and other operations are performed with the B to obtain a recombinant expression vector AAV-N-PECas9C1-1026-EF-1 alpha containing an N-end fragment of the Cas9 nickase with the expression name of Cas9N-1-1026 and an N-end fragment of the intein.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1029, and the other operations are the same as the B, so that the recombinant expression vector AAV-N-PECas9C1-1029-EF-1 alpha containing the N-terminal fragment of the Cas9 nickase with the expression name of Cas9N-1-1029 and the N-terminal fragment of the intein is obtained.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1040, and the other operations are the same as the B, so that the recombinant expression vector AAV-N-PECas9C1-1040-EF-1 alpha containing the N-terminal fragment of the Cas9 nickase with the expression name of Cas9N-1-1040 and the N-terminal fragment of the intein is obtained.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1054, and other operations are performed with the B to obtain a recombinant expression vector AAV-N-PE-Cas9C1-1054-EF-1 alpha containing an N-end fragment of the Cas9 nickase with the expression name of Cas9N-1-1054 and an N-end fragment of the intein.
The AAV-C-PECas9C1016-1368-pegRNA in the B is replaced by AAV-N-PECas9C1-1068, and the other operations are the same as the B, so that the recombinant expression vector AAV-N-PECas9C1-1068-EF-1 alpha containing the N-terminal fragment of the Cas9 nickase with the expression name of Cas9N-1-1068 and the N-terminal fragment of the intein is obtained.
The N-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No.5, and the C-terminal segment of the intein is polypeptide with an amino acid sequence shown as SEQ ID No. 6. The Reverse Transcriptase (RT) is the protein encoded by the gene coding for position 4222-6318 of SEQ ID No. 4.
The N-terminal fragment of the Cas9 nickase named Cas9N-1-1015 is a protein with the nucleotide sequence encoded by the coding gene at 1210-3063 site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1016-1368 is a protein with the nucleotide sequence coded by the coding gene at the 3063-4122 th position of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1-1022 is a protein with the nucleotide sequence being encoded by the 1210-3084 coding gene of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1023-1368 is a protein with the nucleotide sequence encoded by the encoding gene at the No. 3085-4122 position of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1-1026 is a protein with a nucleotide sequence which is encoded by an encoding gene at 1210-3096 th site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1027-1368 is a protein with the nucleotide sequence encoded by the encoding gene at the 3097-4122 th site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1-1029 is a protein with the nucleotide sequence being encoded by the encoding gene at 1210-3105 of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1030-1368 is a protein with the nucleotide sequence coded by a coding gene at the 3106-4122 position of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1-1040 is a protein with the nucleotide sequence encoded by the coding gene at 1210-3138 th site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1041-1368 is a protein with the nucleotide sequence encoded by the encoding gene at the 3139-4122 th site of SEQ ID No. 4; the name is Cas9N-1-1054, the N-terminal fragment of Cas9 nickase is protein with the nucleotide sequence being encoded by the 1210-3180 coding gene of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1055-1368 is a protein with a nucleotide sequence which is encoded by an encoding gene at the 3181-4122 th site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1-1068 is a protein with the nucleotide sequence being encoded by the encoding gene at 1210-3222 site of SEQ ID No. 4; the N-terminal fragment of the Cas9 nickase named Cas9N-1069-1368 is a protein with the nucleotide sequence encoded by the encoding gene at the 3223-4122 position of SEQ ID No. 4.
The AAV-N viral vector AAV-N-PE-Cas9C1-1026-EF-1 α map (FIG. 26) and the recombinant AAV-C viral vector AAV-C-PE-Cas9C1027-1368-pegRNA-EF-1 α map (FIG. 27) are exemplarily shown
The recombinant vector containing the Cas9N end and the C end expressed from the same splitting site is transfected into HEK293T cells. The transfection doses were 0.5. mu.g each, and the PEI transfection amount was 3. mu.l. 3 replicates were transfected per plasmid combination to transfect the full length PE plasmid and gRNA as controls. After 72 hours, genomic DNA of the recombinant HEK293T cells was extracted using a rapid extraction DNA extraction solution (Epicentre, USA), and the PCR products were subjected to high-throughput sequencing by Taq DNA polymerase (Kangshiji, China) PCR in a region of 200bp to 300bp near the edited site to calculate the editing efficiency (Kingzhi, China). Wherein, the recombinant vectors containing Cas9N end and C end are any one of the combination of the group numbers 1-7 in the table 3.
TABLE 3 group transfection of HEK293T cells with recombinant expression vectors
Group number Resolution sites Recombinant vector combinations
1 1015 AAV-N-PE-Cas9C1-1015-EF-1 alpha and AAV-C-PE-Cas9C1016-1368-pegRNA-EF-1 alpha
2 1022 AAV-N-PE-Cas9C1-1022-EF-1 alpha and AAV-C-PE-Cas9C1023-1368-pegRNA-EF-1 alpha
3 1026 AAV-N-PE-Cas9C1-1026-EF-1 alpha and AAV-C-PE-Cas9C1027-1368-pegRNA-EF-1 alpha
4 1029 AAV-N-PE-Cas9C1-1029-EF-1 alpha and AAV-C-PE-Cas9C1030-1368-pegRNA-EF-1 alpha
5 1040 AAV-N-PE-Cas9C1-1040-EF-1 alpha and AAV-C-PE-Cas9C1041-1368-pegRNA-EF-1 alpha
6 1054 AAV-N-PE-Cas9C1-1054-EF-1 alpha and AAV-C-PE-Cas9C1055-1368-pegRNA-EF-1 alpha
7 1068 AAV-N-PE-Cas9C1-1068-EF-1 alpha and AAV-C-PE-Cas9C1069-1368-pegRNA-EF-1 alpha
The experimental results are as follows: VEGFA sites are selected as gRNA sites for result analysis, the editing efficiency of PE of different splitting sites on VEGFA is analyzed, and the experimental results are shown in Table 4. In FIG. 17, the specific editing sites are bases CCT from the left of the first row to the 2 nd to 4 th positions, and the mutation ratios of the bases CCT to GGG are respectively 6%, 7% and 10%; in FIG. 18, the specific editing sites are bases CCT from the left of the first row to the 2 nd to 4 th positions, and the mutation rates of the bases CCT to GGG are respectively 8%, 10% and 13%; in FIG. 19, the specific editing sites are bases CCT from the left of the first row to the 2 nd to 4 th positions, and the mutation rates of the bases CCT to GGG are respectively 5%, 7% and 11%; in FIG. 20, the specific editing sites are bases CCT from the left of the first row to the 2 th to 4 th positions, and the mutation rates of the bases CCT to GGG are 19%, 18% and 21%, respectively.
The result shows that after the Cas9 protein is split at 1022, 1026 and 1068 sites, the co-transfection of the N end and the C end still has the activity of the Cas9 protein, and the VEGFA site can be effectively edited. The result shows that the split PE system can effectively edit a specific locus of a gene, while the editing activity of the Cas9 split at other loci (4 loci in total of 1015, 1029, 1040 and 1054) is low, and the split locus is not suitable for subsequent application.
TABLE 4 efficiency of PE editing VEGFA for different split sites
Resolution sites Number of edited Type of mutation Efficiency of editing Corresponding figures
1015 1 CCT>GGG 7% Is free of
1022 1 CCT>GGG 10% FIG. 17
1026 1 CCT>GGG 13% FIG. 18
1029 1 CCT>GGG 6% Is free of
1040 1 CCT>GGG 6% Is free of
1054 1 CCT>GGG 5% Is free of
1068 1 CCT>GGG 11% FIG. 19
Unresolved control 1 CCT>GGG 21% FIG. 20
The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
Sequence listing
<110> institute of biotechnology for Tianjin industry of Chinese academy of sciences
<120> composition for base editing
<160> 10
<170> SIPOSequenceListing 1.0
<210> 1
<211> 5523
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
ccaaagaaga agcggaaagt ctcctcagag actgggcctg tcgccgtcga tccaaccctg 60
cgccgccgga ttgaacctca cgagtttgaa gtgttctttg acccccggga gctgagaaag 120
gagacatgcc tgctgtacga gatcaactgg ggaggcaggc actccatctg gaggcacacc 180
tctcagaaca caaataagca cgtggaggtg aacttcatcg agaagtttac cacagagcgg 240
tacttctgcc ccaataccag atgtagcatc acatggtttc tgagctggtc cccttgcgga 300
gagtgtagca gggccatcac cgagttcctg tccagatatc cacacgtgac actgtttatc 360
tacatcgcca ggctgtatca ccacgcagac ccaaggaata ggcagggcct gcgcgatctg 420
atcagctccg gcgtgaccat ccagatcatg acagagcagg agtccggcta ctgctggcgg 480
aacttcgtga attattctcc tagcaacgag gcccactggc ctaggtaccc acacctgtgg 540
gtgcgcctgt acgtgctgga gctgtattgc atcatcctgg gcctgccccc ttgtctgaat 600
atcctgcgga gaaagcagcc ccagctgacc ttctttacaa tcgccctgca gtcttgtcac 660
tatcagaggc tgccacccca catcctgtgg gccacaggcc tgaagtctgg aggatctagc 720
ggaggatcct ctggcagcga gacaccagga acaagcgagt cagcaacacc agagagcagt 780
ggcggcagca gcggcggcag cgacaagaag tacagcatcg gcctggccat cggcaccaac 840
tctgtgggct gggccgtgat caccgacgag tacaaggtgc ccagcaagaa attcaaggtg 900
ctgggcaaca ccgaccggca cagcatcaag aagaacctga tcggagccct gctgttcgac 960
agcggcgaaa cagccgaggc cacccggctg aagagaaccg ccagaagaag atacaccaga 1020
cggaagaacc ggatctgcta tctgcaagag atcttcagca acgagatggc caaggtggac 1080
gacagcttct tccacagact ggaagagtcc ttcctggtgg aagaggataa gaagcacgag 1140
cggcacccca tcttcggcaa catcgtggac gaggtggcct accacgagaa gtaccccacc 1200
atctaccacc tgagaaagaa actggtggac agcaccgaca aggccgacct gcggctgatc 1260
tatctggccc tggcccacat gatcaagttc cggggccact tcctgatcga gggcgacctg 1320
aaccccgaca acagcgacgt ggacaagctg ttcatccagc tggtgcagac ctacaaccag 1380
ctgttcgagg aaaaccccat caacgccagc ggcgtggacg ccaaggccat cctgtctgcc 1440
agactgagca agagcagacg gctggaaaat ctgatcgccc agctgcccgg cgagaagaag 1500
aatggcctgt tcggaaacct gattgccctg agcctgggcc tgacccccaa cttcaagagc 1560
aacttcgacc tggccgagga tgccaaactg cagctgagca aggacaccta cgacgacgac 1620
ctggacaacc tgctggccca gatcggcgac cagtacgccg acctgtttct ggccgccaag 1680
aacctgtccg acgccatcct gctgagcgac atcctgagag tgaacaccga gatcaccaag 1740
gcccccctga gcgcctctat gatcaagaga tacgacgagc accaccagga cctgaccctg 1800
ctgaaagctc tcgtgcggca gcagctgcct gagaagtaca aagagatttt cttcgaccag 1860
agcaagaacg gctacgccgg ctacattgac ggcggagcca gccaggaaga gttctacaag 1920
ttcatcaagc ccatcctgga aaagatggac ggcaccgagg aactgctcgt gaagctgaac 1980
agagaggacc tgctgcggaa gcagcggacc ttcgacaacg gcagcatccc ccaccagatc 2040
cacctgggag agctgcacgc cattctgcgg cggcaggaag atttttaccc attcctgaag 2100
gacaaccggg aaaagatcga gaagatcctg accttccgca tcccctacta cgtgggccct 2160
ctggccaggg gaaacagcag attcgcctgg atgaccagaa agagcgagga aaccatcacc 2220
ccctggaact tcgaggaagt ggtggacaag ggcgcttccg cccagagctt catcgagcgg 2280
atgaccaact tcgataagaa cctgcccaac gagaaggtgc tgcccaagca cagcctgctg 2340
tacgagtact tcaccgtgta taacgagctg accaaagtga aatacgtgac cgagggaatg 2400
agaaagcccg ccttcctgag cggcgagcag aaaaaggcca tcgtggacct gctgttcaag 2460
accaaccgga aagtgaccgt gaagcagctg aaagaggact acttcaagaa aatcgagtgc 2520
ttcgactccg tggaaatctc cggcgtggaa gatcggttca acgcctccct gggcacatac 2580
cacgatctgc tgaaaattat caaggacaag gacttcctgg acaatgagga aaacgaggac 2640
attctggaag atatcgtgct gaccctgaca ctgtttgagg acagagagat gatcgaggaa 2700
cggctgaaaa cctatgccca cctgttcgac gacaaagtga tgaagcagct gaagcggcgg 2760
agatacaccg gctggggcag gctgagccgg aagctgatca acggcatccg ggacaagcag 2820
tccggcaaga caatcctgga tttcctgaag tccgacggct tcgccaacag aaacttcatg 2880
cagctgatcc acgacgacag cctgaccttt aaagaggaca tccagaaagc ccaggtgtcc 2940
ggccagggcg atagcctgca cgagcacatt gccaatctgg ccggcagccc cgccattaag 3000
aagggcatcc tgcagacagt gaaggtggtg gacgagctcg tgaaagtgat gggccggcac 3060
aagcccgaga acatcgtgat cgaaatggcc agagagaacc agaccaccca gaagggacag 3120
aagaacagcc gcgagagaat gaagcggatc gaagagggca tcaaagagct gggcagccag 3180
atcctgaaag aacaccccgt ggaaaacacc cagctgcaga acgagaagct gtacctgtac 3240
tacctgcaga atgggcggga tatgtacgtg gaccaggaac tggacatcaa ccggctgtcc 3300
gactacgatg tggaccatat cgtgcctcag agctttctga aggacgactc catcgacaac 3360
aaggtgctga ccagaagcga caagaaccgg ggcaagagcg acaacgtgcc ctccgaagag 3420
gtcgtgaaga agatgaagaa ctactggcgg cagctgctga acgccaagct gattacccag 3480
agaaagttcg acaatctgac caaggccgag agaggcggcc tgagcgaact ggataaggcc 3540
ggcttcatca agagacagct ggtggaaacc cggcagatca caaagcacgt ggcacagatc 3600
ctggactccc ggatgaacac taagtacgac gagaatgaca agctgatccg ggaagtgaaa 3660
gtgatcaccc tgaagtccaa gctggtgtcc gatttccgga aggatttcca gttttacaaa 3720
gtgcgcgaga tcaacaacta ccaccacgcc cacgacgcct acctgaacgc cgtcgtggga 3780
accgccctga tcaaaaagta ccctaagctg gaaagcgagt tcgtgtacgg cgactacaag 3840
gtgtacgacg tgcggaagat gatcgccaag agcgagcagg aaatcggcaa ggctaccgcc 3900
aagtacttct tctacagcaa catcatgaac tttttcaaga ccgagattac cctggccaac 3960
ggcgagatcc ggaagcggcc tctgatcgag acaaacggcg aaaccgggga gatcgtgtgg 4020
gataagggcc gggattttgc caccgtgcgg aaagtgctga gcatgcccca agtgaatatc 4080
gtgaaaaaga ccgaggtgca gacaggcggc ttcagcaaag agtctatcct gcccaagagg 4140
aacagcgata agctgatcgc cagaaagaag gactgggacc ctaagaagta cggcggcttc 4200
gacagcccca ccgtggccta ttctgtgctg gtggtggcca aagtggaaaa gggcaagtcc 4260
aagaaactga agagtgtgaa agagctgctg gggatcacca tcatggaaag aagcagcttc 4320
gagaagaatc ccatcgactt tctggaagcc aagggctaca aagaagtgaa aaaggacctg 4380
atcatcaagc tgcctaagta ctccctgttc gagctggaaa acggccggaa gagaatgctg 4440
gcctctgccg gcgaactgca gaagggaaac gaactggccc tgccctccaa atatgtgaac 4500
ttcctgtacc tggccagcca ctatgagaag ctgaagggct cccccgagga taatgagcag 4560
aaacagctgt ttgtggaaca gcacaagcac tacctggacg agatcatcga gcagatcagc 4620
gagttctcca agagagtgat cctggccgac gctaatctgg acaaagtgct gtccgcctac 4680
aacaagcacc gggataagcc catcagagag caggccgaga atatcatcca cctgtttacc 4740
ctgaccaatc tgggagcccc tgccgccttc aagtactttg acaccaccat cgaccggaag 4800
aggtacacca gcaccaaaga ggtgctggac gccaccctga tccaccagag catcaccggc 4860
ctgtacgaga cacggatcga cctgtctcag ctgggaggtg acagcggcgg gagcggcggg 4920
agcgggggga gcactaatct gagcgacatc attgagaagg agactgggaa acagctggtc 4980
attcaggagt ccatcctgat gctgcctgag gaggtggagg aagtgatcgg caacaagcca 5040
gagtctgaca tcctggtgca caccgcctac gacgagtcca cagatgagaa tgtgatgctg 5100
ctgacctctg acgcccccga gtataagcct tgggccctgg tcatccagga ttctaacggc 5160
gagaataaga tcaagatgct gagcggagga tccggaggat ctggaggcag caccaacctg 5220
tctgacatca tcgagaagga gacaggcaag cagctggtca tccaggagag catcctgatg 5280
ctgcccgaag aagtcgaaga agtgatcgga aacaagcctg agagcgatat cctggtccat 5340
accgcctacg acgagagtac cgacgaaaat gtgatgctgc tgacatccga cgccccagag 5400
tataagccct gggctctggt catccaggat tccaacggag agaacaaaat caaaatgctg 5460
tctggcggct caaaaagaac cgccgacggc agcgaattcg agcccaagaa gaagaggaaa 5520
gtc 5523
<210> 2
<211> 5466
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
aagaggaccg ccgatggctc tgagttcgag agccccaaga agaagcggaa ggtggaggca 60
tctccagcaa gcggaccaag gcacctgatg gacccccaca tcttcacctc taactttaac 120
aatggcatcg gcaggcacaa gacatacctg tgctatgagg tggagcgcct ggacaacggc 180
accagcgtga agatggatca gcacagaggc ttcctgcaca accaggccaa gaatctgctg 240
tgcggcttct acggccggca cgcagagctg agatttctgg acctggtgcc tagcctgcag 300
ctggatccag cccagatcta tagggtgacc tggttcatca gctggtcccc atgcttttcc 360
tggggatgtg caggagaggt gcgcgccttc ctgcaggaga atacacacgt gcggctgaga 420
atctttgccg cccggatcta cgactatgat cctctgtaca aggaggccct gcagatgctg 480
agagacgcag gagcccaggt gtccatcatg acctatgatg agttcaagca ctgctgggac 540
acatttgtgg atcaccaggg ctgtcccttt cagccttggg acggactgga tgagcactcc 600
caggccctgt ctggcaggct gagggccatc ctgcagaacc agggcaattc tggcggatct 660
agcggtggat ctagcggctc tgagacccct ggaacatccg aatccgccac tccagagagc 720
agcggaggct cttctggagg atcagacaag aagtacagca tcggcctggc catcggcacc 780
aactctgtgg gctgggccgt gatcaccgac gagtacaagg tgcccagcaa gaaattcaag 840
gtgctgggca acaccgaccg gcacagcatc aagaagaacc tgatcggagc cctgctgttc 900
gacagcggcg aaacagccga ggccaccgcc ctgaagagaa ccgccagaag aagatacacc 960
agacggaaga accggatctg ctatctgcaa gagatcttca gcaacgagat ggccaaggtg 1020
gacgacagct tcttccacag actggaagag tccttcctgg tggaagagga taagaagcac 1080
gagcggcacc ccatcttcgg caacatcgtg gacgaggtgg cctaccacga gaagtacccc 1140
accatctacc acctgagaaa gaaactggtg gacagcaccg acaaggccga cctgcggctg 1200
atctatctgg ccctggccca catgatcaag ttccggggcc acttcctgat cgagggcgac 1260
ctgaaccccg acaacagcga cgtggacaag ctgttcatcc agctggtgca gacctacaac 1320
cagctgttcg aggaaaaccc catcaacgcc agcggcgtgg acgccaaggc catcctgtct 1380
gccagactga gcaagagcag acggctggaa aatctgatcg cccagctgcc cggcgagaag 1440
aagaatggcc tgttcggaaa cctgattgcc ctgagcctgg gcctgacccc caacttcaag 1500
agcaacttcg acctggccga ggatgccaaa ctgcagctga gcaaggacac ctacgacgac 1560
gacctggaca acctgctggc ccagatcggc gaccagtacg ccgacctgtt tctggccgcc 1620
aagaacctgt ccgacgccat cctgctgagc gacatcctga gagtgaacac cgagatcacc 1680
aaggcccccc tgagcgcctc tatgatcaag agatacgacg agcaccacca ggacctgacc 1740
ctgctgaaag ctctcgtgcg gcagcagctg cctgagaagt acaaagagat tttcttcgac 1800
cagagcaaga acggctacgc cggctacatt gacggcggag ccagccagga agagttctac 1860
aagttcatca agcccatcct ggaaaagatg gacggcaccg aggaactgct cgtgaagctg 1920
aacagagagg acctgctgcg gaagcagcgg accttcgaca acggcagcat cccccaccag 1980
atccacctgg gagagctgca cgccattctg cggcggcagg aagattttta cccattcctg 2040
aaggacaacc gggaaaagat cgagaagatc ctgaccttcc gcatccccta ctacgtgggc 2100
cctctggcca ggggaaacag cagattcgcc tggatgacca gaaagagcga ggaaaccatc 2160
accccctgga acttcgagga agtggtggac aagggcgctt ccgcccagag cttcatcgag 2220
cggatgacca acttcgataa gaacctgccc aacgagaagg tgctgcccaa gcacagcctg 2280
ctgtacgagt acttcaccgt gtataacgag ctgaccaaag tgaaatacgt gaccgaggga 2340
atgagaaagc ccgccttcct gagcggcgag cagaaaaagg ccatcgtgga cctgctgttc 2400
aagaccaacc ggaaagtgac cgtgaagcag ctgaaagagg actacttcaa gaaaatcgag 2460
tgcttcgact ccgtggaaat ctccggcgtg gaagatcggt tcaacgcctc cctgggcaca 2520
taccacgatc tgctgaaaat tatcaaggac aaggacttcc tggacaatga ggaaaacgag 2580
gacattctgg aagatatcgt gctgaccctg acactgtttg aggacagaga gatgatcgag 2640
gaacggctga aaacctatgc ccacctgttc gacgacaaag tgatgaagca gctgaagcgg 2700
cggagataca ccggctgggg caggctgagc cggaagctga tcaacggcat ccgggacaag 2760
cagtccggca agacaatcct ggatttcctg aagtccgacg gcttcgccaa cagaaacttc 2820
atgcagctga tccacgacga cagcctgacc tttaaagagg acatccagaa agcccaggtg 2880
tccggccagg gcgatagcct gcacgagcac attgccaatc tggccggcag ccccgccatt 2940
aagaagggca tcctgcagac agtgaaggtg gtggacgagc tcgtgaaagt gatgggccgg 3000
cacaagcccg agaacatcgt gatcgaaatg gccagagaga accagaccac ccagaaggga 3060
cagaagaaca gccgcgagag aatgaagcgg atcgaagagg gcatcaaaga gctgggcagc 3120
cagatcctga aagaacaccc cgtggaaaac acccagctgc agaacgagaa gctgtacctg 3180
tactacctgc agaatgggcg ggatatgtac gtggaccagg aactggacat caaccggctg 3240
tccgactacg atgtggacca tatcgtgcct cagagctttc tgaaggacga ctccatcgac 3300
aacaaggtgc tgaccagaag cgacaagaac cggggcaaga gcgacaacgt gccctccgaa 3360
gaggtcgtga agaagatgaa gaactactgg cggcagctgc tgaacgccaa gctgattacc 3420
cagagaaagt tcgacaatct gaccaaggcc gagagaggcg gcctgagcga actggataag 3480
gccggcttca tcaagagaca gctggtggaa acccggcaga tcacaaagca cgtggcacag 3540
atcctggact cccggatgaa cactaagtac gacgagaatg acaagctgat ccgggaagtg 3600
aaagtgatca ccctgaagtc caagctggtg tccgatttcc ggaaggattt ccagttttac 3660
aaagtgcgcg agatcaacaa ctaccaccac gcccacgacg cctacctaaa cgccgtcgtg 3720
ggaaccgccc tgatcaaaaa gtaccctaag ctggaaagcg agttcgtgta cggcgactac 3780
aaggtgtacg acgtgcggaa gatgatcgcc aagagcgagc aggaaatcgg caaggctacc 3840
gccaagtact tcttctacag caacatcatg aactttttca agaccgagat taccctggcc 3900
aacggcgaga tccggaagcg gcctctgatc gagacaaacg gcgaaaccgg ggagatcgtg 3960
tgggataagg gccgggattt tgccaccgtg cggaaagtgc tgagcatgcc ccaagtgaat 4020
atcgtgaaaa agaccgaggt gcagacaggc ggcttcagca aagagtctat cctgcccaag 4080
aggaacagcg ataagctgat cgccagaaag aaggactggg accctaagaa gtacggcggc 4140
ttcgacagcc ccaccgtggc ctattctgtg ctggtggtgg ccaaagtgga aaagggcaag 4200
tccaagaaac tgaagagtgt gaaagagctg ctggggatca ccatcatgga aagaagcagc 4260
ttcgagaaga atcccatcga ctttctggaa gccaagggct acaaagaagt gaaaaaggac 4320
ctgatcatca agctgcctaa gtactccctg ttcgagctgg aaaacggccg gaagagaatg 4380
ctggcctctg ccggcgaact gcagaaggga aacgaactgg ccctgccctc caaatatgtg 4440
aacttcctgt acctggccag ccactatgag aagctgaagg gctcccccga ggataatgag 4500
cagaaacagc tgtttgtgga acagcacaag cactacctgg acgagatcat cgagcagatc 4560
agcgagttct ccaagagagt gatcctggcc gacgctaatc tggacaaagt gctgtccgcc 4620
tacaacaagc accgggataa gcccatcaga gagcaggccg agaatatcat ccacctgttt 4680
accctgacca atctgggagc ccctgccgcc ttcaagtact ttgacaccac catcgaccgg 4740
aagaggtaca ccagcaccaa agaggtgctg gacgccaccc tgatccacca gagcatcacc 4800
ggcctgtacg agacacggat cgacctgtct cagctgggag gtgacagcgg cgggagcggc 4860
gggagcgggg ggagcactaa tctgagcgac atcattgaga aggagactgg gaaacagctg 4920
gtcattcagg agtccatcct gatgctgcct gaggaggtgg aggaagtgat cggcaacaag 4980
ccagagtctg acatcctggt gcacaccgcc tacgacgagt ccacagatga gaatgtgatg 5040
ctgctgacct ctgacgcccc cgagtataag ccttgggccc tggtcatcca ggattctaac 5100
ggcgagaata agatcaagat gctgagcgga ggatccggag gatctggagg cagcaccaac 5160
ctgtctgaca tcatcgagaa ggagacaggc aagcagctgg tcatccagga gagcatcctg 5220
atgctgcccg aagaagtcga agaagtgatc ggaaacaagc ctgagagcga tatcctggtc 5280
cataccgcct acgacgagag taccgacgaa aatgtgatgc tgctgacatc cgacgcccca 5340
gagtataagc cctgggctct ggtcatccag gattccaacg gagagaacaa aatcaaaatg 5400
ctgtctggcg gctcaaaaag aaccgccgac ggcagcgaat tcgagcccaa gaagaagagg 5460
aaagtc 5466
<210> 3
<211> 5373
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
ccaaagaaga agcggaaagt ctctgaagtc gagtttagcc acgagtattg gatgaggcac 60
gcactgaccc tggcaaagcg agcatgggat gaaagagaag tccccgtggg cgccgtgctg 120
gtgcacaaca atagagtgat cggagaggga tggaacaggc caatcggccg ccacgaccct 180
accgcacacg cagagatcat ggcactgagg cagggaggcc tggtcatgca gaattaccgc 240
ctgatcgatg ccaccctgta tgtgacactg gagccatgcg tgatgtgcgc aggagcaatg 300
atccacagca ggatcggaag agtggtgttc ggagcacggg acgccaagac cggcgcagca 360
ggctccctga tggatgtgct gcaccacccc ggcatgaacc accgggtgga gatcacagag 420
ggaatcctgg cagacgagtg cgccgccctg ctgagcgatt tctttagaat gcggagacag 480
gagatcaagg cccagaagaa ggcacagagc tccaccgact ctggaggatc tagcggagga 540
tcctctggaa gcgagacacc aggcacaagc gagtccgcca caccagagag ctccggcggc 600
tcctccggag gatcctctga ggtggagttt tcccacgagt actggatgag acatgccctg 660
accctggcca agagggcacg cgatgagagg gaggtgcctg tgggagccgt gctggtgctg 720
aacaatagag tgatcggcga gggctggaac agagccatcg gcctgcacga cccaacagcc 780
catgccgaaa ttatggccct gagacagggc ggcctggtca tgcagaacta cagactgatt 840
gacgccaccc tgtacgtgac attcgagcct tgcgtgatgt gcgccggcgc catgatccac 900
tctaggatcg gccgcgtggt gtttggcgtg aggaacgcaa aaaccggcgc cgcaggctcc 960
ctgatggacg tgctgcacta ccccggcatg aatcaccgcg tcgaaattac cgagggaatc 1020
ctggcagatg aatgtgccgc cctgctgtgc tatttctttc ggatgcctag acaggtgttc 1080
aatgctcaga agaaggccca gagctccacc gactccggag gatctagcgg aggctcctct 1140
ggctctgaga cacctggcac aagcgagagc gcaacacctg aaagcagcgg gggcagcagc 1200
ggggggtcag acaagaagta cagcatcggc ctggccatcg gcaccaactc tgtgggctgg 1260
gccgtgatca ccgacgagta caaggtgccc agcaagaaat tcaaggtgct gggcaacacc 1320
gaccggcaca gcatcaagaa gaacctgatc ggagccctgc tgttcgacag cggcgaaaca 1380
gccgagagaa cccggctgaa gagaaccgcc agaagaagat acaccagacg gaagaaccgg 1440
atctgctatc tgcaagagat cttcagcaac gagatggcca aggtggacga cagcttcttc 1500
cacagactgg aagagtcctt cctggtggaa gaggataaga agcacgagcg gcaccccatc 1560
ttcggcaaca tcgtggacga ggtggcctac cacgagaagt accccaccat ctaccacctg 1620
agaaagaaac tggtggacag caccgacaag gccgacctgc ggctgatcta tctggccctg 1680
gcccacatga tcaagttccg gggccacttc ctgatcgagg gcgacctgaa ccccgacaac 1740
agcgacgtgg acaagctgtt catccagctg gtgcagacct acaaccagct gttcgaggaa 1800
aaccccatca acgccagcgg cgtggacgcc aaggccatcc tgtctgccag actgagcaag 1860
agcagacggc tggaaaatct gatcgcccag ctgcccggcg agaagaagaa tggcctgttc 1920
ggaaacctga ttgccctgag cctgggcctg acccccaact tcaagagcaa cttcgacctg 1980
gccgaggatg ccaaactgca gctgagcaag gacacctacg acgacgacct ggacaacctg 2040
ctggcccaga tcggcgacca gtacgccgac ctgtttctgg ccgccaagaa cctgtccgac 2100
gccatcctgc tgagcgacat cctgagagtg aacaccgaga tcaccaaggc ccccctgagc 2160
gcctctatga tcaagagata cgacgagcac caccaggacc tgaccctgct gaaagctctc 2220
gtgcggcagc agctgcctga gaagtacaaa gagattttct tcgaccagag caagaacggc 2280
tacgccggct acattgacgg cggagccagc caggaagagt tctacaagtt catcaagccc 2340
atcctggaaa agatggacgg caccgaggaa ctgctcgtga agctgaacag agaggacctg 2400
ctgcggaagc agcggacctt cgacaacggc agcatccccc accagatcca cctgggagag 2460
ctgcacgcca ttctgcggcg gcaggaagat ttttacccat tcctgaagga caaccgggaa 2520
aagatcgaga agatcctgac cttccgcatc ccctactacg tgggccctct ggccagggga 2580
aacagcagat tcgcctggat gaccagaaag agcgaggaaa ccatcacccc ctggaacttc 2640
gaggaagtgg tggacaaggg cgcttccgcc cagagcttca tcgagcggat gaccaacttc 2700
gataagaacc tgcccaacga gaaggtgctg cccaagcaca gcctgctgta cgagtacttc 2760
accgtgtata acgagctgac caaagtgaaa tacgtgaccg agggaatgag aaagcccgcc 2820
ttcctgagcg gcgagcagaa aaaggccatc gtggacctgc tgttcaagac caaccggaaa 2880
gtgaccgtga agcagctgaa agaggactac ttcaagaaaa tcgagtgctt cgactccgtg 2940
gaaatctccg gcgtggaaga tcggttcaac gcctccctgg gcacatacca cgatctgctg 3000
aaaattatca aggacaagga cttcctggac aatgaggaaa acgaggacat tctggaagat 3060
atcgtgctga ccctgacact gtttgaggac agagagatga tcgaggaacg gctgaaaacc 3120
tatgcccacc tgttcgacga caaagtgatg aagcagctga agcggcggag atacaccggc 3180
tggggcaggc tgagccggaa gctgatcaac ggcatccggg acaagcagtc cggcaagaca 3240
atcctggatt tcctgaagtc cgacggcttc gccaacagaa acttcatgca gctgatccac 3300
gacgacagcc tgacctttaa agaggacatc cagaaagccc aggtgtccgg ccagggcgat 3360
agcctgcacg agcacattgc caatctggcc ggcagccccg ccattaagaa gggcatcctg 3420
cagacagtga aggtggtgga cgagctcgtg aaagtgatgg gccggcacaa gcccgagaac 3480
atcgtgatcg aaatggccag agagaaccag accacccaga agggacagaa gaacagccgc 3540
gagagaatga agcggatcga agagggcatc aaagagctgg gcagccagat cctgaaagaa 3600
caccccgtgg aaaacaccca gctgcagaac gagaagctgt acctgtacta cctgcagaat 3660
gggcgggata tgtacgtgga ccaggaactg gacatcaacc ggctgtccga ctacgatgtg 3720
gaccatatcg tgcctcagag ctttctgaag gacgactcca tcgacaacaa ggtgctgacc 3780
agaagcgaca agaaccgggg caagagcgac aacgtgccct ccgaagaggt cgtgaagaag 3840
atgaagaact actggcggca gctgctgaac gccaagctga ttacccagag aaagttcgac 3900
aatctgacca aggccgagag aggcggcctg agcgaactgg ataaggccgg cttcatcaag 3960
agacagctgg tggaaacccg gcagatcaca aagcacgtgg cacagatcct ggactcccgg 4020
atgaacacta agtacgacga gaatgacaag ctgatccggg aagtgaaagt gatcaccctg 4080
aagtccaagc tggtgtccga tttccggaag gatttccagt tttacaaagt gcgcgagatc 4140
aacaactacc accacgccca cgacgcctac ctgaacgccg tcgtgggaac cgccctgatc 4200
aaaaagtacc ctaagctgga aagcgagttc gtgtacggcg actacaaggt gtacgacgtg 4260
cggaagatga tcgccaagag cgagcaggaa atcggcaagg ctaccgccaa gtacttcttc 4320
tacagcaaca tcatgaactt tttcaagacc gagattaccc tggccaacgg cgagatccgg 4380
aagcggcctc tgatcgagac aaacggcgaa accggggaga tcgtgtggga taagggccgg 4440
gattttgcca ccgtgcggaa agtgctgagc atgccccaag tgaatatcgt gaaaaagacc 4500
gaggtgcaga caggcggctt cagcaaagag tctatcagac ccaagaggaa cagcgataag 4560
ctgatcgcca gaaagaagga ctgggaccct aagaagtacg gcggcttcct gtggcccacc 4620
gtggcctatt ctgtgctggt ggtggccaaa gtggaaaagg gcaagtccaa gaaactgaag 4680
agtgtgaaag agctgctggg gatcaccatc atggaaagaa gcagcttcga gaagaatccc 4740
atcgactttc tggaagccaa gggctacaaa gaagtgaaaa aggacctgat catcaagctg 4800
cctaagtact ccctgttcga gctggaaaac ggccggaaga gaatgctggc ctctgccaag 4860
cagctgcaga agggaaacga actggccctg ccctccaaat atgtgaactt cctgtacctg 4920
gccagccact atgagaagct gaagggctcc cccgaggata atgagcagaa acagctgttt 4980
gtggaacagc acaagcacta cctggacgag atcatcgagc agatcagcga gttctccaag 5040
agagtgatcc tggccgacgc taatctggac aaagtgctgt ccgcctacaa caagcaccgg 5100
gataagccca tcagagagca ggccgagaat atcatccacc tgtttaccct gaccagactg 5160
ggagccccta gagccttcaa gtactttgac accaccatcg accccaagca gtacagaagc 5220
accaaagagg tgctggacgc caccctgatc caccagagca tcaccggcct gtacgagaca 5280
cggatcgacc tgtctcagct gggaggtgac tctggcggct caaaaagaac cgccgacggc 5340
agcgaattcg agcccaagaa gaagaggaaa gtc 5373
<210> 4
<211> 6318
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
ccaaagaaga agcggaaagt cgacaagaag tacagcatcg gcctggacat cggcaccaac 60
tctgtgggct gggccgtgat caccgacgag tacaaggtgc ccagcaagaa attcaaggtg 120
ctgggcaaca ccgaccggca cagcatcaag aagaacctga tcggagccct gctgttcgac 180
agcggcgaaa cagccgaggc cacccggctg aagagaaccg ccagaagaag atacaccaga 240
cggaagaacc ggatctgcta tctgcaagag atcttcagca acgagatggc caaggtggac 300
gacagcttct tccacagact ggaagagtcc ttcctggtgg aagaggataa gaagcacgag 360
cggcacccca tcttcggcaa catcgtggac gaggtggcct accacgagaa gtaccccacc 420
atctaccacc tgagaaagaa actggtggac agcaccgaca aggccgacct gcggctgatc 480
tatctggccc tggcccacat gatcaagttc cggggccact tcctgatcga gggcgacctg 540
aaccccgaca acagcgacgt ggacaagctg ttcatccagc tggtgcagac ctacaaccag 600
ctgttcgagg aaaaccccat caacgccagc ggcgtggacg ccaaggccat cctgtctgcc 660
agactgagca agagcagacg gctggaaaat ctgatcgccc agctgcccgg cgagaagaag 720
aatggcctgt tcggaaacct gattgccctg agcctgggcc tgacccccaa cttcaagagc 780
aacttcgacc tggccgagga tgccaaactg cagctgagca aggacaccta cgacgacgac 840
ctggacaacc tgctggccca gatcggcgac cagtacgccg acctgtttct ggccgccaag 900
aacctgtccg acgccatcct gctgagcgac atcctgagag tgaacaccga gatcaccaag 960
gcccccctga gcgcctctat gatcaagaga tacgacgagc accaccagga cctgaccctg 1020
ctgaaagctc tcgtgcggca gcagctgcct gagaagtaca aagagatttt cttcgaccag 1080
agcaagaacg gctacgccgg ctacattgac ggcggagcca gccaggaaga gttctacaag 1140
ttcatcaagc ccatcctgga aaagatggac ggcaccgagg aactgctcgt gaagctgaac 1200
agagaggacc tgctgcggaa gcagcggacc ttcgacaacg gcagcatccc ccaccagatc 1260
cacctgggag agctgcacgc cattctgcgg cggcaggaag atttttaccc attcctgaag 1320
gacaaccggg aaaagatcga gaagatcctg accttccgca tcccctacta cgtgggccct 1380
ctggccaggg gaaacagcag attcgcctgg atgaccagaa agagcgagga aaccatcacc 1440
ccctggaact tcgaggaagt ggtggacaag ggcgcttccg cccagagctt catcgagcgg 1500
atgaccaact tcgataagaa cctgcccaac gagaaggtgc tgcccaagca cagcctgctg 1560
tacgagtact tcaccgtgta taacgagctg accaaagtga aatacgtgac cgagggaatg 1620
agaaagcccg ccttcctgag cggcgagcag aaaaaggcca tcgtggacct gctgttcaag 1680
accaaccgga aagtgaccgt gaagcagctg aaagaggact acttcaagaa aatcgagtgc 1740
ttcgactccg tggaaatctc cggcgtggaa gatcggttca acgcctccct gggcacatac 1800
cacgatctgc tgaaaattat caaggacaag gacttcctgg acaatgagga aaacgaggac 1860
attctggaag atatcgtgct gaccctgaca ctgtttgagg acagagagat gatcgaggaa 1920
cggctgaaaa cctatgccca cctgttcgac gacaaagtga tgaagcagct gaagcggcgg 1980
agatacaccg gctggggcag gctgagccgg aagctgatca acggcatccg ggacaagcag 2040
tccggcaaga caatcctgga tttcctgaag tccgacggct tcgccaacag aaacttcatg 2100
cagctgatcc acgacgacag cctgaccttt aaagaggaca tccagaaagc ccaggtgtcc 2160
ggccagggcg atagcctgca cgagcacatt gccaatctgg ccggcagccc cgccattaag 2220
aagggcatcc tgcagacagt gaaggtggtg gacgagctcg tgaaagtgat gggccggcac 2280
aagcccgaga acatcgtgat cgaaatggcc agagagaacc agaccaccca gaagggacag 2340
aagaacagcc gcgagagaat gaagcggatc gaagagggca tcaaagagct gggcagccag 2400
atcctgaaag aacaccccgt ggaaaacacc cagctgcaga acgagaagct gtacctgtac 2460
tacctgcaga atgggcggga tatgtacgtg gaccaggaac tggacatcaa ccggctgtcc 2520
gactacgatg tggacgctat cgtgcctcag agctttctga aggacgactc catcgacaac 2580
aaggtgctga ccagaagcga caagaaccgg ggcaagagcg acaacgtgcc ctccgaagag 2640
gtcgtgaaga agatgaagaa ctactggcgg cagctgctga acgccaagct gattacccag 2700
agaaagttcg acaatctgac caaggccgag agaggcggcc tgagcgaact ggataaggcc 2760
ggcttcatca agagacagct ggtggaaacc cggcagatca caaagcacgt ggcacagatc 2820
ctggactccc ggatgaacac taagtacgac gagaatgaca agctgatccg ggaagtgaaa 2880
gtgatcaccc tgaagtccaa gctggtgtcc gatttccgga aggatttcca gttttacaaa 2940
gtgcgcgaga tcaacaacta ccaccacgcc cacgacgcct acctgaacgc cgtcgtggga 3000
accgccctga tcaaaaagta ccctaagctg gaaagcgagt tcgtgtacgg cgactacaag 3060
gtgtacgacg tgcggaagat gatcgccaag agcgagcagg aaatcggcaa ggctaccgcc 3120
aagtacttct tctacagcaa catcatgaac tttttcaaga ccgagattac cctggccaac 3180
ggcgagatcc ggaagcggcc tctgatcgag acaaacggcg aaaccgggga gatcgtgtgg 3240
gataagggcc gggattttgc caccgtgcgg aaagtgctga gcatgcccca agtgaatatc 3300
gtgaaaaaga ccgaggtgca gacaggcggc ttcagcaaag agtctatcct gcccaagagg 3360
aacagcgata agctgatcgc cagaaagaag gactgggacc ctaagaagta cggcggcttc 3420
gacagcccca ccgtggccta ttctgtgctg gtggtggcca aagtggaaaa gggcaagtcc 3480
aagaaactga agagtgtgaa agagctgctg gggatcacca tcatggaaag aagcagcttc 3540
gagaagaatc ccatcgactt tctggaagcc aagggctaca aagaagtgaa aaaggacctg 3600
atcatcaagc tgcctaagta ctccctgttc gagctggaaa acggccggaa gagaatgctg 3660
gcctctgccg gcgaactgca gaagggaaac gaactggccc tgccctccaa atatgtgaac 3720
ttcctgtacc tggccagcca ctatgagaag ctgaagggct cccccgagga taatgagcag 3780
aaacagctgt ttgtggaaca gcacaagcac tacctggacg agatcatcga gcagatcagc 3840
gagttctcca agagagtgat cctggccgac gctaatctgg acaaagtgct gtccgcctac 3900
aacaagcacc gggataagcc catcagagag caggccgaga atatcatcca cctgtttacc 3960
ctgaccaatc tgggagcccc tgccgccttc aagtactttg acaccaccat cgaccggaag 4020
aggtacacca gcaccaaaga ggtgctggac gccaccctga tccaccagag catcaccggc 4080
ctgtacgaga cacggatcga cctgtctcag ctgggaggtg actctggagg atctagcgga 4140
ggatcctctg gcagcgagac accaggaaca agcgagtcag caacaccaga gagcagtggc 4200
ggcagcagcg gcggcagcag caccctaaat atagaagatg agtatcggct acatgagacc 4260
tcaaaagagc cagatgtttc tctagggtcc acatggctgt ctgattttcc tcaggcctgg 4320
gcggaaaccg ggggcatggg actggcagtt cgccaagctc ctctgatcat acctctgaaa 4380
gcaacctcta cccccgtgtc cataaaacaa taccccatgt cacaagaagc cagactgggg 4440
atcaagcccc acatacagag actgttggac cagggaatac tggtaccctg ccagtccccc 4500
tggaacacgc ccctgctacc cgttaagaaa ccagggacta atgattatag gcctgtccag 4560
gatctgagag aagtcaacaa gcgggtggaa gacatccacc ccaccgtgcc caacccttac 4620
aacctcttga gcgggctccc accgtcccac cagtggtaca ctgtgcttga tttaaaggat 4680
gcctttttct gcctgagact ccaccccacc agtcagcctc tcttcgcctt tgagtggaga 4740
gatccagaga tgggaatctc aggacaattg acctggacca gactcccaca gggtttcaaa 4800
aacagtccca ccctgtttaa tgaggcactg cacagagacc tagcagactt ccggatccag 4860
cacccagact tgatcctgct acagtacgtg gatgacttac tgctggccgc cacttctgag 4920
ctagactgcc aacaaggtac tcgggccctg ttacaaaccc tagggaacct cgggtatcgg 4980
gcctcggcca agaaagccca aatttgccag aaacaggtca agtatctggg gtatcttcta 5040
aaagagggtc agagatggct gactgaggcc agaaaagaga ctgtgatggg gcagcctact 5100
ccgaagaccc ctcgacaact aagggagttc ctagggaagg caggcttctg tcgcctcttc 5160
atccctgggt ttgcagaaat ggcagccccc ctgtaccctc tcaccaaacc ggggactctg 5220
tttaattggg gcccagacca acaaaaggcc tatcaagaaa tcaagcaagc tcttctaact 5280
gccccagccc tggggttgcc agatttgact aagccctttg aactctttgt cgacgagaag 5340
cagggctacg ccaaaggtgt cctaacgcaa aaactgggac cttggcgtcg gccggtggcc 5400
tacctgtcca aaaagctaga cccagtagca gctgggtggc ccccttgcct acggatggta 5460
gcagccattg ccgtactgac aaaggatgca ggcaagctaa ccatgggaca gccactagtc 5520
attctggccc cccatgcagt agaggcacta gtcaaacaac cccccgaccg ctggctttcc 5580
aacgcccgga tgactcacta tcaggccttg cttttggaca cggaccgggt ccagttcgga 5640
ccggtggtag ccctgaaccc ggctacgctg ctcccactgc ctgaggaagg gctgcaacac 5700
aactgccttg atatcctggc cgaagcccac ggaacccgac ccgacctaac ggaccagccg 5760
ctcccagacg ccgaccacac ctggtacacg gatggaagca gtctcttaca agagggacag 5820
cgtaaggcgg gagctgcggt gaccaccgag accgaggtaa tctgggctaa agccctgcca 5880
gccgggacat ccgctcagcg ggctgaactg atagcactca cccaggccct aaagatggca 5940
gaaggtaaga agctaaatgt ttatactgat agccgttatg cttttgctac tgcccatatc 6000
catggagaaa tatacagaag gcgtgggtgg ctcacatcag aaggcaaaga gatcaaaaat 6060
aaagacgaga tcttggccct actaaaagcc ctctttctgc ccaaaagact tagcataatc 6120
cattgtccag gacatcaaaa gggacacagc gccgaggcta gaggcaaccg gatggctgac 6180
caagcggccc gaaaggcagc catcacagag actccagaca cctctaccct cctcatagaa 6240
aattcatcac cctctggcgg ctcaaaaaga accgccgacg gcagcgaatt cgagcccaag 6300
aagaagagga aagtctaa 6318
<210> 5
<211> 116
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 5
Cys Leu Ser Tyr Glu Thr Glu Ile Leu Thr Val Glu Tyr Gly Leu Leu
1 5 10 15
Pro Ile Gly Lys Ile Val Glu Lys Arg Ile Glu Cys Thr Val Tyr Ser
20 25 30
Val Asp Asn Asn Gly Asn Ile Tyr Thr Gln Pro Val Ala Gln Trp His
35 40 45
Asp Arg Gly Glu Gln Glu Val Phe Glu Tyr Cys Leu Glu Asp Gly Ser
50 55 60
Leu Ile Arg Ala Thr Lys Asp His Lys Phe Met Thr Val Asp Gly Gln
65 70 75 80
Met Leu Pro Ile Asp Glu Ile Phe Glu Arg Glu Leu Asp Leu Met Arg
85 90 95
Val Asp Asn Leu Pro Asn Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly
100 105 110
Ser Glu Phe Glu
115
<210> 6
<211> 35
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 6
Ile Lys Ile Ala Thr Arg Lys Tyr Leu Gly Lys Gln Asn Val Tyr Asp
1 5 10 15
Ile Gly Val Glu Arg Asp His Asn Phe Ala Leu Lys Asn Gly Phe Ile
20 25 30
Ala Ser Asn
35
<210> 7
<211> 161
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
aaggacgaaa caccggatgt ctgcaggcca gatgagtttt agagctagaa atagcaagtt 60
aaaataaggc tagtccgtta tcaacttgaa aaagtggcac cgagtcggtg caatgtgcca 120
tctggagcgg gcatctggcc tgcagattcc tgcccgacct t 161
<210> 8
<211> 212
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
gggcagagcg cacatcgccc acagtccccg agaagttggg gggaggggtc ggcaattgat 60
ccggtgccta gagaaggtgg cgcggggtaa actgggaaag tgatgtcgtg tactggctcc 120
gcctttttcc cgagggtggg ggagaaccgt atataagtgc agtagtcgcc gtgaacgttc 180
tttttcgcaa cgggtttgcc gccagaacac ag 212
<210> 9
<211> 348
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
tgcctgtcct acgagacaga gatcctgaca gtggagtatg gcctgctgcc aatcggcaag 60
atcgtggaga agaggatcga gtgtaccgtg tactctgtgg ataacaatgg caacatctat 120
acacagcccg tggcacagtg gcacgatagg ggagagcagg aggtgttcga gtattgcctg 180
gaggacggca gcctgatcag ggcaaccaag gaccacaagt tcatgacagt ggatggccag 240
atgctgccca tcgacgagat tttcgagcgg gagctggacc tgatgagagt ggataacctg 300
cctaatagcg gaggcagtaa aagaacagca gacgggagtg agtttgag 348
<210> 10
<211> 105
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
atcaagattg ctacacggaa atacctggga aagcagaacg tgtacgacat cggcgtggag 60
cgggatcaca acttcgccct gaagaatggc tttatcgcca gcaat 105

Claims (10)

1. A genomic composition for use in the construction of a recombinant adeno-associated viral vector and/or for base editing, characterized in that: the composition is CBE-C, ABE-C or PE-C, the CBE-C is a composition for constructing a cytosine base editor, the ABE-C is a composition for constructing an adenine base editor, the PE-C is a composition for constructing a guide editor,
the CBE-C comprises CBE-C1, CBE-C2, CBE-C3, CBE-C4, CBE-C5 or CBE-C6,
the CBE-C1 consists of a CBE-C1-511 encoding gene and a CBE-C512-1368 encoding gene, the CBE-C1-511 encoding gene encodes a fusion protein with the name of CBE-C1-511, the CBE-C1-511 is formed by fusing an N-terminal fragment of Cas9 nickase with the name of Cas9N-1-511, cytosine deaminase and an N-terminal fragment of intein, the CBE-C512-1368 encoding gene encodes a fusion protein with the name of CBE-C512-1368, the CBE-C512-1368 is formed by fusing a C-terminal fragment of intein, a C-terminal fragment of Cas9 nickase with the name of Cas9N-512-1368 and a uracil glycosylase inhibitor, the Cas9N-1-511 encoding sequence is a protein encoded by the gene with the name of 802-position nucleotide of SEQ ID No.1, the Cas9n-512-1368 is a protein encoded by an encoding gene of which the encoding sequence is shown as nucleotides 2332-4902 of SEQ ID No.1,
the CBE-C2 consists of a CBE-C1-507 encoding gene and a CBE-C508-1368 encoding gene, the CBE-C1-507 encoding gene encodes a fusion protein with the name of CBE-C1-507, the CBE-C1-507 is formed by fusing an N-end fragment of a Cas9 nickase with the name of Cas9N-1-507 and fusing the cytosine deaminase and an N-end fragment of an intein, the CBE-C508-1368 encoding gene encodes a fusion protein with the name of CBE-C508-1368, the CBE-C508-1368 is formed by fusing a C-end fragment of the intein, a C-end fragment of the Cas9 nickase with the name of Cas9N-508-1368 and the uracil glycosylase inhibitor, the Cas9N-1-507 is a protein encoded by a 2319-position nucleotide coding gene with the sequence of SEQ ID No.1, the Cas9n-508-1368 is a protein encoded by an encoding gene of which the encoding sequence is shown as the 2320-4902 th nucleotide of SEQ ID No.1,
the CBE-C3 consists of a CBE-C1-503 encoding gene and a CBE-C504-1368 encoding gene, the CBE-C1-503 encoding gene encodes a fusion protein with the name of CBE-C1-503, the CBE-C1-503 encodes a fusion protein with the name of Cas9 nickase N-terminal fragment of Cas9N-1-503, the cytosine deaminase and the N-terminal fragment of the intein are fused, the CBE-C504-1368 encoding gene encodes a fusion protein with the name of CBE-C504-1368, the CBE-C504-1368 encodes a fusion protein with the name of the C-terminal fragment of the intein, the C-terminal fragment of the Cas9 nickase with the name of Cas 9N-1368 and the uracil glycosylase inhibitor, the Cas9N-1-503 encodes a protein with the coding sequence of the conyzation gene with the 802 th nucleotide sequence of SEQ ID 2307, the Cas9n-504-1368 is a protein encoded by a coding gene of which the coding sequence is shown as the 2308-4902 th nucleotide of SEQ ID No.1,
the CBE-C4 consists of a CBE-C1-502 encoding gene and a CBE-C503-1368 encoding gene, the CBE-C1-502 encoding gene encodes a fusion protein named CBE-C1-502, the CBE-C1-502 is formed by fusing an N-terminal fragment of Cas9 nickase named Cas9N-1-502, the cytosine deaminase and an N-terminal fragment of the intein, the CBE-C503-1368 encoding gene encodes a fusion protein named CBE-C503-1368, the CBE-C503-1368 is formed by fusing a C-terminal fragment of the intein, a C-terminal fragment of Cas9 nickase named Cas9N-503-1368 and the uracil glycosylase inhibitor, the Cas9N-1-502 is a protein encoded by a gene whose coding sequence is represented by the 802 th-position nucleotide of SEQ ID No.1, the Cas9n-503-1368 is a protein encoded by a coding gene of which the coding sequence is shown as the 2305-4902 th nucleotide of SEQ ID No.1,
the CBE-C5 consists of a CBE-C1-501 encoding gene and a CBE-C502-1368 encoding gene, the CBE-C1-501 encoding gene encodes a fusion protein named CBE-C1-501, the CBE-C1-501 is formed by fusing an N-terminal fragment of Cas9 nickase named Cas9N-1-501, the cytosine deaminase and an N-terminal fragment of intein, the CBE-C502-1368 encoding gene encodes a fusion protein named CBE-C502-1368, the CBE-C502-1368 is formed by fusing a C-terminal fragment of the intein, a C-terminal fragment of Cas9 nickase named Cas9N-502-1368 and the uracil glycosylase inhibitor, the Cas9N-1-501 is a protein encoded by a gene whose coding sequence is represented by the 802 th-position nucleotide of SEQ ID No.1, the Cas9n-502-1368 is a protein encoded by a coding gene whose coding sequence is shown by nucleotides 2302-4902 of SEQ ID No.1,
the CBE-C6 consists of a CBE-C1-498 encoding gene and a CBE-C499-1368 encoding gene, the CBE-C1-498 encoding gene encodes a fusion protein named CBE-C1-498, the CBE-C1-498 encodes a fusion protein named Cas9 nickase N-terminal fragment named Cas9N-1-498, the cytosine deaminase and the N-terminal fragment of the intein, the CBE-C499-1368 encoding gene encodes a fusion protein named CBE-C499-1368, the CBE-C499-1368 encodes a protein fused with the C-terminal fragment of the intein, the Cas9 nickase C-terminal fragment named Cas9N-499-1368 and the uracil glycosylase inhibitor, the Cas9N-1-498 encodes a protein encoded by the coding sequence represented by the 802 th 2292 th nucleotide of SEQ ID No.1, the Cas9n-499-1368 is a protein coded by a coding gene which is shown by the 2293-4902 th nucleotide of the sequence 1 in the sequence table;
the ABE-C consists of an ABE-C1-511 encoding gene and an ABE-C512-1368 encoding gene, the ABE-C1-511 encoding gene encodes a fusion protein with the name of ABE-C1-511, the ABE-C1-511 encodes a fusion protein with the name of Cas9 nickase of Cas9N-1-511, adenine deaminase and the N-end fragment of intein, the ABE-C512-1368 encoding gene encodes a fusion protein with the name of ABE-C512-1368, the ABE-C512-1368 encodes a protein with the name of Cas9 nickase of Cas9N-512-1368, the Cas9N-1-511 encodes a protein with the encoding gene represented by the 1210 th-st-2739-bit nucleotide of SEQ ID No.3, the Cas9n-512-1368 is a protein encoded by the encoding gene shown by the 2740-5310 th nucleotide of SEQ ID No. 3;
the PE-C comprises PE-C1, PE-C2 and PE-C3,
the PE-C1 consists of a PE-C1-1022 encoding gene and a PE-C1023-1368 encoding gene, the PE-C1-1022 encoding gene encodes a fusion protein named as PE-C1-1022, the PE-C1-1022 encodes a fusion protein named as Cas9 nickase of Cas9N-1-1022 and an N-terminal fragment of the intein, the PE-C1023-1368 encoding gene encodes a fusion protein named as PE-C1023-1368, the PE-C1023-1368 encodes a fusion protein named as C-terminal fragment of the intein, a C-terminal fragment of the Cas9 nickase named as Cas9N-1023-1368 and reverse transcriptase, the Cas9N-1-1022 is a protein encoded by an encoding gene whose encoding sequence is represented by nucleotides 22 th to 3084 th in sequence 4 of the sequence table, the Cas9n-1023-1368 is a protein coded by a coding gene of which the coding sequence is represented by the 3085-4122 th nucleotide of the sequence 4 in the sequence table,
the PE-C2 consists of a PE-C1-1026 encoding gene and a PE-C1027-1368 encoding gene, the PE-C1-1026 encoding gene encodes a fusion protein named as PE-C1-1026, the PE-C1-1026 is fused by an N-end fragment of a Cas9 nickase named as Cas9N-1-1026 and an N-end fragment of the intein, the PE-C1027-1368 encoding gene encodes a fusion protein named as PE-C1027-1368, the PE-C1027-1368 is fused by a C-end fragment of the intein, a C-end fragment of a Cas9 nickase named as Cas9N-1027-1368 and the reverse transcriptase, the Cas9N-1-1026 is a protein encoded by an encoding gene with a sequence represented by the 22 nd-3096 nd nucleotide of the sequence 1 in the sequence table, the Cas9n-1027-1368 is a protein encoded by a coding gene which is shown by the coding sequence of the 3097-4122 th nucleotide of the sequence 1 in the sequence table,
the PE-C3 consists of a PE-C1-1068 encoding gene and a PE-C1069-1368 encoding gene, the PE-C1-1068 encoding gene encodes a fusion protein named as PE-C1-1068, the PE-C1-1068 is formed by fusing an N-terminal fragment of a Cas9 nickase named as Cas9N-1-1068 and an N-terminal fragment of the intein, the PE-C1069-1368 encoding gene encodes a fusion protein named as PE-C1069-1368, the PE-C1069-1368 is formed by fusing a C-terminal fragment of the intein, a C-terminal fragment of the Cas9 nickase named as Cas9N-1069-1368 and the Cas enzyme, the PE-C9N-1-1068 is a protein encoded by an encoding gene with a 22 nd-3222 nd nucleotide shown in a sequence 1 in a sequence table, the Cas9n-1069-1368 is a protein encoded by a coding gene which is shown by the 3223-4122 th nucleotide of the sequence 1 in the sequence table.
2. The genomic composition of claim 2, wherein:
the cytosine deaminase is A1) or A2),
A1) protein coded by coding gene shown by 22-705 bit nucleotides of SEQ ID No. 1;
A2) protein coded by coding gene shown by 55 th-648 th nucleotides of SEQ ID No. 2;
the adenine deaminase is protein coded by a coding gene shown by 22 th-519 th nucleotides in SEQ ID No. 3;
the Reverse Transcriptase (RT) is a protein coded by a coding gene shown by the 4222-6318 th nucleotide of SEQ ID No. 4;
the Uracil Glycosylase Inhibitor (UGI) is protein coded by a coding gene shown by nucleotides 5212-5460 of SEQ ID No. 1;
the amino acid sequence of the N-terminal fragment of the intein is SEQ ID No. 5; the amino acid sequence of the C-terminal fragment of the intein is SEQ ID No. 6.
3. An expression cassette composition related to the genomic composition of claim 1 or 2 being any one of the following:
B1) an expression cassette comprising a gene encoding CBE-C1-511 as described above and an expression cassette comprising a gene encoding CBE-C512-1368 as described above;
B2) an expression cassette comprising a gene encoding CBE-C1-507 as described above and an expression cassette comprising a gene encoding CBE-C508-1368 as described above;
B3) an expression cassette comprising a gene encoding CBE-C1-503 as described above and an expression cassette comprising a gene encoding CBE-C504-1368 as described above;
B4) an expression cassette comprising a gene encoding CBE-C1-502 as described above and an expression cassette comprising a gene encoding CBE-C503-1368 as described above;
B5) an expression cassette comprising a gene encoding CBE-C1-501 as described above and an expression cassette comprising a gene encoding CBE-C502-1368 as described above;
B6) an expression cassette comprising a gene encoding CBE-C1-498 as described above and an expression cassette comprising a gene encoding CBE-C499-1368 as described above;
B7) an expression cassette comprising a gene encoding ABE-C1-511 as described above and an expression cassette comprising a gene encoding ABE-C512-1368 as described above;
B8) an expression cassette containing the gene encoding PE-C1-1022 described above and an expression cassette containing the gene encoding PE-C1023-1368 described above;
B9) an expression cassette containing the gene encoding PE-C1-1026 as described above and an expression cassette containing the gene encoding PE-C1027-1368 as described above;
B10) an expression cassette containing the gene encoding PE-C1-1068 and an expression cassette containing the gene encoding PE-C1069-1368.
4. A carrier composition associated with the composition of claim 1 or 2, being any one of:
v1) a vector containing the above-mentioned CBE-C1-511 encoding gene and a vector containing the above-mentioned CBE-C512-1368 encoding gene;
v2) a vector containing the above CBE-C1-507 encoding gene and a vector containing the above CBE-C508-1368 encoding gene;
v3) a vector containing the gene encoding CBE-C1-503 as described above and a vector containing the gene encoding CBE-C504-1368 as described above;
v4) a vector containing the above-mentioned CBE-C1-502 encoding gene and a vector containing the above-mentioned CBE-C503-1368 encoding gene;
v5) a vector containing the above-mentioned CBE-C1-501 encoding gene and a vector containing the above-mentioned CBE-C502-1368 encoding gene;
v6) a vector containing the above-mentioned CBE-C1-498 encoding gene and a vector containing the above-mentioned CBE-C499-1368 encoding gene;
v7) a vector containing the above-mentioned gene encoding ABE-C1-511 and a vector containing the above-mentioned gene encoding ABE-C512-1368;
v8) a vector containing the gene encoding PE-C1-1022 and a vector containing the gene encoding PE-C1023-1368;
v9) a vector containing the above PE-C1-1026 encoding gene and a vector containing the above PE-C1027-1368 encoding gene;
v10) a vector containing the above-mentioned gene encoding PE-C1-1068 and a vector containing the above-mentioned gene encoding PE-C1069-1368.
5. The carrier composition of claim 4, wherein:
v1) the vector for the gene encoding CBE-C1-511, V2) the vector for the gene encoding CBE-C1-507, V3) the vector for the gene encoding CBE-C1-503, V4) the vector for the gene encoding CBE-C1-502, V5) the vector for the gene encoding CBE-C1-501 and V6) the vector for the gene encoding CBE-C1-498 further contains sgRNA gene;
v7) the vector encoding the ABE-C512-1368 gene further contains a sgRNA gene;
v8), the PE-C1023-1368 gene, V9), the PE-C1027-1368 gene or V10), and the PE-C1069-1368 gene.
6. A recombinant microorganism comprising the genetic composition of claim 1 or 2 or comprising the expression cassette composition of claim 3 or comprising the vector composition of claim 4 or 5.
7. The recombinant microorganism according to claim 6, wherein: the recombinant microorganism is selected from Escherichia coli or adeno-associated virus.
8. A recombinant cell comprising the genetic composition of claim 1 or 2 or comprising the expression cassette composition of claim 3 or comprising the vector composition of claim 4 or 5.
9. The recombinant cell of claim 8, wherein: the cell is a mammalian cell.
10. Use of the genomic composition of claim 1 or 2 or the expression cassette composition of claim 3 or the vector composition of claim 4 or 5 for constructing a recombinant adeno-associated viral vector and/or for base editing.
CN202210031173.4A 2022-01-12 2022-01-12 Composition for base editing Active CN114395585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210031173.4A CN114395585B (en) 2022-01-12 2022-01-12 Composition for base editing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210031173.4A CN114395585B (en) 2022-01-12 2022-01-12 Composition for base editing

Publications (2)

Publication Number Publication Date
CN114395585A true CN114395585A (en) 2022-04-26
CN114395585B CN114395585B (en) 2024-03-08

Family

ID=81231250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210031173.4A Active CN114395585B (en) 2022-01-12 2022-01-12 Composition for base editing

Country Status (1)

Country Link
CN (1) CN114395585B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106011104A (en) * 2015-05-21 2016-10-12 清华大学 Method for carrying out gene editing and expression regulation by utilizing Cas splitting system
CN109929839A (en) * 2017-12-18 2019-06-25 华东师范大学 Detatching single base gene editing system and its application
CN111117985A (en) * 2020-01-23 2020-05-08 中山大学 Method for splitting Cas9 and application thereof
CN112708605A (en) * 2021-01-14 2021-04-27 中山大学 Proteome obtained by splitting Cas9 protein and application thereof
CN113874501A (en) * 2018-11-01 2021-12-31 中国科学院遗传与发育生物学研究所 Targeted mutagenesis using base editor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106011104A (en) * 2015-05-21 2016-10-12 清华大学 Method for carrying out gene editing and expression regulation by utilizing Cas splitting system
CN109929839A (en) * 2017-12-18 2019-06-25 华东师范大学 Detatching single base gene editing system and its application
US20200347407A1 (en) * 2017-12-18 2020-11-05 East China Normal University Split single-base gene editing systems and application thereof
CN113874501A (en) * 2018-11-01 2021-12-31 中国科学院遗传与发育生物学研究所 Targeted mutagenesis using base editor
CN111117985A (en) * 2020-01-23 2020-05-08 中山大学 Method for splitting Cas9 and application thereof
CN112708605A (en) * 2021-01-14 2021-04-27 中山大学 Proteome obtained by splitting Cas9 protein and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONG-JIUNN JEFFERY TRUONG ET AL.: "Development of an intein-mediated split–Cas9 system for gene therapy", 《NUCLEIC ACIDS RESEARCH》, vol. 43, no. 13, pages 6450 - 6458, XP055791410, DOI: 10.1093/nar/gkv601 *
XIAOFENG DAI ET AL.: "Inducible CRISPR genome-editing tool: classifications and future trends", 《CRITICAL REVIEWS IN BIOTECHNOLOGY》, pages 1 - 15 *

Also Published As

Publication number Publication date
CN114395585B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
KR102021585B1 (en) A method for regulation of gene expression by expressing Cas9 protein from the two independent vector
CN113631708B (en) Methods and compositions for editing RNA
KR20230057487A (en) Methods and compositions for genomic manipulation
US20230242899A1 (en) Methods and compositions for modulating a genome
CN111278972B (en) Non-integrated DNA vectors for cytogenetic modification
CN111712569A (en) Cpf 1-related methods and compositions for gene editing
US20240076698A1 (en) Methods and compositions for modulating a genome
JP2020508685A (en) RNA targeting of mutations by suppressor tRNA and deaminase
KR101999410B1 (en) Chromosomal landing pads and related uses
KR20230169449A (en) Rna-guided nucleic acid modifying enzymes and methods of use thereof
KR20200067190A (en) Composition and method for gene editing for hemophilia A
KR20220002609A (en) Modification of Mammalian Cells Using Artificial Micro-RNAs and Compositions of These Products to Alter Properties of Mammalian Cells
CN112159801B (en) SlugCas9-HF protein, gene editing system containing SlugCas9-HF protein and application
EP4314257A1 (en) Methods and compositions for editing nucleotide sequences
KR20210096088A (en) Composition and method for transgene delivery
CN110760511A (en) gRNA, expression vector and CRISPR-Cas9 system for treating duchenne muscular dystrophy
CN114395585B (en) Composition for base editing
KR102151064B1 (en) Gene editing composition comprising sgRNAs with matched 5&#39; nucleotide and gene editing method using the same
EP4172329A2 (en) Compositions for genome editing and methods of use thereof
RU2774631C1 (en) Engineered cascade components and cascade complexes
RU2792187C2 (en) Compositions of cart-irines and their use methods
KR20220022110A (en) Gene Editing for Hemophilia A Using Improved Factor VIII Expression
CN115873850A (en) Adenine base editing system and application thereof
WO2023192655A2 (en) Methods and compositions for editing nucleotide sequences
WO2022226215A1 (en) NOVEL OMNI 117, 140, 150-158, 160-165, 167-177, 180-188, 191-198, 200, 201, 203, 205-209, 211-217, 219, 220, 222, 223, 226, 227, 229, 231-236, 238-245, 247, 250, 254, 256, 257, 260 and 262 CRISPR NUCLEASES

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant