WO2022007803A1 - 一种改善的rna编辑方法 - Google Patents

一种改善的rna编辑方法 Download PDF

Info

Publication number
WO2022007803A1
WO2022007803A1 PCT/CN2021/104801 CN2021104801W WO2022007803A1 WO 2022007803 A1 WO2022007803 A1 WO 2022007803A1 CN 2021104801 W CN2021104801 W CN 2021104801W WO 2022007803 A1 WO2022007803 A1 WO 2022007803A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
target
base
arrna
residue
Prior art date
Application number
PCT/CN2021/104801
Other languages
English (en)
French (fr)
Inventor
袁鹏飞
易泽轩
刘能银
Original Assignee
博雅辑因(北京)生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 博雅辑因(北京)生物科技有限公司 filed Critical 博雅辑因(北京)生物科技有限公司
Priority to EP21838580.5A priority Critical patent/EP4177345A1/en
Priority to CA3185231A priority patent/CA3185231A1/en
Priority to JP2023501188A priority patent/JP2023532375A/ja
Priority to CN202180047686.8A priority patent/CN116194582A/zh
Priority to KR1020237004198A priority patent/KR20230035362A/ko
Priority to US18/015,054 priority patent/US20230272379A1/en
Priority to AU2021305359A priority patent/AU2021305359A1/en
Publication of WO2022007803A1 publication Critical patent/WO2022007803A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04004Adenosine deaminase (3.5.4.4)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04005Cytidine deaminase (3.5.4.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/11Antisense
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2740/00Reverse transcribing RNA viruses
    • C12N2740/00011Details
    • C12N2740/10011Retroviridae
    • C12N2740/16011Human Immunodeficiency Virus, HIV
    • C12N2740/16041Use of virus, viral particle or viral elements as a vector
    • C12N2740/16043Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector

Definitions

  • the invention belongs to the field of gene editing, in particular to the field of RNA editing, and includes introducing a deaminase recruiting RNA (dRNA, also called arRNA) or a construct encoding the arRNA into a host cell, and editing the target RNA at the target residue position of the host cell .
  • dRNA deaminase recruiting RNA
  • arRNA a construct encoding the arRNA
  • CRISPR Clustered regularly interspaced short palindromic repeats
  • WO2014018423A3 Clustered regularly interspaced short palindromic repeats
  • Many researchers and biotech companies are also working to bring the technology to the clinic.
  • Professor Deng Hongkui of Peking University and his collaborators published an article, which reported for the first time the results of clinical trials using CRISPR technology to edit stem cells and infuse them into patients to treat AIDS and leukemia. Conversion made a huge contribution.
  • CRISPR technology has great potential application prospects, it also has a series of defects, which make the transformation of this technology from the scientific research stage to clinical treatment application difficult.
  • One of the problems is the central enzyme that CRISPR technology uses: Cas9.
  • CRISPR-based DNA editing technology must introduce exogenous expression of Cas9 or other nucleases with similar functions, which causes the following problems.
  • nucleases that require exogenous expression typically have large molecular weights, which drastically reduces the efficiency of their delivery into the body via viral vectors.
  • Cas9 expression has been shown to be a potential carcinogenic risk in multi-person studies.
  • p53 is the most studied tumor suppressor gene. Haapaniemi et al.
  • exogenously expressed Cas9 is usually of bacterial origin, such as Staphylococcus aureus or Streptococcus pyogenes, rather than naturally occurring in humans or mammals, making it possible to elicit an immune response in patients.
  • Charlesworth et al. found IgG antibodies to Cas9 in human serum (Charlesworth et al., 2019).
  • the exogenously expressed nuclease may be neutralized, thereby losing its due activity, and on the other hand, it may cause damage or even toxicity to the patient itself or hinder further intervention.
  • RNA editing In order to avoid the potential risks in DNA editing, scientists have also developed a strong interest in RNA editing.
  • the genetic information existing in DNA needs to be transcribed into RNA and further translated into protein in order to perform normal physiological functions, which is called the central dogma of biology.
  • editing at the RNA level can make changes in the final biological function while avoiding genome damage.
  • ADAR Adosine deaminases acting on RNA
  • RNA editing technology called REPAIR (RNA Editing for Programmable A to I Replacement) through exogenous expression of Cas13-ADAR fusion protein and single guide RNA (single guide RNA, sgRNA) can also achieve A to I editing targeting target RNA (Cox et al., 2017).
  • Cas13 binds to sgRNA to perform the targeted function, bringing the fusion protein to the site that needs to be edited, while the ADAR deamination domain plays a catalytic role to achieve A to I editing.
  • this method like CRISPR technology, still requires the expression of exogenous proteins. Unable to solve the problem caused by foreign protein expression.
  • RESTORE recruiting Endogenous ADAR to Specific Transcripts for Oligonucleotide-mediated RNA Editing, WO2020001793A1. Similar to LEAPER, RESTORE can also get rid of the dependence on foreign proteins. But unlike LEAPER, firstly, RESTORE technology requires the presence of IFN- ⁇ to have high editing efficiency, and IFN- ⁇ is a key factor in determining the development and severity of autoimmunity (Pollard et al., 2013 ), which greatly reduces the application of this technology in the medical field.
  • a guide RNA is also used in the RESTORE technology, and the guide RNA used must be a chemically synthesized oligonucleotide, and the synthesized oligonucleotide needs to artificially introduce a large number of chemical modifications to ensure its stability. Among these chemical modifications, some of them may have potential toxicity or immunogenicity, and some of them may lead to different conformations of the same base chain, so that the RNA of the same sequence may have dozens of different conformational combinations.
  • LEAPER technology can not only be accomplished by chemically synthesizing RNA, but also by adeno-associated virus (AAV), lentivirus and other vectors delivered to patient cells to function, which makes it more flexible in the choice of delivery methods. .
  • AAV adeno-associated virus
  • Residues or sequences upstream and downstream of the A to I editing site are Residues or sequences upstream and downstream of the A to I editing site
  • the edited site will be transmitted to all progeny cells through replication, so even if the editing efficiency at the DNA level is relatively low, the edited cells can be enriched by screening progeny cells, etc. .
  • the difference here is that in the process of RNA editing, the resulting edits are not inherited. Therefore, on the one hand, off-target targets in RNA editing cannot be inherited to offspring, which makes RNA-level editing safer than DNA editing, and on the other hand, makes the efficiency of RNA editing more important.
  • ADAR is used as the key enzyme in the catalytic reaction, whether it is REPAIR (WO 2019005884A1), RESTORE (WO2020001793A1) or LEAPER (WO2020074001A1).
  • ADAR1 two isoforms, p110 and p150
  • ADAR2 two isoforms, p110 and p150
  • ADAR2 two isoforms, p110 and p150
  • ADAR2 and ADAR3 (catalytically inactive).
  • the catalytic substrate of the ADAR protein is double-stranded RNA, which removes the -NH2 group from the adenosine (A) nucleobase, changing A to inosine (I), which is recognized as guanosine (G)
  • a to inosine (I) which is recognized as guanosine (G)
  • C cytidine Due to the specific nature of ADAR, some of the same factors influence the editing efficiency of RNA by REPAIR, RESTORE, and LEAPER editing systems.
  • the edited adenosine A that is, the target residue here, has a significant impact on the editing efficiency of the bases adjacent to the 5' upstream and 3' downstream adjacent bases in the mRNA.
  • target A the edited adenosine A
  • the motif formed by the sequence of ' is called "three-base motif”.
  • the three-base motif can have 16 combinations, namely AAA, AAU, AAC, AAG, UAA, UAU, UAC , UAG, CAA, CAU, CAC, CAG, GAA, GAU, GAC, GAG.
  • TOPAIR REPAIR
  • RESTORE RESTORE
  • LEAPER LEAPER
  • the triplet preference of the system is slightly different due to the use of the Cas13-ADAR fusion protein.
  • the REPAIR system had the lowest editing efficiency for the three-base motif GAC, and the highest editing efficiency for UAU, a difference of about 2-3 times.
  • FIG triangle SA1Q specific implementation method ADAR1 human catalytic domain of human O 6 - alkylguanine DNA alkyl transferase (O 6 -alkylguanine- DNA-alkyl transferase, hAGT) C-terminal domain (SNAP-tag) was fused, and the 835 amino acid was glutamic acid to glutamine mutation (EQ), and then covalently crossed with guide RNA through SNAP-tag Linked (Keppler, A.
  • the specific implementation method is to combine the catalytic domain of human ADAR2 with human O 6 -alkylguanine DNA Alkyl transferase (O 6 -alkylguanine-DNA-alkyl transferase, hAGT) C-terminal domain (SNAP-tag) was fused, and the 1310 amino acid was glutamic acid to glutamine mutation (EQ), and then by SNAP-tags are covalently cross-linked to guide RNAs (Keppler, A. et al., 2003; Stafforst, T., et al., 2012).
  • the triple preference of the technical system for RNA editing using deaminase limits the application scope of the existing RNA editing technology.
  • the existing RNA editing technology is almost helpless for the position where the upstream residue of the three-base motif is G, which makes the application of this system in the treatment of diseases greatly reduced.
  • the upstream residue of the disease-causing gene mutation happens to be a G, it is difficult for us to correct and treat it using known RNA editing methods.
  • the problem to be solved by the present invention is precisely for other three-base motifs other than the preferred three-base motifs in the prior art, such as other three-base motifs other than UAG, by adjusting the method for recruiting deamination
  • Enzymes can adjust the recruiting RNA (deaminase recruiting RNA, dRNA or arRNA) sequence to reach the target RNA for precise editing, so as to break through the limitation of triple preference without any modification or change of the existing deaminase.
  • the upstream residue in the base motif is G or C, the editing efficiency is greatly improved.
  • the application provides a method of editing a target RNA at a target residue position in a host cell, comprising introducing a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA into the host cell, wherein the The arRNA comprises a complementary RNA sequence that hybridizes to a target RNA, wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA.
  • arRNA deaminase recruiting RNA
  • the target residue and the 3'-nearest-neighbor residue (downstream residue) of the target residue in the target RNA, wherein the three-base motif is not a UAG, and wherein the complementary RNA sequence comprises a A mismatch between the upstream or downstream residues directly relative to each other.
  • the present application provides a method of editing a target RNA at a target residue position in a host cell, comprising introducing into the host cell a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA, wherein the The arRNA comprises a complementary RNA sequence that hybridizes to a target RNA, wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream) of the target residue in the target RNA.
  • arRNA deaminase recruiting RNA
  • the target residue residue
  • the target residue residue
  • the 3' nearest neighbor (downstream residue) of the target residue in the target RNA wherein the three-base motif is not a UAG
  • the complementary RNA sequence comprises a
  • the upstream residue of the three-base motif is G. In certain embodiments, the upstream residue of the three-base motif is A. In certain embodiments, the upstream residue of the three-base motif is C. In certain embodiments, the downstream residue of the three-base motif is C. In certain embodiments, the downstream residue of the three-base motif is U. In certain embodiments, the downstream residue of the three-base motif is A. In certain embodiments, the three-base motif is selected from the group consisting of GAG, GAC, GAA, GAU, AAG, AAC, AAA, AAU, CAG, CAC, CAA, CAU, UAA, UAC and UAU.
  • the base opposite to the upstream residue in the complementary RNA is G.
  • the residue upstream of the three-base motif is G
  • the base opposite the upstream residue in the complementary RNA is A.
  • the three-base motif is GAU
  • the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is ACG or ACA.
  • the three-base motif is GAU
  • the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is ACG.
  • the three-base motif is GAA, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is UCA, CCG, CCC, or UCC. In certain embodiments, the three-base motif is GAA and the triple-complementary base in the complementary RNA sequence directly opposite the three-base motif is UCA. In some embodiments, the three-base motif is GAC, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is GCG or GCA. In certain embodiments, the three-base motif is GAC, and the triple-complementary base of the complementary RNA sequence directly opposite to the three-base motif is GCG.
  • the three-base motif is GAG
  • the complementary RNA sequence comprising the three-base complementary base directly opposite the three-base motif is CCG, CCA, CCC, UCC, or UCG .
  • the three-base motif is GAG
  • the triple-complementary base in the complementary RNA sequence directly opposite to the three-base motif is CCG.
  • the complementary RNA sequence comprises cytidine (C), adenosine (A) or uridine (U) directly opposite the target adenosine in the target RNA.
  • the complementary RNA sequence comprises a C directly opposite the target adenosine in the target RNA.
  • the complementary RNA sequence when hybridized to the target RNA further comprises one or more mismatches, each of which is opposite to a non-target adenosine in the target RNA .
  • the mismatched nucleosides opposite the one or more non-target adenosines are guanosines.
  • the residue upstream of the three-base motif is G, and wherein the base opposite the upstream residue in the complementary RNA is G or A.
  • the downstream residues of the three-base motif are strictly complementary to opposite bases in the complementary RNA.
  • the upstream residue of the three-base motif is G, wherein the base opposite to the upstream residue in the complementary RNA is G or A, and the downstream residue of the three-base motif is G or A. bases are strictly complementary to opposite bases in the complementary RNA.
  • the complementary RNA sequence comprises a C directly opposite to the target adenosine in the target RNA, and the residue upstream of the three-base motif is a G, wherein the complementary RNA is directly opposite to the target adenosine in the target RNA.
  • the opposite base of the upstream residue is G or A, and the downstream residue of the three-base motif is strictly complementary to the opposite base in the complementary RNA.
  • the complementary RNA sequence comprises a C directly opposite to the target adenosine in the target RNA, and the residue upstream of the three-base motif is a G, wherein the complementary RNA is directly opposite to the target adenosine in the target RNA.
  • the opposite base of the upstream residue is G, and the downstream residue of the three-base motif is strictly complementary to the opposite base in the complementary RNA.
  • the editing efficiency of RNA is improved by at least 90% to 1100% compared to the prior art, for example, at least 100%, 200%, 300%, 400%, 500%, 600%, 700% , 800%, 900%, 1000%.
  • the target adenosine (A) in the target RNA is deaminated by Adenosine Deaminase Acting on RNA (ADAR).
  • ADAR Adenosine Deaminase Acting on RNA
  • the adenosine deaminase is a native ADAR or a homologous protein thereof.
  • the adenosine deaminase is a modified adenosine deaminase functional variant that retains adenosine deaminase activity, eg, based on native ADAR or a homologous protein thereof by one or more Variants modified by multiple site mutations but still possessing adenosine deaminase activity.
  • the adenosine deaminase is a fusion protein comprising an ADAR catalytic domain or a catalytic domain of a homologous protein or a functional variant of an adenosine deaminase.
  • the fusion protein comprising the catalytic domain of an ADAR protein comprises a Cas13 protein that has lost catalytic activity by mutation and an ADAR functional domain or an ADAR homologous protein functional domain or adenosine deaminase functional variant fusion protein.
  • the deaminase having cytidine deaminase activity is exogenously introduced into the host cell or expressed in the host cell by a construct into which the deaminase is introduced.
  • the fusion protein comprising the catalytic domain of an ADAR protein is a fusion protein comprising a ⁇ N peptide and an ADAR functional domain or a homologous protein catalytic domain or adenosine deaminase functional variant thereof.
  • the fusion protein comprising the catalytic domain of an ADAR protein is a SNAP-tag-tagged ADAR or a SNAP-tag-tagged ADAR functional variant.
  • the ADAR is ADAR1 and/or ADAR2.
  • the ADAR is one or more ADARs selected from the group consisting of hADAR1, hADAR2, mouse ADAR1 and mouse ADAR2.
  • the ADAR is expressed by the host cell.
  • the ADAR occurs naturally or endogenously in the host cell, eg, naturally or endogenously in a eukaryotic cell.
  • the ADAR protein is exogenously introduced into the host cell.
  • the ADAR or a construct encoding the ADAR is introduced into a host cell.
  • the construct is selected from any one including, but not limited to, linear nucleic acids, plasmids, viruses, and the like.
  • the ADAR includes the above-mentioned natural ADAR, its homologous protein, a modified adenosine deaminase functional variant (eg, based on the natural ADAR or its homologous protein) that retains adenosine deaminase activity A variant modified by one or more site mutations but still having adenosine deaminase activity) or a fusion protein comprising an ADAR catalytic domain or its homologous protein catalytic domain or adenosine deaminase functional variant.
  • the method does not comprise introducing any protein into the host cell.
  • the ADAR is ADAR1 and/or ADAR2.
  • the ADAR is one or more ADARs selected from the group consisting of hADAR1, hADAR2, mouse ADAR1 and mouse ADAR2.
  • Another aspect of the present application provides a method of editing a target RNA at a target residue position in a host cell, wherein the target residue is cytidine, and the arRNA recruits a deaminase having cytidine deaminase activity on the RNA.
  • Aminase or "cytidine deaminase", in this application, deaminase with cytidine deaminase activity and cytidine deaminase have the same meaning and are used interchangeably), so that the Target cytidine deamination in target RNA.
  • a deaminase having cytidine deaminase activity or a construct comprising a deaminase encoding a deaminase having cytidine deaminase activity is introduced into a host cell.
  • the arRNA comprises a complementary RNA sequence that hybridizes to a target RNA, wherein the target residue is located in a three-base motif comprising 5' of the target residue in the target RNA The nearest neighbor residue (upstream residue), the target residue and the 3' nearest neighbor residue (downstream residue) of the target residue in the target RNA, wherein the target residue is cytidine (C), wherein the complementary RNA sequence Comprises mismatches directly opposite upstream and/or downstream residues on the target RNA.
  • the three-base motif in which the target cytidine is located is selected from any of the following: GCG, GCC, GCA, GCU, ACG, ACC, ACA, ACU, CCG, CCC, CCA, CCU, UCA , UCC, UCU and UCG.
  • the arRNA comprises unpaired nucleotides at positions corresponding to the target residues of the target RNA to form mismatches with the target residues.
  • the complementary RNA sequence in the arRNA that can hybridize to the target RNA comprises cytidine, adenosine or uridine directly opposite the target cytidine in the target RNA.
  • the complementary RNA sequence comprises uridine directly opposite the target cytidine.
  • the arRNA comprises one or more unpaired nucleotides at a non-target editing site corresponding to the target RNA to form one or more mismatches with the non-target site of the target RNA.
  • the residue upstream of the three-base motif is a G, and wherein the base opposite the upstream residue in the complementary RNA is a G.
  • the residue downstream of the three-base motif is A, and wherein the base opposite the downstream residue in the complementary RNA is U or A.
  • the three-base motif is ACA, and wherein the complementary RNA sequence comprises an AUU or GUU opposite the three-base motif.
  • the three-base motif is ACA, and wherein the complementary RNA sequence comprises an AUU opposite the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence comprises AUA, GUA, or CUA opposite the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence comprises an AUA opposite the three-base motif. In some embodiments, the three-base motif is GCA, and wherein the complementary RNA sequence comprises U UG or UCG opposite the three-base motif. In some embodiments, the three-base motif is GCA, and wherein the complementary RNA sequence comprises a UUG opposite the three-base motif. In some embodiments, the three-base motif is CCA, and wherein the complementary RNA sequence comprises an AUG opposite the three-base motif.
  • the deaminase with cytidine deaminase activity is a deaminase obtained by genetically modifying an ADAR protein or a fusion protein comprising an ADAR catalytic domain to obtain C to U catalytic activity.
  • the cytidine deaminase is a modified ADAR2 and comprises an ADAR2 catalytic domain selected from one or more mutations of E488Q/V351G/S486A/T375S/S370C/P462A/N597I/ L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T.
  • the cytidine deaminase is a fusion protein comprising all of the following mutated ADAR2 catalytic domains: E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/ S582T/V440I/S495N/K418E/S661T.
  • the deaminase having cytidine deaminase activity further comprises a targeting domain.
  • the targeting domain includes, but is not limited to, any one selected from the group consisting of a Cas13 protein that has been mutated to lose catalytic activity, a ⁇ N peptide, a SNAP-tag. Contains a Cas13 protein that has been mutated to lose catalytic activity.
  • the fusion protein comprises a Cas13 protein that has been mutated to inactivate catalytic activity and an ADAR2 catalytic domain having cytidine deaminase activity.
  • the deaminase having cytidine deaminase activity is exogenously introduced into the host cell or expressed in the host cell by a construct into which the deaminase is introduced.
  • the method comprises introducing the cytidine deaminase or fusion protein or a construct encoding the cytidine deaminase or fusion protein into a cell comprising a target RNA, wherein the cytidine deaminase is encoded
  • the construct of the deaminase or fusion protein is selected from any one including, but not limited to, linear nucleic acids, plasmids, viruses, and linear nucleic acids.
  • the target residue in the three-base motif in the target RNA is cytidine
  • the upstream residue of the three-base motif is selected from the nucleosides of G, C, A and U Acids, the preferred order is G>C>A ⁇ U.
  • the arRNA is single-stranded RNA.
  • the complementary RNA sequence is completely single-stranded.
  • the arRNA comprises one or more (eg, 1, 2, 3, or more) double-stranded regions and/or one or more stem-loop regions.
  • the arRNA consists solely of the complementary RNA sequence.
  • the length of the arRNA is about 20-260 nucleotides, for example, the length of the arRNA is 40-260, 45-250, 50-240, 60- 230, 65-220, 70-220, 70-210, 70-200, 70-190, 70-180, 70-170, 70-160, 70-150, 70-140, 70-130, 70-120, 70-110, 70-100, 70-90, 70-80, 75-200, 80-190, 85-180, 90-170, 95-160, 100-200, 100-150, 100-175, 110- Any of 200, 110-175, 110-150, or 105-140 nucleotides.
  • the arRNA is about 60-200 nucleotides in length (eg, about any of 60-150, 65-140, 68-130, or 70-120).
  • the arRNA further comprises an ADAR recruitment domain.
  • the arRNA comprises one or more chemical modifications.
  • the chemical modification comprises methylation and/or phosphorothioation, eg, 2'-O-methylation (2'-O-Me) and/or internucleotide phosphorothioate linkages .
  • the first and last 3 or 5 nucleotides of the arRNA comprise a 2'-O-Me modification, and/or a link between its first and last 3, 4 or 5 nucleotides Contains phosphorothioate linkage modifications.
  • one or more or all of the uridines in the arRNA comprise a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain 3'-phosphorothioate linkage modifications.
  • the arRNA does not contain any chemical modifications.
  • the present invention also provides the edited RNA produced by the method for editing a target RNA provided by the present invention or a host cell comprising the edited RNA.
  • the methods of editing target RNAs at target residue positions in host cells can be used in the treatment or prevention of diseases or disorders in an individual. Accordingly, the present invention also provides a method for treating or preventing a disease or disorder in an individual, comprising using any one of the aforementioned methods for editing a target RNA at a target residue position in a host cell to edit a cell of the individual with The target RNA associated with the disease or disorder.
  • the disease or disorder is an inherited genetic disease or a disease or disorder associated with one or more acquired genetic mutations (eg, drug resistance).
  • the present invention also provides an RNA (arRNA) that deaminates a target residue in a target RNA by recruiting a deaminase acting on the RNA, comprising an RNA (arRNA) that hybridizes to the target RNA, which can be used in the methods provided by the present invention.
  • RNA RNA
  • a complementary RNA sequence wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA, the target residue and the 3' nearest neighbor (downstream residue) of the target residue in the target RNA, wherein the three-base motif is not a UAG, and wherein the complementary RNA sequence comprises the upstream residue of the target RNA and/or Directly relative mismatches of downstream residues.
  • the arRNA comprises C directly opposite to the target adenosine in the target RNA.
  • the arRNA further comprises one or more mismatches when hybridized to the target RNA, each of the mismatches being opposite a non-target adenosine in the target RNA.
  • the mismatched nucleosides opposite the one or more non-target adenosines are guanosines.
  • the three-base motif is GAU, and wherein the arRNA comprises a triple complementary base directly opposite to the three-base motif is ACG or ACA.
  • the three-base motif is GAU, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is ACG. In certain embodiments, the three-base motif is GAA, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is UCA, CCG, CCC, or UCC. In certain embodiments, the three-base motif is GAA and the triple-complementary base in the arRNA directly opposite to the three-base motif is UCA. In certain embodiments, the three-base motif is GAC, and wherein the arRNA comprises a triple complementary base directly opposite to the three-base motif is GCG or GCA.
  • the three-base motif is GAC, and the triple-complementary base of the arRNA directly opposite to the three-base motif is GCG.
  • the three-base motif is GAG, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is CCG, CCA, CCC, UCC, or UCG.
  • the three-base motif is GAG and the triple-complementary base in the arRNA directly opposite the three-base motif is CCG.
  • the length of the arRNA is about 20-260 nucleotides, such as 40-260, 45-250, 50-240, 60-230, 65-220, 70-220, 70-210, 70-200, 70-190, 70-180, 70-170, 70-160, 70-150, 70-140, 70-130, 70-120, 70-110, 70- 100, 70-90, 70-80, 75-200, 80-190, 85-180, 90-170, 95-160, 100-200, 100-150, 100-175, 110-200, 110-175, Any of 110-150 or 105-140 nucleotides.
  • the arRNA is about 60-200 nucleotides in length (eg, about any of 60-150, 65-140, 68-130, or 70-120).
  • the arRNA further comprises an ADAR recruitment domain.
  • the arRNA comprises one or more chemical modifications.
  • the chemical modification comprises methylation and/or phosphorothioation, eg, 2'-O-methylation (2'-O-Me) and/or internucleotide phosphorothioate linkages .
  • the first and last 3 or 5 nucleotides of the arRNA comprise a 2'-O-Me modification, and/or a link between its first and last 3, 4 or 5 nucleotides Contains phosphorothioate linkage modifications.
  • one or more or all of the uridines in the arRNA comprise a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain 3'-phosphorothioate linkage modifications.
  • the arRNA does not contain any chemical modifications.
  • the present invention also provides a viral vector, plasmid or linear nucleic acid chain comprising any of the above-mentioned arRNAs provided by the present invention, and the arRNA does not contain any chemical modification.
  • the present invention also provides a library comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the present invention also provides a composition comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the present invention also provides a host cell comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the host cell comprising any of the above arRNAs provided herein is a eukaryotic cell.
  • Figure 1 Triad preference of the REPAIR system (Cox et al., 2017).
  • Figure 2 Triad preference of the SNAP-ADAR system (Vogel et al., 2018).
  • FIG. 4 The basic process of the LEAPER system and the improvement of this case.
  • Fig. 6 The design of 16 kinds of triple complementary bases, the results corresponding to the three base motifs according to the arRNA design principles of the LEAPER system in the prior art.
  • Figures 11A-11C Editing efficiency assays for GAN three-base motifs, including GAU (Figure 11A), GAG (Figure 11B), and GAC ( Figure 11C).
  • Figures 13A-13D improve editing efficiency after improving arRNA design according to this case, including arRNA design for three-base motifs GAA (Figure 13A), GAU (Figure 13B), GAG ( Figure 13C), and GAC ( Figure 13D) Improve.
  • Figure 15 shows a test of the C to U editing system, where the target residue is C, and the effect of changes in upstream residues and bases opposite to target C in the triple complementary base is tested for editing efficiency.
  • "/" in the figure represents that no corresponding plasmid or arRNA was added, only the same volume of water was added.
  • Figure 16 shows the results of repeating some of the data in Figure 15. "/" in the figure represents that no corresponding plasmid or arRNA was added, only the same volume of water was added.
  • Figure 18 selects the data in which the mRNA three-base motif is N*CA (as shown on the horizontal axis) and the arRNA triple-complementary base is GUU in the data in Figure 15 for comparison with the data in Figure 17 .
  • Figures 19A-19B show pairing analysis of each of the three base motifs and triple complementary bases used in Figures 18 and 17. Among them, the three-base motif and triple-complementary bases in Figure 19A are used to obtain the results in Figure 18, and the three-base motif and triple-complementary bases in Figure 19B are used to obtain the results in Figure 17. result.
  • FIG 21 Comparison of editing efficiency for multiple and single mismatches using the reporter system, and is the result of the same assay shown in Figure 20, shown in mean fluorescence intensity (MFI).
  • MFI mean fluorescence intensity
  • Figures 22A-22D show editing efficiency testing of different designed arRNAs for ACA (Figure 22A), TCA ( Figure 22B), CCA (Figure 22C) and GCA ( Figure 22D) three-base motifs.
  • the present invention provides a method for editing a target RNA at a target residue position in a host cell, comprising introducing a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA into the host cell.
  • the arRNA comprises a complementary RNA sequence that hybridizes to its target RNA to form a double-stranded RNA, recruits a deaminase acting on the RNA to deaminate the target residue in the target RNA, and after deamination, the Changes in the type of bases in the residues.
  • the present application provides a method for editing target RNA, wherein through the design of arRNA and the target RNA, the RNA editing system using ADAR in the prior art is significantly improved to other three bases other than UAG that do not conform to the natural preference of ADAR
  • the editing efficiency of the motif breaks the long-standing limitation of editing site selection in RNA editing applications.
  • the scope and effect of treating diseases by RNA editing methods can be greatly expanded, so that more diseases, such as more genetic diseases caused by gene mutations, have the opportunity to be safely and effectively treated by RNA editing methods.
  • the diseases caused by G->A mutation that can be treated by RNA editing therapy in the future can have a more flexible selection of the three-base motif where the mutation site is located.
  • the three-base motif where the mutation site is located is GAU
  • the editing efficiency of the prior art cannot meet the therapeutic requirements at all, while the editing efficiency achieved by the method provided in this application exceeds at least the prior art. 10 times.
  • the method of the present invention can also improve the editing efficiency of the RNA editing system for different three-base motifs whose target residue is C.
  • the present application provides a method for editing a target RNA at a target residue position in a host cell, comprising introducing into the host cell a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA, wherein the arRNA comprises a A complementary RNA sequence to which a target RNA hybridizes, wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA, The target residue and the 3'-nearest-neighbor residue (downstream residue) of the target residue in the target RNA, wherein the three-base motif is not a UAG, and wherein the complementary RNA sequence comprises an upstream residue on the target RNA Residues and/or direct opposite mismatches of downstream residues.
  • arRNA deaminase recruiting RNA
  • a “target RNA” as used herein is a pre-edited RNA.
  • base and “residue” refer to nucleobases such as “adenine”, “guanine”, “cytosine”, “thymine”, “uracil” and “hypoxanthine”.
  • adenosine and “guanosine”, “cytidine”, “thymidine”, “uridine” and “inosine” refer to a nucleobase attached to the sugar moiety of ribose or deoxyribose.
  • nucleoside refers to a nucleobase linked to a ribose or deoxyribose sugar.
  • nucleotide refers to the respective nucleobase-ribosyl-phosphate or nucleobase-deoxyribosyl-phosphate.
  • A adenosine and adenine
  • G guanosine and guanine
  • C cytosine and cytidine
  • U uracil and uridine
  • T Thymine and thymidine
  • I inosine and hypoxanthine
  • nucleotide residue or “residue”.
  • nucleobase, base, nucleoside, nucleotide, nucleotide residue and residue are used interchangeably unless the context clearly dictates otherwise.
  • complementarity of nucleic acids refers to the ability of one nucleic acid to form hydrogen bonds with another nucleic acid through traditional Watson-Crick base pairing. Percent complementarity represents the percentage of residues in a nucleic acid molecule that can hydrogen bond (ie, Watson-Crick base pairing) with another nucleic acid molecule (eg, about 5, 6, 7, 8, 9, 10 out of 10). are approximately 50%, 60%, 70%, 80%, 90% and 100% complementary). "Perfectly complementary” means that all contiguous residues of a nucleic acid sequence hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence.
  • substantially complementary means at least about 70%, 75%, 80% over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides
  • degree of complementarity of any of %, 85%, 90%, 95%, 97%, 98%, 99% or 100% or refers to two nucleic acids that hybridize under stringent conditions.
  • base or a single nucleotide according to the Watson-Crick base pairing principle, when A is paired with T or U, C with G or I, it is called complementary or matched, and vice versa; and other bases Base pairings are both called non-complementary or mismatched.
  • Hybridization refers to a reaction in which one or more polynucleotides react to form a complex stabilized by hydrogen bonding between the bases of nucleotide residues. Said hydrogen bonding can occur by Watson Crick base pairing, Hoogstein binding or in any other sequence specific manner. A sequence capable of hybridizing to a given sequence is referred to as the "complement" of the given sequence.
  • RNA editing refers to the phenomenon of base insertion, deletion or substitution in RNA.
  • One enzyme commonly used in many systems for RNA editing is adenosine deaminases acting on RNA (ADAR), variants thereof, or complexes comprising functional domains thereof.
  • ADAR adenosine deaminases acting on RNA
  • the ADAR family of proteins binds to the double-stranded region of specific RNAs, and it removes the -NH2 group from the adenosine (A) nucleobase, changing A to inosine (I), which is recognized during translation is guanosine (G) and pairs with cytidine (C) during subsequent cellular translation.
  • A->I (Adenosine-to-inosine) RNA editing is the most common type of RNA editing in animals, and is widely involved in a variety of gene regulatory mechanisms at the transcriptional and post-transcriptional levels, such as changing amino acid sequences at the transcriptome level, regulating mRNA splicing, mRNA stability and circular RNA formation, among others (Nishkura K. 2010).
  • ADAR1 two isoforms, p110 and p150
  • ADAR2 and ADAR3 catalytically inactive
  • RNA editing includes fusion of antisense RNAs to R/G motifs (ADAR-recruiting RNA scaffolds) to edit target RNAs by overexpressing ADAR1 or ADAR2 proteins in mammalian cells, and the use of dCas13-ADAR precision Targeting and editing RNA. Editing at the RNA level, on the one hand, avoids genome damage, and on the other hand, can make changes in the final biological function.
  • RNA recruiting RNA refers to recruiting ADARs, ADAR variants or certain complexes comprising domains thereof, RNA that deaminates target adenosine or deaminates target cytidine in RNA.
  • target RNA refers to an RNA sequence with which the deaminase recruiting RNA sequence is designed to have complete or substantial complementarity, the target RNA comprising target residues.
  • target residues herein refer to nucleotide residues that are modified by RNA editing, eg, by introduction of ADAR enzymes and arRNA.
  • Hybridization between the target sequence and the arRNA forms a double-stranded RNA (dsRNA) region containing the target residue, which recruits adenosine deaminase (ADAR) or a variant thereof that acts on the target residue, which causes the target Deamination of residues.
  • dsRNA double-stranded RNA
  • ADAR adenosine deaminase
  • Three-base motif means the 5'-nearest-neighbor residue (upstream residue) of the target residue in the target RNA, the target residue and the 3'-nearest-neighbor residue (downstream residue) of the target residue in the target RNA of three consecutive base sequences.
  • the "target residue” is located at the "editing site”, and therefore can be used interchangeably unless otherwise specified.
  • the upstream and downstream residues in the three-base motif often determine whether the RNA editing of the target residue can be edited with high efficiency.
  • the three bases in the complementary RNA sequence that are directly opposite the three-base motif in the target RNA that is, the bases that are directly opposite to the target residues (referred to herein as "target bases"), and
  • the triplet motif consisting of the 5' nearest neighbor residue of the base and the 3' nearest neighbor residue of the base is referred to herein as a "triple complementary base”.
  • the hybridization between the target RNA and the arRNA forms a double-stranded RNA (dsRNA) region containing the target residue, which recruits a deaminase acting on the RNA, which deaminates the target residue.
  • the methods provided by the present invention include designing an arRNA and introducing the arRNA or a construct encoding the arRNA into a host cell.
  • the double-stranded RNA formed by the hybridization of the complementary RNA sequence in the arRNA sequence and the target RNA can recruit a deaminase acting on the RNA to deaminate the target residue in the target RNA. After deamination, the residue
  • the type of bases in can change.
  • Adenosine (A) can be converted to creatinine (I) due to deamination, and I is recognized as guanosine (G), enabling A to G editing.
  • deamination of cytidine (C) can be converted to uridine (U), enabling C to U editing.
  • RNA editing There is a triplet preference for RNA editing, as shown in Figures 2 and 3, for example.
  • a lower triplet preference for three-base motifs whose upstream residues are guanosine (G) is a common feature of current ADAR-based RNA editing methods.
  • the open literature also showed a clear preference for triplets. It is precisely because of the limitation of this triplet preference that in order to meet the needs of practical applications and obtain higher editing efficiency, various deaminase-based RNA editing systems in the prior art must try to select some triplet preference. Editing with high three-base motifs. This limits the scope of RNA editing applications.
  • the present application provides improved methods for editing target RNAs at target residue positions in host cells, including introducing more mismatches in the arRNA at bases directly opposite to the three-base motif, significantly improving the use of prior art
  • the editing efficiency of ADAR's RNA editing system for target bases in three-base motifs that do not conform to the three-nucleotide preference of deaminase breaks the long-standing limitation in the selection of editing sites in RNA editing applications.
  • the application provides a method of editing a target RNA at a target residue position in a host cell, comprising introducing a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA into the host cell, wherein the The arRNA contains complementary RNA sequences that hybridize to the target RNA, forming double-stranded RNA, recruiting deaminase that acts on the RNA to deaminate target residues in the target RNA.
  • the target residue is located in a three-base motif in the target RNA comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA, the target residue and the target The 3' nearest neighbor (downstream residue) of the target residue in the RNA.
  • the triplet formed by the sequential connection of the upstream residue, the target residue and the downstream residue is called a "tribase motif".
  • all three base motifs are described in a 5' to 3' manner.
  • the three bases in the complementary RNA sequence relative to the three base motif in the target RNA are also in the order of 5' to 3'.
  • the present application provides a method for editing a target RNA at a target residue position in a host cell, comprising introducing a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA into the host cell, wherein the arRNA comprises a A hybridized complementary RNA sequence in which the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA, the target residue base and the 3'-nearest-neighbor residue (downstream residue) of the target residue in the target RNA, wherein the three-base motif is not a UAG, and wherein the complementary RNA sequence comprises an upstream residue on the target RNA or direct opposite mismatches of downstream residues.
  • arRNA deaminase recruiting RNA
  • the present application provides a method of editing a target RNA at a target residue position in a host cell, comprising introducing into the host cell a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA, wherein the The arRNA comprises a complementary RNA sequence that hybridizes to a target RNA, wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream) of the target residue in the target RNA.
  • arRNA deaminase recruiting RNA
  • the target residue residue
  • the target residue residue
  • the 3' nearest neighbor (downstream residue) of the target residue in the target RNA wherein the three-base motif is not a UAG
  • the complementary RNA sequence comprises a
  • the upstream residue of the three-base motif is G. In certain embodiments, the upstream residue of the three-base motif is A. In certain embodiments, the upstream residue of the three-base motif is C. In certain embodiments, the downstream residue of the three-base motif is C. In certain embodiments, the downstream residue of the three-base motif is U. In certain embodiments, the downstream residue of the three-base motif is A. In certain embodiments, the three-base motif is selected from the group consisting of GAG, GAC, GAA, GAU, AAG, AAC, AAA, AAU, CAG, CAC, CAA, CAU, UAA, UAC and UAU. In certain embodiments, the three-base motif is GAU.
  • the three-base motif is GAG. In certain embodiments, the three-base motif is GAA. In certain embodiments, the three-base motif is GAC. In some embodiments, the upstream residues in the target RNA are selected from nucleotides of G, C, A and U, preferably in the order G>C ⁇ A>U. In some embodiments, the complementary RNA sequence comprises cytidine (C), adenosine (A) or uridine (U) directly opposite the target adenosine in the target RNA. In some specific embodiments, the complementary RNA sequence comprises a C directly opposite the target adenosine in the target RNA.
  • the complementary RNA sequence when hybridized to the target RNA further comprises one or more mismatches, each of which is opposite to a non-target adenosine in the target RNA .
  • the mismatched nucleosides opposite the one or more non-target adenosines are guanosines.
  • the three-base motif is GAU, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is ACG or ACA.
  • the three-base motif is GAU, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is ACG.
  • the three-base motif is GAA, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is UCA, CCG, CCC, or UCC. In certain embodiments, the three-base motif is GAA and the triple-complementary base in the complementary RNA sequence directly opposite the three-base motif is UCA. In some embodiments, the three-base motif is GAC, and wherein the complementary RNA sequence comprises a triple-complementary base directly opposite the three-base motif is GCG or GCA. In certain embodiments, the three-base motif is GAC, and the triple-complementary base of the complementary RNA sequence directly opposite to the three-base motif is GCG.
  • the three-base motif is GAG
  • the complementary RNA sequence comprising the three-base complementary base directly opposite the three-base motif is CCG, CCA, CCC, UCC, or UCG .
  • the three-base motif is GAG
  • the triple-complementary base in the complementary RNA sequence directly opposite to the three-base motif is CCG.
  • the residue upstream of the three-base motif is G
  • the base opposite the upstream residue in the complementary RNA is G or A.
  • the downstream residues of the three-base motif are strictly complementary to opposite bases in the complementary RNA.
  • the upstream residue of the three-base motif is G, wherein the base opposite to the upstream residue in the complementary RNA is G or A, and the downstream residue of the three-base motif is G or A. bases are strictly complementary to opposite bases in the complementary RNA.
  • the complementary RNA sequence comprises a C directly opposite to the target adenosine in the target RNA, and the residue upstream of the three-base motif is a G, wherein the complementary RNA is directly opposite to the target adenosine in the target RNA.
  • the opposite base of the upstream residue is G or A, and the downstream residue of the three-base motif is strictly complementary to the opposite base in the complementary RNA.
  • the complementary RNA sequence comprises a C directly opposite to the target adenosine in the target RNA, and the residue upstream of the three-base motif is a G, wherein the complementary RNA is directly opposite to the target adenosine in the target RNA.
  • the opposite base of the upstream residue is G, and the downstream residue of the three-base motif is strictly complementary to the opposite base in the complementary RNA.
  • the target adenosine (A) in the target RNA is deaminated by adenosine deaminase (Adenosine Deaminase Acting on RNA, ADAR).
  • adenosine deaminase Acting on RNA, ADAR.
  • the adenosine deaminase is a native ADAR or a homologous protein thereof.
  • the adenosine deaminase is a modified adenosine deaminase functional variant that retains adenosine deaminase activity, eg, based on native ADAR or a homologous protein thereof by one or more Variants modified by multiple site mutations but still possessing adenosine deaminase activity.
  • the adenosine deaminase is a fusion protein comprising an ADAR catalytic domain or a catalytic domain of a homologous protein or a functional variant of an adenosine deaminase.
  • the fusion protein comprising the catalytic domain of an ADAR protein comprises a Cas13 protein that has lost catalytic activity by mutation and an ADAR functional domain or an ADAR homologous protein functional domain or adenosine deaminase functional variant fusion protein.
  • the deaminase having cytidine deaminase activity is exogenously introduced into the host cell or expressed in the host cell by a construct into which the deaminase is introduced.
  • the fusion protein comprising the catalytic domain of an ADAR protein is a fusion protein comprising a ⁇ N peptide and an ADAR functional domain or a homologous protein catalytic domain or adenosine deaminase functional variant thereof.
  • the fusion protein comprising the catalytic domain of an ADAR protein is a SNAP-tag-tagged ADAR or a SNAP-tag-tagged ADAR functional variant.
  • the ADAR is ADAR1 and/or ADAR2.
  • the ADAR is one or more ADARs selected from the group consisting of hADAR1, hADAR2, mouse ADAR1 and mouse ADAR2.
  • the ADAR is expressed by the host cell.
  • the ADAR occurs naturally or endogenously in the host cell, eg, naturally or endogenously in a eukaryotic cell.
  • the ADAR protein is exogenously introduced into the host cell.
  • the ADAR or a construct encoding the ADAR is introduced into a host cell.
  • the constructs include, but are not limited to, linear nucleic acids, plasmids, viruses, and the like.
  • the ADAR includes the above-mentioned natural ADAR, its homologous protein, a modified adenosine deaminase functional variant (eg, based on the natural ADAR or its homologous protein) that retains adenosine deaminase activity A variant modified by one or more site mutations but still having adenosine deaminase activity) or a fusion protein comprising an ADAR catalytic domain or its homologous protein catalytic domain or adenosine deaminase functional variant.
  • a modified adenosine deaminase functional variant eg, based on the natural ADAR or its homologous protein
  • a fusion protein comprising an ADAR catalytic domain or its homologous protein catalytic domain or adenosine deaminase functional variant.
  • the fusion protein comprising an ADAR catalytic domain or a homologous protein catalytic domain or adenosine deaminase functional variant thereof is a fusion protein comprising a targeting domain and the ADAR catalytic domain or an equivalent thereof A fusion protein of the catalytic domain of the source protein or a functional variant of adenosine deaminase.
  • the targeting domain is selected from any one including, but not limited to, a Cas13 protein that has been mutated to lose catalytic activity, a ⁇ N peptide, a SNAP-tag.
  • the ADAR is one or more ADARs selected from the group consisting of hADAR1, hADAR2, mouse ADAR1 and mouse ADAR2. In some embodiments, the method does not comprise introducing any protein into the host cell. In certain embodiments, the ADAR is ADAR1 and/or ADAR2.
  • Another aspect of the present application provides a method of editing a target RNA at a target residue position in a host cell, comprising introducing into the host cell a deaminase recruiting RNA (arRNA) or a construct encoding the arRNA, wherein the arRNA comprises A complementary RNA sequence that hybridizes to a target RNA, wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA , the target residue and the 3'-nearest-neighbor residue (downstream residue) of the target residue in the target RNA, the target residue is cytidine (C), wherein the complementary RNA sequence comprises an upstream residue on the target RNA and/or direct opposite residues, and the method further comprises introducing into the host cell a deaminase having cytidine deaminase activity or a cytidine deaminase or a construct encoding the deaminase.
  • the deaminase with cytidine deaminase activity is a deaminase obtained by genetically modifying an ADAR protein or a fusion protein comprising an ADAR catalytic domain to obtain C to U catalytic activity.
  • the deaminase having cytidine deaminase activity further comprises a targeting domain.
  • the three-base motif in which the target cytidine is located is selected from any of the following: GCG, GCC, GCA, GCU, ACG, ACC, ACA, ACU, CCG, CCC, CCA, CCU, UCA , UCC, UCU and UCG.
  • the arRNA comprises unpaired nucleotides at positions corresponding to the target residues of the target RNA to form mismatches with the target residues.
  • the complementary RNA sequence in the arRNA that can hybridize to the target RNA comprises cytidine, adenosine or uridine directly opposite the target cytidine in the target RNA.
  • the complementary RNA sequence comprises a cytidine directly opposite the target cytidine.
  • the arRNA comprises one or more unpaired nucleotides at a non-target editing site corresponding to the target RNA to form one or more mismatches with the non-target site of the target RNA.
  • Example 4 the editing efficiency of cytidine to uridine in the case of only a single mismatch of the target residue in the three-base motif and the case of multiple residue mismatches in the three-base motif were detected, and the results As shown in Figure 22.
  • the upstream residue of the three-base motif is A or U
  • multiple mismatches can achieve editing efficiency comparable to that of only a single mismatch of the target residue, while the upstream residue of the three-base motif is G.
  • the editing efficiency is extremely low, and the introduction of more mismatches at this time can significantly improve the editing efficiency of C to U.
  • the upstream residue of the three-base motif is a G
  • the complementary RNA sequence comprises a G directly opposite the upstream residue.
  • the complementary RNA sequence comprises an AUU or GUU relative to the three base motif.
  • the three-base motif is ACA, and wherein the complementary RNA sequence preferably comprises an AUU opposite to the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence comprises AUA, GUA, or CUA opposite the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence preferably comprises an AUA opposite the three-base motif.
  • the three-base motif is GCA, and wherein the complementary RNA sequence comprises UUG or UCG opposite the three-base motif.
  • the three-base motif is GCA, and wherein the complementary RNA sequence preferably comprises a UUG opposite the three-base motif.
  • the three-base motif is CCA, and wherein the complementary RNA sequence comprises an AUG opposite the three-base motif.
  • the target residue in the three-base motif in the target RNA is cytidine, and the upstream residue of the three-base motif is selected from the nucleosides of G, C, A and U Acids, the preferred order is G>C>A ⁇ U.
  • the arRNA deaminates the target cytidine (C) in the target RNA and converts it to uridine by recruiting a deaminase having cytidine deaminase activity to the target RNA.
  • the cytidine deaminase is an adenosine deaminase or an adenosine deaminase homologous protein variant modified (eg, through amino acid deletion or mutation at one or more sites) to have cytidine deamination activity.
  • the modified adenosine deaminase with cytidine deamination activity comprises one or more mutations disclosed in the prior art, such as those disclosed in Abudayyeh et al., 2019 Adenosine deaminase fragment with glycoside deamination activity.
  • the modified adenosine deaminase with cytidine deamination activity is an ADAR2 comprising one or more mutations selected from the group consisting of: E488Q/V351G/S486A/T375S/S370C/P462A/N597I/ L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T.
  • the modified adenosine deaminase with cytidine deamination activity is ADAR2 comprising all of the following mutations: E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/ K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T.
  • the cytidine deaminase is a fusion protein comprising all of the following mutated ADAR2 catalytic domains: E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/ S582T/V440I/S495N/K418E/S661T.
  • the deaminase having cytidine deaminase activity further comprises a targeting domain.
  • the targeting domain includes, but is not limited to, any one selected from the group consisting of a Cas13 protein that has been mutated to lose catalytic activity, a ⁇ N peptide, a SNAP-tag.
  • the method comprises introducing the cytidine deaminase or the fusion protein or a construct encoding the adenosine deaminase or the fusion protein into a host cell.
  • the constructs include, but are not limited to, linear nucleic acids, plasmids, viruses, and the like.
  • the arRNA is single-stranded RNA.
  • the complementary RNA sequence is completely single-stranded.
  • the arRNA comprises one or more (eg, 1, 2, 3, or more) double-stranded regions and/or one or more stem-loop regions.
  • the arRNA consists solely of the complementary RNA sequence.
  • the complementary RNA sequence has one or more mismatches with the target sequence in addition to the triple complementary bases.
  • one or more wobble pairings can occur when the complementary RNA sequence hybridizes to the target sequence.
  • one or more unilateral protrusions may appear when the complementary RNA sequence hybridizes to the target sequence.
  • one or more wobble pairings and one or more unilateral protrusions may occur when the complementary RNA sequence hybridizes to the target sequence.
  • the length of the arRNA is about 20-260 nucleotides, for example, the length of the arRNA is less than or equal to about 30, 40, 45, 50, 55, 60 , 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more nucleotides .
  • the length of the complementary RNA sequence is 40-260, 45-250, 50-240, 60-230, 65-220, 70-220, 70-210, 70-200, 70-190, 70 -180, 70-170, 70-160, 70-150, 70-140, 70-130, 70-120, 70-110, 70-100, 70-90, 70-80, 75-200, 80-190 , 85-180, 90-170, 95-160, 100-200, 100-150, 100-175, 110-200, 110-175, 110-150, or 105-140 nucleotides.
  • the arRNA is about 60-200 nucleotides in length (eg, about any of 60-150, 65-140, 68-130, or 70-120).
  • the arRNA further comprises an ADAR recruitment domain.
  • the arRNA comprises one or more chemical modifications.
  • the chemical modification comprises methylation and/or phosphorothioation, eg, 2'-O-methylation (2'-O-Me) and/or internucleotide phosphorothioate linkages .
  • the first and last 3 or 5 nucleotides of the arRNA comprise a 2'-O-Me modification, and/or the linkage between the first and last 3 or 5 nucleotides contains a thiol Phosphate bond modification.
  • one or more or all of the uridines in the arRNA comprise a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain 3'-phosphorothioate linkage modifications.
  • the arRNA does not contain any chemical modifications.
  • the target RNA is an RNA selected from the group consisting of precursor RNA, messenger RNA, ribosomal RNA, transfer RNA, long non-coding RNA and small RNA.
  • editing at target residues in the target RNA by the methods described herein results in missense mutations, premature stop codons, aberrant splicing or alternative splicing in the target RNA, or reversal in the target RNA missense mutations, premature stop codons, aberrant splicing, or alternative splicing.
  • editing of target residues in the target RNA by the methods described herein results in point mutation, truncation, elongation and/or misfolding of the protein encoded by the target RNA, or by reversing missense mutations in the target RNA , premature stop codons, aberrant splicing, or alternative splicing to obtain a functional, full-length, properly folded and/or wild-type protein.
  • the host cell is a eukaryotic cell.
  • the host cell is a mammalian cell.
  • the host cells are human or mouse cells.
  • an edited RNA or a host cell comprising the edited RNA can be generated. Accordingly, the present invention also provides the edited RNA produced by the method for editing a target RNA provided by the present invention or a host cell comprising the edited RNA.
  • the methods of editing target RNAs at target residue positions in host cells can be used in the treatment or prevention of diseases or disorders in an individual. Accordingly, the present invention also provides a method for treating or preventing a disease or disorder in an individual, comprising using any one of the aforementioned methods for editing a target RNA at a target residue position in a host cell to edit a cell of the individual with The target RNA associated with the disease or disorder.
  • the disease or disorder is an inherited genetic disease or a disease or disorder associated with one or more acquired genetic mutations (eg, drug resistance).
  • the present invention also provides an RNA (arRNA) that deaminates a target residue in a target RNA by recruiting a deaminase acting on the RNA, comprising an RNA (arRNA) that hybridizes to the target RNA, which can be used in the methods provided by the present invention.
  • RNA RNA
  • a complementary RNA sequence wherein the target residue is located in a three-base motif comprising the 5' nearest neighbor (upstream residue) of the target residue in the target RNA, the target residue and the 3' nearest neighbor (downstream residue) of the target residue in the target RNA, wherein the three-base motif is not a UAG, and wherein the complementary RNA sequence comprises the upstream residue of the target RNA and/or Directly relative mismatches of downstream residues.
  • the target residue of the three-base motif in the target RNA targeted by the arRNA is adenosine
  • the upstream residue in the target RNA is selected from G , C, A and U nucleotides, preferably G>C ⁇ A>U.
  • the three-base motif is selected from the group consisting of GAG, GAC, GAA, GAU, AAG, AAC, AAA, AAU, CAG, CAC, CAA, CAU, UAA, UAC and UAU.
  • the arRNA comprises cytidine (C), adenosine (A) or uridine (U) directly opposite to the target adenosine in the target RNA.
  • the arRNA comprises a C directly opposite the target adenosine in the target RNA.
  • the arRNA further comprises one or more mismatches when hybridized to the target RNA, each of the mismatches being opposite a non-target adenosine in the target RNA.
  • the mismatched nucleosides opposite the one or more non-target adenosines are guanosines.
  • the upstream residue of the three-base motif is a G
  • the base opposite the upstream residue in the complementary RNA is either a G or an A.
  • the three-base motif is GAU, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is ACG or ACA. In certain embodiments, the three-base motif is GAU, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is ACG. In certain embodiments, the three-base motif is GAA, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is UCA, CCG, CCC, or UCC. In certain embodiments, the three-base motif is GAA and the triple-complementary base in the arRNA directly opposite to the three-base motif is UCA.
  • the three-base motif is GAC, and wherein the arRNA comprises a triple complementary base directly opposite to the three-base motif is GCG or GCA. In certain embodiments, the three-base motif is GAC, and the triple-complementary base of the arRNA directly opposite to the three-base motif is GCG. In certain embodiments, the three-base motif is GAG, and wherein the arRNA comprises a triple complementary base directly opposite the three-base motif is CCG, CCA, CCC, UCC, or UCG. In certain embodiments, the three-base motif is GAG, and the triple-complementary base in the arRNA directly opposite to the three-base motif is CCG. In certain embodiments, the arRNA comprises one or more mismatches, each of which is opposite to a non-target adenosine in the target RNA.
  • the target residue in the three-base motif in the target RNA targeted by the arRNA can be cytidine (C), referred to as target cytidine.
  • the upstream residues of the three-base motif are nucleotides selected from the group consisting of G, C, A and U, preferably in the order G>C>A ⁇ U.
  • the three-base motif in which the target cytidine is located is selected from any of the following: GCG, GCC, GCA, GCU, ACG, ACC, ACA, ACU, CCG, CCC, CCA, CCU, UCA, UCC, UCU and UCG.
  • the upstream residue of the three-base motif is a G, and wherein the base opposite the upstream residue in the complementary RNA is a G.
  • the residue downstream of the three-base motif is A, and wherein the base opposite the downstream residue in the complementary RNA is U or A.
  • the three-base motif is ACA, and wherein the complementary RNA sequence comprises an AUU, or GUU, opposite the three-base motif.
  • the three-base motif is ACA, and wherein the complementary RNA sequence comprises an AUU opposite the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence comprises AUA, GUA, or CUA opposite the three-base motif.
  • the three-base motif is UCA, and wherein the complementary RNA sequence comprises an AUA opposite the three-base motif. In some embodiments, the three-base motif is GCA, and wherein the complementary RNA sequence comprises UUG or UCG opposite the three-base motif. In some embodiments, the three-base motif is GCA, and wherein the complementary RNA sequence comprises a UUG opposite the three-base motif. In some embodiments, the three-base motif is CCA, and wherein the complementary RNA sequence comprises an AUG opposite the three-base motif. In certain embodiments, the arRNA comprises unpaired nucleotides at positions corresponding to the target residues of the target RNA to form mismatches with the target residues.
  • the complementary RNA sequence in the arRNA that can hybridize to the target RNA comprises cytidine, adenosine or uridine directly opposite the target cytidine in the target RNA. In certain embodiments, the complementary RNA sequence comprises a cytidine directly opposite the target cytidine. In certain embodiments, the arRNA comprises one or more unpaired nucleotides at a non-target editing site corresponding to the target RNA to form one or more mismatches with the non-target site of the target RNA.
  • the arRNA is single-stranded RNA.
  • the complementary RNA sequence is completely single-stranded.
  • the arRNA comprises one or more (eg, 1, 2, 3, or more) double-stranded regions and one or more stem-loop regions.
  • the arRNA comprises one or more (eg, 1, 2, 3, or more) double-stranded regions.
  • the arRNA comprises one or more (eg, 1, 2, 3, or more) stem-loop regions.
  • the arRNA comprises a region capable of forming an intramolecular stem-loop structure for recruiting ADAR enzymes.
  • the arRNA does not comprise a region capable of forming an intramolecular stem-loop structure for recruiting ADAR enzymes.
  • the arRNA consists solely of the complementary RNA sequence.
  • one or more wobble pairings can occur when the complementary RNA sequence hybridizes to the target sequence.
  • one or more unilateral protrusions may appear when the complementary RNA sequence hybridizes to the target sequence.
  • one or more wobble pairs and one or more unilateral protrusions may occur when the complementary RNA sequence hybridizes to the target sequence.
  • the length of the arRNA is about 20-260 nucleotides, for example, the length of the arRNA is less than or equal to about 30, 40, 45, 50, 55, 60 , 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more nucleotides .
  • the length of the complementary RNA sequence is 40-260, 45-250, 50-240, 60-230, 65-220, 70-220, 70-210, 70-200, 70-190, 70 -180, 70-170, 70-160, 70-150, 70-140, 70-130, 70-120, 70-110, 70-100, 70-90, 70-80, 75-200, 80-190 , 85-180, 90-170, 95-160, 100-200, 100-150, 100-175, 110-200, 110-175, 110-150, or 105-140 nucleotides.
  • the arRNA is about 60-200 nucleotides in length (eg, about any of 60-150, 65-140, 68-130, or 70-120).
  • the arRNA further comprises an ADAR recruitment domain.
  • the arRNA comprises one or more chemical modifications.
  • the chemical modification comprises methylation and/or phosphorothioation, eg, 2'-O-methylation (2'-O-Me) and/or internucleotide phosphorothioate linkages .
  • the first and last 3 or 5 nucleotides of the arRNA comprise a 2'-O-Me modification, and/or the linkage between the first and last 3 or 5 nucleotides contains a thiol Phosphate bond modification.
  • one or more or all of the uridines in the arRNA comprise a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain a 2'-O-Me modification.
  • the targeting nucleoside and/or nucleosides adjacent to the 5' and/or 3' end of the targeting nucleoside contain 3'-phosphorothioate linkage modifications.
  • the arRNA does not contain any chemical modifications.
  • the present invention also provides a viral vector, plasmid or linear nucleic acid chain comprising any of the above-mentioned arRNAs provided by the present invention, and the arRNA does not contain any chemical modification.
  • the present invention also provides a library comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the present invention also provides a composition comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the present invention also provides a host cell comprising any of the above-mentioned arRNAs provided by the present invention or any of the above-mentioned viral vectors, plasmids or linear nucleic acid chains provided by the present invention.
  • the host cell comprising any of the above arRNAs provided herein is a eukaryotic cell.
  • a small piece of arRNA that is partially or fully complementary to the target RNA containing the target adenosine (A) is exogenously transferred, and the RNA is used to recruit endogenous ADARs to edit the target A from A to I.
  • the arRNA is synthesized in vitro and has a length of 71nt-111nt.
  • the arRNA used in the present invention is directly opposite to the three-base motif in the target sequence.
  • the complementarity of the three bases to the three-base motif is weaker, that is, in addition to the mismatch with target A, the three bases directly opposite to the three-base motif in the arRNA also contain upstream Residues and/or downstream residues are mismatched bases. It is this change that breaks the three-link preference, allowing existing and future editing methods using ADAR to more freely and easily modify three-base motifs with upstream residues G or other three-base motifs other than UAGs. More efficient editing.
  • Example 1 Construction of three base motif reporter system and corresponding arRNA
  • the clones with correct sequencing results were plasmid extracted and packaged into lentiviruses.
  • 293T cells were infected with these lentiviruses packaged with genes encoding different three-base motifs. After 48 hours of infection, 16 types of 293T cells that can transcribe mRNAs (target RNAs) containing different three-base motifs were obtained, which are the final three-base motif reporter system cells, and their names are as shown in Table 2.
  • the three base motifs are the same.
  • the synthetic method was synthesized by chemical synthesis.
  • 16 kinds of arRNA the design principle is based on the reverse complementary single-stranded RNA of the RNA fragment from the 3' downstream 55nt to the 5' upstream 25nt of the target A in the three-base motif on the mRNA, which is the same as the target in the three-base motif.
  • the base corresponding to A is C.
  • the bases corresponding to the upstream residues and the downstream residues are selected from one of A, C, G or U, respectively.
  • the specific sequence is shown in Table 3.
  • the 16 target RNAs described in Example 1 all carry the GFP green fluorescent protein nucleic acid sequence at the 3' end of the target sequence.
  • the sequence can be translated correctly and fluoresces green normally.
  • the three-base motif is UAG
  • UAG since UAG is a stop codon, translation will stop there, and thus cannot be translated into GFP.
  • the A in the UAG three-base motif was edited by the LEAPER system. If the editing is successful, UAG will be converted into UIG, and UIG will be recognized as UGG during the translation process, so that the translation will no longer be terminated, so that the downstream GFP can be translated normally. Therefore, we can roughly judge the editing efficiency of different arRNAs through the size of the GFP positive ratio.
  • RNAi MAX reagent Invitrogen 13778150
  • RNAi MAX reagent Invitrogen 13778150
  • the cells were 293T cells transcribing mRNA containing a UAG three-base motif. Cells were incubated with arRNA for 72 hours (48 hours post-transfection) to analyze FITC channel intensity by flow cytometry. The results are shown in Figure 7, where UT is a control without any transfection, and Vech is a control without any dRNA transfection by adding RNAiMAX transfection reagent.
  • FIG. 9 the editing efficiency of the UAG three-base motif using the LEAPER system using a reporter system similar to that in this example is shown in Figure 9 (Qu et al., 2019).
  • the horizontal axis number in Figure 9 is the arRNA sequence name, which corresponds to the subscript part of the horizontal axis name in Figures 7 and 8.
  • the arRNAs in Figures 7 and 8 are in the same order as the arRNAs in Figure 9. Since chemically synthesized arRNA is used for transfection in this example, and plasmid transfection is used in FIG. 9 , the overall editing efficiency in this example is relatively high, but the overall trend is the same as that in FIG. 9 .
  • the base corresponding to target A in the arRNA is C
  • the bases corresponding to upstream residue U and downstream residue G are A and C paired with U and G, respectively.
  • the highest efficiency, that is, the corresponding arRNA is arRNA CCA .
  • the editing efficiency not only does not improve significantly, but decreases.
  • the research results in this example are basically consistent with those reported in the literature, that is: for the editing of the UAG three-base motif, the three-base motif corresponding to the three-base motif on the arRNA is edited. Introducing more mismatches into each base does not improve editing efficiency.
  • RNA was reverse transcribed After 72h (48h after transfection), samples were collected by TRIZOL and RNA was extracted (TRIzol Reagent, ambion REF15596 026), 1 ⁇ g RNA was reverse transcribed, and the reverse transcription system was 20 ⁇ L ( One-Step gDNA Removal and cDNA Synthesis SuperMix, full-type gold AT311-02), take 1 ⁇ L of the reverse transcription product for PCR with the following pair of primers: gaggtgagtacggtgtgcGACGAGCTGTACAAGCTGCAGGG (SEQ ID NO: 1), gagttggatgctggatggTGGTGCAGATGAACTTCAGGGTCAG (SEQ ID NO: 2) ( Lowercase letters indicate the primer adapters required by the Hi-Tom kit) for PCR amplification, and the library was constructed by the Hi-Tom kit (Nuohozhiyuan, REF PT045).
  • next-generation sequencing was performed according to the following steps, and the editing efficiency of A->G in the editing site was analyzed.
  • the constructed sequencing library was used for high-throughput sequencing in PE150 mode through NovaSeq6000 platform.
  • the raw data obtained by high-throughput sequencing is quality-controlled with fastp (v0.19.6), and low-quality sequences with linker sequences and sequences containing polyG are filtered out.
  • the obtained high-quality sequencing data was split into each sample according to the corresponding barcode sequence with the self-developed split script, and the BWA (v0.7.17-r1188) software was used to align with the sequence of the amplified target region, through SAMtools (v1.9) Format conversion to generate BAM files, statistical alignment information and reordering and indexing.
  • RNA editing sites were detected using JACUSA (v1.3.0) software with parameters: call-1 -aB,R,D,I,Y,M:4 -C ACGT -c 2 -p 1 -P UNSTRANDED -R -u DirMult-CE. After filtering out high-frequency point mutations that appeared in both control and treated samples, three times the average mutation frequency other than A->G mutation was used as a threshold, and the editing site A->G mutation frequency above the threshold was used as a threshold. Partially serves as the frequency at which the true target A is mutated to G.
  • the arRNA design method of the present invention shows unexpected editing efficiency , the results are shown in Figure 11.
  • the efficiency trend of the arRNA design method of the present invention is particularly obvious on GAU.
  • the arRNA ACC designed according to the methods commonly used in the prior art basically has no editing efficiency, which is consistent with the literature reports.
  • the base in the arRNA opposite to the 5' upstream residue G in the three-base motif is an unpaired base, i.e. other than C, for example when using arRNA ACG , its Editing efficiency can be greatly improved.
  • the improvement is more than 10 times that of the inherent design (arRNA ACC ) in the prior art, and the arRNA ACC designed according to the prior art is consistent with the previous report, and the editing efficiency is extremely low.
  • the arRNA ACA , arRNA CCU , arRNA UCC, etc. in this example appropriately weaken the arRNA design of the three-base motif complementarity compared with the inherent design arRNA in the prior art ACC also showed a significantly higher editing efficiency.
  • the base opposite to target A in the arRNA is C
  • the base opposite to the downstream residue is the complementary base A of the downstream residue U
  • the base opposite to the upstream residue is C.
  • the highest efficiency is when G is opposite to the mismatched base G (ie arRNA ACG ).
  • the arRNA GCC designed according to the inherent principles of the prior art has basically no editing efficiency, while the arRNA GCG and arRNA GCA introduced with more mismatches have significantly higher editing efficiency.
  • the base opposite to target A in the arRNA is C
  • the base opposite to the downstream residue is the complementary base G of the downstream residue C
  • the base opposite to the upstream residue G is The highest efficiency is when it is a mismatched base G (ie arRNA GCG ).
  • the arRNA CCC designed according to the fixed mode of the prior art is not the most efficient, and the arRNA CCG and arRNA CCA with appropriately weakened complementarity are not the most efficient. Editing efficiency is significantly higher.
  • the base opposite to target A in the arRNA is C
  • the base opposite to the downstream residue is the complementary base of the downstream residue G.
  • Base C, and the opposite of the upstream residue G is the mismatch base G (ie arRNA CCG ) with the highest efficiency.
  • the editing efficiency of the arRNA UCC designed according to the fixed mode of the prior art is not high, but when the base opposite to the target A in the arRNA is C, the base opposite to the downstream residue A is the complementary base U, and when the complementary base to the upstream residue G is the mismatch base A, the editing efficiency of arRNA UCA is improved.
  • the arRNA designed according to the inherent technology that is, the opposite base of target A is C, and the other two bases are designed in accordance with the principle of base complementary pairing. At this time, the base paired with the upstream residue G of target A is C.
  • the base paired with the upstream residue G of the target A is A.
  • the base paired with the upstream residue G of target A is G.
  • the three-base motifs GAU, GAC, and GAA are the three with the weakest editing efficiency, and the efficiency is close to zero ( Figure 3). Therefore, it should be avoided as much as possible in the process of RNA editing.
  • the present invention breaks this limitation by creatively introducing more mismatched bases for the three-base motif into the arRNA. According to the embodiment of the present invention, it can be seen that, among the three bases in the arRNA opposite to the three-base motif, the base opposite to the target A is C, and the base opposite to the upstream and/or downstream residues is When bases are mismatched, editing efficiency can be significantly improved.
  • the base opposite to the downstream residue is often a complementary base, and when the mismatch base A is opposite to the upstream residue G, the editing efficiency is higher, and the editing efficiency is higher than that of the upstream residue.
  • the editing efficiency is the highest when G is opposite to the mismatched base G.
  • ADAR2 the catalytic domain of ADAR2 was mutagenized, and the mutation site was the same as that of r16 in this document (dADAR2 (E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L). /D619G/S582T/V440I/S495N/K418E/S661T)r16, https://benchling.com/s/seq-19Ytwwh0i0vSIbyXYZ95).
  • ADAR2 XmaI restriction site was donated by Professor Wei Wensheng's laboratory
  • AscI restriction site was donated by Professor Wei Wensheng's laboratory
  • the ADAR2 gene mutated in the catalytic domain was named ADAR2-r16.
  • the full-length cDNA sequence of ADAR2-r16 is shown in Table 6.
  • the BFP cDNA sequence was cloned into the pCDH-CMV plasmid vector through the multiple cloning site behind the CMV promoter (the pCDH-CMV plasmid backbone was a gift from Kazuhiro Oka, Addgene plasmid#72265; http://n2t.net/addgene:72265; RRID: Addgene_72265).
  • the C to U editing site in the reporter system is the base C at position 199 of the BFP sequence, then positions 199, 200 and 201 are CAC, corresponding to the 66th histidine.
  • the 198th, 199th, and 200th bases of the sequence are CCA, named BFP-CCA, abbreviated as C*.
  • BFP-CCA BFP-CCA
  • the amino acid at position 66 will be changed to change the BFP fluorescent protein from the original blue fluorescence to green fluorescence, so that it can be detected by flow cytometry FITC (Fluorescein isothiocyanate) channel detected signal.
  • flow cytometry FITC Fluorescein isothiocyanate
  • a site-directed mutagenesis kit Site-Directed Mutagenesis Kit, NEB E0554S
  • the three bases at 198, 199, and 200 were: GCA, named BFP-GCA, abbreviated as G*; ACA, named BFP-ACA, abbreviated as A*; TCA is named BFP-TCA, abbreviated as T*.
  • the C at position 199 was mutated to T, and the CTA was named BFP-CUA, abbreviated as CUA.
  • RNA in this example has the same meaning as all the terms "dRNA” herein, and can be used interchangeably.
  • the base opposite to the target residue in the three-base motif of the arRNA is located in the middle of the arRNA, and the 5' upstream and 3' downstream extend to both sides by the same length. Due to the limitation of synthesis length, in this example, RNA with a length of 91 nt was selected for in vitro synthesis.
  • C the four synthetic arRNAs are abbreviated as A*, U*, G*, C*, respectively.
  • the specific sequences of the four synthetic arRNAs are shown in Table 5 below.
  • the design of the four arRNAs in this batch of experiments only changed the target base relative to the target residue C, that is, A, U, G, and C at position 46.
  • the base at position 47 of arRNA (corresponding to position 198 of the reporter system) was designed according to the BFP sequence before mutation, namely CCA, in the four arRNAs.
  • arRNAs containing different triple complementary bases were synthesized when the target residue of the three-base motif was cytidine and the downstream residue was adenosine.
  • the specific sequences are shown in Table 8.
  • the 46th nucleotide of arRNA is fixed as U, and the 45th and 47th nucleotides are A, U, G, and C, respectively, so there are 16 types in total.
  • Each arRNA is named according to the following principles: all arRNA names start with "arRNA", followed by the following superscripts to display the triple complementary bases on the arRNA. On the basis that the arRNA target base corresponding to the mRNA target residue C is U, the triple complementary base is displayed, and the display sequence of the triple complementary base is the sequence of 5'-3'.
  • the upstream residue of the target residue C is C
  • the 3' nearest neighbor residue of the corresponding arRNA target base is G
  • the target residue C corresponding to the arRNA target base is U
  • the downstream residue of the target residue C is A
  • the 5' nearest neighbor residue U of the target base of the arRNA corresponds to U
  • the triple complementary base contained in the arRNA is UUG.
  • the antisense RNA is named It is: arRNA U UG.
  • ADAR2-r16-293T was plated to a 6-well plate at a density of 300,000 cells/well, 24 hours after plating, transfected with Lipofectamine TM 3000 Transfection Reagent (Invitrogen, Catalog number: L3000015), the transfection steps were carried out according to the instructions, according to the instructions Two replicate experiments were performed with different concentrations of Lipofectamine 3000 transfection reagent. Repeat 1 used 3.75 ⁇ L and Repeat 2 used 7.5 ⁇ L of transfection reagent concentration per well.
  • BFP-GCA BFP-ACA
  • BFP-TCA BFP-TCA
  • BFP-CUA BFP-CUA
  • the mRNA row represents the BFP reporter system plasmid added to the corresponding well
  • the arRNA row represents the arRNA added to the corresponding well.
  • the three bases 198, 199, and 200 are CCA in the original sequence, and when the C at position 198 is changed to A, T or G, the amino acid at position 65 is threonine, so BFP-GCA, BFP- The difference in position 198 of the four different reporter systems CCA, BFP-ACA, and BFP-TCA will not cause changes in the original protein function.
  • the background GFP signal MFI of the reporter system was about 5 ⁇ 10 4 (the reporter system was marked as U*, the arRNA was marked as /; and the reporter system was marked as A*, and the arRNA was marked as /).
  • the MFI of the GFP signal was about 2.4 ⁇ 10 6 -3.1 ⁇ 10 6 , which was about 2.4 ⁇ 10 6 higher than the background value. 100 times. Therefore, when the 199-position C changes to U at the RNA level, it will cause about a 100-fold increase in the GFP signal MFI.
  • the applicant integrated the four plasmids BFP-GCA, BFP-ACA, BFP-TCA, and BFP-CCA through lentiviral packaging.
  • Lipofectamine 3000 was used as described above because the plasmids needed to be transfected at the same time. In the triplex preference test, because only arRNA was needed to be transfected, no plasmid transfection was required, so the use of Lipofectamine 3000 was used.
  • Lipofectamine TM RNAiMAX Transfection Reagent Invitrogen, Catalog number: 13778100).
  • 293T or ADAR2-r16-293T containing different reporter systems were plated to 12-well plates at a density of 150,000 cells/well, 15 pmol arRNA was transfected with RNAiMAX reagent 24 hours after plating, and FITC channel signal was detected by FACS 48 hours after transfection Intensity, the percentage of GFP+ cells was counted.
  • the triple complementary base in the arRNA there is only a mismatch with the target C, and the mismatched base corresponding to the target C is U, and the upstream residue and the downstream residue of the target C are completely matched (ie :
  • the reporter system is BFP-GCA
  • the triple complementary base in the arRNA is UUC
  • the reporter system is BFP-ACA
  • the triple complementary base in the arRNA is UUU
  • the reporter system is BFP -TCA
  • the triple complementary base in the arRNA is UUA
  • the reporter system is BFP-CCA
  • the triple complementary base in the arRNA is UUG
  • untreated means not adding arRNA control
  • random RNA sequence means adding 91nt random sequence RNA control (see Table 8 Ran-91 for the specific sequence)
  • arRNA means adding corresponding matching arRNA according to the above rules. From Figure 17, we can see that when the triplet base is TCA or ACA, the system has higher editing efficiency, and when the triplet base is GCA or CCA, the editing efficiency is almost zero.
  • Figure 19A shows the pairing relationship between the mRNA three-base motif used in Figure 18 and the arRNA triple-complementary base
  • Figure 19B shows the mRNA three-base motif used in Figure 17 and the arRNA triple-complementary base pairing of bases.
  • the difference between the two is that the arRNA base opposite to the upstream residue of target C in the former (Fig. 19A) is G, except when the upstream residue of target C is C, the upstream residue and arRNA in other cases Both formed mismatches; whereas the arRNA bases of the latter (FIG. 19B) opposite the upstream residues of target C were all strictly complementary bases. Therefore, we speculate that the reason for the above contradiction lies in the mismatch between the upstream residues in the triple complementary base and the arRNA, or the change in the preference of the triple. .
  • the controls in the 4 figures are the same sample; "91nt random sequence” is the control with 91nt random sequence added, "Vector only” is the control with only RNAiMAX transfection reagent but no RNA added, and "Opti-DMEM medium” is only added with The control of the same volume of Opti-DMEM without RNAiMAX transfection reagent, "untreated” is the control without transfection, wherein arRNA UAG , arRNA UUG , arRNA UCG , arRNA UGG are respectively the same as CCA-arRNA UAG , CCA-arRNA UUG , CCA-arRNA UCG , CCA-arRNA UGG have the exact same sequence, but were synthesized through two different batches.
  • the preference for introducing multiple mismatches is shown, that is, when the three-base motif is ACA, the editing efficiency is higher when the triple-complementary base in the arRNA is AUU or GUU, and the triple-complementary base is more efficient.
  • the base is AUU is relatively higher; when the three-base motif is UCA (TCA in the plasmid), the editing efficiency is higher when the triple-complementary base in the arRNA is AUA, GUA or CUA, and the triple-complementary base is AUA Higher; when the three-base motif is GCA, the editing efficiency is higher when the three-link complementary base in arRNA is UUG or UCG, and the editing efficiency is higher when UUG; when the three-base motif is CCA, the three-linkage in arRNA is higher. The editing efficiency is higher when the complementary base is AUG.
  • this example also compares the editing efficiency in the presence of mismatches between upstream residues and/or downstream residues. and editing efficiency in the case where only target residues are mismatched.
  • the results are also shown in Figure 22, it can be found that when the upstream residue of the three-base motif is A or U, the mismatch with the upstream residue and/or the downstream residue directly opposite can reach the same level as only a single mismatch of the target residue. Equivalent to the editing efficiency of the situation.
  • the three-base motif is ACA
  • UCA the editing efficiency of AUA with only a single mismatch with the target residue
  • the triple-complementary base UUA is the triple-complementary base UUA and the mismatch with the upstream and/or downstream residues directly opposite.
  • the three-base motif is GCA
  • the UUC efficiency of the triple-complementary base with only a single base mismatch with the target residue is close to 0
  • the UUC efficiency of the three-base complementary base with the upstream residue and/or the directly opposite mismatch of the downstream residue is close to 0.
  • the editing efficiency of UUG and UCG can be several times to more than 10 times that of UUC.
  • the three-base motif is CCA
  • the AUG that introduces the mismatch of the upstream residue and/or the directly opposite of the downstream residue also has similar editing efficiency to UCG.
  • the preferred sequence of mismatches with the upstream residues in the three-base motif is G>C>A ⁇ U, that is, when the upstream residues of the three-base motif are Introducing a mismatched G to the upstream residue at the time of G can significantly improve the editing efficiency.
  • the editing efficiency can be improved to 6% to 8% GFP+, but for CCA, despite the introduction of upstream residues and/or downstream residues.
  • the highest efficiency did not exceed 2.5% GFP+ for direct relative mismatch of downstream residues.
  • this case greatly improves the editing ability of GCA by introducing additional base mismatches.
  • This case breaks the long-standing limitation of editing site selection in RNA editing applications.
  • the present invention enables more genetic diseases caused by gene mutations to be treated more safely and efficiently by means of RNA editing.
  • m indicates that the right base is modified with dimethoxy (2'-O-me); * indicates that the two nucleotides before and after it are connected by a phosphorothioate bond; underlined The 3 bases of nucleic acid directly opposite to the three-base motif on the target RNA when the arRNA hybridizes to the target RNA

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Virology (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Saccharide Compounds (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,该方法包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中该arRNA包含与靶标RNA杂交的互补RNA序列,其中该靶标残基位于一个三碱基基序中,该三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基)、靶标残基、和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中该三碱基基序不是UAG,并且其中该互补RNA序列包含与该靶标RNA上的上游残基或下游残基直接相对的错配。还提供了用于该方法的arRNA、通过该方法获得的RNA、包含该RNA的宿主细胞,以及通过该方法治疗疾病的应用。

Description

一种改善的RNA编辑方法 技术领域
本发明属于基因编辑领域,具体地属于RNA编辑领域,包括将脱氨酶招募RNA(dRNA,也称arRNA)或编码该arRNA的构建体引入宿主细胞,在宿主细胞的靶标残基位置编辑靶标RNA。
背景技术
CRISPR技术
近年来,以CRISPR(Clustered regularly interspaced short palindromic repeats,WO2014018423A3)为首的基因组编辑技术正在飞速发展,并对生物以及医学诸多领域产生了深远的影响。许多科研工作者和生物技术公司也在致力于将该技术推上临床。2019年9月,北京大学邓宏魁教授与其合作者发表文章,首次报道了利用CRISPR技术编辑干细胞并将其回输至患者,以治疗其艾滋病和白血病的临床实验结果,为CRISPR技术在基因治疗方向的转化做出了巨大的贡献。
尽管CRISPR技术存在极大的潜在应用前景,但该技术也存在一系列缺陷,导致该技术从科研阶段向临床治疗应用中的转化步履维艰。问题之一便是CRISPR技术用到的核心作用酶:Cas9。基于CRISPR的DNA编辑技术,必须引入外源表达Cas9或拥有相似功能的其它核酸酶,从而造成了以下几个问题。首先,需要外源表达的核酸酶通常具有较大分子量,这使得通过病毒载体将其递送至体内的效率急剧下降。其次,Cas9的表达在多人的研究中被证实存在潜在致癌风险。p53是研究最多的抑癌基因,Haapaniemi等人研究发现,Cas9系统能够激活p53引起的DNA损伤(Haapaniemi et al.,2018),而Enache等人也发现Cas9蛋白的过表达能够选择性富集p53失活突变的细胞(Enache et al.,2020)。此外,Adikusuma发现Cas9编辑后的小鼠受精卵中存在大量大片段DNA缺失(Adikusuma et al.,2018),而Cullot等人更是发现,Cas9进行编辑后基因组上会出现上百万碱基的大片段缺失,更重要的是,这些缺失的片段中包括5个原癌基因和7个抑癌基因(Cullot et al.,2019)。最后,外源表达的Cas9通常是细菌来源的,例如金黄葡萄球菌或酿脓链球菌,而非人类或哺乳动物天然存在的,这使得其可能引起患者体内的免疫反应。Charlesworth等人 研究发现在人血清中存在Cas9的IgG抗体(Charlesworth et al.,2019)。这一方面可能会使外源表达的核酸酶被中和,从而失去应有的活性,另一方面也可能会对患者自身造成损伤甚至毒性或者阻碍进一步的干预治疗。
RNA层面的A到I编辑
为了避免DNA编辑中潜在的风险,科学家们也同时对RNA编辑产生了浓厚的兴趣。存在于DNA中的遗传信息,需要转录成RNA,并进一步翻译成蛋白质才能发挥正常的生理功能,这被称为生物的中心法则。相比于DNA层面的编辑,RNA层面的编辑,在避开了基因组损伤的同时,又能在最终生物功能上做出改变。常见的RNA编辑之一,是ADA R(Adenosine deaminases acting on RNA)介导的腺苷A到鸟苷I的编辑。麻省理工学院张锋(Feng Zhang)教授课题组2017年曾报道一种名为REPAIR(RNA Editing for Programm able A to I Replacement)的RNA编辑技术通过外源表达Cas13-ADAR融合蛋白及单向导RNA(single guide RNA,sgRNA)同样可以实现靶向目标RNA的A到I的编辑(Cox et al.,2017)。该方法中,Cas13与sgRNA结合,行使靶向的功能,将融合蛋白带至需要编辑的位点,同时ADAR去氨基化结构域发挥催化作用,实现A到I的编辑。但是该方法同CRIS PR技术一样,仍需要外源蛋白的表达。无法解决外源蛋白表达造成的问题。
为了解决以上问题,以便将核酸编辑技术更好地应用于医学领域,迫切地需要找到一种新的核酸编辑技术,尤其是不依赖于外源蛋白表达的新技术。2019年7月,北京大学生命科学学院魏文胜教授课题组在Nature Biotechnology上发表文章,“Programmable R NA editing by recruiting endogenous ADAR using engineered RNAs”,首次报道了新的核酸编辑技术:LEAPER(Leveraging Endogenous ADAR for Programmable Editing of RNA,Qu et al.,2019)(WO2020074001A1)。与CRISPR(WO2014018423A3)及REP AIR技术(WO2019005884A1)不同的是,该技术从原理上摆脱了对外源核酸酶过表达的依赖,使得该技术在向医学领域转化的过程中,有更大的优势。但该技术只能实现腺苷A到肌酐I的编辑,即腺苷A到鸟苷G的编辑(因为肌酐I在蛋白质翻译过程中会被识别为鸟酐G),因此其应用上依然有所局限。与CRISPR技术类似,该技术同样需要一段RNA作为向导,以便将内源的核酸酶招募至所需编辑的位点。该段向导RNA被命名为arRNA(adar-recruiting RNA)。
2019年1月,Thorsten Stafforst课题组也曾报道与LEAPER技术类似的核酸编辑技术,名为RESTORE(Recruiting Endogenous ADAR to Specific Transcripts for Oligonucleotide-mediated RNA Editing,WO2020001793A1)。与LEAPER类似,RESTORE也能够摆脱 对外源蛋白的依赖。但与LEAPER不同的是,首先,RESTORE技术需要在IFN-γ存在的前提下才能有较高的编辑效率,而IFN-γ是决定自体免疫发展和严重程度的关键因子(Pollard et al.,2013),这使得该技术在医学领域的应用大打折扣。另一方面,RESTORE技术中同样也用到一段向导RNA,而其使用的向导RNA必须为化学合成的寡核苷酸,并且所述合成的寡核苷酸需要人为引入大量的化学修饰以保证其稳定性。在这些化学修饰中,有一部分修饰可能会存在潜在的毒性或是免疫原性,也有一部分修饰会导致相同碱基链的不同构象,使得相同序列的RNA可能有数十种不同的构象组合。相比之下,LEAPER技术不仅可以通过化学合成RNA完成,也可以通过腺相关病毒(AAV)、慢病毒等载体递送至患者细胞内发挥功能,这使得其在递送手段的选择上更加灵活多变。
A到I编辑位点的上下游残基或序列
在DNA的编辑中,经过编辑的位点会通过复制传递至所有子代细胞中,因此DNA层面的编辑就算效率相对低一些,也可以通过对子代细胞的筛选等方式富集编辑过的细胞。于此不同的是,在RNA编辑的过程中,所造成的编辑是不会被遗传的。因此,一方面RNA编辑中的脱靶靶点无法遗传到子代,这使得RNA层面的编辑比DNA编辑更安全,另一方面也使得RNA编辑的效率显得更为重要。在A到I的RNA编辑中,无论是REPAIR(WO 2019005884A1)、RESTORE(WO2020001793A1)还是LEAPER(WO2020074001A1)体系,都用到ADAR作为催化反应的关键酶。在哺乳动物细胞中,有三种类型的ADAR蛋白,ADAR1(两个同种型,p110和p150),ADAR2和ADAR3(无催化活性)。ADAR蛋白的催化底物是双链RNA,它可以从腺苷(A)核苷碱基中去除-NH2基团,将A变为肌苷(I),后者被识别为鸟苷(G)并且在之后的细胞生理过程中,例如逆转录和翻译过程中,或病毒RNA在胞内复制过程中,与胞苷(C)配对。由于ADAR的特定性质,一些同样的因素影响着REPAIR、RESTORE及LEAPER编辑系统对RNA的编辑效率。这其中之一便是编辑位点的上下游残基及序列。被编辑的腺苷A(靶标A),即这里的靶标残基,在mRNA中5'上游和3'下游相邻的碱基各是什么碱基对编辑的效率影响较为明显。为叙述方便,我们将与靶标残基相邻的5'上游碱基(上游残基)、靶标残基以及与靶标残基相邻的3'下游碱基(下游残基)按照5'到3'的顺序连起来形成的基序称为“三碱基基序”。由于靶标A的上游残基和下游残基均可以为A、U、C、G,因此所述三碱基基序可以有16种组合,即AAA、AAU、AAC、AAG、UAA、UAU、UAC、UAG、CAA、CAU、CAC、CAG、GAA、GAU、GAC、GAG。针对不同的三碱基基序,在REPAIR、RESTORE、LEAPER体系中均有不同的编辑效率,这种针对不同三碱基基序编辑效率不同的情况在本文中称为“三连偏 好性”。
在REPAIR体系中,由于采用了Cas13-ADAR的融合蛋白,该体系的三连偏好性略有不同。如图1所示(Cox et al,2017),REPAIR体系对三碱基基序GAC的编辑效率最低,而对UAU编辑效率最高,相差约2~3倍。
在RESTORE体系中,作者没有直接展示对三碱基基序偏好性的数据,但是文章中曾引用另一篇文章(Vogel et al.,2018),并说明可能与该体系偏好一致(Merkle et al.,2019)。如图2所示(Vogel et al.,2018),图中三角形为SA1Q,具体实施方法为将人ADAR1催化结构域与人O 6-烷基鸟嘌呤DNA烷基转移酶(O 6-alkylguanine-DNA-alkyl transferase,hAGT)C端结构域(SNAP-tag)进行融合,并在835位氨基酸进行谷氨酸到谷氨酰胺的突变(E-Q),再通过SNAP-tag与向导RNA进行共价交联(Keppler,A.et al.,2003;Stafforst,T.,et al.,2012);图中方块形为SA2Q具体实施方法为将人ADAR2催化结构域与人O 6-烷基鸟嘌呤DNA烷基转移酶(O 6-alkylguanine-DNA-alkyl transferase,hAGT)C端结构域(SNAP-tag)进行融合,并在1310位氨基酸进行谷氨酸到谷氨酰胺的突变(E-Q),再通过SNAP-tag与向导RNA进行共价交联(Keppler,A.et al.,2003;Stafforst,T.,et al.,2012)。可以看到在两种不同的ADAR中,其三连偏好性具有趋势类似的明显差异。在5'上游残基是G的时候,即GAA、GAU、GAC、GAG中其编辑效率通常远低于其它三碱基基序,甚至接近未编辑时的水平,而UAG是其中编辑效率最高的几个三碱基基序之一。如图2所示,该体系对UAG的编辑效率最多可达到其对上游残基为G的三碱基基序的编辑效率的10倍。
在LEAPER体系中,作者直接测试了该体系的三连偏好性(Qu et al.,2019)。如图3所示,在LEAPER体系中,由于该体系与RESTORE体系相同,均采用完整的未经修饰和修改的ADAR,所以并不难理解,其呈现出与RESTORE体系类似的三连偏好性。从图3中我们可以看出,LEAPER体系中编辑效率最低的也是GAA、GAU、GAC、GAG的三碱基基序,并且接近于零,而效率最高的三碱基基序是UAG,LEAPER体系对UAG的编辑效率同样最多可达到其对上游残基为G的三碱基基序的编辑效率的10倍以上。综上所述,在REPAIR体系中,由于采用了外源过表达的Cas13-ADAR,因此其三连偏好性略有不同。而在LEAPER体系与RESTORE体系中,由于均采用未经修饰和更改的ADAR,其三连偏好性呈现出近似的模式。在所有三碱基基序中,利用这类以未经修饰和更改的ADAR进行编辑时,其针对UAG的编辑效率最高或者是最高的几个三碱基基序之一。而当5’上游残基为G的时候,其编辑效率明显降低,甚至接近于零,两者之间相差10倍以上。这说明在现有技术中,利用内源ADAR进行RNA编辑的体系,几乎无法对三碱基基序中5’上游 残基为G的位点进行编辑。
发明概述
在现有技术中,利用脱氨基酶进行RNA编辑的技术体系的三连偏好性限制了现有RNA编辑技术的应用范围。例如,现有的RNA编辑技术对于三碱基基序中上游残基为G的位点近乎束手无策,而这使得该体系在疾病治疗的应用之中大打折扣。当我们面对遗传病时,如果其致病基因突变的上游残基刚好是G,我们便很难使用已知的RNA编辑手段进行纠正和治疗。而本发明所要解决的问题,正是针对现有技术中所偏好的三碱基基序以外的其它三碱基基序,例如UAG以外的其它三碱基基序,通过对用于招募脱氨基酶到达靶标RNA进行精准编辑的招募RNA(deaminase recruiting RNA,dRNA或arRNA)序列进行调整,从而在无需对现有脱氨基酶做任何修饰或更改的情况下突破三连偏好性的限制,使三碱基基序中上游残基为G或者C时编辑效率大大提高。
因此,一方面,本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基或下游残基直接相对的错配。
在一些实施方案中,本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基和下游残基直接相对的错配。
在某些实施方案中,所述三碱基基序的上游残基为G。在某些实施方案中,所述三碱基基序的上游残基为A。在某些实施方案中,所述三碱基基序的上游残基为C。在某些实施方案中,所述三碱基基序的下游残基为C。在某些实施方案中,所述三碱基基序的下游残基为U。在某些实施方案中,所述三碱基基序的下游残基为A。在某些实施方案中,所述三碱基基序选自GAG,GAC,GAA,GAU,AAG,AAC,AAA,AAU,CAG,CAC, CAA,CAU,UAA,UAC和UAU。
根据本申请提供的方法,在一些实施方案中,当所述三碱基基序的上游残基为G时,其中所述互补RNA中与所述上游残基相对的碱基为G。在一些实施方案中,当所述三碱基基序上游残基为G时,其中所述互补RNA中与所述上游残基相对的碱基为A。在一些实施方案中,所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为ACG或ACA。在一些实施方案中,所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为ACG。在一些实施方案中,所述三碱基基序是GAA,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为UCA、CCG、CCC或UCC。在某些实施方案中,所述三碱基基序是GAA,所述互补RNA序列中与所述三碱基基序直接相对的三连互补碱基为UCA。在一些实施方案中,所述三碱基基序是GAC,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为GCG或GCA。在某些实施方案中,所述三碱基基序是GAC,所述互补RNA序列与所述三碱基基序直接相对的三连互补碱基为GCG。在一些实施方案中,所述三碱基基序是GAG,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为CCG、CCA、CCC、UCC或UCG。在某些实施方案中,所述三碱基基序是GAG,所述互补RNA序列中与所述三碱基基序直接相对的三连互补碱基为CCG。
在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的胞苷(C),腺苷(A)或尿苷(U)。在一些特定的实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C。
根据本发明所述的方法,在一些实施方案中,所述互补RNA序列与靶标RNA杂交时还包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。在某些实施方案中,与一个或多个非靶标腺苷相对的错配核苷为鸟苷。
在一些实施方案中,所述三碱基基序上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G或A。在一些实施方案中,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。在一些实施方案中,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G或A,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G或A,所述三碱基基序的下游残基与所述互补RN A中相对的碱基严格互补。在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。
在本申请的上述RNA编辑方法中,RNA的编辑效率相对于现有技术提高至少90%至1100%,例如提高至少100%、200%、300%、400%、500%、600%、700%、800%、900%、1000%。
在一些实施方案中,所述靶RNA中的靶标腺苷(A)通过腺苷脱氨酶(Adenosine Deaminase Acting on RNA,ADAR)脱氨基。在某些实施方案中,所述腺苷脱氨酶为天然ADAR或其同源蛋白。在某些实施方案中,所述腺苷脱氨酶为经过修饰但保留了腺苷脱氨酶活性的腺苷脱氨酶功能变体,例如在天然ADAR或其同源蛋白基础上经一个或多个位点突变修饰但仍然具有腺苷脱氨酶活性的变体。在某些实施方案中,所述腺苷脱氨酶为包含ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为包含经突变丧失催化活性的Cas13蛋白与ADAR功能结构域或ADAR同源蛋白功能结构域或腺苷脱氨酶功能变体的融合蛋白。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶通过外源引入所述宿主细胞中或通过导入该脱氨酶的构建体在宿主细胞中表达。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为包含λN肽与ADAR功能结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为SNAP-tag标记的ADAR或SNAP-tag标记的ADAR功能变体。在某些实施方案中,所述ADAR是ADAR1和/或ADAR2。在一些实施方案中,ADAR是选自由hADAR1,hADAR2,小鼠ADAR1和小鼠ADAR2构成的组的一种或多种ADAR。
在某些实施方案中,所述ADAR由所述宿主细胞表达。在某些实施方案中,ADAR天然地或内源地存在于宿主细胞中,例如,天然地或内源地存在于真核细胞中。在某些实施方案中,所述ADAR蛋白经外源引入所述宿主细胞中。在某些实施方案中,将所述ADAR或编码所述ADAR的构建体引入宿主细胞。在一些实施方案中,所述构建体选自包括但不限于以下的任一项:线性核酸、质粒、病毒等。在上述方法中,所述ADAR包括上述天然ADAR、其同源蛋白、经过修饰但保留了腺苷脱氨酶活性的腺苷脱氨酶功能变体(例如在天然ADAR或其同源蛋白基础上经一个或多个位点突变修饰但仍然具有腺苷脱氨酶活性的变体)或包含ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功 能变体的融合蛋白。在一些实施方案中,所述方法不包含将任何蛋白质引入宿主细胞中。在某些实施方案中,所述ADAR是ADAR1和/或ADAR2。在一些实施方案中,ADAR是选自由hADAR1,hADAR2,小鼠ADAR1和小鼠ADAR2构成的组的一种或多种ADAR。
本申请另一方面提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其中所述靶标残基是胞苷,所述arRNA招募作用于RNA的具有胞苷脱氨酶活性的脱氨酶(或称为“胞苷脱氨酶”,在本申请中,具有胞苷脱氨酶活性的脱氨酶和胞苷脱氨酶表示相同含义,可互换使用),以使所述靶标RNA中的靶标胞苷脱氨。在一些实施方案中,将具有胞苷脱氨酶活性的脱氨酶或包含编码具有胞苷脱氨酶活性的脱氨酶的的构建体引入宿主细胞。在所述方法中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中靶标残基为胞苷(C),其中所述互补RNA序列包含与所述靶标RNA上的上游残基和/或下游残基直接相对的错配。
在一些实施方案中,所述靶标胞苷所在的三碱基基序选自以下任一项:GCG,GCC,GCA,GCU,ACG,ACC,ACA,ACU,CCG,CCC,CCA,CCU,UCA,UCC,UCU和UCG。在一些实施方案中,所述arRNA在对应于靶标RNA的靶标残基的位置包含非配对核苷酸,以形成和靶标残基的错配。在一些实施方案中,所述arRNA中可与所述靶标RNA杂交的互补RNA序列包含与所述靶RNA中的所述靶标胞苷直接相对的胞苷,腺苷或尿苷。在某些实施方案中,所述互补RNA序列包含与所述靶标胞苷直接相对的尿苷。在某些实施方案中,所述arRNA在对应于靶RNA的非靶标编辑位点包含一个或多个非配对核苷酸,以形成与靶标RNA的非靶标位点的一个或多个错配。
在一些实施方案中,所述三碱基基序上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G。在一些实施方案中,所述三碱基基序下游残基为A,且其中所述互补RNA中与所述下游残基相对的碱基为U或A。在一些实施方案中,所述三碱基基序为ACA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUU或GUU。在一些实施方案中,所述三碱基基序为ACA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUU。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUA、GUA或CUA。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUA。在一些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的U UG或UCG。在一些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG。在一些实施方案中,所述三碱基基序为CCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUG。
在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶是对ADAR蛋白或包含ADAR催化结构域的融合蛋白进行基因修饰后获得C到U催化活性的脱氨酶。在某些实施方案中,所述胞苷脱氨酶为经修饰的ADAR2,且包含选自如下一个或多个突变的ADAR2催化结构域:E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T。在某些实施方案中,胞苷脱氨酶为包含如下全部突变的ADAR2催化结构域的融合蛋白:E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶还包含靶向结构域。在某些实施方案中,所述靶向结构域包含但不限于选自以下的任一项:经突变丧失催化活性的Cas13蛋白、λN肽、SNAP-tag。包含经突变丧失催化活性的Cas13蛋白。在一些实施方案中,所述融合蛋白包含经突变丧失催化活性的Cas13蛋白和具有胞苷脱氨酶活性的ADAR2催化结构域。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶通过外源引入所述宿主细胞中或通过导入该脱氨酶的构建体在宿主细胞中表达。
在某些实施方案中,所述方法包括将所述胞苷脱氨酶或融合蛋白或编码所述胞苷脱氨酶或融合蛋白的构建体引入包含靶标RNA的细胞,其中编码所述胞苷脱氨酶或融合蛋白的构建体选自包括但不限于以下的任一项:线性核酸、质粒、病毒、及线性核酸。在某些实施方案中,所述靶标RNA中的三碱基基序中的靶标残基为胞苷,所述三碱基基序的上游残基选自G,C,A和U的核苷酸,优选的次序是G>C>A≈U。
根据本申请的上述方法,所述arRNA为单链RNA。在一些实施方案中,所述互补RNA序列完全是单链。在某些实施方案中,所述arRNA包含一个或多个(例如,1、2、3或更多个)双链区和/或一个或多个茎环区。在某些实施方案中,所述arRNA仅由所述互补RNA序列组成。
根据本发明所述的方法,在一些实施方案中,所述arRNA的长度为约20-260个核苷酸,例如所述arRNA的长度为40-260、45-250、50-240、60-230、65-220、70-220、70-210、70-200、70-190、70-180、70-170、70-160、70-150、70-140、70-130、70-120、70-110、70-100、70-90、70-80、75-200、80-190、85-180、90-170、95-160、100-200、100-150、100-175、110-200、110-175、110-150或105-140个核苷酸中的任一项。在一些实施方案中, 所述arRNA长约60-200个核苷酸(例如约60-150、65-140、68-130或70-120中的任何一个)。在一些实施方案中,所述arRNA还包含ADAR招募结构域。
根据本发明所述的方法,在一些实施方案中,所述arRNA包含一种或多种化学修饰。在一些实施方案中所述化学修饰包含甲基化和/或硫代磷酸化,例如2′-O-甲基化(2′-O-Me)和/或核苷酸间硫代磷酸酯键。在某些实施方案中,所述arRNA的首尾的3个或5个核苷酸包含2'-O-Me修饰,和/或其首尾的3个、4个或5个核苷酸间的连接含有硫代磷酸酯键修饰。在某些实施方案中,所述arRNA中的一个或多个或所有尿苷包含2′-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含2'-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含3'-硫代磷酸酯键修饰。在某些实施方案中,所述arRNA不包含任何化学修饰。
本发明还提供所述经本发明提供的编辑靶标RNA的方法所产生的经编辑的RNA或包含经编辑的RNA的宿主细胞。
本发明所提供的在宿主细胞的靶标残基位置编辑靶标RNA的方法可用于个体中治疗或预防疾病或病症中。因此本发明还提供一种用于在个体中治疗或预防疾病或病症的方法,包括使用前述本发明所提供的任意一种在宿主细胞的靶标残基位置编辑靶标RNA的方法编辑个体细胞中与所述疾病或病症相关的靶标RNA。在一些实施方案中,所述疾病或病症是遗传性基因疾病或与一种或多种获得性基因突变(例如,药物抗性)相关的疾病或病症。
本发明还提供了可以在本发明所提供的方法中使用的一种通过招募作用于RNA的脱氨酶对靶标RNA中的靶标残基脱氨基的RNA(arRNA),其包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中的靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中的靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与靶标RNA的上游残基和/或下游残基直接相对的错配。
根据本发明所提供的arRNA,所述arRNA包含与所述靶标RNA中的所述靶标腺苷直接相对的C。在某些实施方案中,所述arRNA与靶标RNA杂交时还包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。在某些实施方案中,与一个或多个非靶标腺苷相对的错配核苷为鸟苷。在某些实施方案中,所述三碱基基序是GAU,并且 其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为ACG或ACA。在某些实施方案中,所述三碱基基序是GAU,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为ACG。在某些实施方案中,所述三碱基基序是GAA,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为UCA、CCG、CCC或UCC。在某些实施方案中,所述三碱基基序是GAA,所述arRNA中与所述三碱基基序直接相对的三连互补碱基为UCA。在某些实施方案中,所述三碱基基序是GAC,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为GCG或GCA。在某些实施方案中,所述三碱基基序是GAC,所述arRNA与所述三碱基基序直接相对的三连互补碱基为GCG。在某些实施方案中,所述三碱基基序是GAG,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为CCG、CCA、CCC、UCC或UCG。在某些实施方案中,所述三碱基基序是GAG,所述arRNA中与所述三碱基基序直接相对的三连互补碱基为CCG。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA的长度为约20-260个核苷酸,例如40-260、45-250、50-240、60-230、65-220、70-220、70-210、70-200、70-190、70-180、70-170、70-160、70-150、70-140、70-130、70-120、70-110、70-100、70-90、70-80、75-200、80-190、85-180、90-170、95-160、100-200、100-150、100-175、110-200、110-175、110-150或105-140个核苷酸中的任一项。在一些实施方案中,所述arRNA长约60-200个核苷酸(例如约60-150、65-140、68-130或70-120中的任何一个)。在一些实施方案中,所述arRNA还包含ADAR招募结构域。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA包含一种或多种化学修饰。在一些实施方案中所述化学修饰包含甲基化和/或硫代磷酸化,例如2′-O-甲基化(2′-O-Me)和/或核苷酸间硫代磷酸酯键。在某些实施方案中,所述arRNA的首尾的3个或5个核苷酸包含2'-O-Me修饰,和/或其首尾的3个、4个或5个核苷酸间的连接含有硫代磷酸酯键修饰。在某些实施方案中,所述arRNA中的一个或多个或所有尿苷包含2′-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含2'-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含3'-硫代磷酸酯键修饰。在某些实施方案中,所述arRNA不包含任何化学修饰。
本发明还提供一种病毒载体、质粒或线性核酸链,其包含本发明所提供的上述任一 种arRNA,且所述arRNA不包含任何化学修饰。本发明还提供一种文库,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。本发明还提供一种组合物,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。本发明还提供一种宿主细胞,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。在一些实施方案中,所述包含本发明所提供的上述任一种arRNA的宿主细胞是一种真核细胞。
附图说明
图1 REPAIR体系的三连偏好性(Cox et al.,2017)。
图2 SNAP-ADAR体系的三连偏好性(Vogel et al.,2018)。
图3 LEAPER体系的三连偏好性(Qu et al.,2019)。
图4 LEAPER体系的基本流程与本案例的改进。
图5构建16种三碱基基序的报告体系。
图6 16种三连互补碱基的设计,按现有技术中LEAPER体系arRNA设计原则与三碱基基序对应的结果。
图7 UAG三碱基基序报告体系的首次测试。
图8 UAG三碱基基序报告体系的重复实验。
图9 LEAPER体系文献报道对UAG三碱基基序报告体系的测试(Qu et al.,2019)。
图10 UAG三碱基基序的编辑效率测定。
图11A-11C GAN三碱基基序的编辑效率测定,包括GAU(图11A)、GAG(图11B)、和GAC(图11C)。
图12本案例的arRNA设计改进。
图13A-13D按照本案例改进arRNA设计后编辑效率的提升,包括针对三碱基基序GAA(图13A)、GAU(图13B)、GAG(图13C)、和GAC(图13D)的arRNA设计改进。
图14 Reporter1质粒图谱以及序列。
图15显示对C到U编辑体系的测试,其中靶标残基为C,测试上游残基及三连互补碱基中与靶标C相对的碱基的变化对编辑效率的影响。图中“/”代表未添加对应质粒或arRNA,仅添加相同体积水。
图16显示图15中部分数据的重复结果。图中“/”代表未添加对应质粒或arRNA,仅添 加相同体积水。
图17对C到U编辑体系的测试。其中arRNA的三连互补碱基中仅有与靶标C的错配,且标靶C对应的错配碱基为U,而与靶标C的上游残基以及下游残基完全匹配的情况下的测试结果。
图18遴选出图15数据中mRNA三碱基基序为N*CA(如横轴所示),arRNA三连互补碱基为GUU的数据,用于与图17中数据做比较。
图19A-19B显示了对图18和图17中使用的各三碱基基序和三连互补碱基的配对分析。其中,图19A中的三碱基基序和三连互补碱基用于得出图18中的结果,图19B中的三碱基基序和三连互补碱基用于得出图17中的结果。
图20使用报告体系对多个错配及单个错配情况下编辑效率的比较,结果以%GFP显示。其中三碱基基序中与靶标残基C配对的碱基为C,与靶标残基的下游残基相对的为U。图中“mRNA 5’碱基”表示三碱基基序的中的上游残基。对于其他未提及的碱基,mRNA与arRNA均形成严格的互补配对。
图21使用报告体系对多个错配及单个错配情况下编辑效率的比较,是图20中所示相同试验以平均荧光强度(MFI)显示的结果。
图22A-22D显示不同设计的arRNA对ACA(图22A)、TCA(图22B)、CCA(图22C)和GCA(图22D)三碱基基序的编辑效率测试。
发明详述
本发明提供一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞。所述arRNA包含互补RNA序列,所述互补RNA序列与其靶标RNA杂交,形成双链RNA,招募作用于RNA的脱氨酶以使靶标RNA中的靶标残基脱氨基,经过脱氨基作用后所述残基中的碱基类型发生改变。本申请提供了一种编辑靶标RNA的方法,其中通过对arRNA与所述靶标RNA的设计显著提升了现有技术中使用ADAR的RNA编辑系统对不符合ADAR自然偏好性的UAG以外的其他三碱基基序的编辑效率,打破了RNA编辑应用中长久以来在编辑位点选择上存在的限制。通过本申请的方法,可以大大拓展通过RNA编辑方法治疗疾病的范围和效果,使更多疾病,例如更多因基因突变引起的遗传性疾病有机会通过RNA编辑的方法得到安全有效的治疗。通过使用本申请提供的方法和/或arRNA,未来RNA编辑疗法所能治疗的G->A突变引起的疾病,其突变位点所在的三碱基基序可以有更灵活的选择。例如,当所述突 变位点所在三碱基基序为GAU时,现有技术的编辑效率根本无法达到治疗要求,而通过本申请提供的方法所达到的编辑效率则超过了现有技术的至少10倍。此外,由于经过适当改造后的ADAR蛋白可以进行C->U的RNA碱基编辑,因此本发明的方法还可以提升RNA编辑系统对靶标残基为C的不同三碱基基序的编辑效率。
因此,本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基和/或下游残基直接相对的错配。
本文所述的“靶标RNA”为预进行编辑的RNA。本申请中“碱基”和“残基”,指核碱基,例如“腺嘌呤”,“鸟嘌呤”,“胞嘧啶”,“胸腺嘧啶”,“尿嘧啶”和“次黄嘌呤”。术语“腺苷”,“鸟苷”,“胞苷”,“胸苷”,“尿苷”和“肌苷”是指与核糖或脱氧核糖的糖部分连接的核碱基。术语“核苷”是指与核糖或脱氧核糖连接的核碱基。术语“核苷酸”是指各自的核碱基-核糖基-磷酸酯或核碱基-脱氧核糖基-磷酸酯。有时术语腺苷和腺嘌呤(缩写“A”),鸟苷和鸟嘌呤(缩写“G”),胞嘧啶和胞苷(缩写“C”),尿嘧啶和尿苷(缩写“U”),胸腺嘧啶和胸苷(缩写“T”),肌苷和次黄嘌呤(缩写“I”),可互换使用,指相应的核碱基,核苷或核苷酸。核酸链内的前一个核苷酸的3’羟基和下一个核苷酸的5’磷酸形成3’,5’磷酸二酯键,3’脱掉1个羟基-OH,在本文中被称为“核苷酸残基”或“残基”。有时,术语核碱基、碱基、核苷、核苷酸、核苷酸残基和残基可互换使用,除非上下文明确要求不同。
如本文所用,核酸的“互补”是指一条核酸通过传统的Watson-Crick碱基配对与另一条核酸形成氢键的能力。百分比互补性表示核酸分子中可与另一核酸分子形成氢键(即,Watson-Crick碱基配对)的残基的百分比(例如,10个中的约5、6、7、8、9、10个分别为约50%,60%,70%,80%,90%和100%互补)。“完全互补”是指核酸序列的所有连续残基与第二核酸序列中相同数量的连续残基形成氢键。如本文所用,“基本上互补”是指在约40、50、60、70、80、100、150、200、250或更多个核苷酸的区域内,至少约70%,75%,80%,85%,90%,95%,97%,98%,99%或100%中的任何一个的互补程度,或指在严格条件下杂交的两条核酸。对于单个碱基或单个核苷酸,按照Watson-Crick碱基配对原则,A与T或U、C与G或I配对时,被称为互补或匹配,反之亦然;而除此以外的碱基配对都称为不互补或不匹配。
“杂交”是指其中一种或多种多核苷酸反应形成复合物的反应,所述复合物通过核苷酸残基的碱基之间的氢键稳定。所述氢键可以通过Watson Crick碱基配对,Hoogstein结合或以任何其他序列特异性的方式发生。能够与给定序列杂交的序列称为给定序列的“互补序列”。
术语“RNA编辑”(RNA editing)是在RNA上发生的碱基插入、缺失或替换等现象。许多用于RNA编辑的系统中常常会使用的一种酶是作用于RNA的腺苷脱氨酶(Adenosine deaminases acting on RNA,ADAR)、其变体或包含其功能结构域的复合物。ADAR蛋白家族能结合到特定RNA的双链区域,它可以从腺苷(A)核苷碱基中去除-NH2基团,将A变为肌苷(I),后者在翻译过程中被识别为鸟苷(G)并且在随后的细胞翻译过程中与胞苷(C)配对。A->I(Adenosine-to-inosine)的RNA编辑是动物中最普遍的RNA编辑类型,广泛的参与转录水平和转录后水平的多种基因调控机制,比如在转录组水平改变氨基酸序列,调控mRNA剪切、mRNA稳定性和环状RNA形成等等(Nishkura K.2010)。在哺乳动物细胞中,有三种类型的ADAR蛋白,ADAR1(两个同种型,p110和p150),ADAR2和ADAR3(无催化活性)。研究人员将λN肽与人类ADAR1或ADAR2脱氨酶结构域相融合来构建λN-ADARDD系统,该系统可由BoxB茎环和反义RNA组成的融合RNA来引导,结合特定的RNA靶标。该方法可以在靶标A碱基处将靶标A编辑为I(引入A-C错配),从而导致从A至G的RNA碱基编辑。用于RNA编辑的其他方法包括将反义RNA融合至R/G基序(ADAR-招募RNA支架)以通过在哺乳动物细胞中过表达ADAR1或ADAR2蛋白来编辑靶标RNA,以及利用dCas13-ADAR精确打靶和编辑RNA。RNA层面的编辑,一方面避开了基因组损伤的同时,另一方面又能在最终生物功能上做出改变。
术语“脱氨酶招募RNA”、“dRNA”、“arRNA”或“ADAR招募RNA”在本文中可互换使用,指能募集招募ADAR,ADAR变体或某些包含其结构域的复合体,在RNA中使靶标腺苷脱氨或使靶标胞苷脱氨的RNA。在本申请上下文中,“靶标RNA”是指将脱氨酶招募RNA序列设计为与其具有完全互补性或基本互补性的RNA序列,靶标RNA上包含靶标残基。“靶标残基”在本文中是指通过RNA编辑,例如通过引入ADAR酶及arRNA,进行修改的核苷酸残基。靶标序列与arRNA之间杂交形成包含靶标残基的双链RNA(dsRNA)区域,其招募作用于靶标残基的腺苷脱氨酶(ADAR)或其变体,该酶或其变体使靶标残基脱氨基。
“三碱基基序”表示包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基)的三个连续碱基序列。在本申请“三 碱基基序”的语境中,“靶标残基”位于“编辑位点”,因此如无特别说明,可以互换使用。三碱基基序中上游残基和下游残基往往决定了针对靶标残基的RNA编辑是否可以以较高的效率进行编辑。例如,针对不同的三碱基基序,在REPAIR(WO2019005884A1)、RESTORE(WO2020001793A1)、LEAPER(WO2020074001A1)等RNA编辑体系中均有不同的编辑效率,这种针对不同三碱基基序编辑效率不同的情况在本文中称为“三连偏好性”。
互补RNA序列中与所述靶标RNA中三碱基基序直接相对的三个碱基,即与所述靶标残基直接相对的碱基(本申请中称为“靶向碱基”),以及所述碱基的5'最近邻残基和所述碱基的3'最近邻残基所组成的三连基序,在本文中被称为“三连互补碱基”。
在本文中,所有三碱基基序和三连互补碱基均为5'到3'的顺序。
在本申请的方法中,靶标RNA与arRNA之间杂交形成含有靶标残基的双链RNA(dsRNA)区域,其招募作用于RNA的脱氨酶,该酶使靶标残基脱氨基。本发明所提供的方法包括设计arRNA并将所述arRNA或编码所述arRNA的构建体引入宿主细胞。所述arRNA序列中的互补RNA序列与所述靶标RNA杂交形成的双链RNA可招募作用于RNA的脱氨酶以使靶标RNA中的靶标残基脱氨基,经过脱氨基作用后所述残基中的碱基类型可发生改变。由于经脱氨基作用,腺苷(A)可转化为肌酐(I),而I被识别为鸟苷(G),实现A至G的编辑。同样的,胞苷(C)脱氨基可转化为尿苷(U),实现C至U的编辑。
RNA编辑存在三连偏好性,例如图2及图3所示。对上游残基为鸟苷(G)的三碱基基序更低的三连偏好性是目前以ADAR为基础的RNA编辑方法的共性。同样的,在C至U的编辑中,公开文献也显示了明显的三连偏好性。正是由于这种三连偏好性的限制,为满足实际应用的需要,获得较高的编辑效率,现有技术中的各种基于脱氨酶的RNA编辑系统必须尽量选取一些三连偏好性更高的三碱基基序进行编辑。这就束缚了RNA编辑的应用范围。本申请提供了改进的、在宿主细胞的靶标残基位置编辑靶标RNA的方法,包括在arRNA中与三碱基基序直接相对的碱基处引入更多错配,显著提升现有技术中使用ADAR的RNA编辑系统对不符合脱氨酶三连偏好性的三碱基基序中靶标碱基的编辑效率,打破了RNA编辑应用中长久以来在编辑位点选择上存在的限制。
因此,一方面,本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含互补RNA序列,所述互补RNA序列与靶标RNA杂交,形成双链RNA,招募作用于RNA的脱氨酶以使靶标RNA中的靶标残基脱氨基。所述靶标残基位于靶标RNA中的 一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基)。从5'到3',由所述上游残基、靶标残基及下游残基依次连接而成的三连体被称为“三碱基基序”。在本申请中,所有三碱基基序都是以5'到3'的方式描述的。而与所述靶标RNA中三碱基基序相对的互补RNA序列中的三碱基也为5'到3'的顺序。
本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基或下游残基直接相对的错配。
在一些实施方案中,本申请提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基和下游残基直接相对的错配。
在某些实施方案中,所述三碱基基序的上游残基为G。在某些实施方案中,所述三碱基基序的上游残基为A。在某些实施方案中,所述三碱基基序的上游残基为C。在某些实施方案中,所述三碱基基序的下游残基为C。在某些实施方案中,所述三碱基基序的下游残基为U。在某些实施方案中,所述三碱基基序的下游残基为A。在某些实施方案中,所述三碱基基序选自GAG,GAC,GAA,GAU,AAG,AAC,AAA,AAU,CAG,CAC,CAA,CAU,UAA,UAC和UAU。在某些实施方案中,所述三碱基基序为GAU。在某些实施方案中,所述三碱基基序为GAG。在某些实施方案中,所述三碱基基序为GAA。在某些实施方案中,所述三碱基基序为GAC。在一些实施方案中,所述靶标RNA中的上游残基选自G,C,A和U的核苷酸,优选次序为G>C≈A>U。在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的胞苷(C),腺苷(A)或尿苷(U)。在一些特定的实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C。
根据本发明所述的方法,在一些实施方案中,所述互补RNA序列与靶标RNA杂交时还包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。在某些实施方案中,与一个或多个非靶标腺苷相对的错配核苷为鸟苷。在一些实施方案中,所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为ACG或ACA。在一些实施方案中,所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为ACG。在一些实施方案中,所述三碱基基序是GAA,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为UCA、CCG、CCC或UCC。在某些实施方案中,所述三碱基基序是GAA,所述互补RNA序列中与所述三碱基基序直接相对的三连互补碱基为UCA。在一些实施方案中,所述三碱基基序是GAC,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为GCG或GCA。在某些实施方案中,所述三碱基基序是GAC,所述互补RNA序列与所述三碱基基序直接相对的三连互补碱基为GCG。在一些实施方案中,所述三碱基基序是GAG,并且其中所述互补RNA序列包含与所述三碱基基序直接相对的三连互补碱基为CCG、CCA、CCC、UCC或UCG。在某些实施方案中,所述三碱基基序是GAG,所述互补RNA序列中与所述三碱基基序直接相对的三连互补碱基为CCG。在一些实施方案中,所述三碱基基序上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G或A。在一些实施方案中,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。在一些实施方案中,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G或A,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G或A,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。在一些实施方案中,所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的C,所述三碱基基序上游残基为G,其中所述互补RNA中与所述上游残基相对的碱基为G,所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。通过本申请的方法,RNA的编辑效率相对于现有技术提高至少90%至1100%,例如提高至少100%、200%、300%、400%、500%、600%、700%、800%、900%、1000%等。
在一些实施方案中,所述靶RNA中的靶标腺苷(A)通过腺苷脱氨酶(Adenosine D eaminase Acting on RNA,ADAR)脱氨基。在某些实施方案中,所述腺苷脱氨酶为天然ADAR或其同源蛋白。在某些实施方案中,所述腺苷脱氨酶为经过修饰但保留了腺苷 脱氨酶活性的腺苷脱氨酶功能变体,例如在天然ADAR或其同源蛋白基础上经一个或多个位点突变修饰但仍然具有腺苷脱氨酶活性的变体。在某些实施方案中,所述腺苷脱氨酶为包含ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为包含经突变丧失催化活性的Cas13蛋白与ADAR功能结构域或ADAR同源蛋白功能结构域或腺苷脱氨酶功能变体的融合蛋白。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶通过外源引入所述宿主细胞中或通过导入该脱氨酶的构建体在宿主细胞中表达。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为包含λN肽与ADAR功能结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述包含ADAR蛋白催化结构域的融合蛋白为SNAP-tag标记的ADAR或SNAP-tag标记的ADAR功能变体。在某些实施方案中,所述ADAR是ADAR1和/或ADAR2。在一些实施方案中,ADAR是选自由hADAR1,hADAR2,小鼠ADAR1和小鼠ADAR2构成的组的一种或多种ADAR。
在某些实施方案中,所述ADAR由所述宿主细胞表达。在某些实施方案中,ADAR天然地或内源地存在于宿主细胞中,例如,天然地或内源地存在于真核细胞中。在某些实施方案中,所述ADAR蛋白经外源引入所述宿主细胞中。在某些实施方案中,将所述ADAR或编码所述ADAR的构建体引入宿主细胞。在一些实施方案中,所述构建体包括但不限于线性核酸、质粒、病毒等。在上述方法中,所述ADAR包括上述天然ADAR、其同源蛋白、经过修饰但保留了腺苷脱氨酶活性的腺苷脱氨酶功能变体(例如在天然ADAR或其同源蛋白基础上经一个或多个位点突变修饰但仍然具有腺苷脱氨酶活性的变体)或包含ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述包含ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白为包括靶向结构域和所述ADAR催化结构域或其同源蛋白催化结构域或腺苷脱氨酶功能变体的融合蛋白。在某些实施方案中,所述靶向结构域选自包含但不限于以下的任一项:经突变丧失催化活性的Cas13蛋白、λN肽、SNAP-tag。在一些实施方案中,ADAR是选自由hADAR1,hADAR2,小鼠ADAR1和小鼠ADAR2构成的组的一种或多种ADAR。在一些实施方案中,所述方法不包含将任何蛋白质引入宿主细胞中。在某些实施方案中,所述ADAR是ADAR1和/或ADAR2。
本申请另一方面提供了一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所 述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),靶标残基为胞苷(C),其中所述互补RNA序列包含与所述靶标RNA上的上游残基和/或下游残基直接相对的错配,并且所述方法进一步包括将具有胞苷脱氨酶活性的脱氨酶或胞苷脱氨酶或编码该脱氨酶的构建体引入宿主细胞。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶是对ADAR蛋白或包含ADAR催化结构域的融合蛋白进行基因修饰后获得C到U催化活性的脱氨酶。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶还包含靶向结构域。
在一些实施方案中,所述靶标胞苷所在的三碱基基序选自以下任一项:GCG,GCC,GCA,GCU,ACG,ACC,ACA,ACU,CCG,CCC,CCA,CCU,UCA,UCC,UCU和UCG。在一些实施方案中,所述arRNA在对应于靶标RNA的靶标残基的位置包含非配对核苷酸,以形成和靶标残基的错配。在一些实施方案中,所述arRNA中可与所述靶标RNA杂交的互补RNA序列包含与所述靶RNA中的所述靶标胞苷直接相对的胞苷,腺苷或尿苷。在某些实施方案中,所述互补RNA序列包含与所述靶标胞苷直接相对的胞苷。在某些实施方案中,所述arRNA在对应于靶RNA的非靶标编辑位点包含一个或多个非配对核苷酸,以形成与靶标RNA的非靶标位点的一个或多个错配。实施例4中分别检测了三碱基基序中只有靶标残基单个错配的情况及三碱基基序中有多个残基错配的情况下,胞苷到尿苷的编辑效率,结果如图22所示。可见在三碱基基序上游残基为A或U时,多个错配可以达到与只有靶标残基单个错配的情况相当的编辑效率,而在三碱基基序的上游残基为G时,只有靶标残基单个错配的情况下编辑效率极低,而此时引入更多错配可以显著提升C到U的编辑效率。因此,在一些实施方案中,所述三碱基基序的上游残基为G,并且其中所述互补RNA序列包含与所述上游残基直接相对的G。在一些实施方案中为ACA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUU或GUU。在一些实施方案中,所述三碱基基序为ACA,并且其中所述互补RNA序列优选地包含与所述三碱基基序相对的AUU。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUA、GUA或CUA。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列优选地包含与所述三碱基基序相对的AUA。在某些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG或UCG。在一些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列优选地包含与所述三碱基基序相对的UUG。在一些实施方案中,所述三碱基基序为CCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUG。在某些实施方案 中,所述靶标RNA中的三碱基基序中的靶标残基为胞苷,所述三碱基基序的上游残基选自G,C,A和U的核苷酸,优选的次序是G>C>A≈U。
在一些实施方案中,所述arRNA通过招募具有胞苷脱氨酶活性的脱氨酶至靶标RNA对靶标RNA中的靶标胞苷(C)脱氨基并使其转变为尿苷。其中所述胞苷脱氨酶为经过修饰(例如经过一个或多个位点氨基酸缺失或突变)具有胞苷脱氨基活性的腺苷脱氨酶或腺苷脱氨酶同源蛋白变体。在某些实施方案中,所述经修饰具有胞苷脱氨基活性的腺苷脱氨酶包含现有技术中公开的,例如Abudayyeh et al.,2019中公开的一个或多个突变的、具有胞苷脱氨基活性的腺苷脱氨酶片段。在某些实施方案中,所述经修饰具有胞苷脱氨基活性的腺苷脱氨酶为包含选自如下一个或多个突变的ADAR2:E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T。在某些特定的实施方案中所述经修饰具有胞苷脱氨基活性的腺苷脱氨酶为包含以下全部突变的ADAR2:E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T。在某些实施方案中,胞苷脱氨酶为包含如下全部突变的ADAR2催化结构域的融合蛋白:E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T。在一些实施方案中,所述具有胞苷脱氨酶活性的脱氨酶还包含靶向结构域的。在某些实施方案中,所述靶向结构域包含但不限于选自以下的任一项:经突变丧失催化活性的Cas13蛋白、λN肽、SNAP-tag。
在某些实施方案中,所述方法包括将所述胞苷脱氨酶或所述融合蛋白或编码所述腺苷脱氨酶或所述融合蛋白的构建体引入宿主细胞。在某些实施方案中,所述构建体包括但不限于线性核酸、质粒、病毒等。
根据本申请的上述方法,所述arRNA为单链RNA。在一些实施方案中,所述互补RNA序列完全是单链。在某些实施方案中,所述arRNA包含一个或多个(例如,1、2、3或更多个)双链区和/或一个或多个茎环区。在某些实施方案中,所述arRNA仅由所述互补RNA序列组成。
根据本发明所述的方法,在一些实施方案中,所述互补RNA序列与靶标序列之间存在两个或两个以上的错配。在一些实施方案中,所述互补RNA序列的三连互补碱基以外以外还存在一个或多个与靶标序列之间的错配。在一些实施方案中,当所述互补RNA序列与靶标序列杂交时,可出现一个或多个摆动配对。在一些实施方案中,当所述互补RNA序列与靶标序列杂交时,可出现一个或多个单侧突起。在一些实施方案中,当所述互补 RNA序列与靶标序列杂交时,可出现一个或多个摆动配对及一个或多个单侧突起。
根据本发明所述的方法,在一些实施方案中,所述arRNA的长度为约20-260个核苷酸,例如所述arRNA的长度小于或等于约30、40、45、50、55、60、65、70、75、80、85、90、95、100、105、110、120、130、140、150、160、170、180、190、200或更多个核苷酸中的任一项。在某些实施方案中,互补RNA序列的长度为40-260、45-250、50-240、60-230、65-220、70-220、70-210、70-200、70-190、70-180、70-170、70-160、70-150、70-140、70-130、70-120、70-110、70-100、70-90、70-80、75-200、80-190、85-180、90-170、95-160、100-200、100-150、100-175、110-200、110-175、110-150或105-140个核苷酸中的任一项。在一些实施方案中,所述arRNA长约60-200个核苷酸(例如约60-150、65-140、68-130或70-120中的任何一个)。在一些实施方案中,所述arRNA还包含ADAR招募结构域。
根据本发明所述的方法,在一些实施方案中,所述arRNA包含一种或多种化学修饰。在一些实施方案中所述化学修饰包含甲基化和/或硫代磷酸化,例如2′-O-甲基化(2′-O-Me)和/或核苷酸间硫代磷酸酯键。在某些实施方案中,所述arRNA的首尾的3个或5个核苷酸包含2'-O-Me修饰,和/或其首尾的3个或5个核苷酸间的连接含有硫代磷酸酯键修饰。在某些实施方案中,所述arRNA中的一个或多个或所有尿苷包含2′-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含2'-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含3'-硫代磷酸酯键修饰。在某些实施方案中,所述arRNA不包含任何化学修饰。
根据本发明所述的方法,在一些实施方案中,所述靶RNA是选自信使RNA前体、信使RNA、核糖体RNA、转运RNA、长链非编码RNA和小RNA的RNA。在一些实施方案中,本发明所述的方法在靶标RNA的靶标残基上进行编辑可导致靶标RNA的错义突变、提前出现的终止密码子、异常剪接或可变剪接,或逆转靶标RNA中的错义突变、提前出现的终止密码子、异常剪接或可变剪接。在一些实施方案中,本发明所述的方法在靶标RNA中编辑靶标残基可导致靶标RNA编码的蛋白的点突变、截短、延长和/或错误折叠,或通过逆转靶标RNA的错义突变、提前出现的终止密码子、异常剪接、或可变剪接获得功能性的、全长、正确折叠的和/或野生型的蛋白质。
根据本发明所述的方法,在一些实施方案中,所述宿主细胞是真核细胞。在某些实 施方案中,所述宿主细胞是哺乳动物细胞。在某些实施方案中,所述宿主细胞是人或小鼠细胞。
使用本发明所提供的任意一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,可以产生经编辑的RNA或包含经编辑的RNA的宿主细胞。因此本发明还提供所述经本发明提供的编辑靶标RNA的方法所产生的经编辑的RNA或包含经编辑的RNA的宿主细胞。
本发明所提供的在宿主细胞的靶标残基位置编辑靶标RNA的方法可用于个体中治疗或预防疾病或病症中。因此本发明还提供一种用于在个体中治疗或预防疾病或病症的方法,包括使用前述本发明所提供的任意一种在宿主细胞的靶标残基位置编辑靶标RNA的方法编辑个体细胞中与所述疾病或病症相关的靶标RNA。在一些实施方案中,所述疾病或病症是遗传性基因疾病或与一种或多种获得性基因突变(例如,药物抗性)相关的疾病或病症。
本发明还提供了可以在本发明所提供的方法中使用的一种通过招募作用于RNA的脱氨酶对靶标RNA中的靶标残基脱氨基的RNA(arRNA),其包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中的靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中的靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与靶标RNA的上游残基和/或下游残基直接相对的错配。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA靶向的靶标RNA中的三碱基基序的靶标残基是腺苷,所述靶标RNA中的上游残基是选自G,C,A和U的核苷酸,优选的G>C≈A>U。在一些实施方案中,所述三碱基基序选自GAG,GAC,GAA,GAU,AAG,AAC,AAA,AAU,CAG,CAC,CAA,CAU,UAA,UAC和UAU。在某些实施方案中,所述arRNA包含与所述靶标RNA中的所述靶标腺苷直接相对的胞苷(C),腺苷(A)或尿苷(U)。在一些特定的实施方案中,所述arRNA包含与所述靶标RNA中的所述靶标腺苷直接相对的C。在某些实施方案中,所述arRNA与靶标RNA杂交时还包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。在某些实施方案中,与一个或多个非靶标腺苷相对的错配核苷为鸟苷。在一些实施方案中,所述三碱基基序的上游残基为G,并且所述互补RNA中与所述上游残基相对的碱基为G或A。在某些实施方案中,所述三碱基基序是GAU,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为ACG或ACA。在某些实施方案中,所述三碱基基序是GAU,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为ACG。在 某些实施方案中,所述三碱基基序是GAA,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为UCA、CCG、CCC或UCC。在某些实施方案中,所述三碱基基序是GAA,所述arRNA中与所述三碱基基序直接相对的三连互补碱基为UCA。在某些实施方案中,所述三碱基基序是GAC,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为GCG或GCA。在某些实施方案中,所述三碱基基序是GAC,所述arRNA与所述三碱基基序直接相对的三连互补碱基为GCG。在某些实施方案中,所述三碱基基序是GAG,并且其中所述arRNA包含与所述三碱基基序直接相对的三连互补碱基为CCG、CCA、CCC、UCC或UCG。在某些实施方案中,所述三碱基基序是GAG,所述arRNA中与所述三碱基基序直接相对的三连互补碱基为CCG。在某些实施方案中,所述arRNA包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA靶向的靶标RNA中的三碱基基序中的靶标残基可以是胞苷(C),称为靶标胞苷。在某些实施方案中,所述三碱基基序的上游残基是选自G,C,A和U的核苷酸,优选次序为G>C>A≈U。在某些实施方案中,所述靶标胞苷所在的三碱基基序选自以下任一项:GCG,GCC,GCA,GCU,ACG,ACC,ACA,ACU,CCG,CCC,CCA,CCU,UCA,UCC,UCU和UCG。在某些实施方案中,所述三碱基基序的上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G。在一些实施方案中,所述三碱基基序下游残基为A,且其中所述互补RNA中与所述下游残基相对的碱基为U或A。在一些实施方案中,所述三碱基基序为ACA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUU、或GUU。在一些实施方案中,所述三碱基基序为ACA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUU。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUA、GUA或CUA。在一些实施方案中,所述三碱基基序为UCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUA。在一些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG或UCG。在一些实施方案中,所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG。在一些实施方案中,所述三碱基基序为CCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUG。在某些实施方案中,所述arRNA在对应于靶标RNA的靶标残基的位置包含非配对核苷酸,以形成和靶标残基的错配。在某些实施方案中,所述arRNA中可与所述靶标RNA杂交的互补RNA序列包含与所述靶RNA中的所述靶标胞苷直接相对的胞苷,腺苷或尿苷。在某些实施方案中,所述互补RN A序列包含与所述靶标胞苷直接相对的胞苷。在某些实施方案中,所述arRNA在对应于靶RNA的非靶标编辑位点包含一个或多个非配对核苷酸,以形成与靶标RNA的非靶标位点的一个或多个错配。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA为单链RNA。在一些实施方案中,所述互补RNA序列完全是单链。在某些实施方案中,所述arRNA包含一个或多个(例如,1、2、3或更多个)双链区和一个或多个茎环区。在某些实施方案中,所述arRNA包含一个或多个(例如,1、2、3或更多个)双链区。在某些实施方案中,所述arRNA包含一个或多个(例如,1、2、3或更多个)茎环区。在某些实施方案中,所述arRNA包含能够形成用于招募ADAR酶的分子内茎环结构的区域。在某些实施方案中,所述arRNA不包含能够形成用于招募ADAR酶的分子内茎环结构的区域。在某些实施方案中,所述arRNA仅由所述互补RNA序列组成。
根据本发明所提供的arRNA,在一些实施方案中,当所述互补RNA序列与靶标序列杂交时,可出现一个或多个摆动配对。在一些实施方案中,当所述互补RNA序列于靶标序列杂交时,可出现一个或多个单侧突起。在一些实施方案中,当所述互补RNA序列于靶标序列杂交时,可出现一个或多个摆动配对及一个或多个单侧突起。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA的长度为约20-260个核苷酸,例如所述arRNA的长度小于或等于约30、40、45、50、55、60、65、70、75、80、85、90、95、100、105、110、120、130、140、150、160、170、180、190、200或更多个核苷酸中的任一项。在某些实施方案中,互补RNA序列的长度为40-260、45-250、50-240、60-230、65-220、70-220、70-210、70-200、70-190、70-180、70-170、70-160、70-150、70-140、70-130、70-120、70-110、70-100、70-90、70-80、75-200、80-190、85-180、90-170、95-160、100-200、100-150、100-175、110-200、110-175、110-150或105-140个核苷酸中的任一项。在一些实施方案中,所述arRNA长约60-200个核苷酸(例如约60-150、65-140、68-130或70-120中的任何一个)。在一些实施方案中,所述arRNA还包含ADAR招募结构域。
根据本发明所提供的arRNA,在一些实施方案中,所述arRNA包含一种或多种化学修饰。在一些实施方案中所述化学修饰包含甲基化和/或硫代磷酸化,例如2′-O-甲基化(2′-O-Me)和/或核苷酸间硫代磷酸酯键。在某些实施方案中,所述arRNA的首尾的3个或5个核苷酸包含2'-O-Me修饰,和/或其首尾的3个或5个核苷酸间的连接含有硫代磷酸酯键修饰。在某些实施方案中,所述arRNA中的一个或多个或所有尿苷包含2′-O-Me修饰。在某 些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含2'-O-Me修饰。在某些实施方案中,所述arRNA中靶向核苷和/或与靶向核苷的5'端和/或3'端相邻的核苷(例如5'端和/或3'端直接相邻的一个或两个核苷)包含3'-硫代磷酸酯键修饰。在某些实施方案中,所述arRNA不包含任何化学修饰。
本发明还提供一种病毒载体、质粒或线性核酸链,其包含本发明所提供的上述任一种arRNA,且所述arRNA不包含任何化学修饰。本发明还提供一种文库,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。本发明还提供一种组合物,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。本发明还提供一种宿主细胞,其包含本发明所提供的上述任一种arRNA或本发明所提供的上述任一种病毒载体、质粒或线性核酸链。在一些实施方案中,所述包含本发明所提供的上述任一种arRNA的宿主细胞是一种真核细胞。
实施例
参照LEAPER技术路线(WO2020074001A1),外源转入一小段与包含靶标腺苷(A)的靶标RNA部分或完全互补的arRNA,并利用该RNA招募内源ADAR对靶标A进行A到I的编辑。所述arRNA于体外合成,长度为71nt~111nt。如图4所示,相对于LEAPER等使用ADAR蛋白或其功能结构域的现有编辑技术中使用的ADAR蛋白招募RNA,本发明所使用的arRNA中与靶标序列中的三碱基基序直接相对的三个碱基对所述三碱基基序的互补性更弱,即,除与靶标A的错配以外,arRNA中与三碱基基序直接相对的三个碱基中还包含与上游残基和/或下游残基错配的碱基。正是这一改变,打破了三连偏好性,使得现有及未来使用ADAR的编辑方法可以对上游残基为G的三碱基基序或其他UAG以外的三碱基基序进行更自由和更高效的编辑。
实施例1:三碱基基序报告体系及对应arRNA的构建
首先我们构建了包含16种三碱基基序的报告体系。由于LEAPER文献中,测试过当三碱基基序为UAG时其编辑效率的差异(Qu et al.,2019),本实施例中,为保持对照的一致性,在编辑位点以外arRNA可以互补配对的部分均采用与LEAPER文献中相同的序列设计,如图5所示。原始质粒Reporter1由北京大学生命科学学院魏文胜教授馈赠,质粒图 谱如图14所示,所述质粒包含表4中所示序列。合成如表1所示的16种三碱基基序相关引物,并根据“J.萨姆布鲁克、M.R.格林,分子克隆实验指南(第四版),2017”中本领域研究人员所熟知的方法,使用表2中所示的材料,进行PCR扩增(
Figure PCTCN2021104801-appb-000001
High-Fidelity 2X Master Mix,NEB M0492L)、酶切(XbaI,NEB R0145L;AscI,NEB R0558L)、琼脂糖凝胶胶回收(Seakem LE琼脂糖,Lonza 5502;GeneJET Gel Extraction and DNA Cleanup Micro Kit,Thermo Fisher K0832)以及组装(
Figure PCTCN2021104801-appb-000002
HiFi DNA Assembly Master Mix,NEB E2621L),并组装至Reporter1中,替代Reporter1中原靶标RNA编码序列,之后转化至感受态细胞(Trans1-T1 Phage Resistant Chemically Competent Cell,全式金CD501-02)并于次日挑取克隆进行测序。将测序结果正确的克隆经过质粒提取并包装成慢病毒。使用这些包装有不同三碱基基序编码基因的慢病毒分别侵染293T细胞。侵染48小时后,得到16种可分别转录包含不同三碱基基序的mRNA(靶标RNA)的293T细胞,即为最终的三碱基基序报告体系细胞,其命名与表2中所示三碱基基序相同。
为测试arRNA中与三碱基基序上游残基和/或下游残基的错配是否能提升针对其中某一特定三碱基基序的编辑效率,本实施例中通过化学合成的方式合成了16种arRNA,设计原则按照取mRNA上三碱基基序中的标靶A的3’下游55nt到5’上游25nt的RNA片段反向互补单链RNA,其中与三碱基基序中的靶标A相对应的碱基为C。在arRNA上其他碱基不变的情况下,与上游残基及下游残基相对应的碱基分别选自A、C、G或U中的一个,正是通过这4种与上游残基相对应的碱基以及这4种与下游残基相对应的碱基的不同组合便得到了所述4×4=16种arRNA。其具体序列如表3所示。
实施例2:通过GFP阳性比例比较不同arRNA对UAG三碱基基序的编辑效率
如图5所示,为实施例1中所述的16种靶标RNA,其在靶标序列的3’端均带有GFP绿色荧光蛋白核酸序列。该序列正常情况下可以正确翻译并发绿色荧光。但当三碱基基序为UAG时,由于UAG是终止密码子,则会导致翻译在该处停止,进而无法翻译成GFP。在本实施例中,通过LEAPER体系对UAG三碱基基序中的A进行编辑。如果编辑成功,则UAG会转变为UIG,而UIG在翻译过程中会被识别成UGG,从而不再终止翻译,从而使得其下游的GFP正常翻译。因此,通过GFP阳性比例的大小,我们便可以粗略判断不同arRNA编辑效率的高低。
本研究中所有测试均使用RNAi MAX试剂(Invitrogen 13778150)将实施例1中所述16种arRNA分别转染至细胞中,具体步骤见下:
I.细胞培养使用含有10%FBS(Vistech SE100-011)的DMEM(Hyclone SH30243.01)。报告体系细胞以150000细胞/孔传至12孔板。此时记时间为0点。
II.细胞传代后24小时,用RNAi MAX试剂(Invitrogen 13778150)将12.5pmol的arRNA转入每个孔。转染步骤参照供应商说明书。
III.细胞传代后72小时,用胰酶(Invitrogen 25300054)分别消化各孔细胞,并于流式细胞仪分析FITC通道强度。
所述细胞为转录包含UAG三碱基基序的mRNA的293T细胞。arRNA培养细胞72小时(转染后48小时)通过流式细胞仪分析FITC通道强度。结果如图7所示,其中UT为不做任何转染的对照,Vech为加入RNAiMAX转染试剂而不转染任何dRNA的对照。
此后,我们对该实验进行了重复。
实验结果如图8所示。首次实验中arRNA是采用干粉直接溶解的,溶解之后在-80℃进行保存。重复实验中,由于arRNA经过-80℃的一次冻融,整体效率有所下降,但与首次测试相比,结果总趋势不变。其中,arRNA Ran为转染随机RNA序列的对照。
现有技术中,LEAPER体系使用与本实施例中相似报告系统对UAG三碱基基序的编辑效率结果如图9所示(Qu et al.,2019)。图9中横轴编号为arRNA序列名称,与图7、8中横轴名称的下标部分对应。为了方便比较,图7、8中arRNA与图9中arRNA的排列顺序一致。由于本实施例中采用化学合成的arRNA进行转染,而图9中采用的是质粒转染,因此本实施例中的整体编辑效率相对较高,但与图9的整体趋势相同。即当三碱基基序为UAG时,arRNA中与靶标A对应的碱基为C,且与上游残基U和下游残基G对应的碱基为分别与U和G配对的A和C时效率最高,即对应arRNA为arRNA CCA。而当三碱基基序为UAG时,如果在arRNA中与其上游残基U和下游残基G对应的碱基为其他非配对碱基时,编辑效率不但没有显著提升,反而出现下降。
对于UAG三碱基基序的编辑,本实施例中的研究结果与文献中报道的基本一致,即:对于UAG三碱基基序的编辑,通过在arRNA上与三碱基基序对应的三个碱基中引入更多不匹配部分并不能提高编辑效率。
实施例3:GAN三碱基基序的RNA编辑效率测定
本实施例中对分别包含有UAG、GAA、GAU、GAC、GAG这几个三碱基基序的报告系统细胞分别做了16种arRNA的转染,转染步骤同实施例2。
72h(转染后48h)后通过TRIZOL收样并提取RNA(TRIzol Reagent,ambion REF15596 026),取1μg RNA反转录,反转录体系为20μL(
Figure PCTCN2021104801-appb-000003
One-Step gDNA Removal and cDNA Synthesis SuperMix,全式金AT311-02),取1μL反转录产物用以下一对引物进行PCR:ggagtgagtacggtgtgcGACGAGCTGTACAAGCTGCAGGG(SEQ ID NO:1)、gagttggatgctggatggTGGTGCAGATGAACTTCAGGGTCAG(SEQ ID NO:2)(小写字母表示Hi-Tom试剂盒要求的引物接头)进行PCR扩增,并通过Hi-Tom试剂盒建库(诺禾致源,REF PT045)。
之后按照如下步骤进行二代测序,并分析编辑位点中A->G的编辑效率。
i.Illumina测序
将构建好的测序文库,通过NovaSeq6000平台以PE150方式进行高通量测序。
ii.测序数据处理
高通量测序得到的原始数据以fastp(v0.19.6)进行质控,过滤掉低质量、带有接头序列、含有polyG等的序列。将得到的高质量测序数据以自主开发的拆分脚本按照相应的barcode序列拆分到每个样本,使用BWA(v0.7.17-r1188)软件与扩增的目标区域的序列进行比对,通过SAMtools(v1.9)进行格式转换生成BAM文件、统计比对信息并重新排序和建立索引。
iii.编辑效率分析
使用JACUSA(v1.3.0)软件检测所有潜在的RNA编辑位点,所用参数为:call-1 -aB,R,D,I,Y,M:4 -C ACGT -c 2 -p 1 -P UNSTRANDED -R -u DirMult-CE。过滤掉同时在对照和处理样本中出现的高频点突变之后,以A->G突变之外的平均突变频率的三倍作为阈值,将编辑位点A->G突变频率在阈值之上的部分作为真实的靶标A突变为G的频率。
对于UAG三碱基基序,实验结果如图10所示。从图中可以明显地看出,编辑效率最高的是按照现有技术中报道的原则设计的arRNA序列,即仅有靶标碱基处有错配的arRNA CCA。这与此前实验GFP的结果一致。
而对于现有技术中,编辑效率极低的三碱基基序:GAN(其中N为四种核糖核苷酸中的任一种),本发明的arRNA设计方式则体现出意想不到的编辑效率,结果如图11所示。本发明的arRNA设计方式在GAU上的效率趋势尤为明显。当mRNA序列所含有的三碱基基序为GAU时,按照现有技术常用方法设计出的arRNA ACC基本没有任何编辑效率,这与文献报道一致。但是,当我们减弱互补性,即使arRNA中与三碱基基序中的5’上游残基G相对的碱基为非配对碱基,即C以外的其他碱基,例如使用arRNA ACG时,其编辑效率 便可有大幅度提高。如图11A所示,所述提升幅度超过现有技术中固有设计(arRNA ACC)的10倍,而根据现有技术设计的arRNA ACC与之前报道的一致,编辑效率极低。除此之外,对于三碱基基序GAU,本实例中的arRNA ACA、arRNA CCU、arRNA UCC等适当减弱了与三碱基基序互补性的arRNA设计相对于现有技术中的固有设计arRNA ACC也出现了明显相对更高的编辑效率。值得注意的是,针对三碱基基序GAU,当arRNA中与靶标A相对的碱基为C,与下游残基相对的碱基为下游残基U的互补碱基A,而与上游残基G相对的为错配碱基G时(即arRNA ACG)效率最高。
同样,对于三碱基基序GAC,适当减弱了与三碱基基序互补性的arRNA也显示出意料之外的高编辑效率。如图11C的柱状图所示,按照现有技术的固有原则设计出的arRNA GCC基本没有编辑效率,而引入了更多错配的arRNA GCG和arRNA GCA则出现了明显更高的编辑效率。此外,针对三碱基基序GAC,当arRNA中与靶标A相对的碱基为C,与下游残基相对的碱基为下游残基C的互补碱基G,而与上游残基G相对的为错配碱基G时(即arRNA GCG)效率最高。
与此类似的,对于三碱基基序GAG的编辑中(图11B),按照现有技术固定模式设计的的arRNA  CCC并不是效率最高的,而经过适当减弱互补性的arRNA CCG和arRNA CCA的编辑效率明显更高。与前述三碱基基序GAU和GAC情况类似,针对三碱基基序GAG,当arRNA中与靶标A相对的碱基为C,与下游残基相对的碱基为下游残基G的互补碱基C,而与上游残基G相对的为错配碱基G时(即arRNA CCG)效率最高。
同样,对于GAA,按照现有技术固定模式设计的的arRNA UCC编辑效率并不高,但是当arRNA中与靶标A相对的碱基为C时,与下游残基A相对的碱基为互补碱基U,而与上游残基G的互补碱基为错配碱基A时即arRNA UCA其编辑效率提升。
为了进一步确认以上结果,我们对mRNA为GAA、GAU、GAC、GAG的三碱基基序进行了重复实验。重复实验中针对每个特定的三碱基基序,我们仅重复了三种arRNA设计:
1.按照固有技术设计的arRNA,即靶标A相对的碱基为C,其余两个碱基设计均符合碱基互补配对原则,此时与靶标A的上游残基G配对的碱基为C。
2.按照本发明的设计,使与靶标A的上游残基G配对的碱基为A。
3.按照本发明的设计,使与靶标A的上游残基G配对的碱基为G。
如图13所示,无论三碱基基序为GAA、GAU、GAC、GAG,我们都可以明确地发现,当与靶标残基A的上游残基G配对的碱基为A时,其编辑效率均有一定程度的提升, 而当与靶标残基A的上游残基G配对的碱基为G时,其编辑效率通常最高。此外,当将靶标残基A的上游残基G的配对碱基从固有技术的C改为G时,对于不同三碱基基序的效率提升倍数为GAU>GAC≈GAA>GAG;而当将与靶标残基A的上游残基G配对的碱基从固有技术中的C改为A时,对于不同三碱基基序的效率提升倍数为GAC>GAU≈GAG≈GAA。
根据现有技术的文献报道,三碱基基序GAU、GAC、GAA为编辑效率最弱的三个,且效率接近于零(图3),因此在RNA编辑的过程中应当尽量避开。而本发明却通过向arRNA中创造性引入针对三碱基基序的更多错配碱基,打破了这一限制。根据本发明实施例可以看出,当arRNA中与三碱基基序相对的三个碱基中,与靶标A相对的碱基为C,且与上游和/或下游残基相对的碱基为错配碱基时,编辑效率可显著提升。并且,针对上游残基为G的情况,往往是与下游残基相对的碱基为互补碱基,而与上游残基G相对的为错配碱基A时编辑效率更高,与上游残基G相对的为错配碱基G时编辑效率最高。
实施例4:C到U的RNA编辑的三连偏好性研究
i. 突变型ADAR2-r16-293T的构建
参照参RESUCE技术(WO2019071048A9),对ADAR2催化结构域进行诱变,突变位点与该文献中r16相同(dADAR2(E488Q/V351G/S486A/T375S/S370C/P462A/N597I/L332I/I398V/K350I/M383L/D619G/S582T/V440I/S495N/K418E/S661T)r16,https://benchling.com/s/seq-19Ytwwh0i0vSIbyXYZ95)。通过常规DNA合成技术,体外DNA合成pLenti-ADAR2质粒载体(pLenti-ADAR2质粒骨架由魏文胜教授实验室惠赠)上ADAR2 XmaI酶切位点至AscI酶切位点之间的序列,并包含上述突变,通过上述两个限制性内切酶,通过酶切连接用新合成的DNA片段替换原质粒pLenti-ADAR2上所对应的片段,替换后质粒命名为pLenti-ADAR2-r16,其含有参照RESCUE技术(WO2019071048A9)对催化结构域突变的ADAR2基因命名为ADAR2-r16。ADAR2-r16全长cDNA序列为如表6。通过二代慢病毒包装体系(pCAG-VSVG由Arthur Nienhuis&Patrick Salmon惠赠(Addgene plasmid#35616;http://n2t.net/addgene:35616;RRID:Addgene_35616);pCMVR8.74由Didier Trono惠赠(Addgene plasmid#22036;http://n2t.net/addgene:22036;RRID:Addgene_22036)),将pLenti-ADAR2-r16包装成慢病毒,侵染293T细胞并于48小时后以10μg/mL终浓度Blasticidin(Solarbio B9300)抗性筛选。筛选过后存活的细胞称为ADAR2-r16-293T。
ii.BFP报告体系的构建
BFP报告体系参照参考文献(Vu,L.T.,Nguyen,T.T.K.,Md Thoufic,A.A.,Suzuki,H.,&Tsukahara,T.(2016).Chemical RNA editing for genetic restoration:the relationship between the structure and deamination efficiency of carboxyvinyldeoxyuridine oligodeoxynucleotides.Chemical biology&drug design,87(4),583-593)构建,全部BFP cDNA序列由体外DNA合成,具体序列如表7所示。BFP cDNA序列通过CMV启动子后的多克隆位点克隆至pCDH-CMV质粒载体(pCDH-CMV质粒骨架由Kazuhiro Oka惠赠,Addgene plasmid#72265;http://n2t.net/addgene:72265;RRID:Addgene_72265)。报告体系中C到U编辑位点为BFP序列第199位的碱基C,则199、200和201位为CAC,对应第66位组氨酸。
该序列第198,199,200位的碱基依次为CCA,命名为BFP-CCA,缩略为C*。当199位的碱基C在RNA水平通过去氨基化被编辑为U之后,将改变第66位氨基酸,使BFP荧光蛋白从原先的蓝色荧光变为绿色荧光,从而可以通过流式细胞仪FITC(Fluorescein isothiocyanate)通道检测到信号。而由于198位核苷酸从C突变为A、T、G后,其参与编码的第65位氨基酸密码子为ACC、ACA、ACT、ACG均编码苏氨酸,因此该位突变为同义突变。这使得该报告体系可以同时测定和比较在mRNA上199位靶标残基的上游残基为不同碱基时,C到U编辑的效率。使用定点诱变试剂盒(
Figure PCTCN2021104801-appb-000004
Site-Directed Mutagenesis Kit,NEB E0554S)向198位置碱基中引入突变,198,199,200三个位置碱基分别为:GCA,命名为BFP-GCA,缩略为G*;ACA,命名为BFP-ACA,缩略为A*;TCA命名为BFP-TCA,缩略为T*。199位的C突变为T,CTA命名为BFP-CUA,缩略为CUA。将上述构建好的BFP-GCA、BFP-ACA、BFP-TCA、BFP-CCA四种质粒通过二代慢病毒包装体系(与前述ADAR2-r16慢病毒包装条件相同)包装成慢病毒,并侵染293T或ADAR2-r16-293T,并于48h后用500μg/mL Geneticin(Gibco,Catalog number:10131035)或10μg/mL终浓度Blasticidin(Solarbio B9300)抗性筛选,筛选后存活的细胞分别命名为293T-GCA、293T-ACA、293T-TCA、293T-CCA和ADAR2-r16-GCA、ADAR2-r16-ACA、ADAR2-r16-TCA、ADAR2-r16-CCA。
iii.设计合成arRNA
本实施例中的术语“arRNA”与本文中所有的术语“dRNA”具有相同的含义,可互换使用。本实施例中arRNA与三碱基基序中靶标残基相对的碱基位于arRNA的中间位置,5’上游以及3’下游按照相同长度向两边延伸。由于合成长度的限制,本实施例首先选取长 度为91nt的RNA进行体外合成,根据第46位核苷酸(靶向碱基)的不同,当第46位核苷酸分别为A、U、G、C时,四种合成的arRNA分别缩略为A*、U*、G*、C*。四种合成的arRNA具体序列见下表5。与LEAPER技术设计方法(WO2020074001A1)不同的是,本批实验中四条arRNA的设计只改变了与靶标残基C相对的靶向碱基,即第46位的A、U、G、C,而在arRNA第47位(对应报告体系第198位)的碱基在四条arRNA中均按照引入突变前的BFP序列即CCA设计。随后,在三碱基基序的靶标残基为胞苷且‘下游残基为腺苷的情况下合成了分别包含不同三连互补碱基的arRNA,具体序列见表8。arRNA的第46位核苷酸固定为U,第45、47位核苷酸分别为A、U、G、C,因此共16种。每条arRNA依照以下的原则命名:所有arRNA命名以"arRNA"开头,后以下角标的方式展示arRNA上的三连互补碱基。在mRNA靶标残基C对应arRNA靶向碱基为U的基础上,展示三连互补碱基,且所述三连互补碱基的展示顺序为5’-3’的顺序。例如,arRNA对于三碱基基序CCA,靶标残基C的上游残基为C,对应arRNA靶向碱基的3’最近邻残基为G;靶标残基C对应arRNA靶向碱基为U;靶标残基C的下游残基为A,对应arRNA靶向碱基的5’最近邻残基U,则arRNA包含的三连互补碱基为UUG,此时按照命名规则,该反义RNA命名为:arRNA U UG。为了将首批合成的A*、U*、G*、C*四种arRNA与第二批合成的16种RNA进行命名规则的统一,首批合成的A*、U*、G*、C*四种arRNA在后续实验中分别命名为arRNA U AG、arRNA UUG、arRNA UGG、arRNA UCG。需要特别注明的是,两次实验中,第一次合成的arRNA UUG与第二次合成的arRNA UUG两条序列完全相同,但经由两个不同批次合成。
iv.靶标C对位反义RNA测试
ADAR2-r16-293T以300000个细胞/孔的密度铺板至6孔板,铺板后24小时,用Lipofectamine TM 3000 Transfection Reagent转染(Invitrogen,Catalog number:L3000015),转染步骤按照说明书进行,按照说明书采用不同Lipofectamine 3000转染试剂浓度进行两次重复试验,Repeat 1采用3.75μL,Repeat 2采用7.5μL每孔转染试剂浓度。每孔BFP以及相关质粒即BFP-GCA,(缩略为G*);BFP-ACA(缩略为A*);BFP-TCA(缩略为T*);BFP-CUA(缩略为CUA),添加2.5μg,合成的向导RNA添加25pmol,转染后48h通过FACS检测FITC通道信号强度。阳性细胞平均荧光强度(Mean Fluorescent Intensity,MFI)统计结果如图15所示。
图15中mRNA行表示对应孔中添加的BFP报告体系质粒,arRNA行表示对应孔中添加的arRNA。BFP报告体系中,198,199,200三个碱基在原始序列中为CCA,而当198位C 变为A或T或G的时候,均使得65位的氨基酸为苏氨酸,所以BFP-GCA、BFP-CCA、BFP-ACA、BFP-TCA四种不同报告体系198位的不同不会造成原本蛋白功能的改变。如图15所示,当不加入任何arRNA时,报告体系本底GFP信号MFI约为5×10 4(报告体系标注为U*,arRNA标注为/;以及报告体系标注为A*,arRNA标注为/)。而当通过DNA水平的点突变,使199位C突变为T时(mRNA中三碱基基序为CUA),GFP信号MFI约为2.4×10 6~3.1×10 6,比本底值高约100倍。因此说明,当199位C如果在RNA水平全部变为U后,会在GFP信号MFI造成约100倍的提升。
而当加入arRNA后,在DNA水平199位C不变的基础上,如图15所示,在最终GFP信号MFI上,最多能提升至超过5×10 5,荧光强度约为199位C点突变成T后荧光强度的20%。为进一步确定该编辑能力以及碱基偏好性,进一步设计并重复了以上实验,结果如图16所示,实验条件与图15实验基本一致,唯一不同的是重复1和重复2均采用3.75μL的转染试剂。即:当三碱基基序为GCA和CCA时,对应arRNA为U^(arRNA UUG)的效率>为C^(arRNA UCG)效率。与图15相比,图16中相同实验条件MFI下降了将近一倍,这是由于图15中arRNA为干粉溶解后立即实验,而图16实验为图15实验arRNA溶液经过-80℃冻融一次后再进行实验。但可以看到的是,虽然最大值相比图15有所降低,但图16的实验基本重复了图15中编辑效率最高的4个结果,并且效率高低呈现相同趋势:以本实施例的试验设计条件,固定arRNA中的三连互补碱基的5’端碱基为U且3’端碱基为G,只着重对三连互补碱基中与靶标C相对的碱基进行研究,则得出的结论为:三连互补碱基中间残基U^>C^的编辑效率,当改变三碱基基序中的上游残基时,申请人发现三碱基基序GCA的最高编辑效率大于CCA的最高效率。
v.三连偏好性测试
为使后续结果具有更好的一致性,按照 ii.BFP报告体系的构建中描述,申请人将BFP-GCA、BFP-ACA、BFP-TCA、BFP-CCA四种质粒通过慢病毒包装整合进入无ADAR2-r16的普通293T中,以及稳定整合ADAR2-r16的293T中,步骤及命名见 ii.BFP报告体系的 构建。由于报告体系已整合入细胞基因组,三连偏好性测试中arRNA的转染与靶标C对位反义RNA测试中arRNA的转染采用了不同的转染试剂。靶标C对位反义RNA测试中arRN A的转染由于需要同时转染质粒,如上所述采用了Lipofectamine 3000,三连偏好性测试中由于只需转染arRNA,无需转染质粒,因此采用了Lipofectamine TM RNAiMAX Transfection Reagent(Invitrogen,Catalog number:13778100)。含有不同报告体系的293T或者 ADAR2-r16-293T,以150000个细胞/孔的密度铺板至12孔板,铺板后24小时用RNAiMAX试剂转染15pmol的arRNA,转染后48h通过FACS检测FITC通道信号强度,统计GFP+细胞百分。
在arRNA的三连互补碱基中仅有与靶标C的错配,且标靶C对应的错配碱基为U,而与靶标C的上游残基以及下游残基完全匹配的情况下(即:当报告体系为BFP-GCA时,arRNA中与其互补的三连互补碱基为UUC;当报告体系为BFP-ACA时,arRNA中与其互补的三连互补碱基为UUU;当报告体系为BFP-TCA时,arRNA中与其互补的三连互补碱基为UUA;当报告体系为BFP-CCA时,arRNA中与其互补的三连互补碱基为UUG),测试结果如图17所示。图中,未处理表示不添加arRNA对照,随机RNA序列表示添加91nt随机序列RNA对照(具体序列见表8 Ran-91),arRNA表示按如上规则添加对应匹配的arRNA。从图17我们可以看出,当三连碱基为TCA或者ACA时,该体系有较高的编辑效率,而当三连碱基为GCA或CCA时,编辑效率近乎为零。
三连碱基测试的结果一度给本研究带来巨大的困扰。由于图15的测试,包含了靶标C对应A/U/C/G四种不同碱基,而图17的试验中,C均与U配对,因此我们将图15对应的实验中,arRNA靶标C对应碱基为U时的数据单独调出来重新作图,即图18。与图17对比可以发现两次实验的结论有明显的相悖。尽管两图的统计方式并不一致,但同批实验中的趋势明显不同,根据图15数据重做的图中,明显GCA和CCA效率高,而图17中,明显TCA和ACA高。
vi.编辑位点5’上游不匹配的意外发现
两次实验结果的矛盾,是完全出乎我们预料的。经过再三重复以及对两次arRNA的仔细比较我们意外地发现并重复了两次实验中RNA设计的微妙差别。如图19A所示,为图18中使用的mRNA三碱基基序与arRNA三连互补碱基的配对关系,而图19B显示了图17中使用的mRNA三碱基基序与arRNA三连互补碱基的配对关系。通过比较可以发现,两者的不同在于,前者(图19A)与靶标C的上游残基相对的arRNA碱基为G,除靶标C的上游残基为C时,其他情况下上游残基与arRNA均形成错配;而后者(图19B)与靶标C的上游残基相对的arRNA碱基均为其严格互补的碱基。因此,我们推测,造成上述矛盾的原因在于,三连互补碱基中的上游残基与arRNA的错配,或导致三连偏好性的改变。。
为进一步验证以上推测,我们把 iv.靶标C对位反义RNA测试中合成的arRNA以及 v. 三连偏好性测试中合成的arRNA放到一起进行测试,并同时统计了GFP百分比和MFI。测 试条件与 v.三连偏好性测试完全一致。其中,图20及图21中的上图均为 iv.靶标C对位反 义RNA测试中合成的arRNA的测试结果,图20及图21中的下图均为 v.三连偏好性测试中合成的arRNA的测试结果。对应arRNA的添加同此前iv、v两次测试。如图20(%GFP)、图21(MFI)所示,重复1和重复2为两次独立的实验。从图20和图21中,我们可以看出,尽管两图统计方式不同,但两图拥有近似的趋势。在 上图中,GCA和CCA有着较高的编辑效率,而TCA和ACA效率较低,这与 iv.靶标C对位反义RNA测试中结论一致。而在下图中,TCA与ACA效率较高,而GCA和CCA近乎为零,这与 v.三连偏好性测试中测试结论一致。因此,证实了我们的推测,即看似相悖的两个结论,实则是由于arRNA的设计方式不同造成的。我们也意外的发现,对于GCA这种三碱基基序,如果按已有技术设计arRNA,其编辑效率几乎为零,但如果额外加入G-G错配,则会显著提升其编辑效率。
上述发现进一步启发我们,是否在三碱基基序中引入其它额外不匹配序列可进一步提高编辑效率呢?在此启发下,我们在标靶C在arRNA上相对碱基为U的前提下,进一步向arRNA三连互补碱基与三碱基基序靶向碱基的上游残基和/或下游残基相对的位置引入更多突变。由于与上游残基相对的碱基可以是A、U、C、G,同时与下游残基相对的碱基也可以是A、U、C、G,所以其三连互补碱基共有16种,即:AUA、AUU、AUC、AUG、UUA、UUU、UUC、UUG、CUA、CUU、CUC、CUG、GUA、GUU、GUC、GUG。鉴于此,我们合成了包含以上16种三连互补碱基的arRNA,并根据其三连互补碱基命名了16种对应的arRNA,具体序列见表8。我们将这16种不同的arRNA通过RNAiMAX转染到此前构建好的8种含报告子的细胞系中,即BFP-ACA-293T和BFP-ACA-293T-ADAR2-r16(图22B);BFP-TCA-293T和BFP-TCA-293T-ADAR2-r16(图22A);BFP-CCA-293T和BFP-CCA-293T-ADAR2-r16(图22D);BFP-GCA-293T和BFP-GCA-293T-ADAR2-r16(图22C),转染条件、测试时间同图17相关实验。4个图中的对照为同一样品;“91nt随机序列”为加入91nt随机序列的对照,“仅载体”为仅添加RNAiMAX转染试剂而无RNA的对照,“Opti-DMEM培养基”为仅添加同体积Opti-DMEM而不含RNAiMAX转染试剂的对照,“未处理”为不进行转染的对照,其中arRNA UAG、arRNA UUG、arRNA UCG、arRNA UGG分别与CCA-arRNA UAG、CCA-arRNA UUG、CCA-arRNA UCG、CCA-arRNA UGG具有完全相同的序列,但经由两次不同批次合成。
如图22所示,显示了引入多个错配的优选项,即当三碱基基序为ACA时,arRNA中三连互补碱基为AUU或GUU时编辑效率更高,且三连互补碱基为AUU相对更高;当三碱基基序为UCA(质粒中为TCA),arRNA中三连互补碱基为AUA、GUA或CUA时编辑效 率更高,且三连互补碱基为AUA时更高;当三碱基基序为GCA时,arRNA中三连互补碱基为UUG或UCG时编辑效率更高,且UUG时更高;当三碱基基序为CCA时,arRNA中三连互补碱基为AUG时编辑效率更高。
此外,靶标RNA中的上游残基不同可导致编辑效率的不同。为更好地定义本发明的适用范围,及对三碱基基序的优选顺序,本实施例同时也比较了在有上游残基和/或下游残基直接相对的错配的情况下编辑效率及只有靶标残基错配的情况的编辑效率。结果也显示在图22中,可以发现,在三碱基基序上游残基为A或U时,有上游残基和/或下游残基直接相对的错配可以达到与只有靶标残基单个错配的情况相当的编辑效率。例如当三碱基基序为ACA时,仅与靶标残基有单个碱基错配的三连互补碱基UUU与有上游残基和/或下游残基直接相对的错配的AUU及GUU的编辑效率相当;当三碱基基序为UCA时,仅与靶标残基有单个错配的三连互补碱基UUA与有上游残基和/或下游残基直接相对的错配的AUA编辑效率相当。而当三碱基基序为GCA时,仅与靶标残基有单个碱基错配的三连互补碱基UUC效率接近于0,具有上游残基和/或下游残基直接相对的错配的UUG和UCG的编辑效率可以是UUC的数倍至10余倍。当三碱基基序为CCA时,引入了上游残基和/或下游残基直接相对的错配的AUG也与UCG有近似的编辑效率。由此可见,依照编辑效率提升量排序,与三碱基基序中上游残基错配的优选顺序为G>C>A≈U,也就是说,当三碱基基序的上游残基为G时引入与所述上游残基错配的G可以显著提高编辑效率。
最后,值得一提的是,由于图22中的数据由同批次试验得出,采用了相同实验条件和检测方法,因此便于我们横向比较C到U的RNA编辑技术对ACA、UCA、CCA、GCA四种不同三碱基基序的编辑效率。如图22所示,三碱基基序中,ACA、UCA的最高效率均为10%GFP+左右;而对GCA,如果仅有与靶标碱基的错配,则编辑效率接近于0,如果除包含靶标碱基错配外,还引入与上游残基和/或下游残基直接相对的错配,则编辑效率可提升至6%~8%GFP+,但对于CCA,尽管引入了上游残基和/或下游残基直接相对的错配,最高效率均未超过2.5%GFP+。
工业实用性
本案例突破了现有RNA编辑技术中对于GAU、GAC等三碱基基序编辑效率过低的限制,使得这些以G开头的三碱基基序依然可以有可观的编辑效率,从而打破了现有RNA编辑技术中对于GAU、GAC等位点束手无策的尴尬局面,显著提升了现有技术中使用ADAR的RNA编辑系统,例如LEAPR(WO2020074001A1)、RESTORE(WO2020001793 A1)等,对不符合ADAR自然偏好性的UAG以外的其他三碱基基序的编辑效率。同时,本申请提供的技术方案也突破了现有RNA编辑技术中对GCA等三碱基基序编辑效率过低的限制,相比现有RESCUE技术(WO2019071048A9)对GCA三碱基基序编辑的低效率,该案例通过引入额外的碱基错配极大提升了对GCA的编辑能力。该案例打破了RNA编辑应用中长久以来在编辑位点选择上存在的限制。例如在疾病疗法开发方面,本发明使得更多因基因突变引起的遗传性疾病可以有机会通过RNA编辑的方法更加安全高效地进行治疗。
序列表
表1:构建16种三碱基基序报告体系所用引物
引物名称 SEQ ID NO 引物序列
Vector-F 3 Ctgttttgacctccatagaagacaccgactctagacgtggaacagtacgaacgcgc
GAT-R 4 Cactggcagagccctatcgcatcgcgagcaggcgct
GAT-F 5 Tgctcgcgatgcgatagggctctgccagtgagc
Vector-R 6 gggtttaaacccctgcagggtgtacaccggcgcgccttacttgtacagctcgtccatgc
GAA-R 7 Cactggcagagccctttcgcatcgcgagcaggcgct
GAA-F 8 Tgctcgcgatgcgaaagggctctgccagtgagc
GAG-R 9 Cactggcagagccctctcgcatcgcgagcaggcgct
GAG-F 10 Tgctcgcgatgcgagagggctctgccagtgagc
GAC-R 11 Cactggcagagccctgtcgcatcgcgagcaggcgct
GAC-F 12 Tgctcgcgatgcgacagggctctgccagtgagc
AAA-R 13 Cactggcagagcccttttgcatcgcgagcaggcgct
AAA-F 14 Tgctcgcgatgcaaaagggctctgccagtgagc
AAT-R 15 cactggcagagccctattgcatcgcgagcaggcgct
AAT-F 16 tgctcgcgatgcaatagggctctgccagtgagc
AAC-R 17 cactggcagagccctgttgcatcgcgagcaggcgct
AAC-F 18 tgctcgcgatgcaacagggctctgccagtgagc
AAG-R 19 cactggcagagccctcttgcatcgcgagcaggcgct
AAG-F 20 tgctcgcgatgcaagagggctctgccagtgagc
CAA-R 21 cactggcagagccctttggcatcgcgagcaggcgct
CAA-F 22 tgctcgcgatgccaaagggctctgccagtgagc
CAT-R 23 cactggcagagccctatggcatcgcgagcaggcgct
CAT-F 24 tgctcgcgatgccatagggctctgccagtgagc
CAC-R 25 cactggcagagccctgtggcatcgcgagcaggcgct
CAC-F 26 tgctcgcgatgccacagggctctgccagtgagc
CAG-R 27 cactggcagagccctctggcatcgcgagcaggcgct
CAG-F 28 tgctcgcgatgccagagggctctgccagtgagc
TAA-R 29 cactggcagagccctttagcatcgcgagcaggcgct
TAA-F 30 tgctcgcgatgctaaagggctctgccagtgagc
TAG-R 31 cactggcagagccctctagcatcgcgagcaggcgct
TAG-F 32 tgctcgcgatgctagagggctctgccagtgagc
TAC-R 33 cactggcagagccctgtagcatcgcgagcaggcgct
TAC-F 34 tgctcgcgatgctacagggctctgccagtgagc
TAT-R 35 cactggcagagccctatagcatcgcgagcaggcgct
TAT-F 36 tgctcgcgatgctatagggctctgccagtgagc
表2:构建16种三碱基基序报告体系所用材料以及组装顺序
Figure PCTCN2021104801-appb-000005
表3:实施例1-3中使用的arRNA序列
Figure PCTCN2021104801-appb-000006
Figure PCTCN2021104801-appb-000007
注:m表示对其右侧碱基进行二甲氧修饰(2’-O-me);*表示其前后两个核苷酸之间由硫代磷酸二脂键(Phosphorothioate)连接;下划线标记的核酸在arRNA与靶标RNA杂交时与靶标RNA上的三碱基基序直接相对的3个碱基
Figure PCTCN2021104801-appb-000008
Figure PCTCN2021104801-appb-000009
Figure PCTCN2021104801-appb-000010
表5
Figure PCTCN2021104801-appb-000011
注:大小写字母无区别,大写字母只为突出序列间差异。
Figure PCTCN2021104801-appb-000012
Figure PCTCN2021104801-appb-000013
Figure PCTCN2021104801-appb-000014
表8实施例4中使用的相关arRNA(arRNA)序列
Figure PCTCN2021104801-appb-000015
注:大小写字母无区别,大写字母只为突三连互补碱基。
参考文献
1.Adikusuma,F.,Piltz,S.,Corbett,M.A.,Turvey,M.,McColl,S.R.,Helbig,K.J.,...&Thomas,P.Q.(2018).Large deletions induced by Cas9 cleavage.Nature,560(7717),E8-E9.
2.Cox,D.B.,Gootenberg,J.S.,Abudayyeh,O.O.,Franklin,B.,Kellner,M.J.,Joung,J.,&Zhang,F.(2017).RNA editing with CRISPR-Cas13.Science,358(6366),1019-1027.
3.Charlesworth,C.T.,Deshpande,P.S.,Dever,D.P.,Camarena,J.,Lemgart,V.T.,Cromer,M.K.,...&Behlke,M.A.(2019).Identification of preexisting adaptive immunity to Cas9 proteins in humans.Nature medicine,25(2),249-254.
4.Cullot,G.,Boutin,J.,Toutain,J.,Prat,F.,Pennamen,P.,Rooryck,C.,...&Bibeyran,A.(2019).CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations.Nature communications,10(1),1-14.
5.Enache,O.M.,Rendo,V.,Abdusamad,M.,Lam,D.,Davison,D.,Pal,S.,...&Thorner,A.R.(2020).Cas9 activates the p53 pathway and selects for p53-inactivating mutations.Nature Genetics,1-7.
6.Haapaniemi,E.,Botla,S.,Persson,J.,Schmierer,B.,&Taipale,J.(2018).CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response.Nature medicine,24(7),927-930.
7.Merkle,T.,Merz,S.,Reautschnig,P.,Blaha,A.,Li,Q.,Vogel,P.,...&Stafforst,T.(2019).Precise RNA editing by recruiting endogenous ADARs with antisense oligonucleotides.Nature biotechnology,37(2),133-138.
8.Qu,L.,Yi,Z.,Zhu,S.,Wang,C.,Cao,Z.,Zhou,Z.,...&Bao,Y.(2019).Programmable RNA editing by recruiting endogenous ADAR using engineered RNAs.Nature biotechnology,37(9),1059-1069.
9.Vogel,P.,Moschref,M.,Li,Q.,Merkle,T.,Selvasaravanan,K.D.,Li,J.B.,&Stafforst,T.(2018).Efficient and precise editing of endogenous transcripts with SNAP-tagged ADARs.Nature methods,15(7),535-538.
10.Nishikura K.(2010).Functions and regulation of RNA editing by ADAR deaminases.Annual review of biochemistry,79,321–349
11.Paul Vogel,Matin Moschref,Qin Li,Tobias Merkle,Karthika D.Selvasaravanan,Jin Billy Li&Thorsten Stafforst.(2018).Efficient and precise editing of endogenous transcripts with SNAP-tagged ADARs.Nat Methods 15,535–538(2018).
12.Abudayyeh,O.O.,Gootenberg,J.S.,Franklin,B.,Koob,J.,Kellner,M.J.,Ladha,A.,...&Zhang,F.(2019).A cytosine deaminase for programmable single-base RNA editing.Science,365(6451),382-386
13.Vu,L.T.,Nguyen,T.T.K.,Md Thoufic,A.A.,Suzuki,H.,&Tsukahara,T.(2016).Chemical RNA editing for genetic restoration:the relationship between the structure and deamination efficiency of carboxyvinyldeoxyuridine oligodeoxynucleotides.Chemical biology&drug design,87(4),583-593.
14.Keppler,A.,Gendreizig,S.,Gronemeyer,T.,Pick,H.,Vogel,H.,&Johnsson,K.(2003).A general method for the covalent labeling of fusion proteins with small molecules in vivo.Nature Biotechnology,21(1),86-89
15.Stafforst,T.,&Schneider,M.F.(2012).An RNA–Deaminase Conjugate Selectively Repairs Point Mutations.Angewandte Chemie,51(44),11166-11169.

Claims (57)

  1. 一种在宿主细胞的靶标残基位置编辑靶标RNA的方法,其包括将脱氨酶招募RNA(arRNA)或编码该arRNA的构建体引入宿主细胞,其中所述arRNA包含与靶标RNA杂交的互补RNA序列,其中所述靶标残基位于一个三碱基基序中,所述三碱基基序包含靶标RNA中靶标残基的5'最近邻残基(上游残基),靶标残基和靶标RNA中靶标残基的3'最近邻残基(下游残基),其中所述三碱基基序不是UAG,并且其中所述互补RNA序列包含与所述靶标RNA上的上游残基和/或下游残基直接相对的错配。
  2. 根据权利要求1中所述的方法,其中所述互补RNA序列包含与所述靶标RNA的上游残基直接相对的错配。
  3. 根据权利要求1或2中所述的方法,其中,所述互补RNA序列包含与所述靶标RNA的下游残基直接相对的错配。
  4. 根据权利要求1-3中任一项所述的方法,其中所述靶标残基是腺苷。
  5. 根据权利要求4中所述的方法,其中所述上游残基选自G、A、或C。
  6. 根据权利要求1-4中任一项所述的方法,其中所述三碱基基序选自GAG,GAC,GAA,GAU,AAG,AAC,AAA,AAU,CAG,CAC,CAA,CAU,UAA,UAC和UAU。
  7. 根据权利要求4-6中任一项所述的方法,其中所述互补RNA序列包含与所述靶标RNA中的所述靶标腺苷直接相对的胞苷,腺苷或尿苷。
  8. 根据权利要求4-7中任一项所述的方法,其中所述互补RNA序列还包含一个或多个错配,所述错配各自与所述靶标RNA中的非靶标腺苷相对。
  9. 根据权利要求4-8中任一项所述的方法,其中所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序相对的ACG、UCC、CCU或ACA。
  10. 根据权利要求9中所述的方法,其中所述三碱基基序是GAU,并且其中所述互补RNA序列包含与所述三碱基基序相对的ACG。
  11. 根据权利要求4-8中任一项所述的方法,其中所述三碱基基序是GAA,并且其中所述互补RNA序列包括与所述三碱基基序相对的UCA、CCG、CCC或UCG。
  12. 根据权利要求11中所述的方法,其中所述三碱基基序是GAA,所述互补RNA序列包括与所述三碱基基序相对的UCA或UCG。
  13. 根据权利要求4-8中任一项所述的方法,其中所述三碱基基序是GAC,并且其中所述互补RNA序列包含与所述三碱基基序相对的GCG或GCA。
  14. 根据权利要求13中所述的方法,其中所述三碱基基序是GAC,所述互补RNA序列包含与所述三碱基基序相对的GCG。
  15. 根据权利要求4-8中任一项所述的方法,其中所述三碱基基序是GAG,并且其中所述互补RNA序列包含与所述三碱基基序相对的CCG、CCA、CCC、UCC或UCG。
  16. 根据权利要求15中所述的方法,其中所述三碱基基序是GAG,所述互补RNA序列包含与所述三碱基基序相对的CCG。
  17. 根据权利要求4-8中任一项所述的方法,其中所述靶标RNA中的上游残基选自G,C,A和U的核苷酸,优选次序为G>C≈A>U。
  18. 根据权利要求4-8中任一项所述的方法,其中所述三碱基基序上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G或A。
  19. 根据权利要求4-18中任一项所述的方法,其中所述三碱基基序的下游残基与所述互补RNA中相对的碱基严格互补。
  20. 根据权利要求4-19中任一项所述的方法,其中所述arRNA招募作用于RNA的腺苷脱氨酶(ADAR)或包含ADAR催化结构域的融合蛋白以使所述靶标RNA中的靶标腺苷脱氨。
  21. 根据权利要求20中所述的方法,其中所述包含ADAR催化结构域的融合蛋白进一步包含靶向结构域。
  22. 根据权利要求20或21中所述的方法,其中所述ADAR蛋白或包含ADAR催化结构域的融合蛋白或编码所述ADAR蛋白或包含ADAR催化结构域的融合蛋白的构建体经外源引入所述宿主细胞中。
  23. 根据权利要求20中所述的方法,其中所述ADAR蛋白由所述宿主细胞内源表达。
  24. 根据权利要求1-3中任一项所述的方法,其中所述靶标残基是胞苷,所述arRNA招募作用于RNA的具有胞苷脱氨酶活性的脱氨酶,以使所述靶标RNA中的靶标胞苷脱氨。
  25. 根据权利要求24中所述的方法,其中所述靶标RNA中的靶标胞苷所在的三碱基基序选自以下任一项:GCG,GCC,GCA,GCU,ACG,ACC,ACA,ACU,CCG,CCC,CCA,CCU,UCA,UCC,UCU和UCG。
  26. 根据权利要求24或25中所述的方法,其中所述互补RNA序列包含与所述靶RNA中的所述靶标胞苷相对的胞苷,腺苷或尿苷。
  27. 根据权利要求24至26中任一项所述的方法,其中所述互补RNA序列还包含一个或多个错配,所述错配各自与所述靶RNA中的非靶标胞苷相对。
  28. 根据权利要求24-27中任一项所述的方法,其中所述三碱基基序上游残基为G,且其中所述互补RNA中与所述上游残基相对的碱基为G。
  29. 根据权利要求24-27中任一项所述的方法,其中所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG或UCG。
  30. 根据权利要求29中所述的方法,其中所述三碱基基序为GCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的UUG。
  31. 根据权利要求24-27中任一项所述的方法,其中所述三碱基基序为CCA,并且其中所述互补RNA序列包含与所述三碱基基序相对的AUG。
  32. 根据权利要求24-27中任一项所述的方法,其中所述靶标RNA中的上游残基选自G,C,A和U的核苷酸,优选次序为G>C>A≈U。
  33. 根据权利要求24-32中任一项所述的方法,其中所述具有胞苷脱氨酶活性的脱氨酶是对ADAR蛋白或包含ADAR催化结构域的融合蛋白进行基因修饰后获得C到U催化活性的脱氨酶。
  34. 根据权利要求33中所述的方法,其中所述具有胞苷脱氨酶活性的脱氨酶还包含靶向结构域。
  35. 根据权利要求21或34中所述的方法,其中所述靶向结构域选自以下所述的任一项:SNAP-tag、λN肽、或催化失活的Cas13蛋白。
  36. 根据权利要求22或24中所述的方法,其中所述构建体为选自以下的任一项:病毒载体、质粒或线性核酸链。
  37. 根据权利要求1-36中任一项所述的方法,其中所述arRNA的长度为约20-260个核苷酸。
  38. 根据权利要求1-37中任一项所述的方法,其中所述arRNA是单链RNA。
  39. 根据权利要求1-38中任一项所述的方法,其中所述互补RNA序列是单链的,并且其中所述arRNA还包含一个或多个双链区域。
  40. 根据权利要求1-39中任一项所述的方法,其中所述arRNA还包含ADAR招募结构域。
  41. 根据权利要求1-40中任一项所述的方法,其中所述arRNA包含一种或多种化学修饰。
  42. 根据权利要求1-40中任一项所述的方法,其中所述arRNA不包含任何化学修饰。
  43. 根据权利要求1-42中任一项所述的方法,其中所述靶标RNA是选自信使RNA前体、信使RNA、核糖体RNA、转运RNA、长链非编码RNA和小RNA的RNA。
  44. 根据权利要求1-43中任一项中所述的方法,其中在靶标RNA的靶标残基上进行编辑导致靶标RNA的错义突变、提前出现的终止密码子、异常剪接或可变剪接,或逆转靶标RNA中的错义突变、提前出现的终止密码子、异常剪接或可变剪接。
  45. 根据权利要求1-44中任一项所述的方法,其中在靶标RNA中编辑靶标残基导致靶标RNA编码的蛋白的点突变、截短、延长和/或错误折叠,或通过逆转靶标RNA的错义突变、提前出现的终止密码子、异常剪接、或可变剪接获得功能性的、全长、正确折叠的和/或野生型的蛋白质。
  46. 权利要求1-45中任一项所述的方法,其中所述宿主细胞是真核细胞。
  47. 根据权利要求46中所述的方法,其中所述宿主细胞是哺乳动物细胞。
  48. 根据权利要求47中所述的方法,其中所述宿主细胞是人或小鼠细胞。
  49. 根据权利要求1-48中任一项所述的方法产生的经编辑的RNA或包含经编辑的RNA的宿主细胞。
  50. 一种文库,其包含多个根据权利要求49中所述的RNA或多个如权利要求49中所述经编辑的RNA的宿主细胞。
  51. 一种用于在个体中治疗或预防疾病或病症的方法,其包括根据权利要求1-48中任一项所述的方法编辑个体细胞中与所述疾病或病症相关的靶标RNA。
  52. 根据权利要求51中所述的方法,其中所述疾病或病症是遗传性基因疾病或与一种或多种获得性基因突变相关的疾病或病症。
  53. 一种arRNA,其包括根据权利要求1-49中任一项所述方法中使用的arRNA。
  54. 一种病毒载体、质粒或线性核酸链,其包含根据权利要求53中所述的arRNA,且所述arRNA不包含任何化学修饰。
  55. 一种文库,其包含多个根据权利要求53中所述的arRNA或多个根据权利要求54中所述的病毒载体、质粒或线性核酸链。
  56. 一种组合物,其包含根据权利要求53中所述的arRNA或根据权利要求54中所述的病毒载体、质粒或线性核酸链。
  57. 一种宿主细胞,其包含根据权利要求53中所述的arRNA或根据权利要求54中所述的病毒载体、质粒或线性核酸链。
PCT/CN2021/104801 2020-07-06 2021-07-06 一种改善的rna编辑方法 WO2022007803A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
EP21838580.5A EP4177345A1 (en) 2020-07-06 2021-07-06 Improved rna editing method
CA3185231A CA3185231A1 (en) 2020-07-06 2021-07-06 Improved rna editing method
JP2023501188A JP2023532375A (ja) 2020-07-06 2021-07-06 改良されたrna編集方法
CN202180047686.8A CN116194582A (zh) 2020-07-06 2021-07-06 一种改善的rna编辑方法
KR1020237004198A KR20230035362A (ko) 2020-07-06 2021-07-06 개선된 rna 편집 방법
US18/015,054 US20230272379A1 (en) 2020-07-06 2021-07-06 Improved rna editing method
AU2021305359A AU2021305359A1 (en) 2020-07-06 2021-07-06 Improved RNA editing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020100467 2020-07-06
CNPCT/CN2020/100467 2020-07-06

Publications (1)

Publication Number Publication Date
WO2022007803A1 true WO2022007803A1 (zh) 2022-01-13

Family

ID=79552779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104801 WO2022007803A1 (zh) 2020-07-06 2021-07-06 一种改善的rna编辑方法

Country Status (9)

Country Link
US (1) US20230272379A1 (zh)
EP (1) EP4177345A1 (zh)
JP (1) JP2023532375A (zh)
KR (1) KR20230035362A (zh)
CN (1) CN116194582A (zh)
AU (1) AU2021305359A1 (zh)
CA (1) CA3185231A1 (zh)
TW (1) TW202214853A (zh)
WO (1) WO2022007803A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023152371A1 (en) 2022-02-14 2023-08-17 Proqr Therapeutics Ii B.V. Guide oligonucleotides for nucleic acid editing in the treatment of hypercholesterolemia
WO2023185231A1 (en) * 2022-04-02 2023-10-05 Edigene Therapeutics (Beijing) Inc. Engineered adar-recruiting rnas and methods of use for usher syndrome
WO2024013361A1 (en) 2022-07-15 2024-01-18 Proqr Therapeutics Ii B.V. Oligonucleotides for adar-mediated rna editing and use thereof
WO2024013360A1 (en) 2022-07-15 2024-01-18 Proqr Therapeutics Ii B.V. Chemically modified oligonucleotides for adar-mediated rna editing
WO2024084048A1 (en) 2022-10-21 2024-04-25 Proqr Therapeutics Ii B.V. Heteroduplex rna editing oligonucleotide complexes
WO2024100247A1 (en) * 2022-11-11 2024-05-16 Eberhard Karls Universität Tübingen Artificial nucleic acids for site-directed editing of a target rna

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014018423A2 (en) 2012-07-25 2014-01-30 The Broad Institute, Inc. Inducible dna binding proteins and genome perturbation tools and applications thereof
WO2019005884A1 (en) 2017-06-26 2019-01-03 The Broad Institute, Inc. CRISPR / CAS-ADENINE DEAMINASE COMPOSITIONS, SYSTEMS AND METHODS FOR TARGETED NUCLEIC ACID EDITION
WO2019071048A9 (en) 2017-10-04 2019-06-06 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2020001793A1 (en) 2018-06-29 2020-01-02 Eberhard-Karls-Universität Tübingen Artificial nucleic acids for rna editing
WO2020051555A1 (en) * 2018-09-06 2020-03-12 The Regents Of The University Of California Rna and dna base editing via engneered adar recruitment
WO2020074001A1 (en) 2018-10-12 2020-04-16 Peking University Methods and Compositions for Editing RNAs

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014018423A2 (en) 2012-07-25 2014-01-30 The Broad Institute, Inc. Inducible dna binding proteins and genome perturbation tools and applications thereof
WO2019005884A1 (en) 2017-06-26 2019-01-03 The Broad Institute, Inc. CRISPR / CAS-ADENINE DEAMINASE COMPOSITIONS, SYSTEMS AND METHODS FOR TARGETED NUCLEIC ACID EDITION
WO2019071048A9 (en) 2017-10-04 2019-06-06 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2020001793A1 (en) 2018-06-29 2020-01-02 Eberhard-Karls-Universität Tübingen Artificial nucleic acids for rna editing
WO2020051555A1 (en) * 2018-09-06 2020-03-12 The Regents Of The University Of California Rna and dna base editing via engneered adar recruitment
WO2020074001A1 (en) 2018-10-12 2020-04-16 Peking University Methods and Compositions for Editing RNAs

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
ABUDAYYEH, O. O.GOOTENBERG, J. S.FRANKLIN, B.KOOB, J.,KELLNER, M. J.LADHA, A.,ZHANG, F.: "A cytosine deaminase for programmable single-base RNA editing", SCIENCE, vol. 365, no. 6451, 2019, pages 382 - 386, XP055768225, DOI: 10.1126/science.aax7063
ADIKUSUMA, F.PILTZ, S.,CORBETT, M. A.,TURVEY, M.MCCOLL, S. RHELBIG, K. J.,THOMAS, P. Q.: "Large deletions induced by Cas9 cleavage", NATURE, vol. 560, no. 7717, 2018, pages E8 - E9, XP036563461, DOI: 10.1038/s41586-018-0380-z
CHARLESWORTH, C. T.DESHPANDE, P. S.DEVER, D. P.,CAMARENA, J.LEMGART, V. T.CROMER, M. K.BEHLKE, M. A.: "Identification of preexisting adaptive immunity to Cas9 proteins in humans", NATURE MEDICINE, vol. 25, no. 2, 2019, pages 249 - 254, XP036693195, DOI: 10.1038/s41591-018-0326-x
CHEN KEZHU, MA RUZE, WANG FANG: "Biological roles of adenosine deaminase acting on RNA and their relationship with human diseases", JOURNAL OF CENTRAL SOUTH UNIVERSITY(MEDICAL SCIENCE), vol. 43, no. 8, 15 August 2018 (2018-08-15), CN , pages 904 - 911, XP055886279, ISSN: 1672-7347, DOI: 10.11817/j.issn.1672-7347.2018.08.014 *
COX, D. B.GOOTENBERG, J. S.ABUDAYYEH, O. O.FRANKLIN, B.,KELLNER, M. J.JOUNG, J.,ZHANG, F.: "RNA editing with CRISPR-Cas13", SCIENCE, vol. 358, no. 6366, 2017, pages 1019 - 1027, XP055491658, DOI: 10.1126/science.aaq0180
CULLOT, G., BOUTIN, J., TOUTAIN, J., PRAT, F., PENNAMEN, P., ROORYCK, C., BIBEYRAN, A.: "CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations", NATURE COMMUNICATIONS, vol. 10, no. 1, 2019, pages 1 - 14
ENACHE, O. M., RENDO, V., ABDUSAMAD, M., LAM, D., DAVISON, D., PAL, S., THORNER, A. R.: "Cas9 activates the p53 pathway and selects for p53-inactivating mutations", NATURE GENETICS, 2020, pages 1 - 7
HAAPANIEMI, E.BOTLA, S.PERSSON, J.SCHMIERER, B.TAIPALE, J.: "CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response", NATURE MEDICINE, vol. 24, no. 7, 2018, pages 927 - 930, XP036542072, DOI: 10.1038/s41591-018-0049-z
KEPPLER, A.GENDREIZIG, S.GRONEMEYER, T.,PICK, H.VOGEL, H.JOHNSSON, K.: "A general method for the covalent labeling of fusion proteins with small molecules in vivo", NATURE BIOTECHNOLOGY, vol. 21, no. 1, 2003, pages 86 - 89
MERKLE, T.MERZ, S.REAUTSCHNIG, PBLAHA, A.LI, Q.,VOGEL, P.,STAFFORST, T.: "Precise RNA editing by recruiting endogenous ADARs with antisense oligonucleotides", NATURE BIOTECHNOLOGY, vol. 37, no. 2, 2019, pages 133 - 138, XP036900581, DOI: 10.1038/s41587-019-0013-6
NISHIKURA K.: "Functions and regulation of RNA editing by ADAR deaminases", ANNUAL REVIEW OF BIOCHEMISTRY, vol. 79, 2010, pages 321 - 349, XP055800809
PAUL VOGEL, MATIN MOSCHREF, QIN LI, TOBIAS MERKLE, KARTHIKA D. SELVASARAVANAN, JIN BILLY LI, THORSTEN STAFFORST.: " Efficient and precise editing of endogenous transcripts withSNAP-tagged ADARs. ", NAT METHODS, vol. 15, no. 2018, 2018, pages 535 - 538
QU LIANG; YI ZONGYI; ZHU SHIYOU; WANG CHUNHUI; CAO ZHONGZHENG; ZHOU ZHUO; YUAN PENGFEI; YU YING; TIAN FENG; LIU ZHIHENG; BAO YING;: "Programmable RNA editing by recruiting endogenous ADAR using engineered RNAs", NATURE BIOTECHNOLOGY, vol. 37, no. 9, 15 July 2019 (2019-07-15), New York, pages 1059 - 1069, XP036878168, ISSN: 1087-0156, DOI: 10.1038/s41587-019-0178-z *
QU, L.YI, Z.ZHU, S.,WANG, C.,CAO, Z.ZHOU, Z.BAO, Y.: "Programmable RNA editing by recruiting endogenous ADAR using engineered RNAs", NATURE BIOTECHNOLOGY, vol. 37, no. 9, 2019, pages 1059 - 1069, XP036888288, DOI: 10.1038/s41587-019-0178-z
STAFFORST, T.SCHNEIDER, M. F.: "An RNA Deaminase Conjugate Selectively Repairs Point Mutations", ANGEWANDTE CHEMIE, vol. 51, no. 44, 2012, pages 11166 - 11169
VOGEL, P., MOSCHREF, M., LI, Q., MERKLE, T., SELVASARAVANAN, K. D., LI, J. B., STAFFORST, T.: "Efficient and precise editing of endogenous transcripts with SNAP-tagged ADARs", NATURE METHODS, vol. 15, no. 7, 2018, pages 535 - 538, XP036542160, DOI: 10.1038/s41592-018-0017-z
VU, L. T.NGUYEN, T. T. K.MD THOUFIC, A. A.SUZUKI, H.,TSUKAHARA, T.: "Chemical RNA editing for genetic restoration: the relationship between the structure and deamination efficiency of carboxyvinyldeoxyuridine oligodeoxynucleotides", CHEMICAL BIOLOGY & DRUG DESIGN, vol. 87, no. 4, 2016, pages 583 - 593

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023152371A1 (en) 2022-02-14 2023-08-17 Proqr Therapeutics Ii B.V. Guide oligonucleotides for nucleic acid editing in the treatment of hypercholesterolemia
WO2023185231A1 (en) * 2022-04-02 2023-10-05 Edigene Therapeutics (Beijing) Inc. Engineered adar-recruiting rnas and methods of use for usher syndrome
WO2024013361A1 (en) 2022-07-15 2024-01-18 Proqr Therapeutics Ii B.V. Oligonucleotides for adar-mediated rna editing and use thereof
WO2024013360A1 (en) 2022-07-15 2024-01-18 Proqr Therapeutics Ii B.V. Chemically modified oligonucleotides for adar-mediated rna editing
WO2024084048A1 (en) 2022-10-21 2024-04-25 Proqr Therapeutics Ii B.V. Heteroduplex rna editing oligonucleotide complexes
WO2024100247A1 (en) * 2022-11-11 2024-05-16 Eberhard Karls Universität Tübingen Artificial nucleic acids for site-directed editing of a target rna
WO2024099575A1 (en) * 2022-11-11 2024-05-16 Eberhard Karls Universität Tübingen Artificial nucleic acids for site-directed editing of a target rna

Also Published As

Publication number Publication date
CN116194582A (zh) 2023-05-30
KR20230035362A (ko) 2023-03-13
JP2023532375A (ja) 2023-07-27
US20230272379A1 (en) 2023-08-31
CA3185231A1 (en) 2022-01-13
EP4177345A1 (en) 2023-05-10
AU2021305359A1 (en) 2023-02-16
TW202214853A (zh) 2022-04-16

Similar Documents

Publication Publication Date Title
WO2022007803A1 (zh) 一种改善的rna编辑方法
US11649454B2 (en) Single-stranded RNA-editing oligonucleotides
CN115651927B (zh) 编辑rna的方法和组合物
TW202043249A (zh) 編輯rna的方法和組合物
EP3234134A1 (en) Targeted rna editing
US20230242916A1 (en) Method and drug for treating hurler syndrome
WO2021136520A1 (zh) 一种靶向编辑rna的新方法
CN113528582B (zh) 基于leaper技术靶向编辑rna的方法和药物
US7972816B2 (en) Efficient process for producing dumbbell DNA
CN113122524B (zh) 一种靶向编辑rna的新方法
WO2018223843A1 (zh) 用于dna编辑的系统及其应用
WO2008138066A1 (en) Suppression of viruses involved in respiratory infection or disease

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21838580

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023501188

Country of ref document: JP

Kind code of ref document: A

Ref document number: 3185231

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 20237004198

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021838580

Country of ref document: EP

Effective date: 20230206

ENP Entry into the national phase

Ref document number: 2021305359

Country of ref document: AU

Date of ref document: 20210706

Kind code of ref document: A