WO2022156378A1 - 基因转录框架、载体系统、基因组序列编辑方法及应用 - Google Patents

基因转录框架、载体系统、基因组序列编辑方法及应用 Download PDF

Info

Publication number
WO2022156378A1
WO2022156378A1 PCT/CN2021/134710 CN2021134710W WO2022156378A1 WO 2022156378 A1 WO2022156378 A1 WO 2022156378A1 CN 2021134710 W CN2021134710 W CN 2021134710W WO 2022156378 A1 WO2022156378 A1 WO 2022156378A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
gene
vector
short interspersed
short
Prior art date
Application number
PCT/CN2021/134710
Other languages
English (en)
French (fr)
Inventor
彭双红
隋云鹏
Original Assignee
彭双红
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 彭双红 filed Critical 彭双红
Priority to BR112023014382A priority Critical patent/BR112023014382A2/pt
Priority to CA3206081A priority patent/CA3206081A1/en
Priority to KR1020237028513A priority patent/KR20230135630A/ko
Priority to MX2023008635A priority patent/MX2023008635A/es
Priority to EP21920753.7A priority patent/EP4282968A1/en
Priority to AU2021422067A priority patent/AU2021422067A1/en
Priority to JP2023541700A priority patent/JP2024504592A/ja
Publication of WO2022156378A1 publication Critical patent/WO2022156378A1/zh
Priority to ZA2023/08025A priority patent/ZA202308025B/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/14Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
    • A61P25/16Anti-Parkinson drugs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/28Drugs for disorders of the nervous system for treating neurodegenerative disorders of the central nervous system, e.g. nootropic agents, cognition enhancers, drugs for treating Alzheimer's disease or other forms of dementia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • A61P35/02Antineoplastic agents specific for leukemia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P43/00Drugs for specific purposes, not provided for in groups A61P1/00-A61P41/00
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/106Plasmid DNA for vertebrates
    • C12N2800/107Plasmid DNA for vertebrates for mammalian
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/90Vectors containing a transposable element

Definitions

  • the invention belongs to the field of biotechnology, and relates to a gene editing technology, in particular to a gene editing technology that can be mediated by DNA, RNA or RNP pathways and its application.
  • gene editing technologies in the field of biotechnology mainly include ZFN, TALEN, CRISPR/Cas9 and Targetron technology.
  • ZFN technology appeared the earliest in history, but because its DNA binding domain can generally only recognize sequences with a length of 9 bp, its targeting accuracy in practical application is greatly limited, and the actual design of this technology is cumbersome and cannot be used. Knock out unknown upstream and downstream sequences. In addition, its cytotoxicity and off-target rate are high.
  • TALEN technology is simpler in design, and can recognize sequences of 17-18 bp, making it more specific.
  • the core technology is mastered by individual commercial companies, most laboratories cannot complete it by themselves, making it The dissemination and application are subject to certain restrictions.
  • CRISPR/Cas9 technology is the easiest of the three, and it is also the most operable. It was first discovered in archaea and bacteria, and can specifically recognize sequences of about 20bp. Under the action of Cas9 endonuclease, it can cause double-strand breaks at specific sites, and repair it through the system's own DNA repair function. Perform gene editing operations.
  • the newer Targetron technology uses class II introns to insert sequences at specific sites in the genome to mutate the corresponding genes.
  • this technology will inevitably cause double-strand breaks in the genome and introduce exogenous class II introns into the genome to produce "scars", and since this technology originates from prokaryotes, the RNAs produced by itself for reverse transcription have no effect.
  • the function of transmembrane transport limits the application of its RNA function alone.
  • the technology works well in bacterial gene editing but not in higher organisms. All of these four gene editing technologies must introduce proteins and nucleic acids that do not belong to the receiving system, which increases the uncertainty of their effects and greatly hinders their clinical applications.
  • the purpose of the present invention is to provide a gene transcription framework, which can be transcribed into the nucleus or cytoplasm through DNA, RNA and/or RNP pathways after transcription, and the target fragment is inserted into a specific position in the genome Point or delete or replace specific segments in the genome with high targeting accuracy.
  • Another object of the present invention is to provide a vector system mediated by DNA, RNA and/or RNP pathways.
  • the third object of the present invention is to provide a gene editing method, which can use DNA, RNP or RNA (which can be produced in vitro) and Relevant proteins are used as mediators, which are transferred into the nucleus or cytoplasm through DNA, RNA or RNP pathways, and the target fragments are inserted into the designated sites in the genome or deleted or replaced in the genome, and have high targeting accuracy at the same time.
  • the present invention provides a gene transcription framework, which includes the upstream sequence of the target site, the sequence to be inserted, and the downstream sequence of the target site along the 5′ ⁇ 3′ direction;
  • the gene transcription framework is a DNA sequence that can be transcribed by RNA polymerase I, RNA polymerase II or RNA polymerase III, the target site upstream sequence in the transcription product of the gene transcription framework or the conversion product of the transcription product or its
  • the complementary sequence can hybridize with the upstream sequence of the corresponding target site in the cell genome or its complementary sequence, and the downstream sequence of the target site or its complementary sequence can hybridize with the downstream sequence of the corresponding target site in the cell genome or its complementary sequence.
  • the upstream sequence of the target site and the downstream sequence of the target site are directly connected in the corresponding target gene sequence in the genome, and the site between the upstream sequence of the target site and the downstream sequence of the target site in the target gene sequence on the genome is the site to be inserted. target site of the sequence.
  • the above-mentioned gene transcription framework is used to insert the sequence to be inserted into the genomic target site.
  • the cells are eukaryotic cells.
  • the present invention also provides a vector system, the vector system includes one or more vectors, and the one or more vectors include:
  • components 1), 2) and/or 3) are located on the same or different carriers of the carrier system;
  • components 1) are multiple, they are located on the same or different carriers of the carrier system;
  • component 2 When component 2) is multiple, it is on the same or different carriers of the carrier system;
  • component 3 When component 3) is multiple, it is located on the same or different carriers of the carrier system;
  • the vector carries one or more promoters, which are RNA polymerase I promoter, RNA polymerase II promoter or RNA polymerase III promoter, and are located in the components 1), 2) and/or 3) upstream;
  • promoters which are RNA polymerase I promoter, RNA polymerase II promoter or RNA polymerase III promoter, and are located in the components 1), 2) and/or 3) upstream;
  • the vector system is mediated through the DNA, RNA and/or RNP pathways.
  • the vectors are eukaryotic expression vectors, prokaryotic expression vectors, viral vectors, plasmid vectors, artificial chromosomes, phage vectors, and cosmid vectors.
  • the vector is an expression vector, a cloning vector, a sequencing vector, a transformation vector, a shuttle vector or a multifunctional vector.
  • the short interspersed elements, and/or part of the short interspersed elements, and/or the short interspersed elements are located downstream of the gene transcription framework, and the gene transcription framework is associated with The short interspersed element, and/or part of the short interspersed element, and/or the short interspersed element-like element are directly or indirectly connected; when directly connected, the gene transcription framework is connected to the short interspersed element, and/or part of the short interspersed element, and /or a short interspersed element sharing a promoter; when indirectly linked, the gene transcription framework shares a promoter with the short interspersed element, and/or part of the short interspersed element, and/or the short interspersed element, or does not share a promoter son.
  • the one or more long interspersed elements, and/or one or more ORF1p coding sequences, and/or one or more ORF2p coding sequences are located in the transcription of the gene. upstream and/or downstream of the framework, and the gene transcription framework is directly or indirectly linked to the one or more long interspersed elements, and/or one or more ORF1p coding sequences, and/or one or more ORF2p coding sequences; When directly linked, the gene transcription framework shares a promoter with the one or more long interspersed elements, and/or one or more ORF1p coding sequences, and/or one or more ORF2p coding sequences; when indirectly linked, The gene transcription framework shares a promoter or does not share a promoter with the one or more long interspersed elements, and/or one or more ORF1p coding sequences, and/or one or more ORF2p coding sequences.
  • the short interspersed element and/or part of the short interspersed element and/or the short interspersed element are located downstream of the gene transcription framework, and/or the long interspersed element
  • the interspersed element and/or the downstream of the ORF1p coding sequence and/or the ORF2p coding sequence when the short interspersed element and/or part of the short interspersed element and/or the short interspersed element are located downstream of the transcription frame of the gene, the long interspersed element and /or the ORF1p coding sequence and/or the ORF2p coding sequence are located upstream of the transcriptional frame of the gene, and/or the long interspersed element and/or the ORF1p coding sequence and/or the ORF2p coding sequence are located in the short interspersed element and/or part of the short interspersed element and/or downstream of the short interspersed element; when the short interspersed element and/or part of the short interspersed element and/or
  • the long interspersed element and/or the ORF1p coding sequence and/or the ORF2p coding sequence are located in the short interspersed element and/or part of the short interspersed element and/or the short interspersed element upstream and/or downstream of the element, and the short interspersed element and/or part of the short interspersed element and/or the short interspersed element-like element is directly or indirectly connected to the long interspersed element and/or the ORF1p coding sequence and/or the ORF2p coding sequence;
  • the short interspersed element and/or part of the short interspersed element and/or short interspersed element share a promoter with the long interspersed element and/or the ORF1p coding sequence and/or the ORF2p coding sequence
  • the short interspersed elements and/or partial short interspersed elements and/or short interspersed elements share a promoter or do not share a promoter with the long interspersper
  • Short interspersed elements and/or partial short interspersed elements and/or short interspersed elements, long interspersed elements and/or ORF1p coding sequences and/or ORF2p coding sequences in the vector system is increased.
  • Short interspersed elements and long interspersed elements also exist naturally in different species, but due to the poor activity of the natural native promoter, transcription with an additional promoter increases expression.
  • the present invention also provides a genome sequence editing method, comprising the following steps:
  • Target site the site to be inserted (target site) of the target gene to be edited in the genome, and determine the upstream sequence (target site upstream sequence) and downstream sequence of the target gene to be inserted on both sides of the to-be-inserted site. (sequence downstream of the target site);
  • the vector system is transformed or transfected into cells, tissues or organisms for expression to achieve gene editing.
  • the present invention also provides the application of the above vector system in DNA sequence insertion, deletion and replacement in any region of the genome.
  • the DNA sequence is one or more CNV sequences, CNV terminal sequences, short interspersed elements or long interspersed elements.
  • the CNV terminal (that is, between the gene part in the CNV terminal and part of the SINE) can be edited, and sequences that are not homologous to the genome or local genome can be inserted to hinder its gene copy number Changes and expression changes; or delete the gene part sequence at the end of CNV to change the corresponding cell expression.
  • the present invention also provides the application of the above-mentioned vector system as a medicament for preventing and/or treating cancer, genetic diseases related to genes, and neurodegenerative diseases.
  • the cancer is glioma, breast cancer, cervical cancer, lung cancer, gastric cancer, colorectal cancer, duodenal cancer, leukemia, prostate cancer, endometrial cancer, thyroid cancer, lymphoma, pancreatic cancer, Liver cancer, melanoma, skin cancer, pituitary tumor, germ cell tumor, meningioma, meningeal carcinoma, glioblastoma, various types of astrocytoma, various types of oligodendroglioma, astrocytoma, Various types of ependymoma, choroid plexus papilloma, choroid plexus carcinoma, chordoma, various types of ganglionoma, olfactory neuroblastoma, sympathetic nervous system neuroblastoma, pineal cell tumor, pineal gland Blastoma, medulloblastoma, trigeminal schwannoma, facial acoustic neuroma, glomus tumor
  • the genetic disease related to the gene is Huntington disease, Fragile X syndrome, phenylketonuria, pseudohypertrophic progressive muscular dystrophy, mitochondrial encephalomyopathy, spinal muscular atrophy, Parkinson's plus syndrome , albinism, red-green color blindness, achondroplasia, black urine, congenital deafness, thalassemia, sickle cell anemia, hemophilia, epilepsy associated with genetic alterations, myoclonus, dystonia, stroke and schizophrenia, vitamin D-resistant rickets, familial polyposis colon, hereditary nephritis.
  • the neurodegenerative disease is Parkinson's disease, Alzheimer's disease, Huntington's disease, amyotrophic lateral sclerosis, spinocerebellar ataxia, multiple system atrophy, primary lateral sclerosis, Pick's disease, Frontotemporal dementia, dementia with Lewy bodies, or progressive supranuclear palsy.
  • the present invention can realize the prevention of the occurrence of the above-mentioned cancers and their metastatic cancers, inhibit their proliferation and prevent their grade increase and progression or reverse their properties; prevent, delay or improve the resistance to insulin, levodopa, various tumor chemotherapy drugs and Drug resistance such as targeted drugs, delay or stop the gene and state changes of cells and organisms, tissue and organ reconstruction and biological regeneration.
  • the application of the related technology of the present invention can also edit the copy number variation and the terminal part of each gene, change the terminal position or stabilize the terminal, because it determines the gene expression, so as to achieve the purpose of stabilizing or changing various states of cells and organisms, and Therefore, it can be applied to modify the genes and states of cells, tissues and organisms, modify organisms such as human genome to improve function, modify organisms such as human genome to treat various genetic diseases related to genes such as Huntington disease and Fragile X syndrome, etc., delay or stop the gene and state changes of cells and organisms, change the genes and states of cells or organisms, tissue and organ reconstruction and biological regeneration, transformation of somatic cells into germ cells by introducing transcription factors assisted reproduction, Prevent or delay neurodegenerative diseases such as Parkinson's disease, Alzheimer's disease, Huntington's disease, amyotrophic lateral sclerosis, multiple system atrophy, primary lateral sclerosis, spinocerebellar ataxia, Pick's disease, frontotemporal Research on lobar dementia, dementia with Lewy bodies and
  • the inserted sequence in the gene transcription framework can be an exogenous sequence or an endogenous sequence, and the length of the one-time inserted sequence is 1bp-2000bp. Insertion of DNA sequences of any length can be achieved when inserted multiple times.
  • the length of the nucleotide sequence upstream of the target site can be 10bp-2000bp, and the length of the nucleotide sequence downstream of the target site can be 10bp-2000bp.
  • the sequence to be inserted that needs to be inserted into a selected site in the genome can be positioned at the site to be inserted on the genome by relying on the upstream and downstream sequences of the target sites on both sides of the inserted sequence on the vector, and the short interspersed element, the long interspersed element and the The sequence to be inserted is inserted into the genome at a selected site with the aid of the expressed protein.
  • the ORF2p expressed in cells or vectors can slide smoothly from the 3′ end of the vector nucleic acid to the cleavage site for single-strand cleavage on the genome only under the condition that the upstream sequence of the insertion point is completely matched, which greatly improves the efficiency of single-strand cleavage.
  • the accuracy of its targeting is improved, and the occurrence of unexpected splicing is avoided, and its targeting accuracy is theoretically higher than that of the currently existing gene editing technology.
  • RNA and corresponding endogenous proteins such as ORF1p and ORF2p in vitro
  • the target sequences and genes can be modified by RNA or RNP pathway without introducing DNA fragments and transfection into the nucleus.
  • RNA or RNP-mediated pathways the impact on the recipient system can be minimized and non-specific effects can be reduced while improving targeting.
  • RNA and proteins transfected into cells can be guided into the nucleus, which is beneficial to editing cells that are difficult to manipulate because the vectors are difficult to enter the nucleus.
  • the present invention has high targeting, and can perform relatively accurate identification and cutting of targeting sequences without generating double-strand breaks, insert related sequences directionally through homologous recombination, and thereby delete corresponding fragments. Subtract and replace.
  • the existing gene editing technologies such as CRISPR produce double-strand breaks, and the probability of introducing the target sequence by homologous recombination is low, and it is easy to produce unpredictable random mutations. This technique does not generate double-strand breaks, and there is no need to worry about the danger of double-strand DNA breaks and the introduction of unintended random sequences.
  • further insertion can be carried out by continuously designing a vector according to the new site generated after the previous insertion, so as to introduce a long sequence into the genome in a progressive manner, which is also difficult to achieve with the currently known editing technology.
  • targeted and accurate deletions and sequence replacements on the genome are difficult to achieve by existing techniques but can also be achieved by the present invention.
  • the present invention provides a gene editing method mediated by DNA, RNA or RNP pathway based on eukaryotic retrotransposition mechanism.
  • DNA, RNP or RNA which can be produced in vitro
  • related proteins are used as media, and the DNA, RNA or RNP pathway is transferred into the nucleus or cytoplasm, and the target fragment is inserted into the genome. Locating sites or deleting or replacing selected segments in the genome with high targeting accuracy. Since foreign systems such as prokaryotic-derived proteins are not introduced, and double-strand breaks are not generated, the present invention can be more easily applied to the clinic compared with the existing gene editing technology.
  • Figure 1 is a schematic diagram of the basic principle of gene editing based on the retrotransposition mechanism that eukaryotes own.
  • Figure 2 is a schematic diagram of DNA-mediated genomic insertion or deletion.
  • Figure 3 is a schematic diagram of RNA- and RNP-mediated genomic insertions and deletions.
  • Figure 4 is a schematic diagram of preventing CNV end changes by inserting non-homologous sequences at the CNV ends.
  • Figure 5 is a schematic structural diagram of the gene transcription framework provided by the present invention.
  • Figure 6 is a schematic structural diagram of the gene transcription framework linked promoter provided by the present invention.
  • Figure 7 is a schematic structural diagram of the gene transcription framework provided by the present invention connecting a promoter and a short interspersed element, a partial short interspersed element or a short interspersed element-like element.
  • Figure 8 is a schematic diagram of the structure of the gene transcription framework provided by the present invention connected upstream to a promoter and downstream to a long interspersed element or ORF1p coding sequence or ORF2p coding sequence.
  • Figure 9 is a schematic structural diagram of the upstream linking of the long interspersed element or the ORF1p coding sequence or the ORF2p coding sequence of the gene transcription framework provided by the present invention, and connecting the promoter upstream of the long interspersed element or the ORF1p coding sequence or the ORF2p coding sequence.
  • Figure 10 shows the gene transcription framework provided by the present invention connecting a promoter upstream, connecting short interspersed elements, partial short interspersed elements or short interspersed elements downstream, and then connecting long interspersed elements or ORF1p coding sequence or ORF2p coding sequence downstream Schematic.
  • Figure 11 shows that the gene transcription framework provided by the present invention connects short interspersed elements, partial short interspersed elements or short interspersed elements in the downstream, long interspersed elements or ORF1p coding sequence or ORF2p coding sequence is connected upstream of the gene transcription framework, and long interspersed elements
  • FIG. 11 shows that the gene transcription framework provided by the present invention connects short interspersed elements, partial short interspersed elements or short interspersed elements in the downstream, long interspersed elements or ORF1p coding sequence or ORF2p coding sequence is connected upstream of the gene transcription framework, and long interspersed elements
  • Figure 12 is a schematic structural diagram of the gene transcription framework provided by the present invention and short interspersed elements, partial short interspersed elements or short interspersed elements-like elements do not share a promoter.
  • Figure 13 is a schematic structural diagram of the gene transcription framework provided by the present invention and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence not sharing a promoter.
  • Figure 14 shows that the gene transcription framework provided by the present invention does not share a promoter with short interspersed elements, partial short interspersed elements or similar short interspersed elements and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence, while short interspersed elements, partial short interspersed elements and short interspersed elements Schematic diagram of the structure of elements or short interspersed elements and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence sharing one promoter.
  • Figure 15 shows that the gene transcription framework provided by the present invention does not share a promoter with short interspersed elements, partial short interspersed elements or similar short interspersed elements and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence, while short interspersed elements, partial short interspersed elements and short interspersed elements Schematic diagram of the structure of elements or short interspersed elements and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence sharing one promoter.
  • 16 is a plasmid map of the plasmid pSIL-eGFP-VEGFA1-Alu1 in which the gene transcription framework VEGFA1 constructed in Example 1 was inserted into the vector in Example 1.
  • 17 is a plasmid map of the plasmid pBS-L1PA1-CH-mneo-IT15-1 in which the gene transcription framework IT15-1 constructed in Example 6 was inserted into the vector in Example 6.
  • the present invention is based on the genome remodeling mechanism commonly existing in eukaryotes through transposons to modify the gene copy number and repetitive sequence on the genome. This mechanism may cause deletion or addition of pathogenic trinucleotide repeats in some degenerative diseases of the central nervous system such as Huntington disease and Fragile X syndrome, and is consistent with homologous recombination such as high sequence homology and can be inhibited by methylation and other characteristics and is related to the expression level.
  • the present invention relates to short interspersed elements (SINE, short interspersed nuclear elements), long interspersed elements (LINE, long interspersed nuclear elements) and related proteins such as open reading Frame 1 protein (open reading frame 1 protein, ORF1p), open reading frame 2 protein (open reading frame 2 protein, ORF2p) and other kinds of open reading frame protein (open reading frame protein, ORFp).
  • Short interspersed elements mainly include Alu elements and SVA elements in primates, and mammalian-wide interspersed repeat elements (MIRs) such as MIR and MIR3, which are common in mammals.
  • the long scattered elements mainly include various types of LINE-1 (L1), various types of LINE-2 (L2) and various types of LINE-3 (L3), Ta elements and R2, RandI, L1, RTE, I and other LINE types in the Jockey six types of LINE, etc.
  • L1 various types of LINE-1
  • L2 various types of LINE-2
  • L3 various types of LINE-3
  • the promoter is a relatively short transposon that ends with an A- or T-rich tail or a short simple repeat sequence and is reverse transcribed by means of LINE, and the right half of its transcript contains the functional structure of reverse transcription; while the LINE's Characterized by the widespread distribution of transposons in the genome that contain reverse transcriptase coding sequences. Both SINEs and their corresponding LINEs in corresponding species continuously remodel the genome through similar mechanisms.
  • the basic principle of this mechanism is to connect the lariat structure produced by the body through the processing of pre-mRNA with the right half of the reverse transcription functional structure left after the cleavage of the SINE transcript (these complete SINE transcripts are in The right half of the structure with reverse transcription function remaining after the middle site splicing is called the partial SINE sequence, and the splicing site of different SINEs of different species will be different.
  • the natural splicing site of SINE is generally located in the full-length SINE. For the SINE with a full length of about 100-400 nt, its natural splicing site is usually located at the 100-250 nt.
  • the splicing site is located at the 100-250 nt.
  • the cleavage site can be observed in the range of 100-150nt. In fact, no matter where the site is located, as long as the remaining cleavage is
  • the right part contains the complete reverse transcription functional structure (the secondary structure forms a special structure, usually omega-shaped; its primary structure is characterized by containing two sequences separated by an intermediate spacer sequence between the two segments.
  • the ORF2p encoded by LINE can bind to the sequence located at 3′ in the two sequences in the transcript and is in the two sequences.
  • the genome site corresponding to the vacancy in between is cut into the single strand of the genome, and reverse transcription is initiated), which is the partial SINE sequence.
  • the partial SINE produced by a specific SINE such as an Alu element is then recorded as a partial Alu, specifically its intermediate gland.
  • RNA contains a reverse transcription functional structure and can initiate reverse transcription but is different from conventional Sequences with different SINE-like sequences are called short interspersed elements (SINE-like).
  • SINE-like short interspersed elements
  • LINE such as LINE-1 corresponding to Alu element in function and LINE-1 corresponding to various MIR elements 2, etc.
  • expressed proteins i.e. ORF1p and ORF2p
  • RNA convert RNA into double-stranded DNA and bind to the complementary sequence on the genome (where the RNA formed by transcription is reverse-transcribed to generate single-stranded DNA and single-stranded DNA).
  • the double-stranded DNA generated by using the genome sequence as the primer is the conversion product of the transcription product), which is inserted into the genome through the mechanism of homologous recombination by forming a specific ⁇ structure.
  • LINE can also complete the above-mentioned similar RNA to double-stranded DNA conversion and genomic insertion by transcribing its downstream sequence (i.e., 3' transduction) and combining with the complementary sequence on the genome and forming an omega structure.
  • the pre-mRNA generated after gene expression can be spliced to produce a lariat structure overlapping each other in sequence, which can occur in any region of the pre-mRNA, the difference is The difference in shear strength that produces these lassoes. Since the shear strength (based on sequence differences) generated by the lariat upstream and downstream of the exon is higher than that of other surrounding lariat structures, the exon is easy to be completely excised during pre-mRNA processing and inhibits other lariat. produce.
  • ORF1p produced by LINE-1 can protect the nucleic acid bound to it, and both ORF2p produced by LINE-1 can localize the bound nucleic acid to the nucleus and transport it into the nucleus; in addition, ORF2p can bind to the special ⁇ of Alu element Secondary structure and mediates subsequent genomic single-strand cleavage, reverse transcription, and auxiliary genomic integration.
  • the transcript of the Alu element can be placed at a specific site (that is, the scAlu cleavage site or the natural cleavage site below, which is generally located in front of the middle poly A sequence of the Alu transcript, and may actually float) Cut (Multiple dispersed loci produce small cytoplasmic element RNA) to produce small cytoplasmic Alu (small cytoplasmic element RNA, scAlu) and the remainder including the right monomer (including the reverse transcription functional structure that can bind ORF2p), which includes the right single The remainder of the body is called partial Alu.
  • a specific site that is, the scAlu cleavage site or the natural cleavage site below, which is generally located in front of the middle poly A sequence of the Alu transcript, and may actually float) Cut (Multiple dispersed loci produce small cytoplasmic element RNA) to produce small cytoplasmic Alu (small cytoplasmic element RNA, scAlu) and the remainder including
  • the resulting lariat structure can be linked from its 3′ end to the cleaved portion of the Alu sequence transcript containing the reverse transcription functional structure, ORF2p can be recruited via the A-rich sequence and bind to part of the Alu secondary structure.
  • the two feet of the formed ⁇ structure are located on the 3′ foot, and identify the sequence on the genome that matches the sequence on the two feet of the ⁇ (mainly UU/AAAA, the discontinuity between U and A, that is, where the gap is), and cut it.
  • the single-strand of the genomic site facing the ⁇ gap and the complementary sequence on the unstripped genome are used as primers for reverse transcription, a process called target-primed reverse transcription (TPRT); ORF2p follows reverse transcription
  • the resulting single-stranded DNA sequence can be combined with the complementary sequence on the genome and form an ⁇ structure at the corresponding site to be inserted in the genome (because the sequence to be inserted is corresponding to the genome) It does not exist at the site to be inserted, while the sequences on both sides of the sequence to be inserted on the single-stranded DNA exist on both sides of the site to be inserted in the genome), ORF2p can slide along the matching sequence to the ⁇ structure in the direction of 3′ to 5′ , identify the 6-nucleotide sequence (mainly 4 nucleotides at 3' and 2 nucleotides at 5') complementary to the gap at the bottom end of ⁇ on the genome, and form double-stranded DNA through the above-mentioned similar process.
  • ORF1p which is also encoded by LINE
  • ORF1p can also play an auxiliary role, which can help stabilize the secondary structure produced by nucleic acid and its combination with the genome during the above-mentioned genome remodeling process, and promote nucleic acid in the process of remodeling. Separation from the genome after binding and acting.
  • ORF1p has high RNA affinity and has nuclear localization function. Since ORF2p can only cut one of the double strands of the genome and cannot produce double-strand breaks, it has high safety. Similar mechanics apply to other SINE and LINE combinations as well.
  • the preference for short interspersed element sequences may be a manifestation of this mechanism in nature . It has been reported that the transcribed mRNA sequence can be integrated into the genome with the assistance of ORF1p and ORF2p, but since the transcription template is a purely foreign non-homologous sequence, it cannot target specific sites in the genome and is not linked with Fragments of reverse transcription of functional structures are inefficient and random and difficult to control.
  • the present invention achieves relatively accurate and efficient gene editing effect by redesigning the transcription sequence and connecting with sequences having reverse transcription functional structures, such as various SINEs or part of SINEs, through various active or passive means.
  • CNV copy number variation
  • the end of the CNV is composed of the upstream gene part and the downstream part of the SINE sequence, and a short sequence fragment formed by linking the lariat structure and the part of the SINE sequence is continuously inserted between these two parts to extend the CNV.
  • SINEs such as Alu sequences on the genome are markedly demethylated.
  • LINE-mediated 3' transduction (based on the deletion of the right monomer of the SINE upstream of the promoter and the complete SINE structure downstream) initiates the elongation of associated gene copy number variations (CNVs)
  • the demethylated SINE sequences are Homologous recombination with each other deletes (initializes) most of the previously extended CNVs.
  • the fully initialized embryonic cells regained their hypermethylated state, and the partial SINE sequence at the end of the CNV mediated the gradual extension of the ends of CNVs, thereby changing the expression and state of each cell, and the gene expression of each cell through the lasso.
  • the structure influences CNVs to alter, thereby causing changes in the genome that gradually induce differentiation. This is consistent with the common changes in CNVs in embryos and the differences in CNVs in various tissues.
  • CNVs of different genes Elongation of CNVs of different genes is prevalent in various types of tumor cells and is positively correlated with clinical grade.
  • the expression levels of proto-oncogenes and tumor suppressor genes are also proportional to the length of CNVs, so the formation and progression of tumors should be related to the disorder of proto-oncogenes or tumor suppressor CNVs.
  • some irreversible diseases related to external stimuli such as diabetes, may also be related to the disorder of CNVs. Since most drug resistances are related to changes in the expression of corresponding proteins caused by long-term external stimulation, CNV changes involving their corresponding genes can also be improved or hindered by this technology.
  • DNA-mediated genome sequence insertion technology when DNA-mediated gene editing is used, 1-40 TTAAAA or TTTTAA sequences can be added to the plasmid to assist in converting RNA into DNA (as shown in Figure 2)
  • Lasso structure-mediated method select the upstream and downstream sequences (within 2000bp respectively) of the site to be inserted (ie, the target site), add the sequence to be inserted (within 2000bp) at the insertion point in the middle of the upstream and downstream sequences, and design
  • the sequence is synthesized and integrated into the vector and synthesis is initiated by RNA polymerase II.
  • LINE-1 sequence corresponding to the Alu element in SINE and its encoded ORF1p and ORF2p sequences and the MIR element corresponding to LINE-2 or its encoded ORF1p and ORF2p sequence, etc. to express the protein required for gene editing, thereby realizing gene editing or increasing the editing efficiency
  • RNA polymerase II its corresponding termination signal is poly A sequence
  • its length can be appropriately extended ( Within 200bp) to increase the recruitment of ORF2p, if not, you can choose to add polyadenylation of appropriate length (within 200bp) after the ORF1p and ORF2p sequences before the termination signal, and at the same time, you can appropriately extend the polyA sequence at the end of the SINE sequence (within 200bp) )).
  • the vector (which can be simultaneously transferred into other vectors expressing the LINE sequence corresponding to the SINE sequence contained in the vector or its encoded protein sequence, such as the LINE-1 or ORF2p and/or ORF1p sequences corresponding to the Alu sequence to increase the efficiency
  • the recipient system does not express the above proteins such as ORF1p and ORF2p of the corresponding type of LINE, additional expression is required; in addition, it can also be transferred into a SINE-expressing vector at the same time to improve efficiency) by conventional means such as liposome or virus transfection, etc.
  • the cells and tissues cultured in vitro or given to organisms by means of blood, lymph, cerebrospinal fluid, etc.
  • the constructed vector enters the nucleus for expression, and the corresponding sequence to be inserted is inserted into the genome.
  • the purpose of the site if the expected efficiency is not achieved, or because the formation efficiency of the lariat structure including the upstream and downstream sequences of the insertion point and the sequence to be inserted in the middle is not high, you can try to increase or decrease the length of the upstream and downstream sequences or the length of the fragment to be inserted to promote the lariat.
  • the formation of the lasso structure; or the lasso structure containing the site to be inserted is detected according to the following detection method, the sequence of the lasso structure is used as the upstream and downstream sequences, and the two sides are appropriately extended according to the genome sequence, and the middle insertion point is the sequence to be inserted. Constructing into a vector can also improve the efficiency) (optionally, the constructed vector can be temporarily placed in a physiological liquid containing ORF1p and/or ORF2p for incubation (at a suitable temperature, normal temperature or 37°C, within 48h) to improve the efficiency of the vector. nuclear entry efficiency). If the above-mentioned method continues to be inserted according to the new site generated after the insertion, the insertion will be sustainable and the insertion of a long fragment without obvious length limitation will be completed.
  • SINE sequence direct connection method This method does not require SINE to be cut and then connected to the lasso, but directly connects the upstream and downstream sequences of the to-be-inserted site and the middle to-be-inserted sequence and the SINE-related sequence during the construction of the vector, so this method is applicable. It is suitable for systems that do not have a eukaryotic pre-mRNA splicing mechanism and cannot produce a lasso structure, such as bacteria and other prokaryotes, and also applies to eukaryotes with a pre-mRNA splicing mechanism. The latter LINE mediation method is the same. .
  • RNA polymerase II or the upstream and downstream sequences of the insertion point (within 2000bp respectively) and the sequence to be inserted (within 2000bp) sandwiched in the middle, followed by a SINE sequence, a partial SINE sequence or a SINE-like sequence (selectable in SINE, part After the SINE sequence or SINE-like sequence, add a LINE sequence or its protein coding sequence that can assist the function of the corresponding SINE to increase the efficiency of gene editing.
  • the recipient system When the recipient system does not express LINE or its encoded protein, it must be added or additionally expressed), and thereafter Selectively connect the termination signal of the corresponding type of RNA polymerase (if the termination signal is polyadenylation, it can be appropriately extended (within 200bp) to increase ORF2p recruitment; if not, LINE, ORF1p and/or LINE, ORF1p and / Or add appropriate length polyadenylation (200bp) after ORF2p sequence to increase ORF2p recruitment), and construct this sequence into the vector.
  • the termination signal if the termination signal is polyadenylation, it can be appropriately extended (within 200bp) to increase ORF2p recruitment; if not, LINE, ORF1p and/or LINE, ORF1p and / Or add appropriate length polyadenylation (200bp) after ORF2p sequence to increase ORF2p recruitment
  • the vector is transferred into cells or tissues cultured in vitro by conventional transfection means such as liposomes or viruses, or given to organisms by means such as blood, lymph and cerebrospinal fluid or local tissue administration, and the constructed vector is transferred into Expression in the nucleus (you can choose to briefly place the constructed vector in a physiological liquid containing ORF1p and/or ORF2p and incubate (at a suitable temperature, normal temperature or 37°C, within 48h) to improve the efficiency of the vector into the nucleus) , insert the sequence to be inserted into the corresponding site to be inserted on the genome. If the vector is continued to be constructed according to the above-mentioned method for insertion according to the new site generated after the insertion, the insertion will be sustainable and the insertion of long fragments without obvious length limitation will be completed.
  • RNA polymerase II initiates the expression of LINE or the protein ORF2p and/or ORF1p coding sequence in it, followed by the sequence designed by the same method in the direct connection mode with the SINE sequence (if it is to minimize the interference Under the influence of the accepting system, the SINE and LINE types in the accepting system can be selected; in order to improve the efficiency, the SINE sequence type in the accepting system can be selected to be functionally corresponding to the previous LINE), and finally the RNA polymerization used for selective ligation Termination signal for enzyme II.
  • the vector is transferred into cells or tissues cultured in vitro by conventional transfection means such as liposomes or viruses, or given to organisms by means such as blood, lymph and cerebrospinal fluid or local tissue administration, and the constructed vector is transferred into Expression in the nucleus (you can choose to briefly place the constructed vector in a physiological liquid containing ORF1p and/or ORF2p and incubate (at a suitable temperature, normal temperature or 37°C, within 48h) to improve the efficiency of the vector into the nucleus) , insert the sequence to be inserted into the corresponding site to be inserted on the genome. If the above-mentioned method continues to be inserted according to the new site generated after the insertion, the insertion will be sustainable and the insertion of a long fragment without obvious length limitation will be completed.
  • Downstream linking ORF2p binding sequence method replace the reverse transcription functional structure in SINE with the ⁇ structure formed by the combination of the upstream sequence of the target site, the downstream sequence of the target site and the intermediate sequence to be inserted in the gene transcription framework on the vector and the genome. Initiation of reverse transcription, so the ORF2p binding sequence (such as poly A sequence) that can bind to ORF2p is connected downstream of the target site in the gene transcription frame, and the LINE sequence can optionally be added on the same vector as the gene transcription frame or on another vector , ORF1p and/or ORF2p coding sequences to improve efficiency.
  • ORF2p binding sequence such as poly A sequence
  • the vector containing the gene transcription framework linked downstream of the ORF2p binding sequence (such as poly A sequence) is transferred into in vitro cultured cells, tissues or via blood, lymph and cerebrospinal fluid by conventional transfection means such as liposomes or viruses. It is given to the organism by isopathic or local tissue administration, and the constructed vector is transferred into the nucleus for expression (optionally, the constructed vector can be temporarily placed in a physiological liquid containing ORF1p and/or ORF2p and incubated (at a suitable temperature, normal temperature). or 37°C, incubate within 48h) to improve the nuclear entry efficiency of the vector), and insert the sequence to be inserted into a specific site. If the vector is continued to be constructed according to the above-mentioned method for insertion according to the new site generated after the insertion, the insertion will be sustainable and the insertion of long fragments without obvious length limitation will be completed.
  • RNA-mediated insertion approach can remove the sequence between two identical sequences with a certain efficiency by homologous recombination after inserting the sequence. Sequences containing recombination sites (GCAGA[A/T]C, CCCA[C/G]GAC/or and CCAGC) can be selected for insertion to increase the efficiency of subsequent homologous recombination.
  • Delete from the CNV end Detect the CNV end in the cell or tissue by sequencing and alignment (alignment to the junction of the gene sequence and part of the SINE sequence), and select the gene part (within 2000bp) in the CNV end to be processed.
  • the 3′ partial sequence of the lasso that can be formed within a range (within 20000 bp) downstream of the end of the complete gene (the lasso that can be formed downstream can be predicted or detected by the following method) (you can also directly select the end in The sequence within a range (within 20000bp) in the middle and lower reaches of the complete gene is cut and replaced with the above-mentioned 3′ partial sequence), and the sequence immediately upstream (within 100000bp) of the sequence to be deleted at the end is respectively connected, and then the complete SINE sequence, partial SINE sequence or SINE-like sequences (according to the different insertion methods described above) (which can be followed by ORF1p and ORF2p coding sequences as described above) are synthesized and inserted at the end of the actual CNV by one of the above gene insertion methods through DNA, RNA or RNP pathways.
  • the SINE sequence used on the vector is the same as or closer to the SINE sequence around the insertion point to improve efficiency
  • insert a sequence immediately upstream of the sequence to be deleted at the end and then use homologous recombination between the same sequences.
  • Sequences containing recombination sites (GCAGA[A/T]C, CCCA[C/G]GAC/or and CCAGC) can be selected for insertion to increase efficiency.
  • the sequence to be inserted in the vector designed in the above-mentioned insertion technology is changed to the surrounding sequence of the sequence to be replaced on the replacement sequence and the genome (that is, the segment to be deleted after homologous recombination occurs between the replacement sequence to be inserted and the sequence on the genome.
  • Sequence which is located at the 3' or 5' of the replacement sequence when the vector is constructed, depending on whether the insertion point is upstream or downstream of the sequence to be replaced on the genome) (the replacement sequence should be homologous to the sequence to be replaced on the genome), through the above-mentioned gene
  • the editing and insertion method inserts the replacement sequence and the surrounding sequence of the sequence to be replaced on the genome upstream or downstream of the sequence to be replaced.
  • the genome The sequence to be replaced is replaced by the inserted replacement sequence that is homologous to it, and the surrounding sequence part of the sequence to be replaced that has been deleted due to homologous recombination has been reinserted together with the replacement sequence at the time of insertion.
  • RNA-mediated genome sequence editing technology since no RNA-to-DNA conversion is required, additional TTAAAA sites or TTTTAA sites do not need to be added, as shown in Figure 3
  • RNP Ribonucleoprotein
  • LINE encoded protein a LINE corresponding to the function of the SINE sequence contained in the synthetic vector can be selected to improve efficiency, for example, the synthetic vector contains an Alu sequence or a partial Alu sequence, Then it corresponds to LINE-1 and its encoded proteins ORF1p and ORF2p) (transfected by a vector expressing related proteins or screened after transfection to obtain engineering cells that permanently overexpress related proteins, if the related proteins such as ORF1p and ORF2p are incubated after The transferred engineered cells may not express related proteins) engineering cell lines, after a period of time, the nucleus and cytoplasm are extracted, and the single-stranded plasmid product (single-stranded RNA) or the single-stranded plasmid product (single-stranded RNA) or containing The biologically active ribonucleoprotein (RNP) complex of the strand plasmid product (single
  • RNase inhibitors should be added to the in vitro physiological fluid or cytoplasm. If the previously transferred cells do not express related proteins, they must be combined with ORF1p and / or ORF2p incubation); thereafter, by conventional transfection means such as the application of lipid-soluble substances such as liposomes or viruses to encapsulate the ribonucleoprotein complex, it is transferred into in vitro cultured cells, tissues, or via blood, lymph and cerebrospinal fluid or other pathways or Local tissue administration and other methods are transferred into the organism (into the cytoplasm, no need to enter the nucleus) to complete the corresponding gene editing effect. If directional transfer is required, modifications can be made on the outer package of the vector. Take care to avoid RNA degradation throughout the process.
  • the application of the LINE-mediated vector synthesized in the form of RNP requires screening of the biologically active ribonucleoprotein (RNP) complex containing the single-stranded plasmid product (single-stranded RNA) extracted without the front-end LINE-1.
  • RNP biologically active ribonucleoprotein
  • Sequences or products of ORF1p and ORF2p coding sequences may be added to facilitate cleavage) to prevent front-end sequences from disturbing the targeting of gene editing.
  • the synthesis contains the upstream and downstream sequences (within 2000bp respectively) of the insertion point (that is, the target site) and the sequence to be inserted (within 2000bp) sandwiched in the middle corresponding to the insertion point, followed by the SINE sequence, part of the SINE sequence or SINE-like sequence, followed by the LINE sequence corresponding to the SINE function used or the protein-coding sequence contained therein (e.g., if a partial Alu sequence is used, corresponding to LINE-1 and the ORF1p and ORF2p coding sequences therein; if a partial MIR sequence is used sequence, then corresponds to LINE-2 and the corresponding protein coding sequence therein), and is constructed into the vector and is started by the RNA polymerase II/III promoter (or the vector obtained in the above-mentioned various DNA-mediated methods can be directly adopted and transferred into the engineering cells, after which RNA products that can be used for gene editing are extracted by conventional means, such as based on sequence specificity,
  • RNA is isolated and purified by conventional transfection methods, such as the application of lipid-soluble substances such as liposomes or viruses to encapsulate the RNA, and then transferred into cells or tissues cultured in vitro, or through blood, lymph and cerebrospinal fluid pathways or local tissue administration.
  • the purpose of inserting the to-be-inserted sequence into the corresponding to-be-inserted site on the genome can be achieved by inserting the sequence into the organism (into the cytoplasm, no need to enter the nucleus).
  • the to-be-inserted fragment can be gradually inserted along with the cell division, and the insertion of a long fragment with no obvious length limit can be completed (the RNA needs to be continuously transferred into the cell).
  • the specific sequence of the sequence and part of the SINE sequence can be obtained by molecular biological means such as gene sequencing or gene chip) (by conventional transfection means such as the application of lipid-soluble substances or substances with cell transfection ability, such as liposomes or viruses, etc. After the vector is transferred into cells or tissues cultured in vitro, or transferred into organisms by means of blood, lymph, cerebrospinal fluid, etc., or local tissue administration) (as shown in Figure 4)
  • the upstream sequence of the insertion point in the above-mentioned insertion method is set as the 3' end (within 2000bp) of the CNV end gene part, and the downstream sequence is the partial SINE sequence (so the SINE connected behind the downstream sequence in the above-mentioned method) Sequence, partial SINE sequence or SINE-like sequence can be omitted), the sequence to be inserted is any non-homologous sequence (within 2000bp) with the genome or with the gene part in the end of the CNV and its upstream and downstream sequences in the complete gene.
  • the vector After the construction of the vector is completed, it is transferred into the corresponding cell, living tissue or organism through the above-mentioned DNA, RNA or RNP pathway, so that the end of the corresponding CNV is inserted into the non-homologous sequence. Since the non-homologous sequence does not exist downstream of the corresponding CNV terminal gene sequence in the complete gene, the CNV terminal cannot be further extended according to the complete gene sequence, thereby hindering further changes in the CNV terminal.
  • Genome fragmentation sequence method take the cells in the organism, tissue or cell line to be manipulated for in vitro culture, or directly extract the genome, and enrich by random primers and PCR after ultrasonic fragmentation; design and synthesize short random sequences (20bp within), a partial SINE sequence is ligated downstream.
  • the enriched genome fragments are connected with the synthesized short random sequence and the partial SINE sequence fragments are connected and amplified by PCR to obtain different genome fragment sequences. After the random sequence is connected, the partial SINE sequence is connected.
  • Homologous sequences i.e. short random sequences or partially short random sequences, which are not homologous to the part of the gene fragment, are non-homologous to the local gene sequence of the corresponding gene fragment), since the non-homologous sequence is not present in the complete gene downstream of the corresponding CNV-terminal gene part sequence, thereby preventing further changes at the CNV-terminal end.
  • Random sequence method construct and express a random sequence of appropriate length (within 100bp) (including all possible permutations, and can exclude combinations similar to the SINE sequence), connect any non-homologous sequence (within 2000bp) to the genome, and then connect part of the SINE. Plasmid; or construct a random sequence (within 100bp) and connect it to the natural splicing site in the middle of SINE (for example, for the transcript of Alu, it is the splicing site that can be cut in the middle to generate scAlu and part of Alu) and then add any combination with the genome.
  • a plasmid containing a partial SINE sequence of a non-homologous sequence (within 2000bp); a random sequence expressed by RNA polymerase II can also be constructed and then connected to any non-homologous sequence of the genome (later expressed as a lasso) vector (the vector required in the vector is The LINE sequence or its protein coding sequence corresponding to the function of SINE can be connected or additionally expressed downstream of the SINE sequence.
  • the lasso end sequence method detect all lasso species (insert a small random sequence (within 100 bp) non-homologous to the genome) into the SINE sequence, and the SINE sequence can still be cut into a part of the SINE sequence normally (ie The insertion position of the non-homologous sequence is downstream of the natural cleavage site of SINE, not located at the cleavage site), and a plasmid that can express the modified SINE sequence is constructed, transferred into and amplified from the corresponding organism to be manipulated Cells or cell lines of the corresponding species (the genome of the corresponding species to be tested can also be taken, the whole genome is cut into fragments with a longer length (more than 200bp) and certain overlapping (overlapping more than 10bp), and by constructing a vector through RNA
  • the polymerase II is overexpressed in the in vitro cells of the corresponding species), and after a period of time, the corresponding nucleic acid is extracted and sequenced by the sequence
  • the lasso sequence is predicted according to the sequence law of pre-mRNA forming lasso (for example, most of them end with AG), and all the lasso sequence information of the species or individual is obtained.
  • the LINE sequence corresponding to the SINE function can be connected or
  • the vector (SINE can also be expressed on another vector) with its protein coding sequence to increase efficiency) is expressed as a lasso, and is linked to the partial SINE sequence generated by cell cleavage of the SINE transcript; or all the resulting lasso's 3 'sequences are respectively connected to any non-homologous sequence (within 2000bp) of the genome followed by part of the SINE sequence (according to the above, the LINE sequence corresponding to the SINE function or its protein coding sequence can be connected to increase the efficiency) (the SINE sequence is preferably the same as its The SINE sequence of the gene where the ligated lasso 3' sequence is located is the same or similar) and constructed into a vector for expression
  • SINE sequence modification method that is, by additionally expressing the modified SINE sequence, the sequence non-homologous to the genome or the gene part in the CNV end and its upstream and downstream sequences in the complete gene is inserted into the end of each CNV , preventing end extension.
  • the construction contains an additional short sequence before the natural cleavage site of SINE (inconsistent with the 3' sequence of the lasso produced conventionally, a short sequence spanning the natural cleavage site of SINE (within 100bp)), so that the SINE
  • the vector of the complete SINE sequence in which the transcript can be naturally cut in the newly added region (the LINE sequence or its protein coding sequence corresponding to the corresponding SINE function can be added later to increase the efficiency); or constructed in the SINE transcript
  • add any complete SINE sequence (within 200bp) of non-homologous sequences to the genome the LINE sequence or its protein coding sequence corresponding to the corresponding SINE function can be added afterwards to increase the efficiency
  • Corresponding cells, living tissues or organisms The SINE sequences used try to cover all SINE sequences of the species or individual (obtained by sequencing or array chip methods) so as to accurately modify all CNV ends on the whole genome.
  • the whole genome can also be cut into long fragments that overlap each other (the overlapping length is more than the length of a lasso structure), and the vector is constructed to be overexpressed in the in vitro cell line of the corresponding species to generate a lasso structure.
  • the vector expressing the modified SINE sequence (the LINE sequence corresponding to the function of the corresponding type of SINE or its protein coding sequence can be mediated by the RNA pathway) is transferred into the vector, and the part of the SINE sequence generated by the lasso (by the modified SINE) will be connected.
  • the resulting biologically active single-stranded RNA ribonucleoprotein complex (RNP) or RNA is isolated and purified by properties such as sequence specificity and conventional means, and then acts through the corresponding RNA or RNP pathway.
  • Transformation of SINE elements and LINEs on the genome The promoter of SINE on the genome, the natural splicing site of the transcription product or other sequences on SINE or/and the promoter, protein coding sequence or other sequences of LINE are passed through.
  • the present invention inserts any sequence (within 500bp), so that the SINE sequence on the genome cannot be transcribed or cut after transcription or/and the LINE sequence cannot be transcribed or produce a protein with normal functions.
  • the SINE or LINE sequence of the entire genome of the individual to be manipulated is sequenced to obtain the sequence, and the promoter, the natural splicing site of the transcription product, the protein coding sequence or other sequences are selected as the insertion point.
  • the upstream and downstream sequences in the invention It is the upstream and downstream sequences relative to the site to be inserted on the SINE or LINE sequence, and the inserted sequence is any sequence.
  • Arbitrary sequences are inserted into the corresponding sites on the genome on SINE or LINE by the insertion method described above.
  • the SINE or LINE sequence on the genome may be replaced or deleted to inactivate it by the above-mentioned gene editing method.
  • the vector After the vector is constructed, it is transferred to the corresponding cell, living tissue or organism through the above-mentioned DNA, RNA or RNP approach, so that the end of the corresponding CNV is inserted into the sequence immediately upstream of the sequence to be deleted on the genome, followed by a non-homologous sequence. After the occurrence of homologous recombination resulting in the deletion of the intermediate sequence, the non-homologous sequence will simultaneously hinder the further extension of the CNV.
  • Inhibition of the inherent mechanism It can also directly inhibit the inherent CNV extension mechanism of cells or organisms, such as inhibiting the transcription of SINE and LINE, etc. or the production of RNA and encoded proteins such as ORF1p and ORF2p proteins by means of RNA interference. Specific proteins are combined with the functional structures of related proteins in the CNV extension mechanism, such as ORF1p, ORF2p, or spliceosomes, etc. or complexes to hinder their functions.
  • various LINEs and their corresponding protein coding sequences are modified to inactivate or reduce their activity, inhibit the function of related proteins in homologous recombination or mismatch repair mechanisms, or give modified nucleosides to hinder the Reverse transcription occurs, thereby hindering genomic changes and stabilizing CNVs by inhibiting the intrinsic CNV elongation mechanism.
  • SINE, LINE and their expressed proteins are widely present in eukaryotes, gene editing operations can be performed on a wide range of eukaryotes through this technology. In addition, it can still be applied to the treatment of diseases with gene changes and to change or stabilize the state of cells or organisms related to gene changes.
  • a certain sequence (such as the sequence to be inserted) is defined along the 5' ⁇ 3' direction, the upstream is before the 5' end of the determined sequence, the downstream is after the 3' end of the determined sequence, and the upstream sequence is located in the determined sequence.
  • the sequence before the 5' end, and the downstream sequence is the sequence after the 3' end of the determined sequence.
  • the gene transcription framework provided by the present invention includes the upstream sequence of the target site, the sequence to be inserted, and the downstream sequence of the target site along the 5' ⁇ 3' direction.
  • Figure 6 is a schematic diagram showing the structure of a promoter connected in front of the gene transcription frame.
  • the promoter may be an RNA polymerase I promoter, an RNA polymerase II promoter, an RNA polymerase III promoter.
  • the promoter can be located on the vector, and the gene transcription frame, short interspersed elements, long interspersed elements, etc.
  • FIG. 7 is a structure in which a promoter is connected upstream of the gene transcription framework, and a short interspersed element, a partial short interspersed element or a short interspersed element-like element is connected downstream.
  • Figure 8 is a schematic diagram of the structure of a gene transcription framework connected upstream to a promoter and downstream to a long interspersed element or ORF1p coding sequence or ORF2p coding sequence.
  • Figure 9 is a schematic diagram of the structure of the upstream of the gene transcription framework connecting the long interspersed element or the ORF1p coding sequence or the ORF2p coding sequence, and connecting the promoter upstream of the long interspersed element or the ORF1p coding sequence or the ORF2p coding sequence.
  • Figure 10 is a schematic diagram of the structure of connecting a promoter upstream in the gene transcription framework, connecting short interspersed elements, partial short interspersed elements or short interspersed elements downstream, and then connecting long interspersed elements or ORF1p coding sequence or ORF2p coding sequence downstream.
  • Figure 11 shows that short interspersed elements, partial short interspersed elements or short interspersed elements are connected downstream of the gene transcription frame, long interspersed elements or ORF1p coding sequences or ORF2p coding sequences are connected upstream of the gene transcription frame, and long interspersed elements or ORF1p coding sequences are connected Schematic representation of the sequence or structure of the ORF2p coding sequence linked upstream of the promoter.
  • Figure 12 is a schematic diagram of the structure of the gene transcription framework and short interspersed elements, partial short interspersed elements or short interspersed elements-like elements that do not share a promoter.
  • Figure 13 is a schematic diagram of the structure of the gene transcription framework and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence not sharing a promoter.
  • Figure 14 shows the structure of the gene transcription framework that does not share a promoter with short interspersed elements, partial short interspersed elements or short interspersed elements and long interspersed elements or ORF1p coding sequence or ORF2p coding sequence, and short interspersed elements, partial short interspersed elements or The short interspersed elements are located downstream of the long interspersed elements or the ORF1p coding sequence or the ORF2p coding sequence, both of which share a promoter.
  • Figure 15 shows the structure of the gene transcription framework that does not share a promoter with short interspersed elements, partial short interspersed elements or short interspersed elements and long interspersed elements and/or ORF1p coding sequence and/or ORF2p coding sequence, and short interspersed elements, part of The short interspersed elements or short interspersed elements are located upstream of the long interspersed elements or the ORF1p coding sequence or the ORF2p coding sequence, both of which share a promoter.
  • the above cases are all in the form of gene transcription framework and short interspersed elements, some short interspersed elements or similar short interspersed elements and long interspersed elements, ORF1p coding sequence and/or ORF2p coding sequence on the same vector, or not on the same vector. Transfected into cells and expressed through different promoters.
  • the SINE used is the short interspersed element Alu element unique to primates.
  • the complete sequence of the Alu element is shown in Seq ID No.1, and the partial Alu sequence is shown in Seq ID No.2.
  • the short interspersed elements can be replaced with short interspersed elements of the corresponding species to facilitate expression.
  • the pSIL-eGFP plasmid vector was purchased from Addgene, Plasmid 52675, and the pBS-L1PA1-CH-mneo plasmid vector was purchased from Addgene, Plasmid 51288.
  • 10 ⁇ digestion buffer (required for NheI digestion): 330mM Tris-acetate, 100mM magnesium acetate, 660mM potassium acetate, 1mg/mL BSA; 10 ⁇ digestion buffer (required for SalI digestion): 500mM Tris -HCl, 100 mM MgCl2, 1000 mM NaCl, 1 mg/mL BSA.
  • T4 DNA ligase and 10 ⁇ ligation buffer required for its application were purchased from Promega.
  • Entranster-H4000 transfection reagent was purchased from Beijing Enggen Biotechnology Co., Ltd.
  • Blood/cell/tissue genomic DNA extraction kit was purchased from Tiangen Biochemical Technology (Beijing) Co., Ltd., catalog number: DP304.
  • SuperReal PreMix Plus (SYBR Green) was purchased from Tiangen Biochemical Technology (Beijing) Co., Ltd., catalog number: FP205.
  • Magnetic beads method tissue/cell/blood total RNA extraction kit was purchased from Tiangen Biochemical Technology (Beijing) Co., Ltd., catalog number: DP761.
  • FastKing cDNA first-strand synthesis kit was purchased from Tiangen Biochemical Technology (Beijing) Co., Ltd., catalog number: KR116.
  • Lipofectamine TM MessengerMAX TM mRNA Transfection Reagent Transfection reagent was purchased from ThermoFisher.
  • Trypsin was purchased from Sigma-Aldrich, catalog number: T1426.
  • Example 1 DNA-mediated insertion of exogenous sequences to be inserted into designated sites in the genome
  • VEGFA vascular endothelial growth factor A
  • VEGFA vascular endothelial growth factor A
  • the DNA-mediated genomic sequence insertion technique of the present invention was confirmed by inserting the exogenous sequence into the VEGFA gene.
  • a randomly designed non-homologous sequence is added at the insertion site as the sequence to be inserted, which becomes the gene transcription framework.
  • restriction endonuclease NheI restriction sites are added at both ends. and protected bases. The complete sequence is shown in Seq ID No.4:
  • underlined is a randomly designed exogenous non-homologous sequence (that is, the sequence to be inserted), the length is 38bp, and the two ends of the sequence are NheI restriction sites and protective bases (bold in italics), using chemical synthesis methods
  • This sequence was obtained and named VEGFA1.
  • the purpose of designing the exogenous sequence to be inserted as a sequence non-homologous to the VEGFA gene sequence is to facilitate the specific detection of its insertion at the target site on the genome in subsequent experiments.
  • underlined is a randomly designed exogenous non-homologous sequence (that is, the sequence to be inserted), the length is 38bp, and the two ends of the sequence are NheI restriction sites and protective bases (bold in italics), using chemical synthesis methods
  • This sequence was obtained and named VEGFA2.
  • restriction sites are only used to facilitate the construction of plasmids and can be replaced according to different vectors.
  • VEGFA1 and VEGFA2 into plasmid vector pSIL-eGFP respectively to construct plasmids pSIL-eGFP-VEGFA1 and pSIL-eGFP-VEGFA2.
  • the specific process is as follows:
  • VEGFA1, VEGFA2, and plasmid vector pSIL-eGFP were digested respectively, and the reaction system was shown in Table 1:
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the digested VEGFA1 or VEGFA2 were respectively connected with the plasmid vector pSIL-eGFP cut into linear by the enzyme.
  • the reaction system is shown in Table 2:
  • reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C and incubate for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmid pSIL-eGFP-VEGFA1 and plasmid pSIL-eGFP-VEGFA2.
  • the plasmid was verified to be correct by sequencing.
  • RNA polymerase II Since the pSIL-eGFP plasmid itself has a CMV promoter, as long as the gene transcription frame is inserted into the CMV promoter, transcription can be initiated by RNA polymerase II.
  • the italics at both ends are the SalI restriction site and the corresponding protection base
  • the upstream SalI restriction site and the corresponding protection base are the Alu sequence
  • the underline is the non-homologous sequence (18bp) added to the Alu
  • It is used to label Alu to facilitate the detection of the connection between its transcription product and the lasso produced by transcription of the gene transcription framework (including the sequence to be inserted) in the subsequent experiments to verify the mechanism of action of the experiment.
  • the wavy line is the terminator of transcription, double underlined It is the repeating sequence of 6 TTTTAAs, which is used to convert RNA into DNA. It is a common repeating sequence in the genome, and may not be added or a plurality of repeating sequences may be added. In this embodiment, 6 are selected to be added.
  • the sequence was obtained by chemical synthesis and named Alu1.
  • plasmid pSIL-eGFP-VEGFA1 and plasmid pSIL-eGFP-VEGFA2 were digested with SalI respectively, and the digestion reaction system was shown in Table 3.
  • reaction conditions were as follows: incubation at 37°C for 3 hours, then heating to 80°C for 10 minutes to inactivate the endonuclease, electrophoresis, and recovery of the digested product.
  • the digested Alu1 was ligated with plasmid pSIL-eGFP-VEGFA1 and plasmid pSIL-eGFP-VEGFA2 which were cut linearly by enzyme, respectively.
  • the reaction system is shown in Table 4.
  • reaction conditions were: incubate at 16°C for 16h, then heat up to 70°C for 10 min to inactivate the ligase, electrophoresis, and recover to obtain plasmids pSIL-eGFP-VEGFA1-Alu1 (as shown in Figure 16) and pSIL-eGFP-VEGFA2- Alu1.
  • the plasmid was verified to be correct by sequencing.
  • RNA polymerase III which is an RNA polymerase III-dependent promoter
  • transcription can be initiated by RNA polymerase III as long as the Alu sequence is inserted into the U6 promoter.
  • pSIL-eGFP-VEGFA1-Alu1 or pSIL-eGFP-VEGFA2-Alu1 was transfected into Hela cells to test the insertion efficiency of randomly designed exogenous sequences.
  • ORF1p and ORF2p (LINE ) plasmid pBS-L1PA1-CH-mneo, and a corresponding control group was designed.
  • the co-transfected plasmids of the experimental group and the control group are shown in Table 5.
  • control group 1 was co-transfected with the original pSIL-eGFP and pBS-L1PA1-CH-mneo, which did not contain the gene transcription framework sequence and Alu1 sequence
  • experimental group 1 was co-transfected with pSIL-eGFP-VEGFA1- Alu1 and pBS-L1PA1-CH-mneo, which contain the gene transcription framework containing the long sequence upstream and downstream of the target site on the VEGFA gene, and the sequence of Alu1 and pBS-L1PA1-CH-mneo
  • Experiment 2 was co-transfected with pSIL-eGFP- VEGFA1 and pBS-L1PA1-CH-mneo, which contain the gene transcription framework containing the upstream and downstream long sequences of the target site on the VEGFA gene and pBS-L1PA1-CH-mneo, do not contain the Alu1 sequence
  • Experiment 3 was co-transfected with pSIL-
  • the transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 48 ⁇ g or 96 ⁇ g (according to the experimental grouping, if only one plasmid was transfected, 48 ⁇ g; if two plasmids were co-transfected, 48 ⁇ g of each plasmid, a total of 96 ⁇ g) the constructed plasmid was used Dilute with 300 ⁇ L of serum-free DMEM and mix thoroughly; at the same time, take 120 ⁇ L of Entranster-H4000 reagent and dilute it with 300 ⁇ L of serum-free DMEM, mix thoroughly, and let stand at room temperature for 5 minutes.
  • transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluence.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7: 5′-CACTGCCACCCAGAAGACTG-3′; the downstream primer sequence is shown in Seq ID No.8: 5′-CCTGCTTCACCACCTTCTTG-3′.
  • Design primer pair 1 and primer pair 2 wherein the upstream primer sequence of primer pair 1 is shown in Seq ID No.9: 5′-CCCAGGTTGTCCCATCT-3′; the downstream primer sequence is shown in Seq ID No.10: 5′- CCTCCTCTTATTCCGTAGC-3'.
  • the upstream primer sequence of primer pair 1 is located in the complete VEGFA gene, more upstream of the upstream sequence of the insertion site (target site) used on the plasmid, not present in the plasmid, only in the genome, and the downstream primer sequence of primer pair 1 is located in The 19bp sequence at the 5' end of the randomly designed non-homologous sequence to be inserted (sequence to be inserted).
  • the upstream sequence of primer pair 2 is shown in Seq ID No.11: 5′-CACAACAGTCGTGGGTCG-3′; the downstream primer sequence is shown in Seq ID No.12: 5′-GAGGGAGAAGTGCTAAAGTCAG-3′.
  • the upstream primer sequence of primer pair 2 is located at the 18bp sequence at the 3' end of the randomly designed non-homologous sequence to be inserted (the sequence to be inserted), the downstream primer sequence is located in the complete VEGFA gene, and the insertion site (target site) used on the plasmid Further downstream of the downstream sequence, is not present in the plasmid, only in the genome.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 6.
  • Cell DNA templates were the DNAs extracted from control group 1, experiment group 1 to experiment group 4 after co-transfection.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 1 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 50°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • Primer pair 2 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 54°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • sequences of sufficient length on both sides of the insertion point are required to facilitate the formation of a lasso.
  • additional expression of longer upstream and downstream sequences of the target site in the gene transcription frame, ORF1p, ORF2p protein (LINE) and/or Alu element (SINE) can improve the editing efficiency.
  • sequences of sufficient length on both sides of the insertion point are required to facilitate the formation of a lasso.
  • additional expression of longer upstream and downstream sequences of the target site in the gene transcription frame, ORF1p, ORF2p protein (LINE) and/or Alu element (SINE) can improve the editing efficiency.
  • Example 2 Detection of the lasso structure formed by the transcription of the gene transcription framework (containing the exogenous sequence to be inserted) and the RNA fragment containing part of the SINE sequence (taking the Alu element as an example) (the transcription product of the Alu element is subjected to natural splicing sites) point cut) connection
  • the specific process is: take the transfected cells, remove the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, digest at 37°C for 20 minutes, and pipet 15 times every 5 minutes. .
  • the reaction was terminated by the addition of complete medium containing serum.
  • the cell-containing solution was transferred to an RNase-Free centrifuge tube, and after centrifugation at 300 g for 5 min, the pellet was collected and all the supernatant was aspirated.
  • Total RNA was extracted according to the instructions of the magnetic bead method tissue/cell/blood total RNA extraction kit.
  • the genomic DNA in the extracted total RNA was removed according to the instructions of the FastKing cDNA first-strand synthesis kit, and then the cDNA was synthesized.
  • the upstream and downstream primer sequences for detecting GAPDH gene as shown in Seq ID No.7 and Seq ID No.8 were used as internal reference to participate in the detection.
  • Primer pair 3 designed to detect the ligation of the transcribed lariat structure containing the exogenous sequence to be inserted and the RNA fragment containing part of the Alu sequence formed by the Alu element transcript, wherein the upstream primer sequence is as described in Seq ID No.11. Show: 5′-CACAACAGTCGTGGGTCG-3′, the upstream primer is located on the exogenous sequence to be inserted; the downstream primer sequence is shown in Seq ID No.13: 5′-TACGGGCTCGCCTGATAG-3′, the downstream primer is located in the Alu constructed into the plasmid The position of the non-homologous sequence (18 bp) after the sequence.
  • the above primers were all obtained by chemical synthesis.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Example 3 DNA-mediated insertion of exogenous sequences to be inserted into designated sites in the genome
  • the MMP2 gene a member of the matrix metalloproteinase (MMP) gene family, is a zinc-dependent enzyme capable of cleaving components of the extracellular matrix and molecules involved in signal transduction.
  • the protein encoded by this gene is a collagenase A, type IV collagenase, which contains three fibronectin type II repeats at its catalytic site, allowing denatured type IV and type V collagen to bind to elastin. Unlike most MMP family members, activation of this protein can occur at the cell membrane.
  • This enzyme can be activated extracellularly by proteases or intracellularly by S-glutathione without the need for proteolytic removal of the prodomain.
  • This protein is thought to be involved in multiple pathways, including roles in the nervous system, endometrial menstrual rupture, regulation of vascularization, and metastasis. Mutations in this gene are associated with Winchester syndrome and nodular arthropathy osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms.
  • the DNA-mediated genome sequence insertion technology of the present invention was confirmed by inserting foreign sequences into the MMP2 gene.
  • a randomly designed non-homologous sequence is added at the insertion site as the sequence to be inserted, which becomes the gene transcription framework.
  • restriction endonuclease NheI restriction sites are added at both ends. and protected bases. The complete sequence is shown in Seq ID No.15:
  • the underlined is a randomly designed exogenous non-homologous sequence (that is, the sequence to be inserted), the length is 103 bp, and both ends of the sequence are NheI restriction sites and protected bases (bold in italics), using chemical synthesis methods This sequence was obtained and named MMP2-1.
  • a short sequence of the upstream and downstream sequences of the insertion site in the MMP2 gene was designed before and after the insertion site on the vector to verify the effect of the upstream and downstream sequences of the target site of different lengths on the insertion effect.
  • the underlined is a randomly designed exogenous non-homologous sequence (that is, the sequence to be inserted), the length is 103 bp, and both ends of the sequence are NheI restriction sites and protected bases (bold in italics), using chemical synthesis methods This sequence was obtained and named MMP2-2.
  • MMP2-1, MMP2-2, and plasmid vector pSIL-eGFP were digested respectively, and the reaction system was shown in Table 11:
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the digested MMP2-1 or MMP2-2 were respectively connected with the plasmid vector pSIL-eGFP cut into linear by the enzyme.
  • the reaction system is shown in Table 12:
  • reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C and incubate for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmid pSIL-eGFP-MMP2-1 and plasmid pSIL-eGFP-MMP2-2.
  • the plasmid was verified to be correct by sequencing.
  • the italics at both ends are the SalI restriction site and the corresponding protection base
  • the upstream SalI restriction site and the corresponding protection base are the Alu sequence
  • the underline is the non-homologous sequence (18bp) added to the Alu
  • the wavy line is the terminator of transcription
  • the double underline is the repeat sequence of 6 TTTTAA.
  • the sequence was obtained by chemical synthesis and named as Alu2.
  • Plasmid pSIL-eGFP-MMP2-1 and plasmid pSIL-eGFP-MMP2-2 were digested with SalI respectively, and the enzyme digestion reaction system is shown in Table 13.
  • reaction conditions were as follows: incubation at 37°C for 3 hours, then heating to 80°C for 10 minutes to inactivate the endonuclease, electrophoresis, recovery and collection of the digested product.
  • the digested Alu2 was ligated with the plasmid pSIL-eGFP-MMP2-1 and the plasmid pSIL-eGFP-MMP2-2 which were cut linearly by the enzyme, respectively.
  • the reaction system is shown in Table 14.
  • reaction conditions were as follows: incubation at 16 °C for 16 h, then heated to 70 °C for 10 min to inactivate the ligase, electrophoresis, and recovery to obtain plasmids pSIL-eGFP-MMP2-1-Alu2 and pSIL-eGFP-MMP2-2-Alu2.
  • the plasmid was verified to be correct by sequencing.
  • ORF1p was co-transfected in Hela cells.
  • ORF2p(LINE) plasmid pBS-L1PA1-CH-mneo design a corresponding control group.
  • the co-transfected plasmids of the experimental group and the control group are shown in Table 15.
  • control group 1 was co-transfected with the original pSIL-eGFP and pBS-L1PA1-CH-mneo, which did not contain the gene transcription framework sequence and Alu2 sequence.
  • Experiment 5 Group is the co-transfection of pSIL-eGFP-MMP2-1-Alu2 and pBS-L1PA1-CH-mneo, which contains the gene transcription framework and Alu2 sequence and pBS-L1PA1-CH- mneo;
  • Experiment 6 was co-transfected with pSIL-eGFP-MMP2-1 and pBS-L1PA1-CH-mneo, which contained a gene transcription framework containing long sequences upstream and downstream of the target site on the MMP2 gene and pBS-L1PA1-CH-mneo , does not contain Alu2 sequence;
  • experimental group 7 is co-transfection of pSIL-eGFP-MMP2-2-Alu2 and pBS
  • the transfection steps were as follows: U251 (human glioma) cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 48 ⁇ g or 96 ⁇ g (according to the experimental grouping, if only one plasmid was transfected, 48 ⁇ g; if two plasmids were co-transfected, 48 ⁇ g of each plasmid, a total of 96 ⁇ g) the constructed plasmid was used Dilute with 300 ⁇ L of serum-free DMEM and mix thoroughly; at the same time, take 120 ⁇ L of Entranster-H4000 reagent and dilute it with 300 ⁇ L of serum-free DMEM, mix thoroughly, and let stand at room temperature for 5 minutes.
  • transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 4 was designed, whose upstream primer sequence is shown in Seq ID No.18: 5′-TTTCAGGGTCTAGGTGGC-3′; the downstream primer sequence is shown in Seq ID No.19: 5′-AAATGCTTTCTCCGCTCT-3′.
  • the upstream primer sequence of primer pair 4 is located in the complete MMP2 gene, and the upstream sequence of the insertion site (target site) used on the plasmid is more upstream, not present in the plasmid, but only in the genome; the downstream primer sequence of primer pair 4 is located in On the randomly designed non-homologous sequence to be inserted (sequence to be inserted).
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 16.
  • the cell DNA templates were the DNA extracted from the control group 1, the experimental group 5 to the experimental group 8 after co-transfection, respectively.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 1 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 50°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • the relative copy number of the experimental group 5 was significantly higher than that of the other groups, which were all statistically significant (P ⁇ 0.05), indicating that the insertion site in the gene transcription frame (Target site) There are longer sequences upstream and downstream and under the sufficient expression of Alu element (SINE), ORF1p and ORF2p (LINE), the gene editing efficiency is the highest; the copy numbers of experimental groups 6, 7 and 8 are relatively The amount was higher than that of the control group 1 (N/A was calculated according to 40.00), all of which were statistically significant (P ⁇ 0.05), indicating that the insertion site (target site) in the gene transcription frame upstream and downstream shorter sequences, cells themselves Gene editing is still effective but with low efficiency under low or no expression of Alu element (SINE) or low or no expression of ORF1p and ORF2p (LINE).
  • Example 4 Detection of the lasso structure formed by the transcription of the gene transcription framework (containing the exogenous sequence to be inserted) and the RNA fragment containing part of the SINE sequence (taking the Alu element as an example) (the transcription product of the Alu element is subjected to natural splicing sites) point cut) connection
  • the specific process is: take the transfected cells, remove the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, digest at 37°C for 20 minutes, and pipet 15 times every 5 minutes. .
  • the reaction was terminated by the addition of complete medium containing serum.
  • the cell-containing solution was transferred to an RNase-Free centrifuge tube, and after centrifugation at 300 g for 5 min, the pellet was collected and all the supernatant was aspirated.
  • Total RNA was extracted according to the instructions of the magnetic bead method tissue/cell/blood total RNA extraction kit.
  • the upstream and downstream primer sequences for detecting GAPDH gene as shown in Seq ID No.7 and Seq ID No.8 were used as internal reference to participate in the detection.
  • Primer pair 5 designed to detect the ligation of the lasso structure formed by transcription containing the exogenous sequence to be inserted and the RNA fragment containing part of the Alu sequence formed by the Alu element transcript, wherein the upstream primer sequence is such as Seq ID No.20 Shown: 5'-GGCATAATGATGTGGCTGTT-3'; the downstream primer sequence is shown in Seq ID No.21: 5'-TCTGTTGGCTCGCTCTCTTG-3', where the upstream primer sequence is located on the exogenous sequence to be inserted, and the downstream primer is located in the construct The position of the non-homologous sequence (18 bp) after the Alu sequence in the plasmid.
  • the upstream primer sequence is such as Seq ID No.20 Shown: 5'-GGCATAATGATGTGGCTGTT-3'
  • the downstream primer sequence is shown in Seq ID No.21: 5'-TCTGTTGGCTCGCTCTCTTG-3', where the upstream primer sequence is located on the exogenous sequence to
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 18.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • MMP2-3 was prepared by the method in Example 3 to obtain plasmid pSIL-eGFP-MMP2-3-Alu2.
  • the plasmid pSIL-eGFP-MMP2-3-Alu2 was transfected into U251 (human glioma) cells, and the plasmid pBS-L1PA1-expressing ORF1p and ORF2p (LINE) was co-transfected in U251 (human glioma) cells CH-mneo was used as the experimental group, and the aforementioned cells co-transfected with pSIL-eGFP-MMP2-1-Alu2 and pBS-L1PA1-CH-mneo were used as the control group.
  • the specific groups are shown in Table 20.
  • each parallel was a 6-well plate cultured with U251 (human glioma) cells.
  • the method of transfection and extraction of cell DNA after transfection is the same as that of Example 3.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 4 of the inserted sequence was designed, the upstream primer sequence is shown in Seq ID No.18, and the downstream primer sequence is shown in Seq ID No.19.
  • the relative copy number of the experimental group was significantly lower than that of the control group (N/A was calculated at 40.00), which was statistically significant (P ⁇ 0.05), meaning that when the insertion site (target site) on the vector
  • N/A was calculated at 40.00
  • P ⁇ 0.05 statistically significant
  • Example 6 SINE sequence (taking Alu sequence as an example) inserts exogenous sequence to be inserted by direct connection
  • IT15 gene is the pathogenic gene of Huntington's disease.
  • exogenous sequence was inserted into IT15 gene to confirm the DNA-mediated genome sequence insertion by direct connection of SINE sequence (take Alu sequence as an example) technology.
  • a randomly designed non-homologous sequence is added at the insertion site as the sequence to be inserted, and a part of the Alu sequence is connected downstream of the downstream sequence of the target site to become the gene transcription framework.
  • the Gene transcription framework in the Add restriction endonuclease NheI restriction sites and protective bases at both ends, and the obtained complete sequence is shown in Seq ID No.24:
  • underlined is a randomly designed exogenous non-homologous sequence (that is, the sequence to be inserted), the length is 60bp, and the two ends of the sequence are NheI restriction sites and protective bases (bold in italics), inserted in the IT15 gene
  • a part of the Alu sequence is between the sequence downstream of the site (the downstream sequence of the target site) and the 3'-end NheI restriction site and the protective base sequence.
  • IT15-1 The sequence was obtained by chemical synthesis and named IT15-1.
  • part of the Alu sequence is selected to be connected to the downstream of the downstream sequence of the target site to simulate the SINE (Alu element) transcript in the organism after intracellular action (cleaved at the natural cleavage site in the SINE transcript) and only retain reverse transcription
  • SINE Alu element
  • the specific process is as follows:
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C and incubate for 10 min to inactivate the ligase, electrophoresis and purification to obtain the plasmid pBS-L1PA1-CH-mneo-IT15-1.
  • the plasmid was verified to be correct by sequencing.
  • the transcription can be initiated by RNA polymerase II as long as the expression frame is inserted into the CMV promoter.
  • Experimental grouping the group transfected with pBS-L1PA1-CH-mneo-IT15-1 plasmid was set as experimental group 10; the group transfected with untransfected pBS-L1PA1-CH-mneo plasmid was set as control group 3. Three parallels were set in each group, and each parallel was a 6-well plate with Hela cells cultured.
  • transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 48 ⁇ g of the constructed plasmid and dilute it with 300 ⁇ L of serum-free DMEM, and mix well; at the same time, take 120 ⁇ L of Entranster-H4000 reagent and dilute it with 300 ⁇ L of serum-free DMEM, mix well, and let stand at room temperature 5min. Afterwards, the two prepared liquids were mixed and thoroughly mixed and allowed to stand at room temperature for 15 min to prepare a transfection complex.
  • the transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 6 was designed, the upstream primer sequence is shown in Seq ID No.25: 5'-GAAATTGGTTTGAGCAGGAG-3'; the downstream primer sequence is shown in Seq ID No.26: 5'-CGATTGGATGGCAGTAGC-3'.
  • the upstream primer sequence of primer pair 6 is located in the complete IT15 gene, more upstream of the upstream sequence of the insertion site (target site) used on the plasmid, not present in the plasmid, only in the genome, and the downstream primer sequence of primer pair 6 is located in On the randomly designed non-homologous sequence to be inserted (sequence to be inserted).
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 24.
  • Cell DNA templates were the DNA extracted from the transfected control group 3 and the experimental group 10, respectively.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 6 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 50°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • the exponential growth phase in the amplification curves of GAPDH and detection of the insertion of the sequence to be inserted was observed, and after confirming that they were approximately parallel, the obtained data were analyzed by the 2- ⁇ Ct relative method, and the results are shown in Table 25.
  • the PCR product was verified to be correct by sequencing.
  • the relative copy number of the experimental group 10 was significantly higher than that of the control group 3 (N/A was calculated at 40.00), which was statistically significant (P ⁇ 0.05), indicating that the sequence to be inserted was effectively inserted into the target site on the genome.
  • sequence IT15-1 shown in Seq ID No.24 was replaced by GGACAT at the 10th to 15th bp upstream of the randomly designed non-homologous sequence (ie the sequence to be inserted), and the sequence shown in Seq ID No.27 was obtained:
  • the IT15-2 was inserted into the plasmid vector pBS-L1PA1-CH-mneo to construct the plasmid pBS-L1PA1-CH-mneo-IT15-2. Refer to Example 6 for the method.
  • the Hela cell group transfected with pBS-L1PA1-CH-mneo-IT15-2 plasmid was the experimental group 11; the Hela cell group transfected with the pBS-L1PA1-CH-mneo-IT15-1 plasmid was the control group 4. Three parallels were set in each group, and each parallel was a 6-well plate with Hela cells cultured.
  • Example 6 The method of Example 6 was used for transfection, and the DNA of the transfected cells was extracted and detected by qPCR.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • the relative copy number of the experimental group was significantly lower than that of the control group (N/A was calculated according to 40.00), which was statistically significant (P ⁇ 0.05), which means that when the upstream sequence of the insertion site (target site) on the vector was inserted into the genome When the upstream sequences of the site (target site) are inconsistent, it is difficult to insert the sequence to be inserted into the target site on the genome.
  • Example 1 to Example 6 it can be seen that the method of DNA-mediated insertion of exogenous sequences into a specified site in the genome can perform efficient gene editing in eukaryotic cells (such as cell lines or primary cells), and the sequences are High efficiency and accuracy for targeted insertion into the target site. From the feasibility of cell editing in different tissues, it can be seen that this method can be applied to various cells, tissues and organisms (living bodies).
  • Example 8 DNA-mediated deletion of designated region sequences (sequences to be deleted) on the genome
  • the underlined part is the sequence to be deleted, before the sequence to be deleted is the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted, after the sequence to be deleted is the downstream sequence immediately adjacent to the 3' end of the sequence to be deleted, the shaded part is the 3' end of the sequence to be deleted sequence.
  • the sequence is constructed according to the sequence of the 3' sequence of the sequence to be deleted + the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted + the downstream sequence immediately adjacent to the 3' end of the sequence to be deleted, and NheI restriction sites and corresponding protective bases are added at both ends,
  • the sequence is shown in Seq ID No.29:
  • the underlined part is the replaced non-homologous sequence with the MINK1 gene.
  • the sequence was obtained by chemical synthesis and named as MINK1-2.
  • the specific process is as follows:
  • the MINK1-1, MINK1-2, and plasmid vector pSIL-eGFP were digested respectively, and the reaction system was shown in Table 27:
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the digested MINK1-1 or MINK1-2 were respectively connected with the plasmid vector pSIL-eGFP cut into linear by the enzyme.
  • the reaction system is shown in Table 28:
  • reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmid pSIL-eGFP-MINK1-1 and plasmid pSIL-eGFP-MINK1-2.
  • the plasmid was verified to be correct by sequencing.
  • Example 1 The Alu1 prepared in Example 1, the plasmid pSIL-eGFP-MINK1-1 and the plasmid pSIL-eGFP-MINK1-2 were digested with SalI respectively, connected, and the reaction system and conditions were the same as those in Example 1 to obtain pSIL-eGFP-MINK1- 1-Alu1 and pSIL-eGFP-MINK1-2-Alu1.
  • the group transfected with pSIL-eGFP-MINK1-1-Alu1+pBS-L1PA1-CH-mneo was set as experimental group 12, and the group transfected with pSIL-eGFP-MINK1-2-Alu1+pBS-L1PA1-CH-mneo Set as control 5 groups.
  • Three parallels were set in each group, and each parallel was a 6-well plate with Hela cells cultured.
  • Plasmid transfection was carried out according to the method of Example 1 and the transfected cell DNA was extracted.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream and downstream primer sequences used to detect the copy number of GAPDH gene as shown in Seq ID No.7 and Seq ID No.8 were used as internal reference to participate in the detection.
  • Primer pair 7 was designed, the upstream primer sequence is shown in Seq ID No.31: 5′-ACAGGGTATGGAGTGGAAAG-3′; the downstream primer sequence is shown in Seq ID No.32: 5′-ATAGACGGGAAAGAAGGAAC-3′.
  • the upstream primer of primer pair 7 is located on the to-be-deleted sequence in the MINK1 gene on the genome and does not exist in the plasmid, and the downstream primer is located on the to-be-deleted sequence in the MINK1 gene on the genome and does not exist in the plasmid.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 29.
  • Cell DNA templates were the DNA extracted from the control group 5 and the experimental group 12 after co-transfection.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 7 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 50°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • Example 9 SINE sequence (taking Alu sequence as an example) deletes the sequence by direct connection
  • the FMR1 gene is associated with the hereditary mental retardation disorder - Fragile X Syndrome, and one of the sequences was selected, as shown in Seq ID No.33:
  • the underlined part is the sequence to be deleted, before the sequence to be deleted is the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted, after the sequence to be deleted is the downstream sequence immediately adjacent to the 3' end of the sequence to be deleted, the shaded part is the 3' end of the sequence to be deleted sequence.
  • the italic and bold parts at both ends are NheI restriction sites and corresponding protective bases
  • the underline is the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted
  • the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted is the 3' of the sequence to be deleted
  • the upstream sequence immediately adjacent to the 5' end of the sequence to be deleted is the downstream sequence immediately adjacent to the 3' end of the sequence to be deleted
  • the shaded part is part of the Alu sequence.
  • the sequence was obtained by chemical synthesis and named as FMR1-1.
  • the underlined part is the replaced non-homologous sequence with the FMR1 gene.
  • the sequence was obtained by chemical synthesis and named as FMR1-2.
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the digested FMR1-1 or FMR1-2 were respectively connected with the plasmid vector pBS-L1PA1-CH-mneo which was cut linearly by the enzyme.
  • the reaction system is shown in Table 32:
  • the reaction conditions are: incubate at 16°C for 16h, then heat up to 70°C for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmid pBS-L1PA1-CH-mneo-FMR1-1 and plasmid pBS-L1PA1-CH-mneo-FMR1- 2.
  • the plasmid was verified to be correct by sequencing.
  • the pBS-L1PA1-CH-mneo-FMR1-1 plasmid group was set as experimental 13 groups; the pBS-L1PA1-CH-mneo-FMR1-2 plasmid group was set as control 6 groups. Three parallels were set in each group, and each parallel was a 6-well plate with Hela cells cultured.
  • transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 48 ⁇ g of the constructed plasmid and dilute it with 300 ⁇ L of serum-free DMEM, and mix well; at the same time, take 120 ⁇ L of Entranster-H4000 reagent and dilute it with 300 ⁇ L of serum-free DMEM, mix well, and let stand at room temperature 5min. Afterwards, the two prepared liquids were mixed and thoroughly mixed and allowed to stand at room temperature for 15 min to prepare a transfection complex.
  • the transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 8 was designed, and its upstream primer sequence is shown in Seq ID No.36: 5'-ACAGGGTTACAATTTGGT-3'; the downstream primer sequence is shown in Seq ID No.37: 5'-CATTTGCTCTGGAATACAC-3'.
  • the upstream primer sequence in primer pair 8 is located on the to-be-deleted sequence in the FMR1 gene on the genome and does not exist in the plasmid, and the downstream primer sequence in primer pair 8 is located on the genome on the to-be-deleted sequence in the FMR1 gene and does not exist in the plasmid middle.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 33.
  • Cell DNA templates were the DNA extracted from the control group 6 and the experiment group 13 after co-transfection.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 8 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 45°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • Example 10 RNA-mediated insertion of exogenous sequences to be inserted into the genome
  • the underlined sequence is a randomly designed non-homologous sequence to be inserted with the IT15 gene, the length is 60 bp, and both ends of the sequence are NheI restriction sites and protective bases (bold in italics), at the IT15 gene insertion site A part of the Alu sequence (shaded part) is between the downstream sequence (target site) and the 3'-end NheI restriction site and the protective base sequence.
  • IT15-3 The sequence was obtained by chemical synthesis and named IT15-3.
  • the specific process is:
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C and incubate for 10 min to inactivate the ligase, electrophoresis and purification to obtain the plasmid pBS-L1PA1-CH-mneo-IT15-3.
  • the plasmid was verified to be correct by sequencing.
  • pBS-L1PA1-CH-mneo-IT15-3 and pBS-L1PA1-CH-mneo plasmids were transfected into Hela cells, respectively.
  • transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 48 ⁇ g of the constructed plasmid and dilute it with 300 ⁇ L of serum-free DMEM, and mix well; at the same time, take 120 ⁇ L of Entranster-H4000 reagent and dilute it with 300 ⁇ L of serum-free DMEM, mix well, and let stand at room temperature 5min. Afterwards, the two prepared liquids were mixed and thoroughly mixed and allowed to stand at room temperature for 15 min to prepare a transfection complex.
  • the transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • the specific process is: take the transfected cells, remove the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, digest at 37°C for 20 minutes, and pipet 15 times every 5 minutes. .
  • the reaction was terminated by the addition of complete medium containing serum.
  • the cell-containing solution was transferred to an RNase-Free centrifuge tube, and after centrifugation at 300 g for 5 min, the pellet was collected and all the supernatant was aspirated.
  • Total RNA was extracted according to the instructions of the magnetic bead method tissue/cell/blood total RNA extraction kit.
  • RNA was extracted according to the instructions of the TIANSeq mRNA capture kit, and the mRNA content was detected by sampling (repeated the above experiment several times) steps to obtain enough mRNA for transfection).
  • mice The group that was transfected with mRNA extracted from Hela cells transfected with the pBS-L1PA1-CH-mneo-IT15-3 plasmid was set as the experimental group 14; The group transfected with mRNA extracted from Hela cells of the CH-mneo plasmid was designated as the control group 7. Three parallels were set in each group, and each parallel was a 6-well plate with Hela cells cultured.
  • the obtained mRNA was transfected into cells with Lipofectamine MessengerMAX transfection reagent, and Hela cells cultured on 6-well plates in experimental 14 groups and control 7 groups were transfected.
  • the first transfection was performed when the cells had grown to 40% confluency.
  • 5 ⁇ g of the prepared mRNA was mixed with 125 ⁇ L of serum-free DMEM solution, mixed with 125 ⁇ L of the previously diluted Lipofectamine MessengerMAX transfection reagent, and incubated at room temperature for 5 min.
  • the prepared mixed solution was added to the culture medium of the cultured cells in each well, and mixed gently. When the cells grow to 70% confluence, the above operation is repeated.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • the upstream primer sequence is shown in Seq ID No.25; the downstream primer sequence is shown in Seq ID No.26.
  • the upstream primer sequence of primer pair 6 is located in the complete IT15 gene, more upstream of the upstream sequence of the insertion site (target site) used on the plasmid, not present in the plasmid, only in the genome, and the downstream primer sequence of primer pair 6 is located in On the randomly designed non-homologous sequence to be inserted (sequence to be inserted).
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 37.
  • Cell DNA templates were the DNA extracted from the transfected control group 7 and the experimental group 14, respectively.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 6 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 50°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • the relative copy number of the experimental group 14 was significantly higher than that of the control group 7 (N/A was calculated according to 40.00), with statistical significance (P ⁇ 0.05), indicating that gene editing by the RNA pathway of the present invention is effective.
  • RNA mediation Due to the feasibility of complete RNA mediation, it can be known that if the coding sequence of ORF2p and/or ORF1p is not added to the sequence to be introduced into the editing system, the upstream sequence of the insertion site (target site) + the sequence to be inserted + insertion will be contained in vitro Site (target site) downstream sequence + SINE sequence (such as Alu sequence), partial SINE sequence or RNA product of SINE-like sequence binds ORF2p or simultaneously binds ORF1p and ORF2p and is transferred into the cell (cytoplasm) RNP pathway is also feasible .
  • SINE sequence such as Alu sequence
  • the CNVs of each gene should exist at the end, specifically, a certain sequence in the corresponding gene is connected to a partial SINE (Alu) sequence downstream, and is composed of different lasso-partial SINE (Alu) sequences. ) sequence (double-stranded DNA) is continuously inserted before part of the SINE sequence in the end of the CNV, gradually extending the CNV. Since the exon needs to be cut from the pre-mRNA, the end of the intron must form a lasso, and the probability of the overlapping of the lasso containing the exon is low. Therefore, for the relatively low expression level For a gene, there must be a relatively long period of time, and the CNV end is located at the end of the intron in the gene and is connected to a part of the SINE (Alu) sequence.
  • This embodiment randomly selects the 3' sequence of an intron in the BRCA1 gene, and selects it as the sequence to be deleted in the next embodiment; the 3' sequence of the intron of the BRCA1 gene is as shown in Seq ID No.39 Show:
  • the sequence was constructed according to the sequence of the 3'-end sequence of the sequence to be deleted + the randomly designed sequence non-homologous to the BRCA1 gene + part of the Alu sequence, and NheI restriction sites and protective bases were added at both ends to construct the sequence.
  • the Seq ID No.40 sequence is:
  • the underlined is the randomly designed non-homologous sequence (sequence non-homologous to the BRCA1 gene), the NheI restriction site and the protection base (bold in italics) at both ends of the sequence, and the non-homologous sequence and the Between the 3'-end NheI restriction site and the protective base sequence is a partial Alu sequence-Alu3 (shaded part).
  • the sequence was obtained by chemical synthesis and named BRCA1-1-Alu3.
  • underlined is a randomly designed non-homologous sequence (sequence non-homologous to the BRCA1 gene), at both ends of the sequence are NheI restriction sites and protective bases (bold in italics), in the non-homologous sequence Partial Alu sequence-Alu4 (shaded part) is between the 3'-end NheI restriction site and the protective base sequence.
  • the sequence was obtained by chemical synthesis and named BRCA1-1-Alu4.
  • the underlined is the randomly designed non-homologous sequence (sequence non-homologous to the BRCA1 gene), the NheI restriction site and the protection base (bold in italics) at both ends of the sequence, and the non-homologous sequence and the Between the 3'-end NheI restriction site and the protective base sequence is a partial Alu sequence-Alu5 (shaded part).
  • the sequence was obtained by chemical synthesis and named BRCA1-1-Alu5.
  • sequences that do not contain non-homologous sequences are designed, as shown in Seq ID No.43, Seq ID No.44 and Seq ID No.45.
  • Seq ID No.43 is obtained by removing non-homologous sequences from BRCA1-1-Alu3. The sequence was obtained by chemical synthesis and named BRCA1-2-Alu3.
  • Seq ID No. 44 is obtained by removing the non-homologous sequence from BRCA1-1-Alu4, which was obtained by chemical synthesis and named BRCA1-2-Alu4.
  • the Seq ID No.45 sequence is:
  • Seq ID No. 45 is obtained by removing non-homologous sequences from BRCA1-1-Alu5, which was obtained by chemical synthesis and named BRCA1-2-Alu5.
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the reaction conditions were: incubate at 16 °C for 16 h, then warmed to 70 °C for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmids pBS-L1PA1-CH-mneo-BRCA1-1-Alu3, pBS-L1PA1-CH-mneo-BRCA1 -1-Alu4, pBS-L1PA1-CH-mneo-BRCA1-1-Alu5, pBS-L1PA1-CH-mneo-BRCA1-2-Alu3, pBS-L1PA1-CH-mneo-BRCA1-2-Alu4 and pBS-L1PA1 -CH-mneo-BRCA1-2-Alu5.
  • the plasmid was verified to be correct by sequencing.
  • the transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 96 ⁇ g (32 ⁇ g of each plasmid) of the constructed plasmid and dilute it with 300 ⁇ L of serum-free DMEM, and mix well; After mixing, let stand at room temperature for 5 min. Afterwards, the two prepared liquids were mixed and thoroughly mixed and allowed to stand at room temperature for 15 min to prepare a transfection complex. The transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • Extraction of post-transfected cell DNA after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, and digest at 37°C for 20 min, with 15 times of pipetting every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 9 was used, and its upstream primer sequence was shown in Seq ID No.46: 5'-CCCCTTTATCTCCTTCTG-3'; the downstream primer sequence was shown in Seq ID No.47: 5'-ATTTCTCCCATTCCACTT-3'.
  • the upstream primer sequence of primer pair 9 is located in the downstream sequence of the 3′-end sequence of the sequence to be deleted on the plasmid in the complete BRCA1 gene, which does not exist in the plasmid, but only exists on the genome.
  • the downstream primer sequence of primer pair 9 is located in the complete BRCA1 gene.
  • the downstream of the 3'-end sequence of the to-be-deleted sequence on the plasmid in the plasmid does not exist in the plasmid, but only exists on the genome.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 41.
  • Cell DNA templates were the DNA extracted from the control group 8 and the experiment group 15 after co-transfection.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 9 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 46°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • the relative copy number of the experimental group 15 was lower than that of the control group 8, with statistical significance (P ⁇ 0.05), which means that the gene part at the end of the CNV in the experimental group 15 has fewer copies of the downstream sequence in the corresponding complete gene, It shows that the insertion of non-homologous sequences into the end of CNV hinders the downstream extension of the gene part at the end of CNV.
  • the underlined sequence is a randomly designed non-homologous sequence, the NheI restriction site and the protection base (bold italics) at both ends of the sequence, and the sequence to be deleted 5′ between the non-homologous sequence and the shaded sequence
  • the sequence immediately upstream of the end, the shaded part is the partial Alu sequence-Alu3.
  • the sequence was obtained by chemical synthesis and named BRCA1-3-Alu3.
  • the underlined sequence is a randomly designed non-homologous sequence, the NheI restriction site and the protection base (bold italics) at both ends of the sequence, and the sequence to be deleted 5′ between the non-homologous sequence and the shaded sequence
  • the immediately upstream sequence at the end, the shaded part is the partial Alu sequence-Alu4.
  • the sequence was obtained by chemical synthesis and named BRCA1-3-Alu4.
  • the Seq ID No.50 sequence is:
  • the underlined sequence is a randomly designed non-homologous sequence, the NheI restriction site and the protection base (bold italics) at both ends of the sequence, and the sequence to be deleted 5′ between the non-homologous sequence and the shaded sequence
  • the immediately upstream sequence at the end, the shaded part is the partial Alu sequence-Alu5.
  • the sequence was obtained by chemical synthesis and named BRCA1-3-Alu5.
  • the reaction conditions were as follows: after incubation at 37°C for 1 hour, the temperature was raised to 65°C and incubated for 20 minutes to inactivate the endonuclease, electrophoresis, and the digestion product was recovered.
  • the reaction conditions were as follows: incubate at 16°C for 16h, then heat up to 70°C for 10 min to inactivate the ligase, electrophoresis and purification to obtain plasmids pBS-L1PA1-CH-mneo-BRCA1-3-Alu3, pBS-L1PA1-CH-mneo-BRCA1 -3-Alu4 and pBS-L1PA1-CH-mneo-BRCA1-3-Alu5.
  • the plasmid was verified to be correct by sequencing.
  • the transfection steps were as follows: Hela cells were passaged and plated in 6-well plates. On the next day of passage, transfection was carried out with Entranster-H4000 transfection reagent. For the transfection of each plate of cells, take 96 ⁇ g (32 ⁇ g of each plasmid) of the constructed plasmid and dilute it with 300 ⁇ L of serum-free DMEM, and mix well; After mixing, let stand at room temperature for 5 min. Afterwards, the two prepared liquids were mixed and thoroughly mixed and allowed to stand at room temperature for 15 min to prepare a transfection complex. The transfection complex was added to each well containing 2 ml of DMEM medium containing 10% fetal bovine serum for transfection. Passage when the cells grow to about 90% confluence, repeat the above operation after passaging, and collect materials for subsequent operations after the cells grow to about 90% confluent.
  • DNA extraction of cells after transfection after aspirating the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, digest at 37°C for 20 min, and pipet 15 times every 5 min. When the cells were in suspension, the reaction was terminated by the addition of complete medium containing serum. After that, the cell DNA was extracted according to the product instructions of the blood/cell/tissue genomic DNA extraction kit, and the DNA concentration was determined by an ultraviolet spectrophotometer.
  • GAPDH gene does not contain Alu sequence and its copy number is stable, GAPDH gene is used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 10 was used, and its upstream primer sequence was shown in Seq ID No. 51: 5'-GCTTTCTCAGGGCTCTTT-3'; the downstream primer sequence was shown in Seq ID No. 52: 5'-GCACCATCTCGGCTCACT-3'.
  • the upstream primer sequence of primer pair 10 is located on the expected deletion sequence (sequence to be deleted), does not exist in the plasmid, but only exists on the genome, and the downstream primer sequence of primer pair 10 is located on the expected deletion sequence (sequence to be deleted), not Present in the plasmid, only on the genome.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 45.
  • the cell DNA templates were the DNA extracted from the 9 control groups and the 16 experimental groups after co-transfection.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 10 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 49°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • the relative copy number of the experimental group 16 was lower than that of the control group 9, which was statistically significant (P ⁇ 0.05), indicating that the sequence to be deleted was deleted, and the partial sequence of the CNV terminal gene was reduced in the experimental group 16.
  • Example 11 it can be seen from Example 11 that non-homologous sequences can be inserted at the end of CNV to prevent its continued extension; while Example 12 shows that the gene partial sequence at the end of CNV in the experimental group is significantly less than that in the control group, indicating that the end of CNV is trimmed and advanced, It was demonstrated that CNV ends can be modified by the relevant methods in the present invention. Therefore, it is also possible to modify multiple or all CNVs by changing the guide sequence upstream of the insertion point (ie, the upstream sequence of the target site) in the editing method (the same as the gene part sequence at the CNV end when editing the CNV end). .
  • the specific process is: take the transfected cells, remove the cell culture medium, rinse the cells twice with PBS, add an appropriate amount of 0.25% trypsin for digestion, digest at 37°C for 20 minutes, and pipet 15 times every 5 minutes. .
  • the reaction was terminated by the addition of complete medium containing serum.
  • the cell-containing solution was transferred to an RNase-Free centrifuge tube, and after centrifugation at 300 g for 5 min, the pellet was collected and all the supernatant was aspirated.
  • Total RNA was extracted according to the instructions of the magnetic bead method tissue/cell/blood total RNA extraction kit.
  • RNA was extracted according to the instructions of the TIANSeq mRNA capture kit, and the mRNA content was detected by sampling (repeated the above experiment several times) steps to obtain enough mRNA for transfection).
  • the cDNA was synthesized according to the instructions of the FastKing cDNA first-strand synthesis kit, and the concentration of the synthesized cDNA was measured by an ultraviolet spectrophotometer, which was to be detected later.
  • GAPDH gene expression is relatively stable in various tissues, GAPDH gene was used as an internal reference gene.
  • the upstream primer sequence for detecting GAPDH gene expression is shown in Seq ID No.7; the downstream primer sequence is shown in Seq ID No.8.
  • Primer pair 11 was used, and its upstream primer sequence was shown in Seq ID No. 53: 5'-CAGAGGACAATGGCTTCCATG-3'; the downstream primer sequence was shown in Seq ID No. 54: 5'-CTACACTGTCCAACACCCACTCTC-3'.
  • the upstream primer sequence of primer pair 11 is located on the BRCA1 gene, not in the plasmid, but only in the genome, and the downstream primer sequence of the primer pair 11 is located in the BRCA1 gene, not in the plasmid, only in the genome.
  • the above primers were all obtained by chemical synthesis.
  • the qPCR reaction system is shown in Table 47.
  • the cell DNA templates were the cDNA synthesized from the mRNAs extracted from the control group 9 and the experiment group 16 after the aforementioned co-transfection, respectively.
  • the above reaction system was prepared on ice. After preparation, cover the reaction tube, mix gently and centrifuge briefly to ensure that all components are at the bottom of the tube. Three replicates were performed simultaneously for each 6-well plate cell sample.
  • Primer pair 11 pre-denaturation at 95°C for 15min; (denaturation at 95°C for 10s, annealing at 55°C for 20s, extension at 72°C for 20s) for 40 cycles. GAPDH primers were reacted under the same conditions.
  • CNV ends can be fixed, extended and trimmed, and the gene transcription and protein expression of cells can be affected at the same time.
  • CNVs also change with physiological processes such as embryonic and individual development and tumorigenesis, and differ in different cells, tissues and individuals. Therefore, editing CNVs can also change the corresponding cells, tissues and living conditions.
  • the present invention utilizes retrotransposons widely present in eukaryotes and their reverse transcription functions to edit the genome, and the involved SINE, LINE sequences and related proteins are widely present in normal In organisms, under the premise of no double-strand break, more accurate target sequence identification and shearing are carried out, and the target fragment is integrated into the genome, and the corresponding fragment can be deleted and replaced. Since no double-strand breaks are generated, there is no need to worry about the risk of genomic double-strand DNA breaks and the introduction of unintended random sequences. Taking the Alu sequence in SINE and LINE-1 in LINE corresponding to its function as an example, Alu and LINE-1 are widely distributed in the genome of primates.
  • the sequence to be inserted depends on the sequence to be inserted on the vector.
  • the sequences on both sides are located at the site to be inserted (target site) on the genome, and ORF2p can only be extracted from the target site under the condition that the upstream sequence of the target site is completely matched.
  • the 3' end of the vector nucleic acid slides smoothly to the cleavage site for single-stranded cleavage on the genome, which greatly improves the accuracy of its targeting and avoids the occurrence of unexpected cleavage.
  • the theory of targeting accuracy higher than currently existing gene editing technologies.
  • RNA and corresponding proteins such as ORF1p and ORF2p in vitro
  • the target sequence, gene and genome can be modified by RNA or RNP pathway without introducing DNA fragments and transfection into the nucleus.
  • RNA and proteins transfected into cells can be directed into the nucleus, which is beneficial for editing cells that are difficult to manipulate because the vector is difficult to enter the nucleus.
  • the present invention can also edit the CNVs on the genome to increase, decrease or remain unchanged (can not continue to change), because CNVs can directly affect protein expression, etc., the operation of CNVs can change or stabilize the corresponding cells. expression and state.
  • the relevant mechanisms used in the present invention all exist in normal organisms, and there is no need to introduce external mechanisms and systems, thereby reducing the influence of the recipient system to be edited. Since foreign systems such as prokaryotic-derived proteins are not introduced, and double-strand breaks are not generated, the present invention can be more easily applied to the clinic compared with the existing gene editing technology.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Public Health (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Plant Pathology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Neurology (AREA)
  • Neurosurgery (AREA)
  • Epidemiology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Hematology (AREA)
  • Oncology (AREA)
  • Psychology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

提供一种基因转录框架、载体系统、基因组序列编辑方法及应用。该基因转录框架基于真核生物逆转录转座机制,可经DNA、RNA或RNP途径介导,包含有靶位点上游序列、待插入序列及靶位点下游序列的转录序列、一个或多个SINE(元件),LINE(元件)、一个或多个ORF1p编码序列和/或ORF2p编码序列。该基因编辑方法在尽量不引入外来体系或物质及不产生双链断裂的前提下,通过DNA、RNA或RNP途径转入细胞核或细胞质(RNA或RNP通过ORF1p和/或ORF2p介导从细胞质转入细胞核),将目的片段插入基因组中的指定位点或将基因组中的指定片段删除或替换。

Description

基因转录框架、载体系统、基因组序列编辑方法及应用 技术领域
本发明属于生物技术领域,涉及一种基因编辑技术,具体涉及一种可经DNA、RNA或RNP途径介导的基因编辑技术及其应用。
背景技术
目前在生物技术领域中基因编辑技术技术主要有ZFN、TALEN、CRISPR/Cas9及Targetron技术。其中,ZFN技术在历史上出现最早,但因其DNA结合域一般只能识别9bp长度的序列,使其实际应用时的靶向精确性受到较大的限制,且该技术实际设计繁琐,并无法对未知上下游的序列进行敲除。此外,其细胞毒性及脱靶率均较高。而TALEN技术相较于ZFN在设计上更为简单,且可识别17~18bp的序列使其具有更高的特异性,但由于核心技术掌握于个别商业公司,多数实验室无法自行完成,使其传播和应用受到了一定的限制。同时,其构建过程仍较为复杂。CRISPR/Cas9技术则是这三者中最为简便的,也是最富于操作性的。其最早发现于古细菌和细菌中,可以特异性识别20bp左右的序列,在Cas9内切酶的作用下,造成特定位点的双链断裂,并通过系统自身的DNA修复功能进行修复,以此进行基因编辑操作。
然而这三种技术均存在所谓的脱靶问题,即使是识别长度最高的CRISPR/Cas9技术也不例外。有研究显示,CRISPR/Cas9与靶位点识别的特异性主要依赖于靠近gRNA PAM位点的10~12个碱基的配对,这使其易于产生非特异切割。此外,这三种技术产生双链断裂后,会大概率通过非同源末端连接进行修复,而非同源重组,并产生随机序列,其不确定性使得其作用在实际应用,尤其是在人体和临床上大打折扣。同时,这三种技术将本不属于接受体系的遗传物质及蛋白带入其中,也可能带来意料之外的影响。
而更新的Targetron技术,其应用II类内含子实现在基因组特定位点插入序列,使相应基因发生突变。但该技术不可避免的会引发基因组双链断裂并向基因组引入外源性II类内含子产生“疤痕”,且由于该技术源于原核生物,其本身产生的用于逆转录的RNA并无跨膜运输的功能,限制了其RNA单独行使功能的应用。此外,最为重要的是,该技术在细菌基因编辑领域表现良好,但在高等生物上却表现不佳。所有这四种基因编辑技术均必须引入本不属于接受体系的蛋白质及核酸,增加了其作用的不确定性,并极大的阻碍了其临床应用。
发明内容
为了解决上述问题,本发明的目的在于提供一种基因转录框架,该基因转录框架转录后可经DNA、RNA和/或RNP途径介导转入细胞核或细胞质,将目的片段插入基因组中的特定位点或将基因组中的特定片段删除或替换,并同时具有较高的 靶向准确性。
本发明的另一目的在于提供一种可经DNA、RNA和/或RNP途径介导的载体系统。
本发明的第三个目的在于提供一种基因编辑方法,该基因编辑方法在尽量不引入外来体系或物质及不产生双链断裂的前提下,以DNA、RNP或RNA(可体外制备产生)及相关蛋白质为介质,通过DNA、RNA或RNP途径转入细胞核或细胞质,将目的片段插入基因组中的指定位点或将基因组中的指定片段删除或替换,并同时具有较高的靶向准确性。
为了实现上述目的,本发明提供一种基因转录框架,该基因转录框架沿5′→3′方向包括靶位点上游序列、待插入序列、靶位点下游序列;
该基因转录框架为一段可通过RNA聚合酶I、RNA聚合酶II或RNA聚合酶III转录的DNA序列,该基因转录框架的转录产物或其转录产物的转化产物中的靶位点上游序列或其互补序列能够与细胞基因组中相应靶位点的上游序列或其互补序列杂交,靶位点下游序列或其互补序列能够与细胞基因组中相应靶位点的下游序列或其互补序列杂交,该靶位点上游序列和该靶位点下游序列在基因组中相应的靶基因序列中为直接相连,基因组上靶基因序列中的靶位点上游序列和靶位点下游序列之间的位点即为待插入序列的靶位点。
上述基因转录框架用于将待插入序列插入至基因组靶位点。
优选地,所述细胞为真核细胞。
本发明还提供一种载体系统,该载体系统包括一种或多种载体,该一种或多种载体包括:
一个或多个1)如权利要求1所述的基因转录框架;
一个或多个2)短散在元件和/或部分短散在元件和/或类短散在元件,和/或一个或多个3)长散在元件和/或ORF1p编码序列和/或ORF2p编码序列;
其中,该组分1)、2)和/或3)位于该载体系统的相同或不同载体上;
当组分1)为多个时位于该载体系统的相同或不同载体上;
当组分2)为多个时位于该载体系统的相同或不同载体上;
当组分3)为多个时位于该载体系统的相同或不同载体上;
该载体上带有一个或多个启动子,该启动子为RNA聚合酶I启动子、RNA聚合酶II启动子或RNA聚合酶III启动子,并且位于该组分1)、2)和/或3)的上游;
该载体系统通过DNA、RNA和/或RNP途径介导。
更进一步地,所述载体为真核表达载体、原核表达载体、病毒载体、质粒载体、人工染色体、噬菌体载体、粘粒载体。
更进一步地,所述载体为表达载体、克隆载体、测序载体、转化载体、穿梭载体或多功能载体。
更进一步地,当1)和2)位于相同载体上时,该短散在元件,和/或部分短散在元件,和/或类短散在元件位于该基因转录框架的下游,并且该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件直接相连或间接相连;当直接相连时,该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件共用一个启动子;当间接相连时,该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件共用一个启动子或不共用一个启动子。
更进一步地,当1)和3)位于相同载体上时,该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列位于该基因转录框架的上游和/或下游,并且该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列直接相连或间接相连;当直接相连时,该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列共用一个启动子;当间接相连时,该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列共用一个启动子或不共用一个启动子。
更进一步地,当1)、2)和3)位于相同载体上时,该短散在元件和/或部分短散在元件和/或类短散在元件位于该基因转录框架的下游,和/或该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列的下游;当该短散在元件和/或部分短散在元件和/或类短散在元件位于该基因转录框架的下游时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该基因转录框架的上游,和/或该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该短散在元件和/或部分短散在元件和/或类短散在元件的下游;当该短散在元件和/或部分短散在元件和/或类短散在元件位于该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列的下游时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该基因转录框架的下游;并且该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列之间直接相连或间接相连;当直接相连时,该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子;当间接相连时,该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子或不共用一个启动子。
更进一步地,当2)和3)位于相同载体上时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该短散在元件和/或部分短散在元件和/或类短散在元件的上游和/或下游,且该短散在元件和/或部分短散在元件和/或类短散在元件与该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列直接相连或间接相连;当直接相连时,该短散在元件和/或部分短散在元件和/或类短散在元件与该长散在元 件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子,当间接相连时,该短散在元件和/或部分短散在元件和/或类短散在元件与该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子或不共用一个启动子。
为了提高转录效率,增加载体系统中的基因转录框架、短散在元件和/或部分短散在元件和/或类短散在元件、长散在元件和/或ORF1p编码序列和/或ORF2p编码序列的数量。在不同物种中也天然存在的短散在元件、长散在元件,但是由于天然的自带的启动子的活性差,用额外的启动子转录提高表达。
本发明还提供一种基因组序列编辑方法,包括如下步骤:
1)在基因组中选择待进行编辑的靶基因的待插入位点(靶位点),确定待插入位点两边的靶基因的待插入位点的上游序列(靶位点上游序列)和下游序列(靶位点下游序列);
2)制备如上述的载体系统;
3)将载体系统转化或转染到细胞、组织或生物体中进行表达以实现基因编辑。
本发明还提供上述载体系统在基因组中任意区域进行DNA序列的插入、删除、替换中的应用。
优选地,所述DNA序列为一个或多个CNV序列、CNV末端序列、短散在元件或长散在元件。
当DNA序列为CNV序列、CNV末端序列时,可对CNV末端(即CNV末端中的基因部分和部分SINE之间)进行编辑,插入与基因组或局部基因组非同源的序列以阻碍其基因拷贝数变化以及表达变化;或对CNV末端的基因部分序列进行删除以改变相应细胞表达。
本发明还提供上述载体系统作为预防和/或治疗癌症、与基因有关的遗传病、神经退行性疾病的药物中的应用。
优选地,所述癌症为胶质瘤、乳腺癌、宫颈癌、肺癌、胃癌、结直肠癌、十二指肠癌、白血病、前列腺癌、子宫内膜癌、甲状腺癌、淋巴瘤、胰腺癌、肝癌、黑色素瘤、皮肤癌、垂体瘤、生殖细胞瘤、脑膜瘤、脑膜癌、胶质母细胞瘤、各类星形细胞瘤、各类少枝胶质细胞瘤、星形少枝细胞瘤、各类室管膜瘤、脉络丛乳头状瘤、脉络丛癌、脊索瘤、各类神经节细胞瘤、嗅神经母细胞瘤、交感神经系统神经母细胞瘤、松果体细胞瘤、松果体母细胞瘤、髓母细胞瘤、三叉神经鞘瘤、面听神经瘤、颈静脉球瘤、血管网状细胞瘤、颅咽管瘤或颗粒细胞瘤。
优选地,所述与基因有关的遗传病为Huntington病、脆性X综合征、苯丙酮尿症、假肥大型进行性肌营养不良、线粒体脑肌病、脊髓性肌萎缩症、帕金森叠加综合征、白化病、红绿色盲症、软骨发育不全、黑尿症、先天性聋哑、地中海贫血、镰刀型细胞贫血病、血友病、与基因改变有关的癫痫、肌阵挛、肌张力障碍、卒中和精神分裂、抗维生素D佝偻病、家族性结肠息肉症、遗传性肾炎。
优选地,所述神经退行性疾病为帕金森病、阿尔茨海默病、Huntington病、肌萎缩性侧索硬化、脊髓小脑共济失调、多系统萎缩、原发性侧索硬化、Pick病、额颞叶痴呆、路易体痴呆或进行性核上性麻痹。
本发明可实现对上述癌症及其转移癌的发生的预防,抑制其增殖并阻止其级别升高和进展或逆转其性质;阻止、延缓或改善对胰岛素、左旋多巴、各类肿瘤化疗药物以及靶向药物等药物的耐药性、延缓或停止细胞及生物体的基因及状态改变、组织和器官再造和生物再生。
应用本发明相关技术还可对各基因中的拷贝数变异及其末端部分进行编辑,改变末端位置或稳定末端,由于其决定基因表达,从而达到稳定或改变细胞及生物各项状态的目的,并因此可应用于改造细胞、组织及生物体的基因及状态、对生物体如人类基因组进行改造以提高机能、对生物体如人类基因组进行改造以治疗各类与基因有关的遗传病如Huntington病和脆性X综合症等、延缓或停止细胞及生物体的基因及状态改变、改变细胞或生物体的基因及状态、组织器官再造和生物再生、通过导入转录因子使体细胞转化为生殖细胞辅助生殖、防止或延缓神经退行性疾病如帕金森病、阿尔茨海默病、Huntington病、肌萎缩性侧索硬化、多系统萎缩、原发性侧索硬化、脊髓小脑共济失调、Pick病、额颞叶痴呆、路易体痴呆和进行性核上性麻痹、抑制肿瘤细胞代谢活动、增殖速度及产生同时延缓其恶化并改善其恶性程度、以及其他所有与基因及CNVs改变相关的疾病如糖尿病等的研究与治疗及其他生理、病理和病生理研究等领域。
在本发明中,在基因转录框架中插入序列可为外源序列,也可为内源序列,一次性插入序列长度为1bp-2000bp。当多次插入时可以实现任意长度DNA序列的插入。靶位点上游序列核苷酸序列长度可在10bp-2000bp、靶位点下游序列核苷酸序列长度可在10bp-2000bp。
本发明的特点:
本发明可将需要插入基因组选定位点的待插入序列依靠载体上插入序列两侧的靶位点的上下游序列定位于基因组上的待插入位点,并在短散在元件、长散在元件及其表达的蛋白的辅助下,将待插入序列插入至基因组中的选定位点。且细胞内或载体表达的ORF2p只有在插入点上游序列完全匹配的条件下才可从其载体核酸的3′端顺利滑至剪切位点进行基因组上的单链剪切,这极大的提高了其靶向的准确性,避免了非预期剪切的发生,其靶向准确性理论上高于目前存在的基因编辑技术。
此外,可通过在体外产生所需的RNA及ORF1p和ORF2p等相应内源性蛋白,在不引入DNA片段和转染非必需入核的条件下,通过RNA或RNP途径对目的序列及基因进行修改。通过RNA或RNP介导途径更是可以把对接受体系的影响降到最低并在提高靶向性的同时减少非特异作用。借助ORF1p和ORF2p的核定位功能 及ORF1p对于核酸的保护作用,转染入细胞的RNA和蛋白质可被引导至核内,有利于对因载体难以入核而难以操作的细胞进行编辑。
本发明具有较高的靶向性,可在不产生双链断裂的前提下,进行较为准确的靶向序列识别和剪切,通过同源重组定向插入相关序列,并可由此对相应片段进行删减和替换。而目前已有的基因编辑技术如CRISPR则产生双链断裂,发生同源重组引入目的序列的概率较低,且易产生不可预知的随机突变。本技术不产生双链断裂,无需担心双链DNA断裂的危险和非预期随机序列的引入。
本发明可以通过不断根据之前插入后产生的新位点设计载体进一步进行插入,以递进的方式在基因组上引入长序列,这也是目前已知的编辑技术难以达到的。同样,在基因组上进行定向和准确的删减及序列替换也难以通过现有的技术实现但同样可以通过本发明实现。而通过对CNV及其末端进行编辑和使其稳定等操作以改变或稳定细胞或生物体的基因表达及状态亦无法通过目前已有的基因编辑技术实现。
本发明的有益效果在于:
本发明提供一种基于真核生物逆转录转座机制的可经DNA、RNA或RNP途径介导的基因编辑方法,该方法通过真核生物自身的固有机制,在尽量不引入外源性体系或物质及不产生双链断裂的前提下,以DNA、RNP或RNA(可体外制备产生)及相关蛋白质为介质,通过DNA、RNA或RNP途径转入细胞核或细胞质,将目的片段插入基因组中的选定位点或将基因组中的选定片段删除或替换,并同时具有较高的靶向准确性。由于不引入外来体系如源自原核生物的蛋白质等,且不产生双链断裂,本发明相较于目前已有的基因编辑技术更易应用于临床。
附图说明
图1为基于真核生物自身所有的逆转录转座机制进行基因编辑的基本原理图。
图2为DNA介导的基因组插入或删除的原理图。
图3为RNA及RNP介导的基因组插入与删除的原理图。
图4为通过在CNV末端插入非同源序列以阻止CNV末端变化的原理图。
图5为本发明提供的基因转录框架的结构示意图。
图6为本发明提供的基因转录框架连接启动子的结构示意图。
图7为本发明提供的基因转录框架连接启动子及短散在元件、部分短散在元件或类短散在元件的结构示意图。
图8为本发明提供的基因转录框架上游连接启动子,下游连接长散在元件或ORF1p编码序列或ORF2p编码序列的结构示意图。
图9为本发明提供的基因转录框架上游连接长散在元件或ORF1p编码序列或ORF2p编码序列,并在长散在元件或ORF1p编码序列或ORF2p编码序列的上游连接启动子的结构示意图。
图10为本发明提供的基因转录框架在上游连接启动子,在下游连接短散在元件、部分短散在元件或类短散在元件后,再在下游连接长散在元件或ORF1p编码序列或ORF2p编码序列的结构示意图。
图11为本发明提供的基因转录框架在下游连接短散在元件、部分短散在元件或类短散在元件,在基因转录框架上游连接长散在元件或ORF1p编码序列或ORF2p编码序列,并在长散在元件或ORF1p编码序列或ORF2p编码序列的上游连接启动子的结构示意图。
图12为本发明提供的基因转录框架与短散在元件、部分短散在元件或类短散在元件不共用一个启动子的结构示意图。
图13为本发明提供的基因转录框架与长散在元件或ORF1p编码序列或ORF2p编码序列不共用一个启动子的结构示意图。
图14为本发明提供的基因转录框架与短散在元件、部分短散在元件或类短散在元件及长散在元件或ORF1p编码序列或ORF2p编码序列不共用一个启动子,而短散在元件、部分短散在元件或类短散在元件及长散在元件或ORF1p编码序列或ORF2p编码序列共用一个启动子的结构示意图。
图15为本发明提供的基因转录框架与短散在元件、部分短散在元件或类短散在元件及长散在元件或ORF1p编码序列或ORF2p编码序列不共用一个启动子,而短散在元件、部分短散在元件或类短散在元件及长散在元件或ORF1p编码序列或ORF2p编码序列共用一个启动子的结构示意图。
图16为在实施例1中构建的基因转录框架VEGFA1在实施例1插入到载体上的质粒pSIL-eGFP-VEGFA1-Alu1的质粒图谱。
图17为实施例6中构建的基因转录框架IT15-1在实施例6插入到载体上的质粒pBS-L1PA1-CH-mneo-IT15-1的质粒图谱。
具体实施方式
本发明基于普遍存在于真核生物中的通过转座子对基因组上的基因拷贝数及重复序列等进行修改的基因组重构机制。该机制可能在一些中枢神经系统退行性病变如Huntington病和脆性X综合症中对其中的致病三联核苷酸重复序列造成了删减或添加,并符合同源重组如序列的高度同源性以及可被甲基化抑制等特点且与表达水平相关。如图1所示,本发明所涉及短散在元件(short interspersed element,SINE,短散在核元件)、长散在元件(long interspersed elenent,LINE,长散在核元件)及其产生的相关蛋白如开放阅读框1蛋白(open reading frame 1 protein,ORF1p)、开放阅读框2蛋白(open reading frame 2 protein,ORF2p)和其他种类的开放阅读框蛋白(open reading frame protein,ORFp)。短散在元件(SINE)主要包含灵长类中的Alu元件和SVA元件、哺乳动物中常见的各类哺乳动物广泛分布散在重复元件(mammalian-wide interspersed repeat elements,MIRs)如MIR和MIR3等、单孔目 动物中的Mon-1、鼠类中的B1和B2元件、斑马鱼等中的HE1家族、爬行动物中的Anolis SINE2和Sauria SINE、无脊椎动物如乌贼等中的IdioSINE1、IdioSINE2、SepiaSINE、Sepioth-SINE1、Sepioth-SINE2A、Sepioth-SINE2B和OegopSINE以及植物如大米等中的p-SINE1等。长散在元件主要包含各类不同种类生物中的各类LINE-1(L1)、各类LINE-2(L2)和各类LINE-3(L3)、Ta元件及R2、RandI、L1、RTE、I和Jockey六类LINE中的其他LINE种类等。这些结构广泛存在于各类动植物体内并散布于整个基因组中,每种生物均有其特定的SINE和与其功能互补对应的LINE;SINE的主要特征为分布于基因组上、含有内部RNA聚合酶III启动子并以富含A或T的尾巴或短简单重复序列结尾、借助于LINE实现逆转录的相对较短的转座子,其转录产物的右半部分含有可逆转录的功能结构;而LINE的特征则为含有逆转录酶编码序列的于基因组中广泛分布的转座子。SINE及其在相应物种中所对应的LINE均通过类似的机制对基因组进行不断的重构。该机制的基本原理为将机体通过对pre-mRNA进行处理产生的套索结构与SINE的转录产物经剪切所剩的具有逆转录功能结构的右半部分连接(这些经完整的SINE转录产物于中间位点剪切所剩余的具有逆转录功能结构的右半部分称为部分SINE序列,不同物种的不同SINE其剪切位点会有所不同。SINE的自然剪切位点一般位于全长的中间偏前,对于一般全长为100-400nt左右的SINE来说,其自然剪切位点通常位于其第100-250nt,例如对于全长为300bp左右的Alu元件,其剪切位点位于第118nt;而对于全长为260nt左右的各类MIR,可在其第100-150nt的范围内可观察到剪切位点。实际上,不论位点位于何处,只要经剪切后,剩余的右部分含有完整的逆转录功能结构(二级结构形成特殊结构,通常为Ω形;其一级结构的特征为含有两段在两段之间被中间间隔序列隔开的序列,此两段序列可结合基因组上所对应的不含有中间间隔序列并使两段序列直接连接的序列的互补序列;LINE编码的ORF2p可结合于转录产物中两段序列之中位于3′的序列并在两段序列之间空缺所对应的基因组位点处切开基因组单链,起始逆转录),即为部分SINE序列。对于特定SINE如Alu元件所产生的部分SINE则记为部分Alu,具体为其中间腺苷酸重复序列或连同其上游2-3个碱基和Alu右单体及右单体后面的3′多聚A重复序列。此外,含有逆转录功能结构并可起始逆转录却与常规各类SINE在序列上有所区别的序列则称为类短散在元件(类SINE)。),通过相应种类LINE(如Alu元件在功能上所对应的LINE-1和各类MIR元件对应的LINE-2等)所表达的蛋白质(即ORF1p和ORF2p),实现将RNA转化为双链DNA并结合于基因组上与之互补的序列(其中转录形成的RNA通过逆转录产生的单链DNA及单链DNA以基因组序列为引物产生的双链DNA即为转录产物的转化产物),通过形成特异的Ω结构,经同源重组机制完成对基因组的插入。此外,LINE亦可通过转录其下游序列(即3′转导)并与基因组上的互补序列结合并形成Ω结构来完成上述类似的RNA到双链DNA的转化及基因组的插 入。以Alu和其对应辅助其功能的LINE-1为例:基因表达后产生的pre-mRNA可经剪切产生序列上互相重叠的套索结构,这在pre-mRNA的任何区域均可发生,差别在于产生这些套索的剪切强弱不同。由于外显子上下游套索的产生剪切强度(基于序列差异)高于其他周围的套索结构,使得外显子在pre-mRNA的处理中易于被完整切下,并抑制其他套索的产生。同时,LINE-1产生的ORF1p可以保护与其结合的核酸,其与同为LINE产生的ORF2p均可将所结合的核酸定位至细胞核并转运入核;此外,ORF2p可结合于Alu元件的特殊的Ω二级结构上并介导此后的基因组单链剪切、逆转录及辅助基因组的整合。Alu元件的转录产物则可于特定位点(即下文的scAlu剪切位点或自然剪切位点,该位点一般位于Alu转录产物的中间多聚A序列前,实际情况可出现浮动)被剪切(Multiple dispersed loci produce small cytoplasmic element RNA),产生小胞质Alu(small cytoplasmic element RNA,scAlu)及包含右单体的剩余部分(包含可结合ORF2p的逆转录功能结构),其包含右单体的剩余部分称为部分Alu。此后,产生的套索结构可以从其3′端连接Alu序列转录产物经剪切的含有逆转录功能结构的剩余部分,ORF2p可经由富含A的序列招募,并结合于部分Alu二级结构所形成的Ω结构两脚中位于3′的脚上,并识别基因组上与Ω两脚上序列(主要为UU/AAAA,U和A之间不连续,即缺口所在)相匹配的序列,切开Ω缺口正对的基因组位点的单链并解链基因组上的互补序列作为引物进行逆转录,这一过程称为目标引物化的逆转录(target-primed reverse transcription,TPRT);ORF2p随逆转录的进行移动至所形成的单链DNA的3′端,产生的单链DNA序列可结合于基因组上的互补序列并在基因组相应待插入位点处形成Ω结构(因待插入序列在基因组上相应待插入位点处不存在,而单链DNA上待插入序列的两侧序列存在于基因组上待插入位点两侧),ORF2p可沿匹配的序列以3′到5′的方向滑动至Ω结构,识别基因组上与Ω底端缺口互补的6位核苷酸序列(主要为3′的4个核苷酸及5′的2个核苷酸),经上述类似过程形成双链DNA。注意只有完全匹配的序列才可令ORF2p滑动至剪切位点,这保证了其靶向的准确性。最终产生的双链DNA再次呈“Ω”形结合于相应插入点(两端序列匹配固定)的两侧,当ORF2p识别的6个核苷酸(主要为3′的4个核苷酸及5′的2个核苷酸)中间不连续时(Ω的缺口处),则可在缺口处所对应的基因及自身的另一条链上经ORF2p的核酸内切酶作用制造两个单链缺口,将中间的圆环部分借助同源重组机制插入至基因组中。而通过改变被插入的的序列,可经由同源序列重组达到删除或替换等其他效果。在上述过程中,同样由LINE编码的ORF1p的退火及解构功能也可起到辅助作用,可协助稳定上述基因组重构过程中核酸所产生的二级结构及其与基因组的结合,以及促进核酸在结合并作用后与基因组的分离。此外,ORF1p具有高度的RNA亲和性并具有核定位功能。由于ORF2p只能切开基因组双链中的其中一条链,无法产生双链断裂,因此具有较高的安全性。类似的机制同样适用于其他SINE以及LINE组合。 在胚胎发育及肿瘤发生等病生理过程中的局部拷贝数变异的变化及具有缺失的HIV-1基因组在人类基因组中的插入对于短散在元件序列的偏好性或为该机制在自然界中的一个体现。曾有报导称在ORF1p和ORF2p的协助下可将转录的mRNA序列整合入基因组,但由于转录模板为纯粹的外源非同源序列而使其无法靶向基因组中具体位点且并未连接具有逆转录功能结构的片段造成效率低下且随机,难以控制。本发明通过重新设计转录序列,通过各种主动或被动手段与具有逆转录功能结构的序列如各类SINE或部分SINE连接,以达到较为精准且高效的基因编辑效果。
在生理状况下,拷贝数变异(copy number variation,CNV)类似于完整基因原本的一个副本,通过上述机制,可以依照完整的基因原本不断延伸作为副本的CNV,使得细胞、组织及生物体的蛋白表达及各类状态不断变化。CNV末端由上游的基因部分和下游的部分SINE序列部分组成,而由套索结构与部分SINE序列连接形成的短序列片段则会不断插入至这两部分之间以延伸CNV。在胚胎发育早期,LINE的转录明显增加,而基因组上的SINE如Alu序列则呈现明显的去甲基化。在LINE介导的3′转导(基于启动子上游SINE的右单体缺失及下游的完整的SINE结构)起始相关基因拷贝数变异(CNVs)延长的同时,去甲基化的SINE序列则互相发生同源重组将大部分此前延伸的CNVs删去(初始化)。此后,彻底初始化的胚胎细胞重新恢复高甲基化状态,并由CNV末端的部分SINE序列介导CNVs的末端逐渐延长,从而改变各细胞的表达情况及状态,而各细胞的基因表达情况又通过套索结构反过来影响CNVs改变,从而使基因组产生变化,逐渐诱导分化。这与胚胎中普遍出现的CNVs改变及各种不同组织中的CNVs差异相吻合。
不同基因CNVs的延长普遍存在于各类肿瘤细胞中,且与临床分级呈正相关。同时,原癌基因与抑癌基因的表达水平与CNVs的长度亦成正比关系,因此肿瘤的形成及进展应与原癌或抑癌的CNVs紊乱有关。此外,一些与外界刺激相关的不可逆性疾病如糖尿病等或亦与CNVs的紊乱相关。由于多数耐药性与外界的长期刺激导致相应蛋白的表达改变有关,因此可涉及其相应基因的CNV改变,亦可通过本技术得以改善或阻碍。
一、DNA介导的基因组序列插入技术(当通过DNA介导基因编辑时,可在质粒上添加1-40个TTAAAA或TTTTAA序列辅助将RNA转化为DNA)(如图2所示)
1.套索结构介导方式:选取待插入位点(即靶位点)的上下游序列(分别2000bp以内),在上下游序列中间插入点处添加待插入序列(2000bp以内),将所设计的序列合成并整合入载体并由RNA聚合酶II启动合成。载体其余区域插入SINE序列(0-20个,根据接受体系物种不同可为相应物种中的SINE,以减少对接受体系的影响;如在灵长类中则为Alu序列,而在单孔目动物中则为Mon-1,数量在一定范围内和效率成正比;亦可使用非本物种的SINE,则
Figure PCTCN2021134710-appb-000001
须 导入与所用SINE功能对应的LINE或其蛋白编码序列,具体见下文),并由RNA聚合酶II或III单独启动;SINE序列后选择性连接相应RNA聚合酶的终止信号(可在SINE序列后终止信号前选择性添加相应LINE序列或其编码的蛋白序列,如SINE中的Alu元件对应的LINE-1序列及其编码的ORF1p和ORF2p序列和MIR元件对应的LINE-2或其编码的ORF1p和ORF2p序列等,以表达基因编辑所需蛋白,从而实现基因编辑或增加编辑效率)(当通过RNA聚合酶II启动转录时,其对应的终止信号为多聚A序列,则可适当延长其长度(200bp以内)以增加ORF2p招募,若非,则可选择在终止信号前ORF1p和ORF2p序列后添加适当长度多聚腺苷酸(200bp以内),同时可以适当延长SINE序列末尾的多聚A序列(200bp以内))。此后将该载体(可同时转入表达该载体所含SINE序列所对应的LINE序列或其编码的蛋白序列,如Alu序列对应的LINE-1或ORF2p及/或ORF1p序列的其他载体以增加效率,当接受体系不表达上述蛋白如相应种类LINE的ORF1p及ORF2p时则需额外表达;此外,亦可同时转入表达SINE的载体,以提高效率)通过常规手段如脂质体或病毒转染等转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式给予生物体,使构建的载体进入核内进行表达,完成将相应待插入序列插入至基因组上相应待插入位点的目的(若未达到预期效率,或由于包含插入点上下游序列及中间待插入序列的套索结构的形成效率不高,可尝试增加或减少上下游序列或待插入片段长度以促进套索结构的形成;或根据下述检测方法检测含有待插入位点的套索结构,将该套索结构的序列作为上下游序列,两边按基因组序列做适当延长,中间插入点处为待插入序列构建入载体亦可提高效率)(可选择将构建好的载体短暂置于含有ORF1p和/或ORF2p的生理液体中孵育(于适宜温度,常温或37℃均可,孵育48h以内)以提高载体的入核效率)。若根据插入后产生的新位点继续按上述方法插入,则可持续性插入并完成无明显长度限制的长片段插入。
2.SINE序列直接连接方式:该方法无需SINE剪切后连接套索,而是直接将待插入位点上下游序列及中间的待插入序列与SINE相关序列在载体构建时连接,因此此方法适用于不具有真核生物pre-mRNA剪切机制无法产生套索结构的体系如细菌等原核生物,也同时适用于具有pre-mRNA剪切机制的真核生物中,后面的LINE介导方式同理。具体步骤为合成由RNA聚合酶II
Figure PCTCN2021134710-appb-000002
或III启动子启动的含有插入点上下游序列(分别2000bp以内)及夹在中间的待插入序列(2000bp以内),其后接SINE序列、部分SINE序列或类SINE序列(可选择在SINE、部分SINE序列或类SINE序列后添加可辅助相应SINE功能的LINE序列或其蛋白编码序列以增加基因编辑效率,当接受体系自身不表达LINE或其编码蛋白时,则必须添加或额外给予表达),此后选择性连接相应种类RNA聚 合酶的终止信号(若终止信号为多聚腺苷酸,则可适当延长(200bp以内)以增加ORF2p招募,若非,则可选择在终止信号前,LINE、ORF1p和/或ORF2p序列后添加适当长度多聚腺苷酸(200bp),以增加ORF2p招募),并将该序列构建入载体中。此后将该载体通过常规转染手段如脂质体或病毒等转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式给予生物体,将构建的载体转入核内进行表达(可选择将构建好的载体短暂置于含有ORF1p和/或ORF2p的生理液体中孵育(于适宜温度,常温或37℃均可,孵育48h以内)以提高载体的入核效率),将待插入序列插入至基因组上的相应待插入位点。若根据插入后产生的新位点继续按上述方法构建载体进行插入,则可持续性插入并完成无明显长度限制的长片段插入。
3.LINE介导方式:由RNA聚合酶II启动LINE或其内的蛋白ORF2p和/或ORF1p编码序列表达,后接与SINE序列直接连接方式中相同方法设计的序列(若为最大程度上减少对接受体系的影响,可选择采用该接受体系中的SINE和LINE种类;为提高效率可选择使其内的SINE序列种类为与前面的LINE在功能上相对应者),最后选择性连接所用RNA聚合酶II的终止信号。此后将该载体通过常规转染手段如脂质体或病毒等转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式给予生物体,将构建的载体转入核内进行表达(可选择将构建好的载体短暂置于含有ORF1p和/或ORF2p的生理液体中孵育(于适宜温度,常温或37℃均可,孵育48h以内)以提高载体的入核效率),将待插入序列插入至基因组上的相应待插入位点。若根据插入后产生的新位点继续按上述方法插入,则可持续性插入并完成无明显长度限制的长片段插入。
4.下游连接ORF2p结合序列法:以载体上基因转录框架中的靶位点上游序列、靶位点下游序列以及中间的待插入序列与基因组结合形成的Ω结构代替SINE中的逆转录功能结构起始逆转录,因此在基因转录框架中靶位点下游连接可结合ORF2p的ORF2p结合序列(例如多聚A序列),并可选择性在与基因转录框架同一载体上或另外的载体上添加LINE序列、ORF1p和/或ORF2p编码序列以提高效率。此后将构建好含有下游连接ORF2p结合序列(例如多聚A序列)的基因转录框架的载体通过常规转染手段如脂质体或病毒转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式给予生物体,将构建的载体转入核内进行表达(可选择将构建好的载体短暂置于含有ORF1p和/或ORF2p的生理液体中孵育(于适宜温度,常温或37℃均可,孵育48h以内)以提高载体的入核效率),将待插入序列插入特定位点。若根据插入后产生的新位点继续按上述方法构建载体进行插入,则可持续性插入并完成无明显长度限制的长片段插入。
二、DNA介导的基因组序列删除技术,如图2所示
1.基因组上任意区域删除:将上述插入技术中设计的载体中的待插入序列改为插入点上游或下游(100000bp以内)的某段序列(2000bp以内),通过本发明中描述DNA、RNP或RNA介导的插入途径即可在插入该段序列后通过同源重组以一定效率将两段相同序列之间的序列去除。可选择含有重组位点(GCAGA[A/T]C、CCCA[C/G]GAC/或及CCAGC)的序列进行插入以提高随后的同源重组效率。
2.从CNV末端删除:通过测序及比对(比对到基因序列与部分SINE序列的连接处)检测细胞或组织中的CNV末端,并选取欲进行处理的CNV末端中的基因部分(2000bp以内)及该末端在完整基因中下游一段范围内(20000bp以内)可形成的套索的3′部分序列(下游可形成的套索可由下述方法预测或检测得到)(亦可直接选取该末端在完整基因中下游一段范围内(20000bp以内)的序列进行切割后替换上述的3′部分序列)分别连接末端待删除序列上游(100000bp以内)紧邻的序列,之后连接完整的SINE序列、部分SINE序列或类SINE序列(根据上述不同的插入方式)(据上所述其后可接ORF1p及ORF2p编码序列)进行合成并通过上述基因插入方式的其中一种通过DNA、RNA或RNP途径在实际CNV末端的基因部分与部分SINE序列(载体上所用SINE序列与插入点周围的SINE序列相同或更加接近可提高效率)之间插入末端待删除序列上游紧邻的序列,此后借相同序列间的同源重组将待删除序列删除。可选择含有重组位点(GCAGA[A/T]C、CCCA[C/G]GAC/或及CCAGC)的序列进行插入以提高效率。
三、DNA介导的基因组序列替换技术
将上述插入技术中设计的载体中的待插入序列改为替换用序列及基因组上待替换序列的周围序列(即待插入的替换用序列和基因组上序列发生同源重组后将被删除的那段序列,其在构建载体时位于替换用序列的3′还是5′取决于插入点在基因组上待替换序列的上游还是下游)(替换用序列应与基因组上待替换序列同源),通过上述基因编辑插入方式将替换用序列及基因组上待替换序列的周围序列插入于基因组上待替换序列的上游或下游,当插入的替换用序列与基因组上的待替换序列发生同源重组后,则基因组上的待替换序列被替换为插入的与其同源的替换用序列,同时因同源重组被删去的待替换序列的周围序列部分则在插入时已与替换用序列一块被重新插入。
四、RNA介导的基因组序列编辑技术(由于不需要进行RNA到DNA的转化,因此不需要额外添加TTAAAA位点或TTTTAA位点),如图3所示
1.核糖核蛋白(RNP)介导途径:
将前述合成的载体扩增后转化进入额外高表达LINE所编码蛋白(可选择与所合成载体中所含SINE序列功能对应的LINE以提高效率,例如所合成载体中包含Alu序列或部分Alu序列,则其对应LINE-1和其编码蛋白ORF1p及ORF2p)(通 过表达相关蛋白的载体转染或转染后筛选获得永久过表达相关蛋白的工程细胞,若此后经相关蛋白如ORF1p和ORF2p的孵育则该被转入的工程细胞可不表达相关蛋白)的工程用细胞系,一段时间后提取细胞核及细胞质,根据序列特异性等原理及应用相应常规方法提取单链质粒产物(单链RNA)或含有单链质粒产物(单链RNA)的具有生物活性的核糖核蛋白(RNP)复合体(可选择再次短暂于所得的含有ORF1p和/或ORF2p的细胞质中或含有ORF1p和/或ORF2p的生理液体中孵育(于适宜温度,常温或37℃均可,孵育48h以内)后再次提取纯化,注意体外生理液体或细胞质中需添加RNA酶抑制剂,若此前转入的细胞不表达相关蛋白则必须与ORF1p和/或ORF2p孵育);此后通过常规转染手段如应用脂溶性物质如脂质体或病毒等包裹核糖核蛋白复合体后转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式转入生物体(转入细胞质即可,无需入核),完成相应的基因编辑作用。若需定向转入可在载体外面的包裹上进行修饰。注意整个过程避免RNA降解。
另外,LINE介导方式所合成载体的以RNP形式应用需从所提取的含有单链质粒产物(单链RNA)的具有生物活性的核糖核蛋白(RNP)复合体中筛选不含前端LINE-1序列或ORF1p和ORF2p编码序列的产物(通过序列特异性)(可额外添加体外核酸内切酶等处理以促进剪切),以防止前端序列扰乱基因编辑的靶向进行。
2.单纯RNA介导途径:
同前所述,合成含有插入点(即靶位点)上下游序列(分别2000bp以内)及夹在中间位于插入点所对应处的待插入序列(2000bp以内),其后接SINE序列、部分SINE序列或类SINE序列,之后为与所用SINE功能对应的LINE序列或其中所含的蛋白编码序列(例如若使用部分Alu序列,则对应LINE-1及其中的ORF1p和ORF2p编码序列;若使用部分MIR序列,则对应LINE-2及其中相应的蛋白编码序列),并构建入载体由RNA聚合酶II/III启动子启动(或者可以直接采用上述各种DNA介导方法中所得的载体并转入工程细胞中,此后通过常规手段如根据序列特异性等提取可用于基因编辑的RNA产物)。将其表达的
Figure PCTCN2021134710-appb-000003
RNA经分离纯化后通过常规转染手段如应用脂溶性物质如脂质体或病毒等包裹RNA后转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式转入生物体(转入细胞质即可,无需入核),即可达到将待插入序列插入至基因组上的相应待插入位点的目的。若根据插入后产生的新位点继续按上述方法插入,则可随细胞分裂将待插入片段逐渐插入并完成无明显长度限制的长片段插入(需持续将RNA转入细胞)。
五、阻碍转座子造成的基因组变化,稳定基因组及其上的CNVs(即通过该基因编辑技术在CNV末端的基因部分和部分SINE序列之间或其他区域插入与基因组或与该CNV末端中基因部分及其在完整基因中的上下游序列非同源的序列,阻碍 CNV的进一步延伸;CNV末端定义为基因序列直接连接部分SINE序列处,在该处基因可被延伸,每个特定CNV末端的基因序列和部分SINE序列的具体序列可通过基因测序或基因芯片等分子生物学手段获取)(通过常规转染手段如应用脂溶性物质或具有细胞转染能力的物质如脂质体或病毒等包裹相应载体后转入体外培养的细胞、组织或经血液、淋巴液和脑脊液等通路或局部组织给予等方式转入生物体)(如图4所示)
1.对特定的CNV进行干预(其中插入所用的上游序列为特定基因的CNV末端的基因部分):选定需要操作的CNV,其基因部分的3′端与部分SINE序列的交界处设为插入点(靶位点),将上述插入方法中的插入点上游序列设为CNV末端基因部分的3′端(2000bp以内),下游序列即为部分SINE序列(因此上述方法中下游序列后面连接的SINE序列、部分SINE序列或类SINE序列可省略),待插入序列为任意与基因组或与该CNV末端中基因部分及其在完整基因中的上下游序列非同源的序列(2000bp以内)。载体构建完成后通过上述的DNA、RNA或RNP途径转入相应细胞,活体组织或生物体,使相应CNV末端插入非同源序列。由于非同源序列不存在于完整基因中相应CNV末端基因序列的下游,因此无法依据完整的基因序列对CNV末端进行进一步的延伸,从而阻碍CNV末端的进一步变化。
2.对基因组上广泛的CNV进行干预(插入所用的靶位点上游序列需包含所有可能存在的CNV末端的基因部分):
(1)基因组破碎序列法:取需要操作的生物体、组织或细胞系中的细胞进行体外培养,或直接提取基因组,超声破碎后通过随机引物和PCR进行富集;设计合成短随机序列(20bp以内),在下游连接部分SINE序列。将富集所得的基因组碎片与合成的短随机序列连接部分SINE序列片段通过PCR进行连接和扩增,获得不同的基因组碎片序列连接随机序列后连接部分SINE的序列,将所得的片段构建入载体后,通过上述的DNA、RNA或RNP途径转入相应细胞,活体组织或生物体,经由基因组碎片序列靶向基因组上所有的CNV末端,使CNV末端在其基因部分与部分SINE之间被插入非同源序列(即短随机序列或部分短随机序列,其不与基因碎片同源的部分,对于相应基因碎片的局部基因序列而言是非同源的),由于非同源序列不存在于完整基因中相应CNV末端基因部分序列的下游,从而阻碍CNV末端的进一步变化。
(2)随机序列法:构建表达适当长度(100bp以内)的随机序列(包含所有排列可能,可排除与SINE序列近似的组合)连接任意与基因组非同源序列(2000bp以内)后连接部分SINE的质粒;或构建随机序列(100bp以内)连接在SINE的中间自然剪切位点(如对于Alu的转录产物则为其中间可剪切产生scAlu和部分Alu的剪切位点)后添加任意与基因组非同源序列(2000bp以内)的部分SINE序列的质粒;亦可构建经由RNA聚合酶II表达的随机序列后连接任意与基因组非同源序列 (此后表达为套索)的载体(该载体中需含有SINE序列或另外转入其他含有SINE序列的载体,SINE序列下游可接或另外表达与SINE功能对应的LINE序列或其蛋白编码序列)。并通过上述DNA、RNP或RNA途径转入相应细胞,活体组织或生物体,经由随机序列靶向基因组上所有的CNV末端,使相应CNV末端插入非同源序列,由于非同源序列不存在于完整基因中相应CNV末端基因序列的下游,从而阻碍CNV末端的进一步变化。
(3)据套索末端序列法:检测所有的套索种类(将一小段与基因组非同源的随机序列(100bp以内)插入SINE序列且SINE序列仍可被正常剪切为部分SINE序列(即该非同源序列的插入位置在SINE自然剪切位点的下游,并不位于剪切位点),并构建可表达该改造SINE序列的质粒,转入从相应待操作生物体中取出扩增的细胞或相应物种的细胞系(亦可取相应待测物种的基因组,将全基因组截为长度较长(200bp以上)且互相一定重叠(重叠超过10bp以上)的片段,并通过构建入载体通过RNA聚合酶II在相应物种的体外细胞过表达),一段时间后通过插入至SINE序列中的非同源序列的序列特异性提取相应核酸并进行测序,获取与整合有非同源序列的部分SINE序列相连的各种所产生套索的序列信息。)或/同时根据pre-mRNA形成套索的序列规律预测套索序列(如多以AG结束),获得该物种或个体的所有套索序列信息。取所有套索的3′序列(2000bp以内),分别连接任意与基因组非同源的序列(2000bp以内)后整合入上述表达SINE序列(据上所述其后可接SINE功能对应的LINE序列或其蛋白编码序列以增加效率)的载体(SINE也可在另一个载体上表达)表达为套索,并与SINE转录产物经细胞剪切所产生部分SINE序列相连;或将所有所得套索的3′序列分别连接任意与基因组非同源序列(2000bp以内)后接部分SINE序列(据上所述后可接SINE功能对应的LINE序列或其蛋白编码序列以增加效率)(SINE序列最好与与其连接的套索3′序列所在基因的SINE序列相同或相似)并构建入载体表达。通过上述DNA、RNP或RNA途径转入相应细胞,组织或生物体,并对全基因组范围内的CNV末端进行编辑。
(4)SINE序列改造法:即通过额外给予改造过的SINE序列表达,使与基因组或与该CNV末端中基因部分及其在完整基因中的上下游序列非同源的序列插入至各CNV末端,阻碍末端延伸。构建含有在SINE自然剪切位点前额外增加一段短序列(与常规产生的套索3′序列不一致,一小段横跨SINE自然剪切位点的序列即可(100bp以内)),使SINE的转录产物在该新增区域亦可被自然剪切的完整SINE序列的载体(可在其后添加与相应种类SINE功能对应的LINE序列或其蛋白编码序列以增加效率);或构建在SINE转录产物的自然剪切位点后添加任意与基因组非同源序列(200bp以内)的完整SINE序列(可在其后添加与相应种类SINE功能对应的LINE序列或其蛋白编码序列以增加效率),并给予相应细胞,活体组织或 生物体。所用SINE序列尽量涵盖该物种或个体的所有SINE序列(可通过测序或阵列芯片等方法获得)以对全基因组上的所有CNV末端进行精确修改。
也可将全基因组截为互相一定重叠的长片段(重叠长度在一个套索结构的长度以上),并构建入载体在相应物种的体外细胞系过表达并产生套索结构,此后将上述制作的表达改造SINE序列(下游添加与相应种类SINE功能对应的LINE序列或其蛋白编码序列后可经RNA途径介导)的载体转入,将连接有所产生套索的部分SINE序列(由改造的SINE所产生)的具有生物活性的单链RNA核糖核蛋白复合体(RNP)或RNA通过序列特异性等性质及常规手段进行分离提纯,此后通过相应RNA或RNP途径发挥作用。
3.对基因组上的SINE元件和LINE进行改造:将基因组上SINE的启动子、转录产物的自然剪切位点或SINE上的其他序列或/和LINE的启动子、蛋白编码序列或其他序列通过本发明插入任意序列(500bp以内),使基因组上的SINE序列无法转录或转录后无法剪切或/和LINE序列无法转录或产生具有正常功能的蛋白。首先对待进行操作个体全基因组的SINE或LINE序列通过测序获取序列,选取其上的启动子、转录产物的自然剪切位点、蛋白编码序列或其他序列作为待插入点,发明中的上下游序列为SINE或LINE序列上相对于待插入位点的上下游序列,插入序列为任意序列。通过上述插入方法将任意序列插入至基因组上SINE或LINE上的相应位点。此外,通过上述基因编辑方法对基因组上SINE或LINE序列进行替换或删除使其失活亦可。
4.对CNV末端进行删除同时固定:选定需要操作的CNV末端,将其基因部分的3′端与部分SINE序列的交界处设为插入点,将上述插入方法中的上游序列设为CNV末端基因部分的3′端(2000bp以内),下游序列即为部分SINE序列(因此上述基因编辑方法中下游序列后面连接的部分SINE序列可省略),待插入序列为基因组上待删除序列(100000bp以内)上游紧邻的序列(2000bp以内)后接与基因组序列不同源的任意序列(2000bp以内)。载体构建完成后通过上述的DNA、RNA或RNP途径转入相应细胞,活体组织或生物体,使相应CNV末端插入基因组上待删除序列上游紧邻的序列后接非同源序列,当两段相同序列发生同源重组导致中间序列被删去后,非同源序列将同时阻碍CNV的进一步延伸。
5.抑制固有机制法:亦可直接抑制细胞或生物体固有的CNV延伸机制如通过RNA干扰等方式抑制SINE和LINE等的转录或其RNA及所编码的蛋白质如ORF1p和ORF2p蛋白的产生,通过特异性蛋白与该CNV延伸机制中相关的蛋白如ORF1p、ORF2p或剪切体等或复合体的功能结构结合以阻碍其功能,通过上述的基因编辑技术等对基因组上SINE如Alu和各类MIR等、各类LINE及其中的相应蛋白编码序列等进行改造使其失活或降低活性、对同源重组或错配修复机制上的相关 蛋白功能进行抑制或给予经改造的核苷类物质以阻碍逆转录的进行,从而通过抑制内在的CNV延伸机制实现阻碍基因组变化及稳定CNVs的作用。
由于SINE、LINE及其所表达的蛋白广泛存在于真核生物当中,因此可通过该技术对广泛的真核生物进行基因编辑操作。此外,尚可应用于具有基因改变的疾病治疗及改变或稳定与基因变化相关的细胞或生物体状态等。
在本发明中,定义某确定序列(如待插入序列)沿5′→3′方向,上游为确定序列的5′端之前,下游为确定序列的3′端之后,上游序列为位于确定序列的5′端之前的序列,下游序列为确定序列的3′端之后的序列。
本发明提供的基因转录框架,如图5所示,沿5′→3′方向包括靶位点上游序列、待插入序列、靶位点下游序列。为了能够更好理解基因转录框架与短散在元件、长散在元件、启动子的位置关系,再列一些不同连接形式供理解。如图6所示为在基因转录框架前连接有启动子的结构示意图。启动子可以为RNA聚合酶I启动子、RNA聚合酶II启动子、RNA聚合酶III启动子。启动子可以位于载体上,通过载体酶切位点将基因转录框架、短散在元件、长散在元件等插入到启动子下游,转染进细胞后进表达。也可以通过直接合成的方法将启动子直接与基因转录框架、短散在元件、长散在元件等一起合成,再插入载体中。图7为在基因转录框架上游连接启动子,下游连接短散在元件、部分短散在元件或类短散在元件的结构。图8为基因转录框架上游连接启动子,下游连接长散在元件或ORF1p编码序列或ORF2p编码序列的结构示意图。图9为基因转录框架上游连接长散在元件或ORF1p编码序列或ORF2p编码序列,并在长散在元件或ORF1p编码序列或ORF2p编码序列的上游连接启动子的结构示意图。图10为在基因转录框架在上游连接启动子,在下游连接短散在元件、部分短散在元件或类短散在元件后,再在下游连接长散在元件或ORF1p编码序列或ORF2p编码序列的结构示意图。图11为在基因转录框架在下游连接短散在元件、部分短散在元件或类短散在元件,在基因转录框架上游连接长散在元件或ORF1p编码序列或ORF2p编码序列,并在长散在元件或ORF1p编码序列或ORF2p编码序列的上游连接启动子的结构示意图。图12为基因转录框架与短散在元件、部分短散在元件或类短散在元件不共用一个启动子的结构示意图。图13为基因转录框架与长散在元件或ORF1p编码序列或ORF2p编码序列不共用一个启动子的结构示意图。图14为基因转录框架与短散在元件、部分短散在元件或类短散在元件及长散在元件或ORF1p编码序列或ORF2p编码序列不共用一个启动子的结构,且短散在元件、部分短散在元件或类短散在元件位于长散在元件或ORF1p编码序列或ORF2p编码序列下游,二者共用一个启动子。图15为基因转录框架与短散在元件、部分短散在元件或类短散在元件及长散在元件和/或ORF1p编码序列和/或ORF2p编码序列不共用一个启动子的结构,且短散在元件、部分短散在元件或类短散在元件位于长散在元件或ORF1p编码序列或ORF2p编码序列上游,二者 共用一个启动子。上述情况均为基因转录框架与短散在元件、部分短散在元件或类短散在元件及长散在元件、ORF1p编码序列和/或ORF2p编码序列在同一载体上的形式,也可以不在同一载体上,共转染到细胞中,通过不同启动子进行表达。
在下述实施例中,由于所用材料为人源细胞,因此使用的SINE为灵长类特有的短散在元件Alu元件。Alu元件的完整序列如Seq ID No.1所示,部分Alu序列如Seq ID No.2所示。当使用材料为其他物种时,可以将短散在元件更换为对应物种的短散在元件,以利于表达。
材料
1.pSIL-eGFP质粒载体购自Addgene,Plasmid 52675,pBS-L1PA1-CH-mneo质粒载体购自Addgene,Plasmid 51288。
2. 10×酶切缓冲液(NheI酶切所需):330mM Tris-acetate,100mM乙酸镁,660mM乙酸钾,1mg/mL BSA;10×酶切缓冲液(SalI酶切所需):500mM Tris-HCl,100mM MgCl2,1000mM NaCl,1mg/mL BSA。
3.限制性内切酶NheI、SalI购自ThermoFisher。
4.T4 DNA连接酶及其应用所需10×连接缓冲液购自Promega。
5.Entranster-H4000转染试剂购自北京英格恩生物科技有限公司。
6.血液/细胞/组织基因组DNA提取试剂盒购自天根生化科技(北京)有限公司,产品目录号:DP304。
7.SuperReal PreMix Plus(SYBR Green)购自天根生化科技(北京)有限公司,产品目录号:FP205。
8.磁珠法组织/细胞/血液总RNA提取试剂盒购自天根生化科技(北京)有限公司,产品目录号:DP761。
9.FastKing cDNA第一链合成试剂盒购自天根生化科技(北京)有限公司,产品目录号:KR116。
10.TIANSeq mRNA捕获试剂盒自天根生化科技(北京)有限公司,产品目录号:NR105。
11.Lipofectamine TM MessengerMAX TMmRNA转染试剂转染试剂购自ThermoFisher。
12.胰蛋白酶购自Sigma-Aldrich,产品目录号:T1426。
13.Dnase I购自天根生化科技(北京)有限公司,产品目录号:RT411。
14.引物、序列的化学合成由铂尚生物技术(上海)有限公司完成。
实施例1 DNA介导的外源性待插入序列插入基因组指定位点
VEGFA(血管内皮生长因子A)为PDGF/VEGF生长因子家族的成员。它编码一种肝素结合蛋白,以二硫键连接的同型二聚体的形式存在。这种生长因子在血管生成、血管发生和内皮细胞生长中有作用,可以诱导内皮细胞增殖、促进细胞迁移、 抑制凋亡并诱导血管通透性,对于生理和病理性血管生成都是必需的。
本实施例中以将外源序列插入到VEGFA基因中以证实本发明中DNA介导的基因组序列插入技术。
选取人类基因组中基因VEGFA中的一段459bp的序列,序列如Seq ID No.3所示:
ATTATGCGGATCAAACCTCACCAAGGCCAGCACATAGGAGAGATGAGCTTCCTACAGCACAACAAATGTGAATGCAGGTGAGGATGTAGTCACGGATTCATTATCAGCAAGTGGCTGCAGGGTGCCTGATCTGTGCCAGGGTTAAGCATGCTGTACTTTTTGGCCCCCGTCCAGCTTCCCGCTATGTGACCTTTGGCATTTTACTTCAATGTGCCTCAGTTTCTACATCTGTAAAATGGGCA C*AATAGTAGTATACTTCATAGCATTGTTATAATGATTAAACAAGTTATATATGAAAAGATTAAAACAGTGTTGCTCCATAATAAATGCTGTTTTTACTGTGATTATTATTGTTGTTATCCCTATCATTATCATCACCATCTTAACCCTTCCCTGTTTTGCTCTTTTCTCTCTCCCTACCCATTGCAGACCAAAGAAAGATAGAGCAAGACAAGAAAA,其中*为选定的插入位点(靶位点),插入位点前为VEGFA基因中插入位点上游序列(靶位点上游序列),插入位点后为VEGFA基因中插入位点下游序列(靶位点下游序列)。在插入位点处加入随机设计的非同源序列作为待插入序列,成为基因转录框架,为了使该基因转录框架能够插入到表达载体中,在两端添加限制性内切酶NheI酶切位点及保护碱基。完整序列如Seq ID No.4所示:
Figure PCTCN2021134710-appb-000004
Figure PCTCN2021134710-appb-000005
其中,下划线表示的为随机设计的外源非同源序列(即待插入序列),长度为38bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),采用化学合成方法获得该序列,命名为VEGFA1。将外源待插入序列设计为与VEGFA基因序列非同源的序列的目的为方便在后面的实验中特异性检测其的基因组上靶位点处的插入。
同时设计在载体上插入位点前后添加VEGFA基因中插入位点上游序列和下游序列的短序列来验证不同长度的靶位点上游序列和下游序列对于插入效果的影响。
选择上述VEGFA序列插入位点前后各10bp,与非同源序列及NheI酶切位点及保护碱基设计成短序列,如Seq ID No.5所示:
Figure PCTCN2021134710-appb-000006
Figure PCTCN2021134710-appb-000007
其中,下划线表示的为随机设计的外源非同源序列(即待插入序列),长度为38bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),采用化学合成方法获得该序列,命名为VEGFA2。
酶切位点的选择仅用于方便构建质粒,可根据不同载体进行更换。
将VEGFA1和VEGFA2分别插入到质粒载体pSIL-eGFP中,构建质粒pSIL-eGFP-VEGFA1和pSIL-eGFP-VEGFA2,具体过程为:
将VEGFA1、VEGFA2、质粒载体pSIL-eGFP分别进行酶切,反应体系如表1所示:
表1酶切反应体系
Figure PCTCN2021134710-appb-000008
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的VEGFA1或VEGFA2分别与酶切成线性的质粒载体pSIL-eGFP进行连接,反应体系如表2所示:
表2连接反应体系
Figure PCTCN2021134710-appb-000009
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pSIL-eGFP-VEGFA1和质粒pSIL-eGFP-VEGFA2。质粒经测序验证正确。
由于pSIL-eGFP质粒自身带有CMV启动子,因此只要将基因转录框架插入到CMV启动子后,即可通过RNA聚合酶II启动转录。
设计Alu表达序列,将Alu序列、非同源序列(18bp)、TTTTT、TTTTAA*n连接到一起,这里n=6,并在序列两端添加SalI酶切位点及相应保护碱基,得到序列如Seq ID No.6所示:
Figure PCTCN2021134710-appb-000010
Figure PCTCN2021134710-appb-000011
其中,两端斜体加粗为SalI酶切位点及相应保护碱基,上游SalI酶切位点及相应保护碱基后为Alu序列,下划线为加至Alu后的非同源序列(18bp),用于标记Alu以便于在后面的实验中检测其转录产物与基因转录框架(包含待插入序列)转录产生的套索的连接,以验证实验的作用机制,波浪线为转录的终止子,双下划线为6个TTTTAA的重复序列,用于将RNA转化为DNA,属于基因组中常见重复序列,可以不添加或添加多个,本实施例中选择添加6个。采用化学合成方法获得该序列,命名为Alu1。
将Alu1、质粒pSIL-eGFP-VEGFA1和质粒pSIL-eGFP-VEGFA2分别用SalI进行酶切,酶切反应体系如表3所示。
表3酶切反应体系
Figure PCTCN2021134710-appb-000012
反应条件为:37℃孵育3h,然后加热到80℃温育10min灭活内切酶,电泳、回收酶切产物。
将酶切后的Alu1分别与酶切成线性的质粒pSIL-eGFP-VEGFA1和质粒pSIL-eGFP-VEGFA2进行连接,反应体系如表4所示。
表4连接反应体系
Figure PCTCN2021134710-appb-000013
反应条件为:16℃下孵育16h,此后升温至70℃温育10min灭活连接酶,电泳,回收,获得质粒pSIL-eGFP-VEGFA1-Alu1(如图16所示)及pSIL-eGFP-VEGFA2-Alu1。质粒经测序验证正确。
由于pSIL-eGFP质粒自身带有U6启动子(属于RNA聚合酶III依赖的启动子),因此只要将Alu序列插入到U6启动子后,即可通过RNA聚合酶III启动转录。
将pSIL-eGFP-VEGFA1-Alu1或pSIL-eGFP-VEGFA2-Alu1转染到Hela细胞中检验随机设计的外源序列的插入效率,为了提高插入效率,在Hela细胞中共转染表达ORF1p和ORF2p(LINE)的质粒pBS-L1PA1-CH-mneo,并设计相应对照组。实验组和对照组中共转染质粒如表5所示。
表5实验分组
组别 共转染质粒
对照1组 pSIL-eGFP+pBS-L1PA1-CH-mneo
实验1组 pSIL-eGFP-VEGFA1-Alu1+pBS-L1PA1-CH-mneo
实验2组 pSIL-eGFP-VEGFA1+pBS-L1PA1-CH-mneo
实验3组 pSIL-eGFP-VEGFA2-Alu1+pBS-L1PA1-CH-mneo
实验4组 pSIL-eGFP-VEGFA1-Alu1
从分组可以看出,对照1组为共转染原始pSIL-eGFP和pBS-L1PA1-CH-mneo,其中不包含基因转录框架序列和Alu1序列,实验1组为共转染pSIL-eGFP-VEGFA1-Alu1和pBS-L1PA1-CH-mneo,其中包含含有VEGFA基因上靶位点上下游长序列的基因转录框架和Alu1序列及pBS-L1PA1-CH-mneo;实验2组为共转染pSIL-eGFP-VEGFA1和pBS-L1PA1-CH-mneo,其中包含含有VEGFA基因上靶位点上下游长序列的基因转录框架及pBS-L1PA1-CH-mneo,不包含Alu1序列;实验3组为共转染pSIL-eGFP-VEGFA2-Alu1和pBS-L1PA1-CH-mneo,其中包含含有VEGFA基因上靶位点上下游短序列的基因转录框架和Alu1序列及pBS-L1PA1-CH-mneo;实验4组为转染pSIL-eGFP-VEGFA1-Alu1而不转染pBS-L1PA1-CH-mneo,其中包含含有VEGFA基因上靶位点上下游长序列的基因转录框架和Alu1序列,不包含pBS-L1PA1-CH-mneo。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取48μg或96μg(依照实验分组,若仅转染一种质粒则为48μg;若共转染两种质粒则每种质粒取48μg,总共96μg)构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进 行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示:5′–CACTGCCACCCAGAAGACTG-3′;下游引物序列如Seq ID No.8所示:5′-CCTGCTTCACCACCTTCTTG-3′。
设计引物对1和引物对2,其中,引物对1的上游引物序列如Seq ID No.9所示:5′-CCCAGGGTTGTCCCATCT-3′;下游引物序列如Seq ID No.10所示:5′-CCTCCTCTTATTCCGTAGC-3′。引物对1的上游引物序列位于完整VEGFA基因中,质粒上所用插入位点(靶位点)上游序列的更上游,不存在于质粒中,仅存于基因组中,引物对1的下游引物序列位于待插入的随机设计的非同源序列(待插入序列)的5′端的19bp序列。引物对2的上游序列如Seq ID No.11所示:5′-CACAACAGTCGTGGGTCG-3′;下游引物序列如Seq ID No.12所示:5′-GAGGGAGAAGTGCTAAAGTCAG-3′。引物对2的上游引物序列位于待插入的随机设计的非同源序列(待插入序列)的3′端的18bp序列,下游引物序列位于完整VEGFA基因中,质粒上所用插入位点(靶位点)下游序列的更下游,不存在于质粒中,仅存于基因组中。
上述引物均通过化学合成获得。
qPCR反应体系如表6所示。
表6 qPCR反应体系
Figure PCTCN2021134710-appb-000014
细胞DNA模板分别为前述共转染后的对照1组、实验1组至实验4组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对1:95℃预变性15min;(95℃变性10s,50℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
引物对2:95℃预变性15min;(95℃变性10s,54℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表7和表8所示,其中表7为引物对1的结果,表8为引物对2的结果。PCR产物经测序验证正确。
表7引物对1的结果(n=3,
Figure PCTCN2021134710-appb-000015
)
Figure PCTCN2021134710-appb-000016
实验1组与其他各组两两比较(N/A按照40.00计算),其拷贝数相对量明显高于其他组,均具有统计学意义(P<0.05),说明在基因转录框架中插入位点(靶位点)上下游存在更长的序列以及在Alu元件(SINE)、ORF1p和ORF2p(LINE)的充足表达下,基因编辑的效率最高;实验2组、3组及4组的拷贝数相对量均高于对照1组(N/A按照40.00计算),均具有统计学意义(P<0.05),说明在基因转录 框架中插入位点(靶位点)上下游更短的序列、细胞本身Alu元件(SINE)低或不表达或ORF1p及ORF2p(LINE)低或不表达下,基因编辑仍有效但效率较低。为了保证插入的高效性需要插入点两侧足够长度的序列以利于套索的形成。而基因转录框架中更长的靶位点上下游序列、ORF1p、ORF2p蛋白(LINE)和/或Alu元件(SINE)的额外表达可以提高编辑效率。
表8引物对2的结果(n=3,
Figure PCTCN2021134710-appb-000017
)
Figure PCTCN2021134710-appb-000018
实验1组与其他各组两两比较(N/A按照40.00计算),其拷贝数相对量明显高于其他组,均具有统计学意义(P<0.05),说明在基因转录框架中插入位点(靶位点)上下游存在更长的序列以及在Alu元件(SINE)、ORF1p和ORF2p(LINE)的充足表达下,基因编辑的效率最高;实验2组、3组及4组的拷贝数相对量均高于对照1组(N/A按照40.00计算),均具有统计学意义(P<0.05),说明在基因转录框架中插入位点(靶位点)上下游更短的序列、细胞本身Alu元件(SINE)低或不表达或ORF1p及ORF2p(LINE)低或不表达下,基因编辑仍有效但效率较低。为了保证插入的高效性需要插入点两侧足够长度的序列以利于套索的形成。而基因转录 框架中更长的靶位点上下游序列、ORF1p、ORF2p蛋白(LINE)和/或Alu元件(SINE)的额外表达可以提高编辑效率。
综合表7和表8的实验结果可知待插入序列的两端均被插入待插入位点(靶位点),意味着待插入序列被完整的插入待插入位点。实验1、2、3及4组均可一定程度上将非同源序列完整的插入基因组,以实验1组效率最高。
实施例2检测基因转录框架(含有外源性待插入序列)转录形成的套索结构与含有部分SINE序列(以Alu元件为例)的RNA片段(为Alu元件的转录产物经于自然剪切位点剪切所得)的连接
1.对实施例1中实验1组及对照1组提取细胞总RNA:
具体过程为:取其中经转染后的细胞吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。将含有细胞的溶液转移至RNase-Free的离心管中,300g离心5min后,收集沉淀并吸去所有上清液。按照磁珠法组织/细胞/血液总RNA提取试剂盒的说明书进行总RNA提取。
2.逆转录合成cDNA模板:
按照FastKing cDNA第一链合成试剂盒的说明书去除提取的总RNA中的基因组DNA,然后进行cDNA的合成,紫外分光光度计测定所合成cDNA的浓度,待后续检测。
3.qPCR检测:
以如Seq ID No.7和Seq ID No.8所示检测GAPDH基因的上下游引物序列作为内参参与检测。
设计用于检测含有外源性待插入序列的经转录形成的套索结构与Alu元件转录产物形成的含有部分Alu序列的RNA片段连接的引物对3,其中上游引物序列如Seq ID No.11所示:5′-CACAACAGTCGTGGGTCG-3′,上游引物位于外源性待插入序列上;下游引物序列如Seq ID No.13所示:5′-TACGGGCTCGCCTGATAG-3′,下游引物位于构建入质粒中的Alu序列后的非同源序列(18bp)位置。
上述引物均通过化学合成获得。
qPCR反应体系如表9所示。
表9 qPCR反应体系
Figure PCTCN2021134710-appb-000019
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确 保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
95℃预变性15min;(95℃变性10s,54℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和转录产生的含有待插入序列的套索结构与含有部分Alu序列的RNA片段的连接产物的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表10所示。PCR产物经测序验证正确。
表10引物对3的结果(n=3,
Figure PCTCN2021134710-appb-000020
)
Figure PCTCN2021134710-appb-000021
由实验数据可知,实验1组的相对表达量明显高于对照1组,P<0.05具有统计学意义(N/A按照40.00计算),说明含有外源性待插入序列的套索结构确与Alu序列的转录产物(即含有部分Alu序列的RNA片段)连接。
从表10可以看出Alu序列的转录产物(即含有部分Alu序列的RNA片段)确与表达的待插入序列有所连接。
实施例3DNA介导的外源性待插入序列插入基因组指定位点
MMP2基因为基质金属蛋白酶(MMP)基因家族的成员,是锌依赖性酶,能够切割细胞外基质的成分和参与信号转导的分子。该基因编码的蛋白是一种胶原酶A,IV型胶原酶,在其催化位点包含三个纤维连接蛋白II型重复序列,允许变性的IV型和V型胶原和弹性蛋白结合。与大多数MMP家族成员不同,这种蛋白的活化可以发生在细胞膜上。这种酶可以通过蛋白酶在细胞外激活,也可以通过S-谷胱甘肽在细胞内激活,而不需要蛋白质水解去除原结构域。这种蛋白被认为参与多种途径,包括在神经系统、子宫内膜月经破裂、血管化调节和转移中的作用。该基因突变与温彻斯特综合征和结节性关节病骨溶解(NAO)综合征有关。选择性剪接导致编码不同亚型的多个转录变体。
本实施例中以将外源序列插入到MMP2基因中以证实本发明中DNA介导的基 因组序列插入技术。
选取人类基因组中基因MMP2中的一段479bp的序列,序列如Seq ID No.14所示:
AGCATGGCGATGGATACCCCTTTGACGGTAAGGACGGACTCCTGGCTCATGCCTTCGCCCCAGGCACTGGTGTTGGGGGAGACTCCCATTTTGATGACGATGAGCTATGGACCTTGGGAGAAGGCCAAGGTGAGAAAGGGGCCCTCTGCATGCCCCAGACCTTCTCTCCTGTCCTCTCTCCACTCCATTTGCTTGGACCAGAGA G*GTGGGAGGGGAGGAAAGTCACACATCTGGGTGAGTCAGAATCTTGGTCTCCAAAGAAGGCCTGGAGAAGTCCAACCTCCCCCTTCCATGTCACTCTTTAGTGGTCCGTGTGAAGTATGGGAACGCCGATGGGGAGTACTGCAAGTTCCCCTTCTTGTTCAATGGCAAGGAGTACAACAGCTGCACTGATACCGGCCGCAGCGATGGCTTCCTCTGGTGCTCCACCACCTACAACTTTGAGAAGGATGGCAAGTACGGCTTCTGTCCCCATGAAG,其中*为选定的插入位点(靶位点),插入位点前为MMP2基因中插入位点上游序列(靶位点上游序列),插入位点后为MMP2基因中插入位点下游序列(靶位点下游序列)。在插入位点处加入随机设计的非同源序列作为待插入序列,成为基因转录框架,为了使该基因转录框架能够插入到表达载体中,在两端添加限制性内切酶NheI酶切位点及保护碱基。完整序列如Seq ID No.15所示:
Figure PCTCN2021134710-appb-000022
Figure PCTCN2021134710-appb-000023
其中,下划线表示的为随机设计的外源非同源序列(即待插入序列),长度为103bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),采用化学合成方法获得该序列,命名为MMP2-1。
同时设计在载体上插入位点前后添加MMP2基因中插入位点上游序列和下游序列的短序列来验证不同长度的靶位点上游序列和下游序列对于插入效果的影响。
选择上述MMP2序列插入位点前后各10bp,与非同源序列及NheI酶切位点及保护碱基设计成短序列,如Seq ID No.16所示:
Figure PCTCN2021134710-appb-000024
Figure PCTCN2021134710-appb-000025
其中,下划线表示的为随机设计的外源非同源序列(即待插入序列),长度为103bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),采用化学合成方法获得该序列,命名为MMP2-2。
将MMP2-1和MMP2-2分别插入到质粒载体pSIL-eGFP中,构建质粒pSIL-eGFP-MMP2-1和pSIL-eGFP-MMP2-2,具体过程为:
将MMP2-1、MMP2-2、质粒载体pSIL-eGFP分别进行酶切,反应体系如表11所示:
表11酶切反应体系
Figure PCTCN2021134710-appb-000026
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的MMP2-1或MMP2-2分别与酶切成线性的质粒载体pSIL-eGFP进行连接,反应体系如表12所示:
表12连接反应体系
Figure PCTCN2021134710-appb-000027
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pSIL-eGFP-MMP2-1和质粒pSIL-eGFP-MMP2-2。质粒经测序验证正确。
设计Alu表达序列,将Alu序列、非同源序列(18bp)、TTTTT、TTTTAA*n连接到一起,这里n=6,并在序列两端添加SalI酶切位点及相应保护碱基,得到序列如Seq ID No.17所示:
Figure PCTCN2021134710-appb-000028
Figure PCTCN2021134710-appb-000029
Figure PCTCN2021134710-appb-000030
其中,两端斜体加粗为SalI酶切位点及相应保护碱基,上游SalI酶切位点及相应保护碱基后为Alu序列,下划线为加至Alu后的非同源序列(18bp),波浪线为转录的终止子,双下划线为6个TTTTAA的重复序列。采用化学合成方法获得该序列,命名为Alu2。
将Alu2、质粒pSIL-eGFP-MMP2-1和质粒pSIL-eGFP-MMP2-2分别用SalI进行酶切,酶切反应体系如表13所示。
表13酶切反应体系
Figure PCTCN2021134710-appb-000031
反应条件为:37℃孵育3h,然后加热到80℃温育10min灭活内切酶,电泳、回收、收集酶切产物。
将酶切后的Alu2、分别与酶切成线性的质粒pSIL-eGFP-MMP2-1和质粒pSIL-eGFP-MMP2-2进行连接,反应体系如表14所示。
表14连接反应体系
Figure PCTCN2021134710-appb-000032
反应条件为:16℃下孵育16h,此后升温至70℃温育10min灭活连接酶,电泳,回收,获得质粒pSIL-eGFP-MMP2-1-Alu2及pSIL-eGFP-MMP2-2-Alu2。质粒经测序验证正确。
将pSIL-eGFP-MMP2-1-Alu2或pSIL-eGFP-MMP2-2-Alu2转染到Hela细胞中检验随机设计的外源序列的插入效率,为了提高插入效率,在Hela细胞中共转染表达ORF1p和ORF2p(LINE)的质粒pBS-L1PA1-CH-mneo,并设计相应对照组。实验组和对照组中共转染质粒如表15所示。
表15实验分组
组别 共转染质粒
对照1组 pSIL-eGFP+pBS-L1PA1-CH-mneo
实验5组 pSIL-eGFP-MMP2-1-Alu2+pBS-L1PA1-CH-mneo
实验6组 pSIL-eGFP-MMP2-1+pBS-L1PA1-CH-mneo
实验7组 pSIL-eGFP-MMP2-2-Alu2+pBS-L1PA1-CH-mneo
实验8组 pSIL-eGFP-MMP2-1-Alu2
从分组可以看出,对照1组为共转染原始pSIL-eGFP和pBS-L1PA1-CH-mneo,其中不包含基因转录框架序列和Alu2序列,实验5
Figure PCTCN2021134710-appb-000033
组为共转染pSIL-eGFP-MMP2-1-Alu2和pBS-L1PA1-CH-mneo,其中包含含有MMP2基因上靶位点上下游长序列的基因转录框架和Alu2序列及pBS-L1PA1-CH-mneo;实验6组为共转染pSIL-eGFP-MMP2-1和pBS-L1PA1-CH-mneo,其中包含含有MMP2基因上靶位点上下游长序列的基因转录框架及pBS-L1PA1-CH-mneo,不包含Alu2序列;实验7组为共转染pSIL-eGFP-MMP2-2-Alu2和pBS-L1PA1-CH-mneo,其中包含含有MMP2基因上靶位点上下游短序列的基因转录框架和Alu2序列及pBS-L1PA1-CH-mneo;实验8组为转染pSIL-eGFP-MMP2-1-Alu2而不转染pBS-L1PA1-CH-mneo,其中包含含有MMP2基因上靶位点上下游长序列的基因转录框架和Alu2序列,不包含pBS-L1PA1-CH-mneo。每组设3个平行,每个平行均为一个培养有U251(人胶质瘤)细胞的6孔板。
转染步骤为:将U251(人胶质瘤)细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取48μg或96μg(依照实验分组,若仅转染一种质粒则为48μg;若共转染两种质粒则每种质粒取48μg,总共96μg)构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
设计引物对4,其上游引物序列如Seq ID No.18所示:5′-TTTCAGGGTCTAGGTGGC-3′;下游引物序列如Seq ID No.19所示:5′-AAATGCTTTCTCCGCTCT-3′。引物对4的上游引物序列位于完整MMP2基因中,质粒上所用插入位点(靶位点)上游序列的更上游,不存在于质粒中,仅存于基因组中;引物对4的下游引物序列位于待插入的随机设计的非同源序列(待插入序列)上。
上述引物均通过化学合成获得。
qPCR反应体系如表16所示。
表16 qPCR反应体系
Figure PCTCN2021134710-appb-000034
细胞DNA模板分别为前述共转染后的对照1组、实验5组至实验8组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对1:95℃预变性15min;(95℃变性10s,50℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表17所示。PCR产物经测序验证正确。
表17引物对4的结果(n=3,
Figure PCTCN2021134710-appb-000035
)
Figure PCTCN2021134710-appb-000036
Figure PCTCN2021134710-appb-000037
实验5组与其他各组两两比较(N/A按照40.00计算),其拷贝数相对量明显高于其他组,均具有统计学意义(P<0.05),说明在基因转录框架中插入位点(靶位点)上下游存在更长的序列以及在Alu元件(SINE)、ORF1p和ORF2p(LINE)的充足表达下,基因编辑的效率最高;实验6组、7组及8组的拷贝数相对量均高于对照1组(N/A按照40.00计算),均具有统计学意义(P<0.05),说明在基因转录框架中插入位点(靶位点)上下游更短的序列、细胞本身Alu元件(SINE)低或不表达或ORF1p及ORF2p(LINE)低或不表达下,基因编辑仍有效但效率较低。综合实验结果说明经本发明技术待插入序列被有效插入至基因组上的靶位点,而基因转录框架中更长的靶位点上下游序列、ORF1p、ORF2p蛋白(LINE)和/或Alu元件(SINE)的额外表达可以提高编辑效率。
实施例4检测基因转录框架(含有外源性待插入序列)转录形成的套索结构与含有部分SINE序列(以Alu元件为例)的RNA片段(为Alu元件的转录产物经于自然剪切位点剪切所得)的连接
1.对实施例3中实验5组及对照1组提取细胞总RNA:
具体过程为:取其中经转染后的细胞吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。将含有细胞的溶液转移至RNase-Free的离心管中,300g离心5min后,收集沉淀并吸去所有上清液。按照磁珠法组织/细胞/血液总RNA提取试剂盒的说明书进行总RNA提取。
2.逆转录合成cDNA模板:
按照FastKing cDNA第一链合成试剂盒的说明书去除提取的总RNA中的基因 组DNA,然后进行cDNA的合成,紫外分光光度计测定所合成cDNA的浓度,待后续检测。
3.qPCR检测:
以如Seq ID No.7和Seq ID No.8所示检测GAPDH基因的上下游引物序列作为内参参与检测。
设计用于检测含有外源性待插入序列的经转录形成的套索结构与Alu元件转录产物形成的含有部分Alu序列的RNA片段的连接的引物对5,其中上游引物序列如Seq ID No.20所示:5′-GGCATAATGATGTGGCTGTT-3′;下游引物序列如Seq ID No.21所示:5′-TCTGTTGGCTCGCTCTTG-3′,其中,上游引物序位于外源性待插入序列上,下游引物位于构建入质粒中的Alu序列后的非同源序列(18bp)位置。
上述引物均通过化学合成获得。
qPCR反应体系如表18所示。
表18 qPCR反应体系
Figure PCTCN2021134710-appb-000038
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
95℃预变性15min;(95℃变性10s,52℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和转录产生的含有待插入序列的套索结构连接含有部分Alu序列的RNA片段的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表19所示。PCR产物经测序验证正确。
表19引物对5的结果(n=3,
Figure PCTCN2021134710-appb-000039
)
Figure PCTCN2021134710-appb-000040
Figure PCTCN2021134710-appb-000041
从表19可以看出实验5组的相对表达量明显高于对照1组,P<0.05具有统计学意义(N/A按照40.00计算)。因此,Alu序列的转录产物确与表达的待插入序列所形成的套索结构有所连接。
实施例5检验本发明中基因编辑技术的靶向准确性
将Seq ID No.15所示序列MMP2-1在随机设计的非同源序列(即待插入序列)上游的第5bp到第10bp的6bp替换为CGATGA,得到如Seq ID No.22所示序列:
Figure PCTCN2021134710-appb-000042
Figure PCTCN2021134710-appb-000043
其中波浪线部分原为GACCAG,替换为CGATGA,其余序列与Seq ID No.15相同,采用化学合成方法获得该序列,命名为MMP2-3。
将MMP2-3采用实施例3中的方法制备得到质粒pSIL-eGFP-MMP2-3-Alu2。
将质粒pSIL-eGFP-MMP2-3-Alu2转染到U251(人胶质瘤)细胞中,并在U251(人胶质瘤)细胞中共转染表达ORF1p和ORF2p(LINE)的质粒pBS-L1PA1-CH-mneo作为实验组,前述共转染pSIL-eGFP-MMP2-1-Alu2和pBS-L1PA1-CH-mneo的细胞作为对照组,具体分组如表20所示。
表20实验分组
组别 共转染质粒
对照2组 pSIL-eGFP-MMP2-1-Alu2+pBS-L1PA1-CH-mneo
实验9组 pSIL-eGFP-MMP2-3-Alu2+pBS-L1PA1-CH-mneo
每组设3个平行,每个平行均为一个培养有U251(人胶质瘤)细胞的6孔板。转染及转染后提取细胞DNA的方法同实施例3。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
设计插入序列的引物对4,其上游引物序列如Seq ID No.18所示,下游引物序列如Seq ID No.19所示。
采用实施例3的qPCR反应体系和反应循环进行qPCR。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表21所示。PCR产物经测序验证正确。
表21引物对4的结果(n=3,
Figure PCTCN2021134710-appb-000044
)
Figure PCTCN2021134710-appb-000045
从表21可以看出,实验组的拷贝数相对量明显低于对照组(N/A按照40.00计算),具有统计学意义(P<0.05),意味着当载体上插入位点(靶位点)的上游序列与基因组上插入位点(靶位点)的上游序列不一致时,待插入序列难以插入基因组。说明经本发明实施的基因编辑具有较高的靶向准确性。
实施例6SINE序列(以Alu序列为例)直接连接方式插入外源性待插入序列
IT15基因为亨廷顿舞蹈病的致病基因,本实施例中以将外源性序列插入到IT15基因中以证实DNA介导的以SINE序列(以Alu序列为例)直接连接方式进行的基因组序列插入技术。
选取人类基因组中基因IT15中的一段160bp的序列,序列如Seq ID No.23所示:
ATGCTATTCATAATCACATTCGTTTGTTTGAACCTCTTGTTATAAAAGCTTTAAAACAGTACACGACTACAACATGTGTGCAGTTACA G*AAGCAGGTTTTAGATTTGCTGGCGCAGCTGGTTCAGTTACGGGTTAATTACTGTCTTCTGGATTCAGATCA,其中*为选定的插入位点(靶位点),插入位点前为IT15基因中插入位点上游 序列(靶位点上游序列),插入位点后为IT15基因中插入位点下游序列(靶位点下游序列)。
在插入位点处加入随机设计的非同源序列作为待插入序列,在靶位点下游序列的下游连接部分Alu序列,成为基因转录框架,为了使该基因转录框架能够插入到表达载体中,在两端添加限制性内切酶NheI酶切位点及保护碱基,得到的完整序列如Seq ID No.24所示:
Figure PCTCN2021134710-appb-000046
Figure PCTCN2021134710-appb-000047
其中,下划线表示的为随机设计的外源非同源序列(即待插入序列),长度为60bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),在IT15基因插入位点下游序列(靶位点下游序列)与3′端NheI酶切位点及保护碱基序列之间为部分Alu序列(阴影部分)。采用化学合成方法获得该序列,命名为IT15-1。
这里选择部分Alu序列连接于靶位点下游序列的下游是模拟生物体内SINE(Alu元件)转录产物经细胞内作用(于SINE转录产物中的自然剪切位点处剪切)后仅保留逆转录功能结构并与pre-mRNA经剪切产生的套索结构连接的状态。
将IT15-1插入到质粒载体pBS-L1PA1-CH-mneo中,构建质粒pBS-L1PA1-CH-mneo-IT15-1,如图17所示,具体过程为:
将IT15-1和pBS-L1PA1-CH-mneo分别进行酶切,反应体系如表22所示:
表22酶切反应体系
Figure PCTCN2021134710-appb-000048
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的IT15-1与酶切成线性的质粒载体pBS-L1PA1-CH-mneo进行连接,反应体系如表23所示:
表23连接反应体系
Figure PCTCN2021134710-appb-000049
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pBS-L1PA1-CH-mneo-IT15-1。质粒经测序验证正确。
由于pBS-L1PA1-CH-mneo质粒自身带有CMV启动子,因此只要将表达框架插入到CMV启动子后,即可通过RNA聚合酶II启动转录。
实验分组:将转染pBS-L1PA1-CH-mneo-IT15-1质粒的组设为实验10组;将转染未改造的pBS-L1PA1-CH-mneo质粒的组设为对照3组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取48μg构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
设计引物对6,其上游引物序列如Seq ID No.25所示:5′-GAAATTGGTTTGAGCAGGAG-3′;下游引物序列如Seq ID No.26所示:5′-CGATTGGATGGCAGTAGC-3′。引物对6的上游引物序列位于完整IT15基因中,质粒上所用插入位点(靶位点)上游序列的更上游,不存在于质粒中,仅存于基因组中,引物对6的下游引物序列位于待插入的随机设计的非同源序列(待插入序列)上。
上述引物均通过化学合成获得。
qPCR反应体系如表24所示。
表24 qPCR反应体系
Figure PCTCN2021134710-appb-000050
细胞DNA模板分别为转染后的对照3组、实验10组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对6:95℃预变性15min;(95℃变性10s,50℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表25所示。PCR产物经测序验证正确。
表25引物对6的结果(n=3,
Figure PCTCN2021134710-appb-000051
)
Figure PCTCN2021134710-appb-000052
实验10组的拷贝数相对量明显高于对照3组(N/A按照40.00计算),具有统计学意义(P<0.05),说明待插入序列被有效插入至基因组上的靶位点。
实施例7检验本发明中基因编辑技术的靶向准确性
将Seq ID No.24所示序列IT15-1在随机设计的非同源序列(即待插入序列)上游的第10bp到第15bp的6bp替换为GGACAT,得到如Seq ID No.27所示序列:
Figure PCTCN2021134710-appb-000053
Figure PCTCN2021134710-appb-000054
Figure PCTCN2021134710-appb-000055
其中,其中波浪线部分原为TGTGTG,替换为GGACAT,其余序列与Seq ID No.24相同,采用化学合成方法获得该序列,命名为IT15-2。
将IT15-2插入到质粒载体pBS-L1PA1-CH-mneo中,构建质粒pBS-L1PA1-CH-mneo-IT15-2,方法参考实施例6。
将转染pBS-L1PA1-CH-mneo-IT15-2质粒的Hela细胞组为实验11组;将转染pBS-L1PA1-CH-mneo-IT15-1质粒的Hela细胞组为对照4组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
采用实施例6方法进行转染并提取转染后细胞DNA,进行qPCR检测。
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
使用引物对6进行qPCR,反应体系和反应循环如实施例6。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表26所示。PCR产物经测序验证正确。
表26引物对6的结果(n=3,
Figure PCTCN2021134710-appb-000056
)
Figure PCTCN2021134710-appb-000057
Figure PCTCN2021134710-appb-000058
实验组的拷贝数相对量明显低于对照组(N/A按照40.00计算),具有统计学意义(P<0.05),意味着当载体上插入位点(靶位点)上游序列与基因组上插入位点(靶位点)上游序列不一致时,待插入序列难以插入基因组上的靶位点。
结论:说明经本发明实施的基因组序列插入具有较高的靶向准确性。
从实施例1至实施例6可以看出,DNA介导的外源性序列插入基因组指定位点的方法可对真核细胞(如细胞系或原代细胞)进行有效的基因编辑,将序列以较高的效率和准确性靶向插入至目标位点。由不同组织细胞编辑的可行性,可知该方法可应用于各种细胞、组织及生物体(活体)内等。
实施例8 DNA介导的基因组上指定区域序列(待删除序列)的删除
随机选择人类基因组中基因MINK1中的一段序列,如Seq ID No.28所示:
Figure PCTCN2021134710-appb-000059
Figure PCTCN2021134710-appb-000060
其中,下划线部分为待删除序列,待删除序列前为待删除序列5′端紧邻的上游序列,待删除序列后为待删除序列3′端紧邻的下游序列,阴影部分为待删除序列的3′序列。
按照待删除序列的3′序列+待删除序列5′端紧邻的上游序列+待删除序列3′端紧邻的下游序列的顺序构建序列并在两端添加NheI酶切位点及相应保护碱基,序列如Seq ID No.29所示:
Figure PCTCN2021134710-appb-000061
Figure PCTCN2021134710-appb-000062
其中两端斜体加粗部分为NheI酶切位点及相应保护碱基。采用化学合成方法获得该序列,命名为MINK1-1。
同时,用一段与MINK1基因非同源的序列来取代Seq ID No.29中待删除序列5′端紧邻的上游序列得到序列如Seq ID No.30所示:
Figure PCTCN2021134710-appb-000063
Figure PCTCN2021134710-appb-000064
其中,下划线部位为替换的与MINK1基因非同源的序列。采用化学合成方法获得该序列,命名为MINK1-2。
将MINK1-1和MINK1-2分别插入到质粒载体pSIL-eGFP中,构建质粒pSIL-eGFP-MINK1-1和pSIL-eGFP-MINK1-2,具体过程为:
将MINK1-1、MINK1-2、质粒载体pSIL-eGFP分别进行酶切,反应体系如表27所示:
表27酶切反应体系
Figure PCTCN2021134710-appb-000065
Figure PCTCN2021134710-appb-000066
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的MINK1-1或MINK1-2分别与酶切成线性的质粒载体pSIL-eGFP进行连接,反应体系如表28所示:
表28连接反应体系
Figure PCTCN2021134710-appb-000067
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pSIL-eGFP-MINK1-1和质粒pSIL-eGFP-MINK1-2。质粒经测序验证正确。
将实施例1制备的Alu1、质粒pSIL-eGFP-MINK1-1和质粒pSIL-eGFP-MINK1-2分别用SalI进行酶切,连接,反应体系和条件同实施例1,得到pSIL-eGFP-MINK1-1-Alu1及pSIL-eGFP-MINK1-2-Alu1。
将pSIL-eGFP-MINK1-1-Alu1或pSIL-eGFP-MINK1-2-Alu1转染到Hela细胞中检验将待删除序列删除的作用,为了提高删除效率,在Hela细胞中共转染表达ORF1p和ORF2p(LINE)的质粒pBS-L1PA1-CH-mneo,并设计相应对照组。其中,转染pSIL-eGFP-MINK1-1-Alu1+pBS-L1PA1-CH-mneo的组设为实验12组,转染pSIL-eGFP-MINK1-2-Alu1+pBS-L1PA1-CH-mneo的组设为对照5组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
按照实施例1的方法进行质粒转染并提取转染后的细胞DNA。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
以如Seq ID No.7和Seq ID No.8所示用以检测GAPDH基因拷贝数的上下游引物序列作为内参参与检测。
设计引物对7,其上游引物序列如Seq ID No.31所示:5′-ACAGGGTATGGAGTGGAAAG-3′;下游引物序列如Seq ID No.32所示:5′-ATAGACGGGAAAGAAGGAAC-3′。引物对7的上游引物位于基因组上MINK1基因中的待删除序列上,不存在于质粒中,下游引物位于基因组上MINK1基因中的待删除序列上,不存在于质粒中。
上述引物均通过化学合成获得。
qPCR反应体系如表29所示。
表29 qPCR反应体系
Figure PCTCN2021134710-appb-000068
细胞DNA模板分别为前述共转染后的对照5组及实验12组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对7:95℃预变性15min;(95℃变性10s,50℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待删除序列的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表30所示。PCR产物经测序验证正确。
表30引物对7的结果(n=3,
Figure PCTCN2021134710-appb-000069
)
Figure PCTCN2021134710-appb-000070
实验12组(N/A按照40.00计算)的拷贝数相对量明显低于对照5组,具有统计学意义(P<0.05),说明基因组上的待删除序列有所删减。
实施例9 SINE序列(以Alu序列为例)直接连接方式删除序列
FMR1基因与遗传性精神发育迟滞病——脆性X综合征相关,选择其中一段序列,如Seq ID No.33所示:
Figure PCTCN2021134710-appb-000071
Figure PCTCN2021134710-appb-000072
Figure PCTCN2021134710-appb-000073
其中,下划线部分为待删除序列,待删除序列前为待删除序列5′端紧邻的上游序列,待删除序列后为待删除序列3′端紧邻的下游序列,阴影部分为待删除序列的3′序列。
按照待删除序列的3′序列+待删除序列5′端紧邻的上游序列+待删除序列3′端紧邻的下游序列+部分Alu序列的顺序构建序列并在两端添加NheI酶切位点及相应保护碱基,序列如Seq ID No.34所示:
Figure PCTCN2021134710-appb-000074
Figure PCTCN2021134710-appb-000075
其中两端斜体加粗部分为NheI酶切位点及相应保护碱基,下划线为待删除序列5′端紧邻的上游序列,待删除序列5′端紧邻的上游序列前为待删除序列的3′序列,待删除序列5′端紧邻的上游序列后为待删除序列3′端紧邻的下游序列,阴影部分为部分Alu序列。采用化学合成方法获得该序列,命名为FMR1-1。
同时,用一段与FMR1基因非同源的序列取代Seq ID No.34中待删除序列5′端紧邻的上游序列得到序列如Seq ID No.35所示:
Figure PCTCN2021134710-appb-000076
Figure PCTCN2021134710-appb-000077
其中,下划线部位为替换的与FMR1基因非同源的序列。采用化学合成方法获得该序列,命名为FMR1-2。
将FMR1-1插入到质粒载体pBS-L1PA1-CH-mneo中,构建质粒pBS-L1PA1-CH-mneo-FMR1-1,具体过程为:
将FMR1-1和pBS-L1PA1-CH-mneo分别进行酶切,反应体系如表31所示:
表31酶切反应体系
Figure PCTCN2021134710-appb-000078
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的FMR1-1或FMR1-2分别与酶切成线性的质粒载体pBS-L1PA1-CH-mneo进行连接,反应体系如表32所示:
表32连接反应体系
Figure PCTCN2021134710-appb-000079
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pBS-L1PA1-CH-mneo-FMR1-1和质粒pBS-L1PA1-CH-mneo-FMR1-2。质粒经测序验证正确。
实验分组:将所得质粒转染至Hela细胞中。将转染
pBS-L1PA1-CH-mneo-FMR1-1质粒的组设为实验13组;将pBS-L1PA1-CH-mneo-FMR1-2质粒的组为对照6组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取48μg构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
设计引物对8,其上游引物序列如Seq ID No.36所示:5′-ACAGGGTTACAATTTGGT-3′;下游引物序列如Seq ID No.37所示:5′-CATTTGCTCTGGAATACAC-3′。引物对8中的上游引物序列位于基因组上FMR1基因中的待删除序列上,不存在于质粒中,引物对8中的下游引物序列位于基因组上FMR1基因中的待删除序列上,不存在于质粒中。
上述引物均通过化学合成获得。
qPCR反应体系如表33所示。
表33 qPCR反应体系
Figure PCTCN2021134710-appb-000080
细胞DNA模板分别为前述共转染后的对照6组及实验13组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确 保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对8:95℃预变性15min;(95℃变性10s,45℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待删除序列的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表34所示。PCR产物经测序验证正确。
表34引物对8的结果(n=3,
Figure PCTCN2021134710-appb-000081
)
Figure PCTCN2021134710-appb-000082
实验13组(N/A按照40.00计算)的拷贝数相对量明显低于对照6组,具有统计学意义(P<0.05),说明基因组上的待删除序列有所删减。
由上述实施例可知经本发明在基因组上任意区域的任意序列的删除是可行的。同时,当测序获得各基因CNVs末端时(即相应基因序列连接部分SINE序列处),亦可对其进行编辑,进行插入或删除以对CNVs进行编辑并由其带来的表达变化对细胞、组织或活体的状态进行改造。
实施例10经RNA介导的外源性待插入序列插入基因组
选取人类基因组中基因IT15中的一段序列,按照NheI酶切识别位点及保护碱基+插入位点(靶位点)上游序列+待插入序列+插入位点(靶位点)下游序列+部分Alu序列+NheI酶切识别位点及保护碱基的顺序构建序列如Seq ID No.38所示:
Figure PCTCN2021134710-appb-000083
Figure PCTCN2021134710-appb-000084
Figure PCTCN2021134710-appb-000085
其中,下划线表示的为随机设计的与IT15基因非同源的待插入序列,长度为60bp,在序列两端为NheI酶切位点及保护碱基(斜体加粗),在IT15基因插入位点(靶位点)下游序列与3′端NheI酶切位点及保护碱基序列之间为部分Alu序列(阴影部分)。采用化学合成方法获得该序列,命名为IT15-3。
将IT15-3插入到质粒载体pBS-L1PA1-CH-mneo中,构建质粒pBS-L1PA1-CH-mneo-IT15-3,具体过程为:
将IT15-3和pBS-L1PA1-CH-mneo分别进行酶切,反应体系如表35所示:
表35酶切反应体系
Figure PCTCN2021134710-appb-000086
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的IT15-3与酶切成线性的质粒载体pBS-L1PA1-CH-mneo进行连接,反应体系如表36所示:
表36连接反应体系
Figure PCTCN2021134710-appb-000087
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pBS-L1PA1-CH-mneo-IT15-3。质粒经测序验证正确。
此后将pBS-L1PA1-CH-mneo-IT15-3和pBS-L1PA1-CH-mneo质粒分别转染入Hela细胞中。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取48μg构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无 血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取细胞总RNA:
具体过程为:取其中经转染后的细胞吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。将含有细胞的溶液转移至RNase-Free的离心管中,300g离心5min后,收集沉淀并吸去所有上清液。按照磁珠法组织/细胞/血液总RNA提取试剂盒的说明书进行总RNA提取。
在总RNA中提取mRNA:
紫外分光光度计检测之前提取的总RNA浓度,并取1000ng总RNA用不含核酸酶的ddH 2O稀释至50μL,按照TIANSeq mRNA捕获试剂盒说明书提取mRNA,取样检测mRNA含量(多次重复上述实验步骤以获取足够的mRNA以转染)。
实验分组:将给予从转染pBS-L1PA1-CH-mneo-IT15-3质粒的Hela细胞中提取的mRNA进行转染的组设为实验14组;将给予从转染未改造的pBS-L1PA1-CH-mneo质粒的Hela细胞中提取的mRNA进行转染的组设为对照7组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
将所得mRNA用Lipofectamine MessengerMAX转染试剂转入细胞,转染实验14组和对照7组中的培养于6孔板上的Hela细胞。当细胞长至40%融合度时进行第一次转染。对于每孔转染的细胞,将7.5μL的Lipofectamine MessengerMAX转染试剂用125μL不含血清的DMEM溶液稀释,室温下孵育10min。将5μg制得的mRNA与125μL不含血清的DMEM溶液混合后,与此前稀释的125μL的Lipofectamine MessengerMAX转染试剂混合后室温下孵育5min。将所制得的混合溶液加入至每孔培养细胞的培养液中,轻柔混合。当细胞长至70%融合时,重复上述操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
使用引物对6,其上游引物序列如Seq ID No.25所示;下游引物序列如Seq ID No.26所示。引物对6的上游引物序列位于完整IT15基因中,质粒上所用插入位点(靶位点)上游序列的更上游,不存在于质粒中,仅存于基因组中,引物对6的下游引物序列位于待插入的随机设计的非同源序列(待插入序列)上。
上述引物均通过化学合成获得。
qPCR反应体系如表37所示。
表37 qPCR反应体系
Figure PCTCN2021134710-appb-000088
细胞DNA模板分别为转染后的对照7组、实验14组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对6:95℃预变性15min;(95℃变性10s,50℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待插入序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表38所示。PCR产物经测序验证正确。
表38引物对6的结果(n=3,
Figure PCTCN2021134710-appb-000089
)
Figure PCTCN2021134710-appb-000090
实验14组的拷贝数相对量明显高于对照7组(N/A按照40.00计算),具有统计学意义(P<0.05),说明经本发明的RNA途径进行基因编辑是有效的。
由于完全RNA介导的可行性,可知若不在欲导入待编辑体系的序列中添加ORF2p和/或ORF1p的编码序列,在体外将含有插入位点(靶位点)上游序列+待插入序列+插入位点(靶位点)下游序列+SINE序列(如Alu序列)、部分SINE序列或类SINE序列的RNA产物结合ORF2p或同时结合ORF1p和ORF2p并转入细胞(细胞质)的RNP途径亦是可行的。
实施例11 CNVs末端进行固定和编辑
基于生物体内广泛存在的CNVs延伸现象,可推得各基因的CNVs应存在末端,具体为相应基因中的某段序列下游连接着部分SINE(Alu)序列,由不同的套索-部分SINE(Alu)序列(双链DNA)不断插入于CNV末端中的部分SINE序列前,逐渐延伸CNV。由于需要将外显子从pre-mRNA中切下,内含子的末端必定形成套索,且与其相交叠的含有外显子的套索产生概率较低,因此,对于表达量相对较低的基因,必定存在一个相对较长的时期,CNV末端位于基因中内含子的末端并与部分SINE(Alu)序列相连。
本实施例随机选取BRCA1基因中一内含子的3′序列,并将其选为下一个实施例中的待删除序列;所述BRCA1基因内含子的3′序列如Seq ID No.39所示:
CCCAGCTACTTGAGAGGCTGAGGCAGGGAGAATTGCTTGAACCAGGTAG GCGGAGGTTGCAGTGAGCCAAGATCGCACCACTGCACTCCAGCCTGGGGCAA CAGAGCAAGACTGTCTCAAAAAAAATAAATAAATAAAATAAATTCTTAAGAAGGATATTTTGGAAAACTCCTTACATACCTAAATTCTTTGTTTATCAAATACTTGGACTTAGCACACTCTTCTTTGAAATGGACCAATAAACAACAGGAGCCCATAAGCAAAAAGAACTCATTATTTTAAAAACAGTAACTATCCTTACAGGCTTTCTCAGGGCTCTTTCTGTTGGATCCTTCCCTCTCACAGGTCCTTGCTAATGATCTCTAGGTGGACACATTCTAGATGAGATGTCCCTGTCTAGAATGGCAGCACCATGAGGGCTATATCCTCAGTACTAGGACAGCGCCTGGTGCTTAATAGATAGTAAATAGTTGTCTAATTAACTGAGCAAACAGATAGATTCATGAATTAGCTTTTTGCTTTTTCTGTTAGAAACTAAAGGTTCAGGTCAGGCACAATGGCGCATGTCTCTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCTGATCACTTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACATAGTAAAACCCTGTTTCTACAAAAATTACCAAAATTAGCCGGGCGTCTTGGCAAGCACCTGTAATGCCAGCTACTTGAGAGGCTGAGGTGGGAGAATCGCTTGAACCTGGGAGGAAGAGGTTGCAGTGAGCCGAGATGGTGCCAACCTGGGTGACAGAGGGAGACTTAAAAAAAAAAAGAAAGAAAGAAAGAAAAGAAACTAAAGGTTCAAAGAATCCCAGAAAAGGAAGAGTCCTCACAAGCCAGTAATCTAGGCAGGATTACTGATAGTATTTTTATATTTGTTGTATTTTTATAAAATGCCATAGATAGAGGGCTTTTTTCAACATTACATCAGTCTAAAAATCACACATTTTTATATGAACTAACCTAAATGTCTGATGAATCTCACAACACCAAGTCTTTGAAATGTGCCCA TATAAATAAAATGTTAACAGATTCATGCTAATTTTAAATATCGATAGTGTTTAAATGCCTTAATTATTTTTTCACTCCCTAGCTTTAAAAGAAAATAACCAACTTCAAAAGGACATCACAATAACATCAAGTCTATTTGGGGGAATTTGAGGATTTTTTCCCTC
Figure PCTCN2021134710-appb-000091
Figure PCTCN2021134710-appb-000092
,其中,下划线部分为后续待删除序列5′端紧邻的上游序列,下划线后面序列为后续待删除序列(为BRCA1基因中一内含子的3′序列),波浪线为待删除序列的3′端序列(删除在下一实施例中进行,因此命名)。
按照待删除序列的3′端序列+随机设计的与BRCA1基因非同源的序列+部分Alu序列的顺序构建序列,并在两端添加NheI酶切位点及保护碱基,构建序列。
由于使Alu元件转录产物产生仅含部分Alu序列的RNA的自然剪切位点在不同文献中报道有所差异,为防止与CNV末端中的部分Alu序列不匹配导致无法插入,三种可能的部分Alu序列均被合成和导入(区别在于其5′端序列不同),构建序列如Seq ID No.40、Seq ID No.41、Seq ID No.42所示。
Seq ID No.40序列为:
Figure PCTCN2021134710-appb-000093
其中,下划线表示的为随机设计的非同源序列(与BRCA1基因非同源的序列),在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与3′端NheI酶切位点及保护碱基序列之间为部分Alu序列-Alu3(阴影部分)。采用化学合成方法获得该序列,命名为BRCA1-1-Alu3。
Seq ID No.41序列为:
Figure PCTCN2021134710-appb-000094
Figure PCTCN2021134710-appb-000095
,其中,下划线表示的为随机设计的非同源序列(与BRCA1基因非同源的序列),在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与3′端NheI酶切位点及保护碱基序列之间为部分Alu序列-Alu4(阴影部分)。采用化学合成方法获得该序列,命名为BRCA1-1-Alu4。
Seq ID No.42序列为:
Figure PCTCN2021134710-appb-000096
Figure PCTCN2021134710-appb-000097
其中,下划线表示的为随机设计的非同源序列(与BRCA1基因非同源的序列),在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与3′端NheI酶切位点及保护碱基序列之间为部分Alu序列-Alu5(阴影部分)。采用化学合成方法获得该序列,命名为BRCA1-1-Alu5。
另设计不包含非同源序列(与BRCA1基因非同源的序列)的序列,如Seq ID No.43、Seq ID No.44和Seq ID No.45所示。
Seq ID No.43序列为:
Figure PCTCN2021134710-appb-000098
Figure PCTCN2021134710-appb-000099
其中Seq ID No.43为BRCA1-1-Alu3去除非同源序列而成。采用化学合成方法获得该序列,命名为BRCA1-2-Alu3。
Seq ID No.44序列为:
Figure PCTCN2021134710-appb-000100
Figure PCTCN2021134710-appb-000101
Figure PCTCN2021134710-appb-000102
其中Seq ID No.44为BRCA1-1-Alu4去除非同源序列而成,采用化学合成方法获得该序列,命名为BRCA1-2-Alu4。
Seq ID No.45序列为:
Figure PCTCN2021134710-appb-000103
Figure PCTCN2021134710-appb-000104
其中Seq ID No.45为BRCA1-1-Alu5去除非同源序列而成,采用化学合成方法获得该序列,命名为BRCA1-2-Alu5。
将BRCA1-1-Alu3、BRCA1-1-Alu4、BRCA1-1-Alu5、BRCA1-2-Alu3、BRCA1-2-Alu4、BRCA1-2-Alu5分别插入到pBS-L1PA1-CH-mneo中,构建pBS-L1PA1-CH-mneo-BRCA1-1-Alu3、pBS-L1PA1-CH-mneo-BRCA1-1-Alu4、pBS-L1PA1-CH-mneo-BRCA1-1-Alu5、pBS-L1PA1-CH-mneo-BRCA1-2-Alu3、pBS-L1PA1-CH-mneo-BRCA1-2-Alu4和pBS-L1PA1-CH-mneo-BRCA1-2-Alu5,具体过程为:
将BRCA1-1-Alu3、BRCA1-1-Alu4、BRCA1-1-Alu5、BRCA1-2-Alu3、BRCA1-2-Alu4、BRCA1-2-Alu5和pBS-L1PA1-CH-mneo分别进行酶切,反应体系如表39所示
表39酶切反应体系
Figure PCTCN2021134710-appb-000105
Figure PCTCN2021134710-appb-000106
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的BRCA1-1-Alu3、BRCA1-1-Alu4、BRCA1-1-Alu5、BRCA1-2-Alu3、BRCA1-2-Alu4、BRCA1-2-Alu5分别与酶切成线性的质粒载体pBS-L1PA1-CH-mneo进行连接,反应体系如表40所示:
表40连接反应体系
Figure PCTCN2021134710-appb-000107
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pBS-L1PA1-CH-mneo-BRCA1-1-Alu3、pBS-L1PA1-CH-mneo-BRCA1-1-Alu4、pBS-L1PA1-CH-mneo-BRCA1-1-Alu5、pBS-L1PA1-CH-mneo-BRCA1-2-Alu3、pBS-L1PA1-CH-mneo-BRCA1-2-Alu4和pBS-L1PA1-CH-mneo-BRCA1-2-Alu5。质粒经测序验证正确。
实验分组:将共转染pBS-L1PA1-CH-mneo-BRCA1-1-Alu3、pBS-L1PA1-CH-mneo-BRCA1-1-Alu4和pBS-L1PA1-CH-mneo-BRCA1-1-Alu5质粒的组设为实验15组;将共转染pBS-L1PA1-CH-mneo-BRCA1-2-Alu3、pBS-L1PA1-CH-mneo-BRCA1-2-Alu4和pBS-L1PA1-CH-mneo-BRCA1-2-Alu5质粒的组为对照8组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取96μg(每种质粒各32μg)构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
使用引物对9,其上游引物序列如Seq ID No.46所示:5′-CCCCTTTATCTCCTTCTG-3′;下游引物序列如Seq ID No.47所示:5′-ATTTCTCCCATTCCACTT-3′。引物对9的上游引物序列位于完整BRCA1基因中的质粒上待删除序列的3′端序列的下游序列,不存在于质粒中,仅存于基因组上,引物对9的下游引物序列位于完整BRCA1基因中的质粒上待删除序列的3′端序列的下游,不存在于质粒中,仅存于基因组上。上述引物均通过化学合成获得。
qPCR反应体系如表41所示。
表41 qPCR反应体系
Figure PCTCN2021134710-appb-000108
细胞DNA模板分别为前述共转染后的对照8组和实验15组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对9:95℃预变性15min;(95℃变性10s,46℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待删除序列的3′端序列的下游序列插入的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表42所示。PCR产物经测序验证正确。
表42引物对9的结果(n=3,
Figure PCTCN2021134710-appb-000109
)
Figure PCTCN2021134710-appb-000110
Figure PCTCN2021134710-appb-000111
实验15组的拷贝数相对量低于对照8组,具有统计学意义(P<0.05),意味着在实验15组中CNV末端的基因部分在相应完整基因中的下游序列的拷贝数更少,说明非同源序列插入至CNV末端阻碍了CNV末端的基因部分向下游延伸。
结论:可见在CNV末端中的非同源序列的插入阻碍了相应CNV的延伸。
实施例12对CNV末端进行裁剪
选取BRCA1基因其中一内含子的3′序列(位于实施例11中Seq ID No.39),按照待删除序列的3′端序列+非同源序列+待删除序列5′端紧邻的上游序列+部分Alu序列的顺序合成序列并在两端添加NheI酶切位点及保护碱基,序列如Seq ID No.48、Seq ID No.49和Seq ID No.50所示。
Seq ID No.48序列为:
Figure PCTCN2021134710-appb-000112
Figure PCTCN2021134710-appb-000113
其中,下划线表示的为随机设计的非同源序列,在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与阴影序列之间为待删除序列5′端紧邻的上游序列,阴影部分为部分Alu序列-Alu3。采用化学合成方法获得该序列,命名为BRCA1-3-Alu3。
Seq ID No.49序列为:
Figure PCTCN2021134710-appb-000114
Figure PCTCN2021134710-appb-000115
Figure PCTCN2021134710-appb-000116
其中,下划线表示的为随机设计的非同源序列,在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与阴影序列之间为待删除序列5′端紧邻的上游序列,阴影部分为部分Alu序列-Alu4。采用化学合成方法获得该序列,命名为BRCA1-3-Alu4。
Seq ID No.50序列为:
Figure PCTCN2021134710-appb-000117
Figure PCTCN2021134710-appb-000118
其中,下划线表示的为随机设计的非同源序列,在序列两端为NheI酶切位点及保护碱基(斜体加粗),在非同源序列与阴影序列之间为待删除序列5′端紧邻的上游序列,阴影部分为部分Alu序列-Alu5。采用化学合成方法获得该序列,命名为BRCA1-3-Alu5。
将BRCA1-3-Alu3、BRCA1-3-Alu4和BRCA1-3-Alu5分别插入到pBS-L1PA1-CH-mneo中,构建pBS-L1PA1-CH-mneo-BRCA1-3-Alu3、pBS-L1PA1-CH-mneo-BRCA1-3-Alu4和pBS-L1PA1-CH-mneo-BRCA1-3-Alu5,具体过程为:
将BRCA1-3-Alu3、BRCA1-3-Alu4、BRCA1-3-Alu5和pBS-L1PA1-CH-mneo分别进行酶切,反应体系如表43所示
表43酶切反应体系
Figure PCTCN2021134710-appb-000119
反应条件为:37℃下孵育1h后,然后升温到65℃温育20min使内切酶失活,电泳,回收酶切产物。
将酶切后的BRCA1-3-Alu3、BRCA1-3-Alu4和BRCA1-3-Alu5分别与酶切成线性的质粒载体pBS-L1PA1-CH-mneo进行连接,反应体系如表44所示:
表44连接反应体系
Figure PCTCN2021134710-appb-000120
反应条件为:16℃孵育16h,然后升温至70℃温育10min灭活连接酶,电泳、纯化得到质粒pBS-L1PA1-CH-mneo-BRCA1-3-Alu3、pBS-L1PA1-CH-mneo-BRCA1-3-Alu4和pBS-L1PA1-CH-mneo-BRCA1-3-Alu5。质粒经测序验证正确。
实验分组:将共转染pBS-L1PA1-CH-mneo-BRCA1-3-Alu3、pBS-L1PA1-CH-mneo-BRCA1-3-Alu4、pBS-L1PA1-CH-mneo-BRCA1-3-Alu5质粒的组设为实验16组;将共转染pBS-L1PA1-CH-mneo-BRCA1-1-Alu3、pBS-L1PA1-CH-mneo-BRCA1-1-Alu4和pBS-L1PA1-CH-mneo-BRCA1-1-Alu5质粒的组为对照9组。每组设3个平行,每个平行均为一个培养有Hela细胞的6孔板。
转染步骤为:将Hela细胞传代并铺于6孔板。传代次日,应用Entranster-H4000转染试剂进行转染。对于每板细胞的转染,取96μg(每种质粒各32μg)构建好的质粒用300μL的无血清DMEM稀释,充分混匀;同时取120μL的Entranster-H4000试剂用300μL的无血清DMEM稀释,充分混匀后,室温静置5min。之后将制备好的两种液体混合并充分混匀并室温静置15min,制成转染复合物。将转染复合物加入到每孔含2ml含10%胎牛血清的DMEM培养液进行转染。待细胞长至90%左右融合时传代,传代后重复上述操作,细胞长至90%左右融合后取材进行后续操作。
提取转染后细胞DNA:吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适 量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。此后按照血液/细胞/组织基因组DNA提取试剂盒的产品说明书进行细胞DNA的提取,紫外分光光度计测定DNA浓度。
qPCR检测:
由于GAPDH基因不含有Alu序列,拷贝数稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
使用引物对10,其上游引物序列如Seq ID No.51所示:5′-GCTTTCTCAGGGCTCTTT-3′;下游引物序列如Seq ID No.52所示:5′-GCACCATCTCGGCTCACT-3′。引物对10的上游引物序列位于位于预期删除序列(待删除序列)上,不存在于质粒中,仅存于基因组上,引物对10的下游引物序列位于预期删除序列(待删除序列)上,不存在于质粒中,仅存于基因组上。上述引物均通过化学合成获得。
qPCR反应体系如表45所示。
表45 qPCR反应体系
Figure PCTCN2021134710-appb-000121
细胞DNA模板分别为前述共转染后的对照9组、实验16组中提取的DNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对10:95℃预变性15min;(95℃变性10s,49℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察GAPDH和检测待删除序列的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表46所示。PCR产物经测序验证正确。
表46引物对10的结果(n=3,
Figure PCTCN2021134710-appb-000122
)
Figure PCTCN2021134710-appb-000123
Figure PCTCN2021134710-appb-000124
实验16组的拷贝数相对量低于对照9组,具有统计学意义(P<0.05),说明待删除序列被删除,CNV末端基因部分序列在实验16组中有所减少。
从实施例11可知可通过在CNV末端插入非同源序列以阻碍其继续延伸;而实施例12可见CNV末端的基因部分序列在实验组中明显少于对照组,说明CNV末端被裁剪并提前,证明了可以通过本发明中的相关方法对CNV末端进行修改。因此亦可通过改变编辑方法中的插入点上游的引导序列(即靶位点上游序列)(当对CNV末端进行编辑时与CNV末端的基因部分序列相同),对多种或所有的CNVs进行修改。
实施例13 CNV末端裁剪后的相应基因表达改变
对实施例12中实验16组及对照9组提取细胞总RNA:
具体过程为:取其中经转染后的细胞吸去细胞培养基后,用PBS冲洗两遍细胞后,加入适量0.25%胰蛋白酶进行消化,在37℃下共消化20min,每5min进行15次吹打。当细胞悬浮后,加入含有血清的完全培养基终止反应。将含有细胞的溶液转移至RNase-Free的离心管中,300g离心5min后,收集沉淀并吸去所有上清液。按照磁珠法组织/细胞/血液总RNA提取试剂盒的说明书进行总RNA提取。
在总RNA中提取mRNA:
紫外分光光度计检测之前提取的总RNA浓度,并取1000ng总RNA用不含核酸酶的ddH 2O稀释至50μL,按照TIANSeq mRNA捕获试剂盒说明书提取mRNA,取样检测mRNA含量(多次重复上述实验步骤以获取足够的mRNA以转染)。
逆转录合成cDNA模板:
按照FastKing cDNA第一链合成试剂盒的说明书进行cDNA的合成,紫外分光光度计测定所合成cDNA的浓度,待后续检测。
qPCR检测:
由于GAPDH基因在各组织中表达相对稳定,因此将GAPDH基因作为内参基因。
检测GAPDH基因表达的上游引物序列如Seq ID No.7所示;下游引物序列如Seq ID No.8所示。
使用引物对11,其上游引物序列如Seq ID No.53所示:5′-CAGAGGACAATGGCTTCCATG-3′;下游引物序列如Seq ID No.54所示:5′-CTACACTGTCCAACACCCACTCTC-3′。引物对11的上游引物序列位于BRCA1 基因上,不存在于质粒中,仅存于基因组上,引物对11的下游引物序列位于BRCA1基因上,不存在于质粒中,仅存于基因组上。上述引物均通过化学合成获得。
qPCR反应体系如表47所示。
表47 qPCR反应体系
Figure PCTCN2021134710-appb-000125
细胞DNA模板分别为前述共转染后的对照9组和实验16组中提取的mRNA合成的cDNA。
上述反应体系在冰上配制,配制好后盖上反应管,轻柔混匀后短暂离心,以确保所有组分均位于管底。每个6孔板细胞样本同时进行3次重复。
qPCR反应循环:
引物对11:95℃预变性15min;(95℃变性10s,55℃退火20s,72℃延伸20s)40个循环。GAPDH引物按照相同条件进行反应。
观察检测GAPDH和BRCA1表达的扩增曲线中的指数增长期,确认近似平行后,通过2 -ΔΔCt相对法对所得数据进行分析,结果如表48所示。PCR产物经测序验证正确。
表48引物对11的结果(n=3,
Figure PCTCN2021134710-appb-000126
)
Figure PCTCN2021134710-appb-000127
实验16组BRCA1基因的相对表达量低于对照9组,具有统计学意义(P<0.05),说明BRCA1基因的CNV末端裁剪导致了其表达的降低。
结论:可见BRCA1基因表达有所减少,说明对其CNV进行编辑后的确影响了 相应基因的转录,进而可影响蛋白表达及细胞、组织或活体状态。
从实施例11至13可以看出,基于本发明可以对CNV末端进行固定、延伸及裁剪,并同时影响细胞的基因转录以及蛋白表达。此外,CNVs也随胚胎及个体发育和肿瘤发生等生理过程变化并在不同细胞、组织和个体中有所差异,因此对CNVs进行编辑也可改变相应的细胞、组织及活体状态。
从上述实施例可以看出,本发明利用广泛存在于真核生物中的逆转座子及其逆转录功能对基因组进行编辑,其中所涉及的SINE、LINE序列及相关的蛋白质均广泛存在于正常的生物体中,在不产生双链断裂的前提下,进行较为准确的靶向序列识别和剪切,将目的片段整合入基因组,并可由此对相应片段进行删减和替换。因不产生双链断裂,无需担心基因组双链DNA断裂的危险和非预期随机序列的引入。以SINE中的Alu序列及其功能对应的LINE中的LINE-1为例,Alu和LINE-1广泛分布于灵长类动物的基因组中,在具体实施中,待插入序列依靠载体上待插入序列两侧的序列(靶位点上游序列和靶位点下游序列)定位于基因组上的待插入位点(靶位点),且ORF2p只有在靶位点上游序列完全匹配的条件下才可从其载体核酸的3′端顺利滑至剪切位点进行基因组上的单链剪切,这极大的提高了其靶向的准确性,避免了非预期剪切的发生,其靶向准确性理论上高于目前存在的基因编辑技术。此外尚可通过在体外产生所需的RNA及ORF1p和ORF2p等相应蛋白,在不引入DNA片段和转染非必需入核的条件下,通过RNA或RNP途径对目的序列、基因及基因组进行修改。借助ORF1p(及ORF2p)的核定位功能,转染入细胞的RNA和蛋白质可被引导至核内,有利于对因载体难以入核而难以操作的细胞进行编辑。同时,通过本发明还可对基因组上的CNVs进行编辑,使其增加、减少或稳定不变(无法继续改变),因CNVs可直接影响蛋白质表达等,对CNVs的操作可改变或稳定相应细胞的表达及状态。由于与Alu及LINE-1同源及功能近似的各类SINE和LINE如各类MIR及LINE-2等广泛分布于真核生物中,因此本发明亦可应用于适宜的其他真核生物体系。
有别于其他的基因编辑技术,本发明中所采用的相关机制均存在于正常生物体体内,无需引入外来机制及系统,减少对待进行基因编辑的接受体系的影响。由于不引入外来体系如源自原核生物的蛋白质等,且不产生双链断裂,本发明相较于目前已有的基因编辑技术更易应用于临床。

Claims (17)

  1. 一种基因转录框架,其特征在于,该基因转录框架沿5′→3′方向包括靶位点上游序列、待插入序列、靶位点下游序列;
    该基因转录框架为一段可通过RNA聚合酶I、RNA聚合酶II或RNA聚合酶III转录的DNA序列,该基因转录框架的转录产物或其转录产物的转化产物中的靶位点上游序列或其互补序列能够与细胞基因组中相应靶位点的上游序列或其互补序列杂交,靶位点下游序列或其互补序列能够与细胞基因组中相应靶位点的下游序列或其互补序列杂交,该靶位点上游序列和该靶位点下游序列在基因组中相应的靶基因序列中为直接相连,基因组上靶基因序列中的靶位点上游序列和靶位点下游序列之间的位点即为待插入序列的靶位点。
  2. 如权利要求1所述的基因转录框架,其特征在于,所述细胞为真核细胞。
  3. 一种载体系统,其特征在于,该载体系统包括一种或多种载体,该一种或多种载体包括:
    一个或多个1)如权利要求1所述的基因转录框架;
    一个或多个2)短散在元件和/或部分短散在元件和/或类短散在元件,和/或一个或多个3)长散在元件和/或ORF1p编码序列和/或ORF2p编码序列;
    其中,该组分1)、2)和/或3)位于该载体系统的相同或不同载体上;
    该载体上带有一个或多个启动子,该启动子为RNA聚合酶I启动子、RNA聚合酶II启动子或RNA聚合酶III启动子,并且位于该组分1)、2)和/或3)的上游;
    该载体系统通过DNA、RNA和/或RNP途径介导。
  4. 如权利要求3所述的载体系统,其特征在于,当组分1)为多个时位于该载体系统的相同或不同载体上;当组分2)为多个时位于该载体系统的相同或不同载体上;当组分3)为多个时位于该载体系统的相同或不同载体上。
  5. 如权利要求3所述的载体系统,其特征在于,所述载体为真核表达载体、原核表达载体、病毒载体、质粒载体、人工染色体、噬菌体载体、粘粒载体。
  6. 如权利要求3所述的载体系统,其特征在于,所述载体为表达载体、克隆载体、测序载体、转化载体、穿梭载体或多功能载体。
  7. 如权利要求3所述的载体系统,其特征在于,当1)和2)位于相同载体上时,该短散在元件,和/或部分短散在元件,和/或类短散在元件位于该基因转录框架的下游,并且该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件直接相连或间接相连;当直接相连时,该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件共用一个启动子;当间接相连时, 该基因转录框架与该短散在元件,和/或部分短散在元件,和/或类短散在元件共用一个启动子或不共用一个启动子。
  8. 如权利要求3所述的载体系统,其特征在于,当1)和3)位于相同载体上时,该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列位于该基因转录框架的上游和/或下游,并且该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列直接相连或间接相连;当直接相连时,该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列共用一个启动子;当间接相连时,该基因转录框架与该一个或多个长散在元件,和/或一个或多个ORF1p编码序列,和/或一个或多个ORF2p编码序列共用一个启动子或不共用一个启动子。
  9. 如权利要求3所述的载体系统,其特征在于,当1)、2)和3)位于相同载体上时,该短散在元件和/或部分短散在元件和/或类短散在元件位于该基因转录框架的下游和/或该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列的下游;当该短散在元件和/或部分短散在元件和/或类短散在元件位于该基因转录框架的下游时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该基因转录框架的上游,和/或该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该短散在元件和/或部分短散在元件和/或类短散在元件的下游;当该短散在元件和/或部分短散在元件和/或类短散在元件位于该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列的下游时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该基因转录框架的下游;并且该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列之间直接相连或间接相连;当直接相连时,该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子;当间接相连时,该基因转录框架、该短散在元件和/或部分短散在元件和/或类短散在元件和该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子或不共用一个启动子。
  10. 如权利要求3所述的载体系统,其特征在于,当2)和3)位于相同载体上时,该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列位于该短散在元件和/或部分短散在元件和/或类短散在元件的上游和/或下游,且该短散在元件和/或部分短散在元件和/或类短散在元件与该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列直接相连或间接相连;当直接相连时,该短散在元件和/或部分短散在元件 和/或类短散在元件与该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子,当间接相连时,该短散在元件和/或部分短散在元件和/或类短散在元件与该长散在元件和/或ORF1p编码序列和/或ORF2p编码序列共用一个启动子或不共用一个启动子。
  11. 一种基因组序列编辑方法,其特征在于,包括如下步骤:
    1)在基因组中选择待进行编辑的靶基因的待插入位点,确定待插入位点两边的靶基因的待插入位点的上游序列和下游序列;
    2)制备如权利要求3至10任一项所述的载体系统;
    3)将载体系统转化或转染到细胞、组织或生物体中进行转录。
  12. 如权利要求3至10任一项所述的可经DNA、RNA和/或RNP途径介导的载体系统在基因组中任意区域进行DNA序列的插入、删除、替换中的应用。
  13. 如权利要求12所述的应用,其特征在于,所述DNA序列为一个或多个CNV序列、CNV末端序列、短散在元件或长散在元件。
  14. 如权利要求3至1任一项所述的载体系统作为预防和/或治疗癌症、与基因有关的遗传病、神经退行性疾病的药物中的应用。
  15. 如权利要求14所述的应用,其特征在于,所述癌症为胶质瘤、乳腺癌、宫颈癌、肺癌、胃癌、结直肠癌、十二指肠癌、白血病、前列腺癌、子宫内膜癌、甲状腺癌、淋巴瘤、胰腺癌、肝癌、黑色素瘤、皮肤癌、垂体瘤、生殖细胞瘤、脑膜瘤、脑膜癌、胶质母细胞瘤、各类星形细胞瘤、各类少枝胶质细胞瘤、星形少枝细胞瘤、各类室管膜瘤、脉络丛乳头状瘤、脉络丛癌、脊索瘤、各类神经节细胞瘤、嗅神经母细胞瘤、交感神经系统神经母细胞瘤、松果体细胞瘤、松果体母细胞瘤、髓母细胞瘤、三叉神经鞘瘤、面听神经瘤、颈静脉球瘤、血管网状细胞瘤、颅咽管瘤或颗粒细胞瘤。
  16. 如权利要求14所述的应用,其特征在于,所述与基因有关的遗传病为Huntington病、脆性X综合征、苯丙酮尿症、假肥大型进行性肌营养不良、线粒体脑肌病、脊髓性肌萎缩症、帕金森叠加综合征、白化病、红绿色盲症、软骨发育不全、黑尿症、先天性聋哑、地中海贫血、镰刀型细胞贫血病、血友病、与基因改变有关的癫痫、肌阵挛、肌张力障碍、卒中和精神分裂、抗维生素D佝偻病、家族性结肠息肉症、遗传性肾炎。
  17. 如权利要求14所述的应用,其特征在于,所述神经退行性疾病为帕金森病、阿尔茨海默病、Huntington病、肌萎缩性侧索硬化、脊髓小脑共济失调、多系统萎缩、原发性侧索硬化、Pick病、额颞叶痴呆、路易体痴呆或进行性核上性麻痹。
PCT/CN2021/134710 2021-01-22 2021-12-01 基因转录框架、载体系统、基因组序列编辑方法及应用 WO2022156378A1 (zh)

Priority Applications (8)

Application Number Priority Date Filing Date Title
BR112023014382A BR112023014382A2 (pt) 2021-01-22 2021-12-01 Estrutura de transcrição gênica, sistema vetorial, método de edição da sequência do genoma e aplicação
CA3206081A CA3206081A1 (en) 2021-01-22 2021-12-01 Gene transcription framework, vector system, genome sequence editing method and application
KR1020237028513A KR20230135630A (ko) 2021-01-22 2021-12-01 유전자 전사 프레임워크, 벡터 시스템, 게놈 서열 편집 방법 및 응용
MX2023008635A MX2023008635A (es) 2021-01-22 2021-12-01 Marco para la transcripción de genes, sistema de vectores, método para editar la secuencia genómica y aplicación.
EP21920753.7A EP4282968A1 (en) 2021-01-22 2021-12-01 Gene transcription framework, vector system, genome sequence editing method and application
AU2021422067A AU2021422067A1 (en) 2021-01-22 2021-12-01 Gene transcription framework, vector system, genome sequence editing method and application
JP2023541700A JP2024504592A (ja) 2021-01-22 2021-12-01 遺伝子転写フレームワーク、ベクターシステム、ゲノム配列編集方法および応用
ZA2023/08025A ZA202308025B (en) 2021-01-22 2023-08-18 Gene transcription framework, vector system, genome sequence editing method and application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110089068.1A CN112708636A (zh) 2021-01-22 2021-01-22 基因转录框架、载体系统、基因组序列编辑方法及应用
CN202110089068.1 2021-01-22

Publications (1)

Publication Number Publication Date
WO2022156378A1 true WO2022156378A1 (zh) 2022-07-28

Family

ID=75550451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134710 WO2022156378A1 (zh) 2021-01-22 2021-12-01 基因转录框架、载体系统、基因组序列编辑方法及应用

Country Status (10)

Country Link
EP (1) EP4282968A1 (zh)
JP (1) JP2024504592A (zh)
KR (1) KR20230135630A (zh)
CN (1) CN112708636A (zh)
AU (1) AU2021422067A1 (zh)
BR (1) BR112023014382A2 (zh)
CA (1) CA3206081A1 (zh)
MX (1) MX2023008635A (zh)
WO (1) WO2022156378A1 (zh)
ZA (1) ZA202308025B (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179132A1 (zh) * 2022-03-21 2023-09-28 隋云鹏 用于基因编辑的rna框架和基因编辑方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112481259B (zh) * 2020-11-24 2022-09-16 南昌大学 两种甘薯U6基因启动子IbU6的克隆与应用
CN115312121B (zh) * 2022-09-29 2023-03-24 北京齐碳科技有限公司 靶基因位点检测方法、装置、设备及计算机存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102080101A (zh) * 2010-11-25 2011-06-01 北京济福霖生物技术有限公司 一种定点整合外源基因的方法
WO2020047124A1 (en) * 2018-08-28 2020-03-05 Flagship Pioneering, Inc. Methods and compositions for modulating a genome
WO2020082076A1 (en) * 2018-10-19 2020-04-23 Board Of Regents, The University Of Texas System Engineered long interspersed element (line) transposons and methods of use thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016030501A1 (en) * 2014-08-28 2016-03-03 Centre National De La Recherche Scientifique - Cnrs - Synthetic alu-retrotransposon vectors for gene therapy
US11339427B2 (en) * 2016-02-12 2022-05-24 Jumpcode Genomics, Inc. Method for target specific RNA transcription of DNA sequences
CN112210573B (zh) * 2020-10-14 2024-02-06 浙江大学 一种用基因编辑改造原代细胞的dna模板及定点插入方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102080101A (zh) * 2010-11-25 2011-06-01 北京济福霖生物技术有限公司 一种定点整合外源基因的方法
WO2020047124A1 (en) * 2018-08-28 2020-03-05 Flagship Pioneering, Inc. Methods and compositions for modulating a genome
WO2020082076A1 (en) * 2018-10-19 2020-04-23 Board Of Regents, The University Of Texas System Engineered long interspersed element (line) transposons and methods of use thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LI HONGYU;ZHENG XIN;LONG MIN;ZHANG TIAN;ZHANG JICHUN;LI JING: "The Impact of Retroposons on Primate Genome Structure", SICHUAN JOURNAL OF ZOOLOGY, vol. 34, no. 2, 28 March 2015 (2015-03-28), pages 315 - 320, XP055952850, ISSN: 1000-7083, DOI: 10.3969/j.issn.1000-7083.2015.02.025 *
SUI YUNPENG, PENG SHUANGHONG: "A Mechanism Leading to Changes in Copy Number Variations Affected by Transcriptional Level Might Be Involved in Evolution, Embryonic Development, Senescence, and Oncogenesis Mediated by Retrotransposons", FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, vol. 9, 11 February 2021 (2021-02-11), pages 1 - 27, XP055952845, DOI: 10.3389/fcell.2021.618113 *
ZHENG FEIYANG, KAWABE YOSHINORI, MURAKAMI MAI, TAKAHASHI MAMIKA, YOSHIDA SHOICHIRO, ITO AKIRA, KAMIHIRA MASAMICHI: "Retrotransposon-mediated Gene Transfer for Animal Cells", MATEC WEB OF CONFERENCES, vol. 333, 1 January 2021 (2021-01-01), pages 1 - 5, XP055952848, DOI: 10.1051/matecconf/202133307002 *
ZHU QIXIA, LI YIXIANG;CHENG CHUNYAN;HU WENJIN;ZHOU LIQIN: "The Regulation of Retrotransposon LINE-1 and Its Function in Tumorigenesis", GUANGXI SCIENCES, vol. 25, no. 3, 15 June 2018 (2018-06-15), pages 268 - 273, XP055952852, DOI: 10.13656/j.cnki.gxkx.20180604.002 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179132A1 (zh) * 2022-03-21 2023-09-28 隋云鹏 用于基因编辑的rna框架和基因编辑方法

Also Published As

Publication number Publication date
ZA202308025B (en) 2024-03-27
KR20230135630A (ko) 2023-09-25
CN112708636A (zh) 2021-04-27
MX2023008635A (es) 2023-10-17
AU2021422067A1 (en) 2023-09-07
BR112023014382A2 (pt) 2023-10-31
EP4282968A1 (en) 2023-11-29
JP2024504592A (ja) 2024-02-01
CA3206081A1 (en) 2022-07-28

Similar Documents

Publication Publication Date Title
WO2022156378A1 (zh) 基因转录框架、载体系统、基因组序列编辑方法及应用
EP3494997B1 (en) Inducible dna binding proteins and genome perturbation tools and applications thereof
US20230125704A1 (en) Modified bacterial retroelement with enhanced dna production
WO2017062754A1 (en) Compositions and methods for enhancing crispr activity by polq inhibition
TWI516592B (zh) 用於在原核細胞內使用真核第二型聚合酶啟動子驅動之轉錄作用的可誘導之基因表現組成物及其應用
US20110054012A1 (en) Methods and Compositions for Increasing Gene Expression
US7972816B2 (en) Efficient process for producing dumbbell DNA
CN113201591A (zh) 一种长链非编码rna及其抑制剂在预防、治疗乳腺癌中的应用
CN116426573A (zh) 重编程因子抗衰老表达系统、生物材料及用途
US20230383293A1 (en) Modified functional nucleic acid molecules
Park et al. MicroRNA clustering on the biogenesis of suboptimal microRNAs
WO2023179132A1 (zh) 用于基因编辑的rna框架和基因编辑方法
WO2023051734A1 (en) Engineered crispr-cas13f system and uses thereof
CN113286598A (zh) 利用Staple核酸的蛋白质翻译反应的抑制法
CN115927331A (zh) 一种用于促进circRNA成环和过表达的DNA框架及其构建方法和用途
Eul et al. Trans-splicing and alternative-tandem-cis-splicing: two ways by which mammalian cells generate a truncated SV40 T-antigen
CN112481272A (zh) Nova1促smn2外显子7列入的验证方法及其应用
CN109082426A (zh) 利用CRISPR-Cas9构建的斑马鱼cip2a基因突变体及其构建方法
CN109082427A (zh) 利用CRISPR-Cas9构建的斑马鱼msi1基因突变体及其构建方法
US20230227818A1 (en) Clinically applicable characterization of genetic variants by genome editing
US20240131095A1 (en) Artificial oncolytic viruses and related methods
US20240226208A9 (en) Artificial oncolytic viruses and related methods
JP2022189717A (ja) iPS細胞の生成に使用される新規RNA組成物およびその製造方法
Chavali et al. Functional categories of RNA regulation
CN118048397A (zh) 靶向敲除znf410基因的单碱基编辑系统及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920753

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023541700

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 3206081

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: MX/A/2023/008635

Country of ref document: MX

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023014382

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 2021422067

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 20237028513

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020237028513

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021920753

Country of ref document: EP

Effective date: 20230822

ENP Entry into the national phase

Ref document number: 2021422067

Country of ref document: AU

Date of ref document: 20211201

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01E

Ref document number: 112023014382

Country of ref document: BR

Free format text: FAVOR EFETUAR, EM ATE 60 (SESSENTA) DIAS, O PAGAMENTO DE GRU CODIGO DE SERVICO 260 PARA A REGULARIZACAO DO PEDIDO, CONFORME ART 2O 1O DA RESOLUCAO 189/2017 E NOTA DE ESCLARECIMENTO PUBLICADA NA RPI 2421 DE 30/05/2017, UMA VEZ QUE A PETICAO NO 870230078675 DE 05/09/2023 APRESENTA DOCUMENTOS REFERENTES A DOIS SERVICOS DIVERSOS (COMPLEMENTACAO E MODIFICACAO NO PEDIDO) TENDO SIDO PAGA SOMENTE UMA RETRIBUICAO. DEVERA SER PAGA MAIS 1 (UMA) GRU CODIGO DE SERVICO 260 E A GRU CODIGO DE SERVICO 207 REFERENTE A RESPOSTA DESTA EXIGENCIA.

ENP Entry into the national phase

Ref document number: 112023014382

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230718