CN111690724B - Method for detecting activity of reagent generated by double-strand break - Google Patents

Method for detecting activity of reagent generated by double-strand break Download PDF

Info

Publication number
CN111690724B
CN111690724B CN201910199103.8A CN201910199103A CN111690724B CN 111690724 B CN111690724 B CN 111690724B CN 201910199103 A CN201910199103 A CN 201910199103A CN 111690724 B CN111690724 B CN 111690724B
Authority
CN
China
Prior art keywords
lys
leu
glu
asp
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910199103.8A
Other languages
Chinese (zh)
Other versions
CN111690724A (en
Inventor
胡家志
尹健行
刘孟竺
刘阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201910199103.8A priority Critical patent/CN111690724B/en
Priority to PCT/CN2020/098360 priority patent/WO2020228844A2/en
Publication of CN111690724A publication Critical patent/CN111690724A/en
Application granted granted Critical
Publication of CN111690724B publication Critical patent/CN111690724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/34Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
    • C12Q1/44Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase involving esterase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/66Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving luciferase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/914Hydrolases (3)
    • G01N2333/916Hydrolases (3) acting on ester bonds (3.1), e.g. phosphatases (3.1.3), phospholipases C or phospholipases D (3.1.4)
    • G01N2333/922Ribonucleases (RNAses); Deoxyribonucleases (DNAses)

Abstract

The invention relates to a method for detecting the activity of a double-strand break generation reagent, wherein the double detection of the editing efficiency and specificity of the reagent is realized through the design of an extension primer and a molecular beacon with a random sequence.

Description

Method for detecting activity of reagent generated by double-strand break
Technical Field
The invention provides a method for simultaneously detecting the editing efficiency and specificity of a double-strand break generating reagent, in particular an engineered nuclease (engineered nuclease).
Background
In genetic engineering, suitable genome editing tools are often required to implement manipulation of a target gene or genome fragment, and the ability to cause a double-strand break in a target genome sequence is an essential function of various types of genome editing tool engineering. One of the widely used double-strand break reagents is engineered nuclease (engineered nuclease), which includes meganucleases, zinc finger nucleases, Transcription activator-like nucleases (TALENs), CRISPR/Cas (clustered regulated interstitial delivered short template repeat/CRISPR-associated proteins), and the like, and can precisely implement targeted editing at almost any genome position, thereby implementing precise modification of a genome. Among them, CRISPR/Cas derived from the immune system of bacteria and archaea can guide Cas protein to modify target site sequence by using target site specific RNA, and has attracted extensive attention in recent years due to its high efficiency and simplicity, and has become a novel and efficient genome editing tool.
However, engineered nucleases including CRISPR/Cas all have off-target activity, i.e., causing cleavage at other genomic positions with a certain sequence difference from the target site sometimes even resulting in chromatin rearrangement, which greatly increases the risk of gene editing, thus greatly limiting the application of gene editing means in clinical treatment. Several methods have been developed in the prior art to detect the editing efficiency and specificity of nucleases, for example, targeted high throughput sequencing is widely used to evaluate indels within genomic fragments amplified by PCR (Mali et al, 2013), and these data can be used to roughly estimate the cleavage efficiency of nucleases, but these methods cannot estimate the specificity of nucleases; while LAM-HTGTS has the ability to detect editing specificity by identifying off-target sites by whole genome translocation of the target locus as decoy (Frock et al, 2015). However, this method first performs 80 cycles of linear amplification to generate multiple copies of the original DNA fragment, which would make it difficult to distinguish the PCR amplification product from the original template. Furthermore, restriction enzyme blocking during library preparation leads to underestimation of uncut or fully repaired target fragments and small inserts, and thus DSB repair products around the target site cannot be quantified (Hu et al, 2016), so this method cannot effectively quantify the cleavage efficiency. There is a strong need in the art for a method that allows for more accurate assessment of the cleavage ability of engineered nucleases, as well as cleavage specificity or off-target activity.
Disclosure of Invention
The inventors have conducted long-term studies and provided a method capable of simultaneously quantitatively analyzing the editing efficiency (i.e., the efficiency of cleavage at a target site of interest to generate a double-strand break) and off-target sites of a genome editing tool (e.g., an engineered nuclease), which is also the first method known in the art to simultaneously achieve the above objects, and in the present invention, also referred to as the PEM-seq method, the method can be widely used for genome editing evaluation. In particular, the method of manufacturing a semiconductor device,
in a first aspect of the invention, there is disclosed a method of detecting the activity of a Double Strand Break (DSB) generating agent capable of generating a double strand break at a target location in a genome, comprising:
(1) contacting the reagent with the sample to cause a Double Strand Break (DSB) event to occur in its genome;
(2) taking the genome nucleic acid treated in the step (1) as a template, and carrying out an extension reaction by using an extension primer which is complementary to a flanking sequence of the target position to obtain an extension product of which the sequence extends beyond the target position;
(3) ligating the extension product to a bridging linker at the extension end, wherein the bridging linker comprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) random nucleic acid segments of n nucleotides in length, wherein n is 1-30 or more, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30; allowing each extension product to acquire a unique identity (unique identifier) conferred by the random nucleic acid segment by ligating the bridging linker; and
(4) and (3) carrying out high-throughput sequencing on the extension products obtained in the step (3) and the connection products of the bridging linkers, and identifying whether a double-strand break event occurs at the target position and/or a double-strand break event occurs at a non-target position.
In one embodiment, the agent comprises a nuclease, e.g., the agent is an engineered nuclease (engineered nuclease), e.g., a Zinc Finger Nuclease (ZFN), TALEN, or CRISPR-CAS.
In another embodiment, the sample is a eukaryotic cell, such as an animal cell (preferably a mammalian cell) or a plant cell, or a prokaryotic cell.
In another embodiment, wherein in step (2), the extension primer is subjected to one or more denaturation-annealing cycles, preferably 1 to 20 times, such as1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 times, with the sample genome before the extension reaction occurs.
In another embodiment, the extension primer carries an affinity tag (affinity tag), for example, an affinity tag such as biotin (biotin) attached to the 5' end of the extension primer.
In another embodiment, step (1) is followed by a step of fragmenting the genome, such as sonication or endonuclease digestion.
In another embodiment, wherein in step (2) the binding site of the extended primer to the genomic nucleic acid is located within 2kb upstream or downstream of the double strand break at the target position, such as within 1kb, 900bp, 800bp, 700bp, 600bp, 500bp, 400bp, 300bp, 200bp or 100bp, it is further preferred that the extended primer is annealed to the genomic sequence repeatedly.
In another embodiment, wherein after the extension product is obtained in step (2), a step of isolating the extension product is further included; preferably, the isolation is performed by affinity purification using an affinity tag of the extension primer, more preferably the affinity purification is based on binding between biotin and avidin or streptavidin, further preferably the avidin or streptavidin is attached to a solid substrate, such as a bead.
In another embodiment, the activity refers to the cleavage efficiency and/or specificity of the agent; alternatively, the method further comprises the step of analyzing the cleavage efficiency and/or specificity of the reagent according to the sequencing result after the sequencing.
In a second aspect of the invention, a method of screening for a site of a genomic double strand break is disclosed, comprising
(1) Analyzing the target genome sequence to obtain a double-strand break candidate target site;
(2) making double strand breaks at the candidate target sites in step (1) using a double strand break generating reagent;
(3) performing the steps of the method of the first aspect;
(4) analyzing the efficiency and/or specificity of cleavage of said agent at different candidate target sites.
Preferably, the double-strand break generating agent comprises an engineered nuclease (e.g., a Zinc Finger Nuclease (ZFN), TALEN, or CRISPR-CAS 9.
In a third aspect of the invention, a method of screening for an engineered nuclease is disclosed comprising:
(1) providing an engineered nuclease candidate;
(2) detecting the activity of the engineered nuclease candidate, including cleavage efficiency and/or specificity, using the method of the first aspect;
preferably the engineered nuclease is a Cas nuclease, more preferably the Cas nuclease is selected from Cas9, Cas12a, Cas12b, Cas13a, Cas14 and variants thereof.
In a fourth aspect of the present invention, there is disclosed a method for identifying an inhibitor of a double-strand break producing agent, which comprises:
(1) contacting a double-strand-break-generating agent with a candidate compound;
(2) assessing the activity of the double-strand break generating agent using the method described in the first aspect;
(3) selecting a compound capable of reducing the activity of the double strand break generating agent.
Preferably, the double-strand break generating agent comprises an engineered nuclease (e.g., a Zinc Finger Nuclease (ZFN), TALEN, or CRISPR-CAS 9.
In a fifth aspect of the present invention, there is disclosed a method of identifying an enhancer for a double-strand break-producing agent, comprising:
(1) contacting a double-strand-break-generating agent with a candidate compound;
(2) assessing the activity of the double-strand break generating agent using the method described in the first aspect;
(3) selecting a compound capable of enhancing the activity of the double strand break generating agent.
Preferably, the double-strand break generating agent comprises an engineered nuclease (Engineredinuclease), such as a Zinc Finger Nuclease (ZFN), a TALEN, or a CRISPR-CAS.
In a sixth aspect of the invention, a kit for use in the method of any one of the preceding aspects is disclosed.
In one embodiment, the kit comprises extension primers complementary to the target position flanking sequences, and a pool of bridge adaptors comprising the random nucleic acid sequence segments, wherein the random nucleic acid sequence segments of each bridge adaptor comprised in the pool have a unique sequence such that each extension product obtains a unique identifier (unique identifier) conferred by the random nucleic acid segment by ligating the bridge adaptor;
preferably, the kit further comprises an agent capable of generating a Double Strand Break (DSB) at the target site.
In one embodiment, the agent is an engineered nuclease (engineered nuclease), such as a Zinc Finger Nuclease (ZFN), TALEN, or CRISPR-CAS.
In a seventh aspect of the invention, a Cas9 protein is disclosed that has activity to generate a double strand break at a genomic target location and has an amino acid sequence that is identical to SEQ ID NO: 1, has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identity, and has a mutation at one or more positions selected from the group consisting of lysine (K) at position 848, lysine (K) at position 1003, arginine (R) at position 1060, and aspartic acid (D) at position 1135.
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ IDNO:1)
Encoding the amino acid sequence of SEQ ID NO: 1 the nucleic acid sequence of the protein sequence is as follows:
GACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGAC(SEQID NO:2)。
in one embodiment, the mutations are specifically K848A, K1003A, R1060A and D1135E, preferably, the sequence of the Cas9 protein is as set forth in SEQ ID NO: 3, respectively.
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLADDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPALESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKAPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFESPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ IDNO:3)
In an eighth aspect of the invention, a nucleic acid sequence encoding the Cas9 protein of the eighth aspect, an expression vector expressing the nucleic acid sequence, and a cell, e.g., a eukaryotic cell, such as a plant cell, an animal cell (preferably a mammalian cell), or a prokaryotic cell, comprising the nucleic acid sequence, the expression vector are disclosed.
Wherein, the nucleic acid sequence encoding the Cas9 protein of the seventh aspect is preferably as shown in SEQ id no: 4:
GACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGGCCGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTGCCCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGGCCCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGAAAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGAC(SEQID NO:4)
in a ninth aspect of the invention, there is provided a use of the aforementioned Cas9 protein, nucleic acid sequence, expression vector or cell in targeted modification of a genome.
The method provided by the invention can be used for evaluating the editing capacity of the engineering nuclease including CRISPR/Cas9, and is obviously superior to the existing method in the prior art. The main reasons for this are: first, the use of primer extension and molecular beacons (RMBs) eliminates amplification bias during PCR amplification used in other methods (e.g., T7EI, RFLP, and targeted sequencing); second, PEM-seq is able to detect comprehensively deletions of genome-sized fragments generated after gene editing has occurred, as well as genome-wide translocations, both events that can occur in CRISPR/Cas9 edits, while other methods (such as T7EI, RFLP and targeted sequencing) can only be used to detect small insertions, or can only be used for off-target site analysis (such as LAM-HTGTS). Therefore, the method of the invention has immeasurable value in the aspects of evaluation, detection, analysis, screening and the like of genome editing tools.
Drawings
Fig. 1 shows the results of detecting the off-target hot spot of CRISPR/Cas9 using PEM-Seq. (A) PEM-seq overview. To prepare a PEM-seq library, a primer extension reaction is performed using biotinylated primers to obtain a single copy template, followed by ligation of a bridge linker and DNA amplification. The solid lines represent bait regions and the dashed lines represent captured genomic regions. "N" denotes a random nucleic acid sequence in a bridging linker and may also be referred to as a molecular beacon (RMB). The arrows show the position of the primers and their orientation. (B) SpCas9: circos graph of RAG1A library. A total of three biological replicates were shown from outside to inside, showing translocation junctions of 19,494, 16,005 and 18,078, respectively. Whole genome translocations divided into 5-Mb regions were plotted on a logarithmic scale. Chromosomes are shown in a clockwise direction from centromere to telomere. Black arrows indicate SpCas9: RAG1A cleavage site. The lines in the inner ring connect the target of interest to off-target hot spots. (C) SpCas9 on chromosome 7: RAG1A translocates a magnified view of the junction (bin 2 Mb). The black arrows indicate the identified off-target hot spots. Cen represents centromere, and p/q represents chromosome arm. The Pearson correlation coefficient between repeat 1 and 2 was 0.99, between repeat 1 and 3 was 0.99, and between repeat 2 and 3 was 0.98. (D) SpCas9 in 293T, HCT116, K562 and U2OS cells: scatter plot of RAG1A off-target hotspot. The y-axis shows the frequency of each off-target hotspot occurring every 100,000 editing events (including indels and translocations). Asterisks indicate off-target hot spots detected by PEM-seq, but not by LAM-HTGTS. (E) Using venn diagrams, SpCas9 in 293T, HCT116, K562 and U2OS cells: RAG1A off-target the overlapping relationship between hotspots. The legend is the same as for FIG. D. (F) In vitro SpCas9 digestions were performed for the RAG1A off-target hotspot. The indicated amplified fragments were incubated with purified SpCas9 for 20 hours. "On" denotes the target site for RAG 1A. "NC", without SpCas9: RAG1A target site, but may be targeted by SpCas9: fragment targeted by MYC1, as a negative control. Inverted triangular arrows indicate uncut segments, while horizontal triangular arrows indicate larger cut segments.
Fig. 2 shows the results of detecting the editing activity of CRISPR/Cas9 using PEM-seq. (A) Cas9 induces a Double Strand Break (DSB). Germline represents uncut or complete reconnection (perfect reconnection); indels (indels) are from erroneous reconnections; the translocation would then involve a second DSB. (B) SpCas9: germline, percent insertion deletions and translocations for RAG 1A. Mean. + -. SD. (C) SpCas9 detected by PEM-seq, RFLP, T7EI assay and single cell RFLP: frequency of indels in RAG 1A. The average is represented by a black line. DNA from different repeats is labeled with different colors. (D) SpCas9 linked to off-target DSBs or whole genome low-level DSBs: composition of RAG1A translocation. The mean value. + -. SD; (E) SpCas9: indels within + -20 bp around the cleavage site in the RAG1A library. Wherein partial Deletion also involves insertion, and is represented as "Deletion + insertion" in the figure. The mean value. + -. SD; (F) in SpCas9: ligation frequency (50 bp for bin) within + -5 kb (excluding + -20 bp) around the cleavage site of RAG 1A. The upper panel shows different DNA repair products detected by PEM-seq, including inversions, deletions and deletions. The small box represents the Cas9 target. The arrow without a tail indicates the direction of translocation. The arrows with dotted tails indicate the position and orientation of the primers used for PEM-seq. In the lower panel, the dark dashed lines indicate Cas9 cleavage sites, and the light dashed lines indicate primer positions. The number of connections per region is also shown; (G) SpCas9: the ligation frequency 5-50kb downstream of the cleavage site of RAG1A (bin is 1 kb). The dotted line indicates the boundary of the 5-50kb region. The arrow with a dotted tail indicates the primer used for PEM-seq. The number of connections per area is also shown.
FIG. 3 shows the difference in editing ability between different targets in the peripheral region of the selected RAG 1A. The upper panel is a schematic of the target sites for RAG1A, RAG1B, RAG1C, and RAG 1G. Wherein the arrow with a dotted tail represents the biotinylated primer for PEM-seq. Light-colored boxes indicate gRNA target sites, dark-colored boxes in boxes indicate Cas9 cleavage sites. The lower panel shows the respective composition of germline, insertion deletion and translocation resulting from Cas9 treatment in HEK293T cells. Mean. + -. SD. For detailed information, see tables S1 and S3.
Fig. 4(a) is a schematic of the SpCas9 domain and the corresponding point mutations of the SpCas9 variant. (ii) a (B) Editing efficiency of each SpCas9 variant targeting RAG1A in HEK293T cells as measured by PEM-seq. The target site sequences are listed in the upper part, and the underlined bases indicate PAM sequences. Error bars, mean ± SD; two-tailed t-test,. p < 0.05; (C) for a particular SpCas9 variant targeting RAG1A in HEK293T cells, the frequency of total translocational junctions in the 1kb region around the off-target hotspot was determined. The total number of off-target hotspots identified for each Cas9 variant is shown above the bar. Error bars, mean ± SD. Two-tailed t-test,. p < 0.05; p < 0.01; (D) is a scatter plot of the RAG1A off-target hotspot for the variants shown. The y-axis shows the frequency of occurrence of each hotspot every 100,000 editing events (indels + translocation). Asterisks indicate off-target hotspots detected in the xCas9 library, but not detected in the WT. The arrows represent the top-ranked hotspots in the xCas9 library with non-NGG PAM. (E and F) show editing efficiency and off-target hotspots for different SpCas9 variants targeting the EMX1 site in HEK293T cells, legend to panels B and C. (G) Off-target hotspots for each SpCas9 variant at the EMX1 target site are shown. The y-axis shows the frequency of each hotspot per 100,000 editing events (indels + translocation). (H-J) editing efficiency and off-target hotspots for each SpCas9 variant when directed against MYC target site 1(locus 1) in HEK293T cells; (K-M) shows a statistical comparison of the off-target hotspots of eSPCas9 with FeSpCas9 at three gene targets.
Figure 5 shows the use of PEM-seq in assessing editing efficiency and specificity of SpCas9, SaCas9 and assas 12 a. (A) Targeting DNA and PAM sequences for SpCas9, SaCas9, and assas 12a are shown. The letters indicate the PAM sequences of the different Cas9 enzymes, with blue representing SpCas9, green plus blue representing SaCas9, and red representing AsCas12 a. (B) Editing efficiency of SpCas9(Sp), SaCas9(Sa) and assas 12a (As) against selected targets in HEK293T cells. Error bars, mean ± SD. Two-tailed t-test,. p < 0.05; p < 0.01. (C) The frequency of translocation junctions within the 1kb region around off-target hot spots generated when SpCas9(Sp), SaCas9(Sa) and AsCas12a (As) were targeted to the indicated sites in HEK293T cells. The total number of off-target hot spots identified for each CRISPR-Cas nuclease is shown above the strip. Error bars, mean ± SD. Two-tailed t-test,. star, p < 0.01.
Fig. 6(a) shows detection of SpCas9 in HEK293T cells using PEM-seq method at Cas9 to AcrIIA4 at the indicated mass ratios: editing efficiency of RAG 1A. Error bars, mean ± SD. Two-tailed t-test,. star, p < 0.01. (B) Showing Cas9 to AcrIIA4 at the mass ratios shown, SpCas9: the frequency of translocating junctions in the 1-kb region near the off-target hotspot of RAG1A, the total number of off-target hotspots is shown above the bars. Error bars, mean ± SD. Two-tailed t-test,. p < 0.05; p < 0.01. (C) Showing SpCas9: the compositional ratio of deletion and off-target junctions were inserted in the RAG1A sequencing result library. The total number of junctions from the triple sequencing library is shown above the bar. (D-G) shows editing efficiency and off-target hot spots for SpCas9 against other target sites, respectively, where the mass ratio of Cas9 to AcrIIA4 is 1:1, legend is the same as (A-C).
Detailed Description
The invention may be further understood by the examples, however, it is to be understood that these examples are not limiting of the invention. Variations of the invention, now known or further developed, are considered to fall within the scope of the invention as described herein and claimed below.
Definition of
The term "target location" refers to a location in the genome that is selected or targeted for the occurrence of a double-strand break, which may refer to either the actual specific nucleotide site of the double-strand break or a segment of the nucleic acid sequence selected as the target sequence that includes 1 or more nucleotides.
The term "double strand break event" refers to the occurrence of a DNA Double Strand Break (DSB) at a specific location (or target location) in the genome of a sample. A DNA double strand break event is an event that occurs in vivo (intracellularly), and the cell has DNA repair mechanisms, so after a DNA double strand break occurs, the cell repairs the double strand break and may produce multiple repair results: if perfect repair occurs, the target position sequence after repair is the same as before the occurrence of double-strand break; if the repair is imperfect, the target sequence after repair may be subjected to partial nucleotide insertions or deletions (indels) compared to the original sequence, or may be exchanged with nucleic acid fragments at other positions in the genome to cause translocation (translocation). Thus, in a broader sense, a "double strand break event" also includes a DNA repair process that occurs after a DNA double strand break occurs in the genome.
The term "double-strand break generating agent" refers to an agent that cleaves a specific location in the genome to generate a double-strand break in DNA, and includes a protein (e.g., an enzyme) or a nucleic acid, or a combination of more than one protein or nucleic acid. The cleavage is preferably a cleavage at a specified position of the genome, i.e.a targeted cleavage. When targeted cleavage is performed, the agent preferably comprises a targeting agent, e.g., a DNA fragment capable of binding at the genomic target site, and a cleavage agent, e.g., a domain in a protein (e.g., an enzyme, such as a nuclease in particular) that binds to DNA, or a targeting moiety, e.g., a domain that is capable of directing the cleavage agent or cleavage moiety to create a double strand break at the target site.
In one embodiment, the double-strand break generating agent is a genome editing tool.
In one embodiment, the double-strand break generating reagent is an engineered nuclease, wherein the engineered nuclease is engineered, for example, to improve or optimize the amino acid sequence of the nuclease, or to optimize an auxiliary sequence (e.g., a DNA or RNA sequence that assists in directing the cleavage by an enzyme) thereof, such that the nuclease can achieve specific cleavage or targeted cleavage as desired, and in some embodiments, the engineered nuclease not only itself, but also includes other auxiliary sequences. Examples of such engineered nucleases are Zinc Finger Nucleases (ZFNs), TALENs or CRISPR-CAS, etc.
The term "zinc finger nuclease", i.e. (ZFN), is a chimeric protein molecule capable of passing through a Double Strand Break (DSB) promoting a target gene site within a host cell. The ZFNs may comprise a DNA binding domain and a DNA cleavage domain, wherein the DNA binding domain comprises at least one zinc finger and is linked to the DNA cleavage domain. In one embodiment, the zinc finger DNA binding domain is at the N-terminus of the chimeric protein molecule and the DM cleavage domain is located at the C-terminus of the molecule. The ZFN DNA cleavage domain may be derived from a non-specific DNA cleavage domain species, such as the DNA-cleavage domain of a class II restriction enzyme. In one embodiment, the DNA cleavage domain is from a Fok I nuclease.
The term "TALEN", a transcription activator-like effector nuclease, is formed by a transcription activator-like effector (TALE) attached nuclease, which binds to DNA and cleaves the DNA strand at a specific site, thereby knocking out or introducing a new genetic material.
TALEs are composed of 12 or more tandem "protein modules" that specifically recognize DNA and flanking N-terminal and C-terminal sequences. Most of the amino acid sequences of this series of modules are repeated, with differences being essentially at positions 12 and 13 of the repeated sequences, which are called Repeat Variable Sequences (RVDs). Among them, residues 12 and 13 are key sites for targeted recognition and are called di-residues. Unlike each zinc finger protein recognizing specific triplet bases, each RVDs on TALEs can recognize only one base and there is no obvious pre-post nucleic acid dependence, so TALEs can be designed to recognize and bind all DNA sequences of interest.
The term "CRISPR-Cas" is an adaptive immune defense developed by bacteria and archaea during long-term evolution, and can be used to combat invading viruses and foreign DNA. The CRISPR-Cas9 system provides immunity by integrating fragments of invading phage and plasmid DNA into the CRISPR and using the corresponding CRISPR RNAs (crRNAs) to direct degradation of homologous sequences. The working principle of this system is that crRNA (CRISPR-derived RNA) is bound to tracrRNA (trans-activating RNA) by base pairing to form a tracrRNA/crRNA complex, which directs the nuclease Cas protein (CRISPR associated protein, e.g., Cas9, etc.) to cleave double-stranded DNA at the sequence target site paired with the crRNA. By artificially designing the two RNAs, sgRNA (short guide RNA) with a guiding function can be transformed, so that site-specific cleavage of the DNA by Cas9 can be guided.
The term "extension primer" refers to a primer complementary to a flanking sequence of a target site (or the occurrence site of a DSB fragmentation event), the flanking sequence including upstream or downstream of the target site (or the occurrence site of a DSB fragmentation event), the extension primer being a single primer that binds upstream or downstream of a DSB site caused by a double strand break generating reagent by annealing, and after an extension reaction, an extension product is obtained and is a single copy of a template, thereby retaining information on the original amount of the template DNA, and the extension length of the extension product at least needs to exceed (beyond) the target site (or the occurrence site of a DSB fragmentation event).
In a preferred embodiment, the mixture is subjected to 1 or more (preferably 1 to 20, for example 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) cycles of denaturation-annealing after the addition of the extension primer to the genome of the sample. Wherein the cycle of performing genome denaturation once and annealing the primer to the target binding site is 1, and on the basis, performing genome denaturation once and annealing the primer is the next cycle. The inventors found that the repeated annealing treatment can greatly improve the binding efficiency of the primer to the template and achieve complete coverage of the template.
The term "bridging linker" refers to a DNA sequence comprising one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) random nucleic acid segments of n nucleotides in length (sometimes referred to herein as Random Molecular Beacons (RMBs) or molecular beacons, which have the same meaning), different bridging linkers being distinguished from one another by differences in random sequence in the random nucleic acid segments. n is 1 to 30 or more, such as1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30. Preferably, each of the bridging linkers comprises, in addition to the random nucleic acid segment, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) common nucleic acid segments having a length m, where m is 1-30 or more, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30. Wherein the consensus nucleic acid segment is randomly spaced from the random nucleic acid segment, preferably in both regions of the bridge linker, and wherein the consensus nucleic acid segment at the outer end (or distal end) is capable of being bound by a primer when the bridge linker is ligated to the extension primer, thereby facilitating subsequent sequencing. Wherein the common nucleic acid segment portion comprises a nucleotide sequence that is identical in each bridging linker molecule. In one embodiment, the bridging linker comprises a random nucleic acid segment flanked by two common nucleic acid segments.
A bridging linker is a double-stranded DNA fragment that can be blunt-ended on both sides, blunt-ended on one side and sticky-ended on the other side, or sticky-ended on both sides, preferably formed by annealing two single-stranded DNAs.
After each extension product is connected with the bridging linker, a unique identifier (unique identifier) assigned by the random nucleic acid segment is obtained, namely, the end of each extension product is connected with the bridging linker with a unique sequence (the uniqueness is assigned by the random nucleic acid segment).
The term "editing efficiency", also referred to as "cleavage efficiency" in the present invention, refers to the efficiency of the double-strand-break-generating reagent to cleave at the target genomic site to generate a double-strand break, i.e., the ratio of the amount of the template cleaved at the target site to the amount of the total DNA template after cleavage. The editing efficiency reflects the ability of the reagent or gene editing tool to act on the target site.
The term "specificity" refers to the sometimes unavoidable cleavage at other positions in the genome than the target site of interest (i.e., off-target sites) due to the double-strand break-generating agent, and specificity is used as a measure of the proportion occupied by cleavage at the target site in all cleavage events, with higher ratios indicating better specificity. It should be noted that in gene editing, specificity is associated with the target site of choice in addition to the tools used in gene editing. For example, the selection of different targets when editing for the same gene may also result in different specificities.
Examples
Materials and methods
PEM-seq program and data analysis
1. Primer extension
All biotin-labeled primers (sequences see table 2) (Sangon, Shanghai) were placed within 100bp of the cleavage site. The primers were repeatedly annealed and denatured with 20. mu.g of sonicated genomic DNA (0.3-2kb) under the following conditions: 3 minutes at 95 ℃; annealing at 95 ℃ for 2 minutes at an annealing temperature Ta for 3 minutes for 5 cycles (see table 1 for details); annealing under Ta for 3 minutes. Bst polymerase 3.0(NEB) was then added to perform primer extension: 10 minutes at 65 ℃ and 5 minutes at 80 ℃. Then 1.2 XAxyPrep Mag PCR Clean Up bead (Axygen, USA) was added to remove excess biotinylated primer. The purified product was heated to 95 ℃ for 5 minutes and then rapidly cooled on ice for 5 minutes to denature the DNA. Finally using DynabeadsTM MyOneTMSterpavidin C1(Thermo Fisher) was enriched for biotin-labeled extension products.
TABLE 1
Figure BDA0001996779900000201
TABLE 2
Figure BDA0001996779900000202
Figure BDA0001996779900000211
2. Connecting bridge joint (with random molecular beacon RMB)
Extension products on Streptavidin C1beads were used at 400. mu.L 1 XB&Wash twice with W buffer (1M NaCl, 5mM Tris-HCl (pH 7.4) and 1mM EDTA (pH 8.0)), followed by 400. mu.L dH2And O washing. Followed by 42.4. mu.L dH2O resuspend the DNA-bead complex. Ligation was performed at room temperature in 15% PEG8000(Sigma) using T4DNA ligase (Thermo Fisher scientific) with a bridging linker annealed from two single stranded DNAs, each having the following sequence:
Bridge adapter-up:/5phos/CCACGC GTGCTC TAC ANN NNT NNN ANN NTN NNN AGATCG GAAGAG CAC ACG TCT GAA CTC CAG T-NH2(C7)(SEQ ID NO:25);
Bridge adapter-lower:TGT AGA GCA CGC GTG GNN NNN N-NH2(C7)(SEQ ID NO:26)。
wherein N is optional A, T, C or G.
The ligation reaction system is as follows:
TABLE 3
Figure BDA0001996779900000221
3. PCR amplification for Illumina sequencing.
Sequentially using 400 uL of 1 XB&W buffer and 400. mu.L dH2The ligation product was washed twice with O and then 80. mu.L dH2And O is suspended. The DNA-bead complex was subjected to nested PCR (Taq, Transgen Biotech, China) directly on the bead with sequence primers I5 and I7, amplifying for 16 cycles. The PCR product was then recycled by size-selection beads (Axygen, USA), followed by PCR (Fastpfu, Transgen Biotech, China), and labeled with Illumina P5 and P7 sequences. All PEM-seq libraries constructed as described above were sequenced with 2X 150bp Hiseq.
Plasmid construction
All gRNA target sequences are listed in table 4:
TABLE 4
Figure BDA0001996779900000222
Figure BDA0001996779900000231
Targeted grnas for SpCas9 use the pX330 backbone (Addgene 42230) and SpCas9 nickase uses the pX335 backbone (Addgene 42335). Inserting each SpCas9 variant, SaCas9 and AsCpf1 into a pX330 skeleton, and specifically comprises the following steps:
cDNAs of SpCas9 variants (D1135E, eSpCas9(1.1), fespscas 9 and xCas9) were generated by mutation-overlap PCR and ligated into the pX330 plasmid using an AgeI/EcoRI cleavage. The SaCas9cDNA was purified after digestion of pX601(Addgene 61591) with AgeI/EcoRI and then ligated into the AgeI/EcoRI digested pX330 plasmid. Plasmid U6 promoter-SaCas 9gRNAscaffold DNA from pX601 was inserted between the AflIII and XbaI sites of the pX330-SaCas9 plasmid. The AsCas12a (Cpf1) cDNA was amplified from SQT1659(Addgene78743) and inserted between the AgeI and EcoRI sites of pX 330; cas12a gRNAscaffold DNA was inserted into the BbsI and XbaI double-digested pX330-AsCfp1 backbone, and AcrIIA4 plasmid PJH376 was purchased from Addgene (Addgene 86842).
Cell lines and cell transfections
293T, U2OS and HCT116 cells were cultured in DMEM (Corning) medium supplemented with glutamine (Corning), 10% FBS and penicillin/streptomycin (Corning) at 37 ℃ and 5% carbon dioxide. K562 cells were cultured in RPMI 1640(Corning) containing glutamine, 15% FBS and penicillin/streptomycin (Corning) at 37 ℃ and 5% carbon dioxide. In a 6cm dish, Ca-PO was used4A HEK293T cell library was prepared by co-transfection with 7.2. mu.g nuclease plasmid and 1.8. mu.g pMAX-GFP. The library of U2OS was co-transfected with 20 μ g Cas9 plasmid and 5 μ g gfp plasmid using pei (sigma). Also in 10cm dishes, 20. mu.g of Cas9 plasmid and 5. mu.g of GFP plasmid were transfected into HCT116 cells using Lipofectamine 2000 (Invitrogen). Mu.g of pX330 and 5. mu.g of GFP were co-transferred into K562 cells in SF buffer (Lonza) using the nucleofector 4D coupled with FF120 program. At 6 holesUse of Ca-PO in culture dishes4Co-transfection of 2 μ g SpCas9: RAG1A plasmid, specified ratio of 6 μ g AcrIIA4 and blank plasmid and 1 μ g GFP plasmid were put into target cells and a Cas9 inhibitor AcrIIA4 library was prepared from the above cells. For SaCas 9: MYC1 library, 2 μ g SaCas 9: MYC locus1 plasmid, 2 μ g spcas9: the RAG1A plasmid, 2. mu.g of AcrIIA4 or a blank plasmid, and 1. mu.g of GFP plasmid were co-transfected into cells cultured in 6-well plates. For the other 1:1 library, 2. mu.g of pX330 plasmid, 2. mu.g of AcrIIA4 or blank plasmid and 1. mu.g of GFP plasmid were co-transfected into HEK293T cells. At 48 hours post-transfection, the efficiency of GFP co-transfection in all libraries was analyzed using FACS.
"SuperQ" protocol for PEM-seq analysis
After sequencing, Hiseq reads were processed as follows. For initial pre-treatment, the Illumina linker sequence and terminal low quality sequences (QC <30) were removed using the cutatapt program (http:// cutatapt. readthetadocs. io/en/stable /); while removing reads with residual sequences shorter than 25 bp. The reads were then de-multiplexed using the fastq-multx program (https:// github. com/brwnj/fastq-multx) to distinguish between indens. For alignment and clustering of reads, we adjusted the corresponding program used in LAM-HTGTS (Frock et al, 2015) to perform mapping localization of reads and translocation breakpoint detection. The hg38 genome was used as a reference. The molecular beacon clustering algorithm (Peng et al, 2015) was adjusted to remove PCR repeats at an edit distance of 2. The cleavage site and the adjacent + -5 bp region were used for analysis of indels; reads containing large deletions resulting from excision and religation within a region of + -250 kb of the cleavage site are also classified as indels (I). Reads that did not detect any mutations around the breakpoint were identified as germline (G). Identification of a whole genome translocation (T) was as described previously. The Editing efficiency (Editing efficiency) was calculated as follows, taking into account the non-cutting control and the Transfection Efficiency (TE):
Figure BDA0001996779900000251
wherein S represents a nuclease-treated library and C represents a control library.
Off-target hotspot identification
Reads (250 kb) near the break site were excluded and the MACS2callpeak pattern was used to identify translocation-rich regions: -extsize 50-q 0.05-llocal 10000000. The results of MACS2 were further filtered to remove sites that are not similar in sequence to the target site or fewer than 3 ligated sites. The off-target hot spot is defined as before (Frock et al, 2015). In short, off-target hot spots recur in specific regions containing buried potential target sequences and have a balanced directional distribution centered at the possible cleavage positions. In contrast, off-target independent translocations are generally low frequency and have a preferential directional distribution and no nearby sequences close to the target. The total number of ligations within + -500 bp of the possible cleavage sites was counted and after normalization with the untreated control library off-target intensities were obtained:
Figure BDA0001996779900000252
sequence homology analysis at translocation ligation break site
Translocation junctions from regions within + -250 kb of the cleavage site were first excluded for subsequent analysis. The overlap between the decoy sequence (i.e., the sequence near the double-strand break at the target site of interest) and the capture sequence (i.e., the sequence near the double-strand break at the off-target site) is considered to be microscopic homology (deletion); the gap between the bait and the capture sequence is considered an insertion. The percentage of insertions and deletions of 0 to 10 nucleotides was then calculated.
RFLP cleavage assay
The RAG1A site was PCR amplified using Fastpfu (Transgen Biotech, China). The amplification product was recovered by 1.2 × AxyPrep Mag PCR Clean-Up beads (Axygen, US), followed by 1 hour of cleavage with StyI (NEB), and then subjected to agarose gel electrophoresis. Band intensities were quantified by ImageJ (version 1.51J8) and indels were measured using the following formula:
Figure BDA0001996779900000261
ICis the sum of the intensities of the two strips being cut, IUIs the strength of the uncut strip.
T7EI cleavage assay
The cut ratio (FC) was calculated using the method described previously (Frock et al, 2015). The final indel was measured by the following formula (assuming annealing was completely random before T7EI cutting):
Figure BDA0001996779900000262
single cell RFLP cleavage assay
A single clone of SpCas9RAG1A transfected cells was picked to 96-well plates for culture and genomic DNA was extracted 7 days later. RFLP cleavage assay was performed as described in the "RFLP cleavage assay" section above. Cleavage products are classified into 3 types: complete digestion (I)I) Partial digestion (I)H) Undigested (I)G). Since the HEK293T cells used were triploid, the score for complete digestion was set to 3 and the undigested score to 0. For the partially digested case, Cas9 theoretically has the same chance to edit one or both alleles, so the score for this class of cases is set to 1.5. The percentage of indels can then be calculated by the following formula:
Figure BDA0001996779900000263
TIDE assay
The overall scheme is referenced to the methods described previously (Brinkman et al, 2014). Primers were designed to target sites for RAG 1A. Genomic DNA was extracted from SpCas9RAG1A transfected cells and amplified for 30 cycles using a conventional PCR procedure using 50ng of genomic DNA as template. Gel-purified PCR products were prepared for Sanger sequencing. The results file (. ab1) was analyzed by the tool provided by the https:// tide.
Statistical analysis
Data are presented as mean ± SD, with significant differences considered at p < 0.05.
Example 1 method design
Chromatin translocation and indels caused by non-specific cleavage of CRISPR/Cas9 may threaten the stability of the genome of the treated cells, so it is necessary to effectively evaluate the specificity and efficiency of CRISPR/Cas 9.
Primer-extension-mediated sequencing (PEM-seq) was used to extend from the primer binding site, followed by end-ligation of a bridging linker containing Random Molecular Beacons (RMBs), library construction and sequencing, and then bioinformatics analysis to obtain DSBs. For simplicity, the method of the present invention is also referred to as the PEM-seq method.
In order to quantitatively analyze DSB repair products, the method of the invention generates single copy products of original templates by adopting a primer extension method, and after separating the fragments, each fragment is connected with a joint of a random nucleic acid sequence (also called molecular beacon (RMB)) containing a unique sequence and a length (in the subsequent embodiment, the length of the random nucleic acid sequence is 14bp) so as to specifically mark each fragment (FIG. 1A), thereby avoiding the amplification preference, retaining the original proportion of completely matched and mutated fragments and retaining the original quantity information.
Chromatin translocation is the ligation of two independent Double Strand Breaks (DSBs) and the placement of a site-specific primer at a known evoked DSB site helps identify another unknown DSB. Thus, the PEM-seq method can accurately detect the position where chromatin translocation occurs and quantify the chromatin translocation. In addition, the PEM-seq keeps the original quantitative information and does not introduce other treatments (such as enzyme digestion and the like) for structures nearby the DSB, so the PEM-seq can also quantitatively detect the results of other DSBs such as insertion deletion and the like.
Example 2PEM-seq enables very sensitive detection of the hot off-target spot of CRISPR/Cas9
This example compares to the LAM-HTGTS approach, which identifies 33 off-target sites in HEK293T cells (Frock et al, 2015), using SpCas9 to target the RAG1A site. Each PEM-seq library was constructed using approximately 20 μ g CRISPR/Cas9 treated genomic DNA, with extension primer design located within 200bp from the target cleavage site. The experiment was repeated three times for translocation ligation hotspot analysis (fig. 1B and 1C); hot spots with sequences highly similar to the target site and/or defined PAMs occurring in at least two repeats are considered off-target sites (Frock et al, 2015).
The PEM-seq defined a total of 53 off-target sites, including 24 new sites not identified by LAM-HTGTS, while the 4 weak sites identified by LAM-HTGTS were lost (FIGS. 1B-D and Table S2). To verify the authenticity of these off-target sites, 8 of them, including 4 new sites and 4 common sites, as well as the correctly cleaved target site (on-target sites) in the RAG1A gene were next amplified and used for CRISPR/Cas9 treatment in vitro. After 20 hours of incubation, the resulting uncut electrophoretic band for the correct cleavage site in the RAG1A gene was found to disappear almost completely, whereas no detectable cleavage was present for DNA fragments without the RAG1A target site (fig. 1F). Of the 8 selected sites as described previously, Cas9 induced specific cleavage efficiencies between 18-60%, much lower than cleavage at the RAG1A target site (fig. 1F). In addition, PEM-seq analysis of the two weakest off-target sites (OT 6 and OT8 as shown in FIG. 1F) revealed some translocation links between these OT6/8 and several other off-target sites. It can be seen that off-target has a higher probability of forming a reciprocal translocation. In conclusion, the above results show that the off-target sites detected by PEM-seq are indeed cleaved in vivo, thus indicating that the sensitivity of PEM-seq for detecting off-target sites is higher than that of LAM-HTGTS.
Furthermore, due to the limitations of the assay method, LAM-HTGTS can only be used to assess off-target sites and cannot detect other events resulting from cleavage, such as indels, whereas PEM-seq can quantitatively detect all editing events, as described below, and thus provide a better assessment of nucleic acid editing events.
Example 3PEM-seq enables quantitative analysis of the editing Capacity of CRISPR/Cas9
This example tested the ability of the PEM-seq to perform analysis, particularly quantitative analysis, on all gene editing events. Events generated after gene editing can be classified as: chromosomal translocations, indels, and germlines. Chromatin ectopy is the ligation of two independent Double Strand Breaks (DSBs), such as occurs between the DSB of interest and a DSB that is untargeted cleaved by Cas9, or a basal level DSB that naturally occurs in the genome; indels the cut ends cleaved by Cas9 are religated via in vivo DNA repair mechanisms, but the repaired sequence differs from the original sequence due to incomplete repair, resulting in sequence insertions or deletions (fig. 2A). The "germline" is the state in which the target fragment is not cleaved or has been completely repaired after cleavage, and in the case of complete repair, the completely repaired fragment is indistinguishable from the uncut state, and therefore, the uncut fragment and the completely repaired fragment are generally considered together as no cleavage event.
Chromatin translocation and indel water are commonly used in the art to balance the cleavage capacity of nucleases and are directly proportional (Alt et al, 2013). Thus this example next uses PEM-seq to analyze the number of translocations and insertions generated during gene editing to quantitatively assess the editing efficiency of CRISPR/Cas9 and compare it with other existing methods.
Transfection of SpCas9: RAG1A in HEK293T cells followed by 48 hours of incubation detected levels of translocation and indels of 2.7% and 35.7%, respectively, with a total editing efficiency of approximately 38.4% (fig. 2B and table S1). In addition, methods commonly used in the art to assess the efficiency of indels are also used: RFLP, T7EI and single cell-RFLP to analyze the cleavage condition of CRISPR/Cas9 at RAG1A site, the results show that the insertion deletion ratio detected by the methods is 27-40% (figure 2C), and the PEM-seq has similar capability in detecting insertion deletion to the existing means in the prior art. This indicates that PEM-seq can reliably quantify indels occurring at the target site location, and thus evaluate the editing efficiency of CRISPR/Cas 9. At the same time, PEM-seq also enables detection of off-target sites by detecting translocation, as demonstrated in example 2, which was not possible with the above three methods (RFLP, T7EI and single cell-RFLP).
The identified composition of translocations and indels is then further analyzed. Of these, approximately 1.1% of chromosomal translocation events occurred between the target site and the off-target site, while the other 1.6% occurred between the target site and low levels of DSBs in the genome-wide range (fig. 2D). As for indels, most of them occurred within 20bp around the target site, including 11.3% small insertions and 24.0% microdeletions (FIG. 2E). Larger deletions often occur within about 3kb downstream of the primer; in agreement with previous reports, inverted junctions were also distributed over the. + -.3 kb region (Frock et al, 2015) (FIG. 2F). Notably, there was also a low level of enrichment of ligation events (junctions) from the target site to the region up to 50kb downstream of it, which is a typical terminal excision induction pattern (fig. 2G). The above results indicate that CRISPR/Cas9 accumulates DSB repair products around the target site, whereas PEM-seq can detect them individually.
Example 4SpCas 9RAG 1C shows a balanced editing power in the region around RAG1A
There may be multiple potential targeting sequences in a region of the genome, and generally the optimal strategy is to select the optimal target sequence to prevent more off-target damage to the genome. Thus, two additional sites, RAG1B and RAG1C (Frock et al, 2015), within the 196bp region surrounding RAG1A were further tested for the editing efficiency of SpCas 9. The results show that SpCas9: RAG1B and SpCas9: RAG1C all showed lower editing efficiency (20.0% and 28.0%, fig. 3), but had fewer off-target sites (2 and 0, fig. 3) and a lower level of whole genome translocation (both 0.4%) (fig. 3). Subsequently, the region of RAG1A/G was further analyzed for SpCas9 nickase editing (Frock et al, 2015), with an editing efficiency of 19.3% (fig. 3) and 5 off-target sites (fig. 3), none of which was detected by LAM-HTGTS (Frock et al, 2015). Of all 5 identified off-target hotspots, RAG1A ranked the highest on the off-target list, but none correlated with RAG1G site (table S3). In contrast to SpCas9: RAG1A, SpCas9: the more balanced behavior of RAG1C may be a more preferred choice for targeting RAG 1.
This example shows that PEM-seq can be used to rapidly and accurately select target sites for gene editing.
Example 5 development of a novel Cas9 variant with PEM-seq assistance
In the embodiment, with the help of PEM-seq, a novel SpCas9 variant FeSpCas9 is developed, and has high editing efficiency and low targeting sites. Several variants of SpCas9 have been presented in the prior art in an attempt to improve the specificity of the enzyme and/or to extend the scope of its use (fig. 4A). For example, eSpCas9(Slaymaker et al, 2016) improves specificity by designing mutations that reduce nonspecific contact of Cas9 with DNA.
First, eSpCas9 was placed behind the same promoter as the wild-type SpCas9 and was designed to target the RAG1A site of HEK293T cells. Close to the expectation, eSpCas9 produced only 7 off-target sites, and there was no significant loss in editing efficiency (fig. 4B-D, table S4).
However, eSpCas9 is still not safe enough and there is a large margin for improvement, so the applicant further tried to introduce other mutations to improve its effect, and surprisingly, although the D1135E (Kleinstiver et al, 2015) mutation did not improve the specificity of SpCas9, the variant carrying this mutation alone still retained most of the off-target sites (fig. 4C and D), but after introducing D1135E site into eSpCas9, the resulting variant fescas 9 showed higher specificity than eSpCas9 with only 3 detected off-target (fig. 4C) with substantially unchanged or even improved editing efficiency (fig. 4A) for RAG1A site (fig. 4B), and this difference was statistically significant (fig. 4K). Furthermore, as can be seen from fig. 4D, taking the off-target hotspot with the abscissa scale of 3 as an example, the off-target hotspot exists in both D1135E and eSpCas9, but completely disappears in fespsca 9, and the off-target hotspots with the scales of 4, 6, and 7 also have similar situations, which indicates that the improvement of specificity of fespsca 9 does not simply integrate the results of the improvement of specificity of each of D1135E and eSpCas9, but generates a significant synergistic effect.
The specificity of fesspcas 9 at other sites was next tested, including a position in the EMX1 gene and a position near the c-MYC gene. Detection using the PEM-seq method found that WT SpCas9 targeted EMX1 site with 18 off-target sites, however fespsca 9 consistently showed very low off-target activity with no significant change in editing efficiency, and the specificity of fespsca 9 was superior to eSpCas9 at both sites (fig. 4E-J, L and M).
The above results show that PEM-seq can simultaneously evaluate the cleavage efficiency and specificity of nucleic acid tool enzyme such as Cas9, so that PEM-seq can be used for developing novel nucleic acid tool enzyme, and a novel Cas9 enzyme, fesspcas 9, is developed and obtained in this example.
Example 5PEM-seq for assessing the efficiency and specificity of editing of different CRISPR-Cas nucleases against the same target site
The previous examples have demonstrated that the PEM-seq method enables accurate assessment of the cleavage efficiency and specificity of gene editing tools such as nucleic acid engineering enzymes, and therefore this example selects three Cas nucleases known in the art: SpCas9, SaCas9 and AsCas12a, and 4 target sites near two genes (2 near c-MYC gene and another 2 near DNMT1 gene) were simultaneously selected in the genome, and evaluated for Cas9 nuclease using PEM-seq method (fig. 5A). In the experiment, in order to ensure that the nucleases have similar expression levels in the cells, the same plasmid backbone was used and all three Cas genes were placed behind the chicken β -actin promoter.
The results show that SpCas9 showed strong cleavage activity at all 4 sites tested, with editing efficiency (editing efficiency) between 25-50% (fig. 5B). At the same time, the editing efficiency of SaCas9 and AsCas12a was mostly lower compared to SpCas9, but SaCas9 exhibited the highest editing efficiency at the second target site of DNMT1 (focus 2), while AsCas12a had the highest editing activity at the second target site of MYC (focus 2) (fig. 5B). Notably, except at the first site of SaCas9 at DNMT 1(locus 1), no off-target hotspots were present for SaCas9 and AsCas12a for all other target sites, whereas SpCas9 was detected at several or even several tens of off-target hotspots in these target sites, indicating that the editing specificity of SaCas9 and AsCas12a was better (fig. 5C).
Example 6PEM-seq for evaluating Cas9 inhibitor Activity
This example utilizes the PEM-seq method to examine the widely used SpCas9 inhibitor AcrIIA4 to block SpCas9 in HEK293T cells: capacity of RAG 1A. Plasmids expressing SpCas9 AcrIIA4 were co-transfected in a mass ratio of 3: 1 to 1:1, to 1: 3. the results show that the editing efficiency of SpCas9 is significantly reduced when co-transfected with AcrIIA 4. Wherein, the transfection ratio is 3: 1, the editing efficiency of SpCas9 is reduced by 11 times, and when the AcrIIA4 proportion is increased to 1:1 and 1: after 3, SpCas9 activity was further inhibited (fig. 6A). Also as expected, the efficiency of editing by SpCas9 was suppressed, and less translocation ligation was generated by off-target (fig. 6B). However, off-target activity was only reduced by 1.7-4.6 fold, not as significantly as the reduction in editing efficiency (fig. 6C). This suggests that AcrIIA4 will inhibit the targeting activity of SpCas9 more effectively than off-target activity.
Next, with 1: AcrIIA4 at a ratio of 1 further tested 5 different target sites, and the results showed that AcrIIA4 was able to significantly inhibit the target site cleavage activity of SpCas9 at all sites tested (fig. 6D). In addition to the fewer hot spots per se for off-target RAG1B, it can be seen from the results of the other 4 sites that the inhibitory effect of AcrIIA4 on the targeting activity of SpCas9 is stronger than the inhibitory effect on the off-target activity (FIGS. 6D-F). Furthermore AcrIIA4 for SaCas 9: the editing activity of RAG1A, SaCas9, had no effect (fig. 6G).
Reference to the literature
Alt,F.W.,Zhang,Y.,Meng,F.L.,Guo,C.,and Schwer,B.(2013).Mechanisms of programmed DNA lesions and genomic instability in the immune system.Cell152,417-429.
Brinkman,E.K.,Chen,T.,Amendola,M.,and van Steensel,B.(2014).Easy quantitative assessment of genome editing by sequence trace decomposition.Nucleic Acids Res 42,e168.
Frock,R.L.,Hu,J.,Meyers,R.M.,Ho,Y.J.,Kii,E.,and Alt,F.W.(2015).
Genome-wide detection of DNA double stranded breaks induced by engineered nucleases.Nat Biotechnol 33,179-186.
Hu,J.,Meyers,R.M.,Dong,J.,Panchakshari,R.A.,Alt,F.W.,and Frock,R.L.(2016).Detecting DNA doublestranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.Nat Protoc 11,853-871.
Kleinstiver,B.P.,Prew,M.S.,Tsai,S.Q.,Topkar,V.V.,Nguyen,N.T.,Zheng,Z.,Gonzales,A.P.,Li,Z.,Peterson,R.T.,Yeh,J.R.,et al.(2015).Engineered CRISPR-Cas9 nucleases with altered PAM specificities.Nature 523,481-485.
Peng,Q.,Vijaya Satya,R.,Lewis,M.,Randad,P.,and Wang,Y.(2015).
Reducing amplification artifacts in high multiplex amplicon sequencing by using molecular barcodes.BMC Genomics 16,589.
Slaymaker,I.M.,Gao,L.,Zetsche,B.,Scott,D.A.,Yan,W.X.,and Zhang,F.(2016).Rationally engineered Cas9 nucleases with improved specificity.Science351,84-88.
<110> Beijing university
<120> a method for detecting activity of a reagent for generating a double-strand break
<130> MP1833381
<160> 26
<170> PatentIn version 3.2
<210> 1
<211> 1367
<212> PRT
<213> Artificial Synthesis
<400> 1
Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr
1025 1030 1035
Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn
1040 1045 1050
Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr
1055 1060 1065
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg
1070 1075 1080
Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu
1085 1090 1095
Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1100 1105 1110
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys
1115 1120 1125
Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu
1130 1135 1140
Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1145 1150 1155
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
1160 1165 1170
Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu
1175 1180 1185
Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe
1190 1195 1200
Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu
1205 1210 1215
Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn
1220 1225 1230
Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg
1265 1270 1275
Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr
1280 1285 1290
Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1295 1300 1305
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe
1310 1315 1320
Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
1325 1330 1335
Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1340 1345 1350
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210> 2
<211> 4101
<212> DNA
<213> Artificial Synthesis
<400> 2
gacaagaagt acagcatcgg cctggacatc ggcaccaact ctgtgggctg ggccgtgatc 60
accgacgagt acaaggtgcc cagcaagaaa ttcaaggtgc tgggcaacac cgaccggcac 120
agcatcaaga agaacctgat cggagccctg ctgttcgaca gcggcgaaac agccgaggcc 180
acccggctga agagaaccgc cagaagaaga tacaccagac ggaagaaccg gatctgctat 240
ctgcaagaga tcttcagcaa cgagatggcc aaggtggacg acagcttctt ccacagactg 300
gaagagtcct tcctggtgga agaggataag aagcacgagc ggcaccccat cttcggcaac 360
atcgtggacg aggtggccta ccacgagaag taccccacca tctaccacct gagaaagaaa 420
ctggtggaca gcaccgacaa ggccgacctg cggctgatct atctggccct ggcccacatg 480
atcaagttcc ggggccactt cctgatcgag ggcgacctga accccgacaa cagcgacgtg 540
gacaagctgt tcatccagct ggtgcagacc tacaaccagc tgttcgagga aaaccccatc 600
aacgccagcg gcgtggacgc caaggccatc ctgtctgcca gactgagcaa gagcagacgg 660
ctggaaaatc tgatcgccca gctgcccggc gagaagaaga atggcctgtt cggaaacctg 720
attgccctga gcctgggcct gacccccaac ttcaagagca acttcgacct ggccgaggat 780
gccaaactgc agctgagcaa ggacacctac gacgacgacc tggacaacct gctggcccag 840
atcggcgacc agtacgccga cctgtttctg gccgccaaga acctgtccga cgccatcctg 900
ctgagcgaca tcctgagagt gaacaccgag atcaccaagg cccccctgag cgcctctatg 960
atcaagagat acgacgagca ccaccaggac ctgaccctgc tgaaagctct cgtgcggcag 1020
cagctgcctg agaagtacaa agagattttc ttcgaccaga gcaagaacgg ctacgccggc 1080
tacattgacg gcggagccag ccaggaagag ttctacaagt tcatcaagcc catcctggaa 1140
aagatggacg gcaccgagga actgctcgtg aagctgaaca gagaggacct gctgcggaag 1200
cagcggacct tcgacaacgg cagcatcccc caccagatcc acctgggaga gctgcacgcc 1260
attctgcggc ggcaggaaga tttttaccca ttcctgaagg acaaccggga aaagatcgag 1320
aagatcctga ccttccgcat cccctactac gtgggccctc tggccagggg aaacagcaga 1380
ttcgcctgga tgaccagaaa gagcgaggaa accatcaccc cctggaactt cgaggaagtg 1440
gtggacaagg gcgcttccgc ccagagcttc atcgagcgga tgaccaactt cgataagaac 1500
ctgcccaacg agaaggtgct gcccaagcac agcctgctgt acgagtactt caccgtgtat 1560
aacgagctga ccaaagtgaa atacgtgacc gagggaatga gaaagcccgc cttcctgagc 1620
ggcgagcaga aaaaggccat cgtggacctg ctgttcaaga ccaaccggaa agtgaccgtg 1680
aagcagctga aagaggacta cttcaagaaa atcgagtgct tcgactccgt ggaaatctcc 1740
ggcgtggaag atcggttcaa cgcctccctg ggcacatacc acgatctgct gaaaattatc 1800
aaggacaagg acttcctgga caatgaggaa aacgaggaca ttctggaaga tatcgtgctg 1860
accctgacac tgtttgagga cagagagatg atcgaggaac ggctgaaaac ctatgcccac 1920
ctgttcgacg acaaagtgat gaagcagctg aagcggcgga gatacaccgg ctggggcagg 1980
ctgagccgga agctgatcaa cggcatccgg gacaagcagt ccggcaagac aatcctggat 2040
ttcctgaagt ccgacggctt cgccaacaga aacttcatgc agctgatcca cgacgacagc 2100
ctgaccttta aagaggacat ccagaaagcc caggtgtccg gccagggcga tagcctgcac 2160
gagcacattg ccaatctggc cggcagcccc gccattaaga agggcatcct gcagacagtg 2220
aaggtggtgg acgagctcgt gaaagtgatg ggccggcaca agcccgagaa catcgtgatc 2280
gaaatggcca gagagaacca gaccacccag aagggacaga agaacagccg cgagagaatg 2340
aagcggatcg aagagggcat caaagagctg ggcagccaga tcctgaaaga acaccccgtg 2400
gaaaacaccc agctgcagaa cgagaagctg tacctgtact acctgcagaa tgggcgggat 2460
atgtacgtgg accaggaact ggacatcaac cggctgtccg actacgatgt ggaccatatc 2520
gtgcctcaga gctttctgaa ggacgactcc atcgacaaca aggtgctgac cagaagcgac 2580
aagaaccggg gcaagagcga caacgtgccc tccgaagagg tcgtgaagaa gatgaagaac 2640
tactggcggc agctgctgaa cgccaagctg attacccaga gaaagttcga caatctgacc 2700
aaggccgaga gaggcggcct gagcgaactg gataaggccg gcttcatcaa gagacagctg 2760
gtggaaaccc ggcagatcac aaagcacgtg gcacagatcc tggactcccg gatgaacact 2820
aagtacgacg agaatgacaa gctgatccgg gaagtgaaag tgatcaccct gaagtccaag 2880
ctggtgtccg atttccggaa ggatttccag ttttacaaag tgcgcgagat caacaactac 2940
caccacgccc acgacgccta cctgaacgcc gtcgtgggaa ccgccctgat caaaaagtac 3000
cctaagctgg aaagcgagtt cgtgtacggc gactacaagg tgtacgacgt gcggaagatg 3060
atcgccaaga gcgagcagga aatcggcaag gctaccgcca agtacttctt ctacagcaac 3120
atcatgaact ttttcaagac cgagattacc ctggccaacg gcgagatccg gaagcggcct 3180
ctgatcgaga caaacggcga aaccggggag atcgtgtggg ataagggccg ggattttgcc 3240
accgtgcgga aagtgctgag catgccccaa gtgaatatcg tgaaaaagac cgaggtgcag 3300
acaggcggct tcagcaaaga gtctatcctg cccaagagga acagcgataa gctgatcgcc 3360
agaaagaagg actgggaccc taagaagtac ggcggcttcg acagccccac cgtggcctat 3420
tctgtgctgg tggtggccaa agtggaaaag ggcaagtcca agaaactgaa gagtgtgaaa 3480
gagctgctgg ggatcaccat catggaaaga agcagcttcg agaagaatcc catcgacttt 3540
ctggaagcca agggctacaa agaagtgaaa aaggacctga tcatcaagct gcctaagtac 3600
tccctgttcg agctggaaaa cggccggaag agaatgctgg cctctgccgg cgaactgcag 3660
aagggaaacg aactggccct gccctccaaa tatgtgaact tcctgtacct ggccagccac 3720
tatgagaagc tgaagggctc ccccgaggat aatgagcaga aacagctgtt tgtggaacag 3780
cacaagcact acctggacga gatcatcgag cagatcagcg agttctccaa gagagtgatc 3840
ctggccgacg ctaatctgga caaagtgctg tccgcctaca acaagcaccg ggataagccc 3900
atcagagagc aggccgagaa tatcatccac ctgtttaccc tgaccaatct gggagcccct 3960
gccgccttca agtactttga caccaccatc gaccggaaga ggtacaccag caccaaagag 4020
gtgctggacg ccaccctgat ccaccagagc atcaccggcc tgtacgagac acggatcgac 4080
ctgtctcagc tgggaggcga c 4101
<210> 3
<211> 1367
<212> PRT
<213> Artificial Synthesis
<400> 3
Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Ala Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Ala Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr
1025 1030 1035
Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn
1040 1045 1050
Gly Glu Ile Arg Lys Ala Pro Leu Ile Glu Thr Asn Gly Glu Thr
1055 1060 1065
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg
1070 1075 1080
Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu
1085 1090 1095
Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1100 1105 1110
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys
1115 1120 1125
Lys Tyr Gly Gly Phe Glu Ser Pro Thr Val Ala Tyr Ser Val Leu
1130 1135 1140
Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1145 1150 1155
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
1160 1165 1170
Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu
1175 1180 1185
Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe
1190 1195 1200
Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu
1205 1210 1215
Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn
1220 1225 1230
Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg
1265 1270 1275
Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr
1280 1285 1290
Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1295 1300 1305
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe
1310 1315 1320
Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
1325 1330 1335
Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1340 1345 1350
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210> 4
<211> 4101
<212> DNA
<213> Artificial Synthesis
<400> 4
gacaagaagt acagcatcgg cctggacatc ggcaccaact ctgtgggctg ggccgtgatc 60
accgacgagt acaaggtgcc cagcaagaaa ttcaaggtgc tgggcaacac cgaccggcac 120
agcatcaaga agaacctgat cggagccctg ctgttcgaca gcggcgaaac agccgaggcc 180
acccggctga agagaaccgc cagaagaaga tacaccagac ggaagaaccg gatctgctat 240
ctgcaagaga tcttcagcaa cgagatggcc aaggtggacg acagcttctt ccacagactg 300
gaagagtcct tcctggtgga agaggataag aagcacgagc ggcaccccat cttcggcaac 360
atcgtggacg aggtggccta ccacgagaag taccccacca tctaccacct gagaaagaaa 420
ctggtggaca gcaccgacaa ggccgacctg cggctgatct atctggccct ggcccacatg 480
atcaagttcc ggggccactt cctgatcgag ggcgacctga accccgacaa cagcgacgtg 540
gacaagctgt tcatccagct ggtgcagacc tacaaccagc tgttcgagga aaaccccatc 600
aacgccagcg gcgtggacgc caaggccatc ctgtctgcca gactgagcaa gagcagacgg 660
ctggaaaatc tgatcgccca gctgcccggc gagaagaaga atggcctgtt cggaaacctg 720
attgccctga gcctgggcct gacccccaac ttcaagagca acttcgacct ggccgaggat 780
gccaaactgc agctgagcaa ggacacctac gacgacgacc tggacaacct gctggcccag 840
atcggcgacc agtacgccga cctgtttctg gccgccaaga acctgtccga cgccatcctg 900
ctgagcgaca tcctgagagt gaacaccgag atcaccaagg cccccctgag cgcctctatg 960
atcaagagat acgacgagca ccaccaggac ctgaccctgc tgaaagctct cgtgcggcag 1020
cagctgcctg agaagtacaa agagattttc ttcgaccaga gcaagaacgg ctacgccggc 1080
tacattgacg gcggagccag ccaggaagag ttctacaagt tcatcaagcc catcctggaa 1140
aagatggacg gcaccgagga actgctcgtg aagctgaaca gagaggacct gctgcggaag 1200
cagcggacct tcgacaacgg cagcatcccc caccagatcc acctgggaga gctgcacgcc 1260
attctgcggc ggcaggaaga tttttaccca ttcctgaagg acaaccggga aaagatcgag 1320
aagatcctga ccttccgcat cccctactac gtgggccctc tggccagggg aaacagcaga 1380
ttcgcctgga tgaccagaaa gagcgaggaa accatcaccc cctggaactt cgaggaagtg 1440
gtggacaagg gcgcttccgc ccagagcttc atcgagcgga tgaccaactt cgataagaac 1500
ctgcccaacg agaaggtgct gcccaagcac agcctgctgt acgagtactt caccgtgtat 1560
aacgagctga ccaaagtgaa atacgtgacc gagggaatga gaaagcccgc cttcctgagc 1620
ggcgagcaga aaaaggccat cgtggacctg ctgttcaaga ccaaccggaa agtgaccgtg 1680
aagcagctga aagaggacta cttcaagaaa atcgagtgct tcgactccgt ggaaatctcc 1740
ggcgtggaag atcggttcaa cgcctccctg ggcacatacc acgatctgct gaaaattatc 1800
aaggacaagg acttcctgga caatgaggaa aacgaggaca ttctggaaga tatcgtgctg 1860
accctgacac tgtttgagga cagagagatg atcgaggaac ggctgaaaac ctatgcccac 1920
ctgttcgacg acaaagtgat gaagcagctg aagcggcgga gatacaccgg ctggggcagg 1980
ctgagccgga agctgatcaa cggcatccgg gacaagcagt ccggcaagac aatcctggat 2040
ttcctgaagt ccgacggctt cgccaacaga aacttcatgc agctgatcca cgacgacagc 2100
ctgaccttta aagaggacat ccagaaagcc caggtgtccg gccagggcga tagcctgcac 2160
gagcacattg ccaatctggc cggcagcccc gccattaaga agggcatcct gcagacagtg 2220
aaggtggtgg acgagctcgt gaaagtgatg ggccggcaca agcccgagaa catcgtgatc 2280
gaaatggcca gagagaacca gaccacccag aagggacaga agaacagccg cgagagaatg 2340
aagcggatcg aagagggcat caaagagctg ggcagccaga tcctgaaaga acaccccgtg 2400
gaaaacaccc agctgcagaa cgagaagctg tacctgtact acctgcagaa tgggcgggat 2460
atgtacgtgg accaggaact ggacatcaac cggctgtccg actacgatgt ggaccatatc 2520
gtgcctcaga gctttctggc cgacgactcc atcgacaaca aggtgctgac cagaagcgac 2580
aagaaccggg gcaagagcga caacgtgccc tccgaagagg tcgtgaagaa gatgaagaac 2640
tactggcggc agctgctgaa cgccaagctg attacccaga gaaagttcga caatctgacc 2700
aaggccgaga gaggcggcct gagcgaactg gataaggccg gcttcatcaa gagacagctg 2760
gtggaaaccc ggcagatcac aaagcacgtg gcacagatcc tggactcccg gatgaacact 2820
aagtacgacg agaatgacaa gctgatccgg gaagtgaaag tgatcaccct gaagtccaag 2880
ctggtgtccg atttccggaa ggatttccag ttttacaaag tgcgcgagat caacaactac 2940
caccacgccc acgacgccta cctgaacgcc gtcgtgggaa ccgccctgat caaaaagtac 3000
cctgccctgg aaagcgagtt cgtgtacggc gactacaagg tgtacgacgt gcggaagatg 3060
atcgccaaga gcgagcagga aatcggcaag gctaccgcca agtacttctt ctacagcaac 3120
atcatgaact ttttcaagac cgagattacc ctggccaacg gcgagatccg gaaggcccct 3180
ctgatcgaga caaacggcga aaccggggag atcgtgtggg ataagggccg ggattttgcc 3240
accgtgcgga aagtgctgag catgccccaa gtgaatatcg tgaaaaagac cgaggtgcag 3300
acaggcggct tcagcaaaga gtctatcctg cccaagagga acagcgataa gctgatcgcc 3360
agaaagaagg actgggaccc taagaagtac ggcggcttcg aaagccccac cgtggcctat 3420
tctgtgctgg tggtggccaa agtggaaaag ggcaagtcca agaaactgaa gagtgtgaaa 3480
gagctgctgg ggatcaccat catggaaaga agcagcttcg agaagaatcc catcgacttt 3540
ctggaagcca agggctacaa agaagtgaaa aaggacctga tcatcaagct gcctaagtac 3600
tccctgttcg agctggaaaa cggccggaag agaatgctgg cctctgccgg cgaactgcag 3660
aagggaaacg aactggccct gccctccaaa tatgtgaact tcctgtacct ggccagccac 3720
tatgagaagc tgaagggctc ccccgaggat aatgagcaga aacagctgtt tgtggaacag 3780
cacaagcact acctggacga gatcatcgag cagatcagcg agttctccaa gagagtgatc 3840
ctggccgacg ctaatctgga caaagtgctg tccgcctaca acaagcaccg ggataagccc 3900
atcagagagc aggccgagaa tatcatccac ctgtttaccc tgaccaatct gggagcccct 3960
gccgccttca agtactttga caccaccatc gaccggaaga ggtacaccag caccaaagag 4020
gtgctggacg ccaccctgat ccaccagagc atcaccggcc tgtacgagac acggatcgac 4080
ctgtctcagc tgggaggcga c 4101
<210> 5
<211> 50
<212> DNA
<213> Artificial Synthesis
<400> 5
ggactgctgg agattgctcc agagagggtt tcccctcaaa ggaatccttc 50
<210> 6
<211> 23
<212> DNA
<213> Artificial Synthesis
<400> 6
cctgagaaca atgaaaacaa gtc 23
<210> 7
<211> 26
<212> DNA
<213> Artificial Synthesis
<400> 7
cgggaaggaa gttggcatct gtcctg 26
<210> 8
<211> 28
<212> DNA
<213> Artificial Synthesis
<400> 8
ttgcgactct cagctgaatc cactgctg 28
<210> 9
<211> 28
<212> DNA
<213> Artificial Synthesis
<400> 9
gcccgcactg aatgcacttg ggagggtg 28
<210> 10
<211> 28
<212> DNA
<213> Artificial Synthesis
<400> 10
gagaggcctc gttaggagct ctcctttg 28
<210> 11
<211> 34
<212> DNA
<213> Artificial Synthesis
<400> 11
cccatcaggc tctcagctca gcctgagtgt tgag 34
<210> 12
<211> 32
<212> DNA
<213> Artificial Synthesis
<400> 12
ggcggggtcc caggtgctga cgtaggtagt gc 32
<210> 13
<211> 30
<212> DNA
<213> Artificial Synthesis
<400> 13
gccgccctct tgcctccact ggttgtgcag 30
<210> 14
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 14
gcctctttcc cacccacctt 20
<210> 15
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 15
gacttgtttt cattgttctc 20
<210> 16
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 16
gcacctaaca tgatatatta 20
<210> 17
<211> 23
<212> DNA
<213> Artificial Synthesis
<400> 17
gaaagaggct gccatgctgg ctg 23
<210> 18
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 18
gagtccgagc agaagaagaa 20
<210> 19
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 19
gactctctgc gtactgattg 20
<210> 20
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 20
gcggtctcaa gcactaccta 20
<210> 21
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 21
gggatgtgga gcttggctat 20
<210> 22
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 22
gtacatgcag ttctgcatct 20
<210> 23
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 23
ttcccggcag atgtttacct 20
<210> 24
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 24
ccctgcagtt ccctaactga 20
<210> 25
<211> 64
<212> DNA
<213> Artificial Synthesis
<220>
<221> misc_feature
<222> (17)..(20)
<223> n is a, c, g, or t
<220>
<221> misc_feature
<222> (22)..(24)
<223> n is a, c, g, or t
<220>
<221> misc_feature
<222> (26)..(28)
<223> n is a, c, g, or t
<220>
<221> misc_feature
<222> (30)..(33)
<223> n is a, c, g, or t
<400> 25
ccacgcgtgc tctacannnn tnnnannntn nnnagatcgg aagagcacac gtctgaactc 60
cagt 64
<210> 26
<211> 22
<212> DNA
<213> Artificial Synthesis
<220>
<221> misc_feature
<222> (17)..(22)
<223> n is a, c, g, or t
<400> 26
tgtagagcac gcgtggnnnn nn 22

Claims (24)

1. A method of detecting the activity of a Double Strand Break (DSB) generating agent capable of generating a double strand break at a target location in a genome, comprising:
(1) contacting the reagent with the sample to cause a Double Strand Break (DSB) event to occur in its genome;
(2) adding an extension primer complementary to a flanking sequence of the target site, the extension primer being a single primer, to the treated genomic nucleic acid of step (1) as a template, binding to the DSB site upstream or downstream of the DSB site by the double strand break generating reagent by annealing, the extension primer performing one or more denaturation-annealing cycles with the sample genome, and then performing one extension reaction to obtain an extension product having a sequence extended beyond the target site;
(3) ligating the extension product to a bridging linker at the extension end, wherein the bridging linker comprises one or more random nucleic acid segments of n nucleotides in length, wherein n is 1-30 or more; allowing each extension product to acquire a unique identity (unique identifier) conferred by the random nucleic acid segment by ligating the bridging linker; and
(4) and (3) carrying out high-throughput sequencing on the extension products obtained in the step (3) and the connection products of the bridging linkers, and identifying whether a double-strand break event occurs at the target position and/or a double-strand break event occurs at a non-target position.
2. The method of claim 1, wherein the agent is an engineered nuclease (engineered nuclease).
3. The method of claim 2, wherein the engineered nuclease is selected from a Zinc Finger Nuclease (ZFN), TALEN, or CRISPR-CAS.
4. The method of claim 1, wherein the sample is a eukaryotic cell.
5. The method of claim 1, wherein in step (2), the extension primer is subjected to 1-20 cycles of denaturation-annealing with the sample genome before the extension reaction occurs.
6. The method of claim 1, wherein the extension primer bears an affinity tag (affinity tag).
7. The method of claim 6, wherein the extension primer is linked 5' to biotin (biotin).
8. The method according to claim 1, wherein after step (1), further comprising the step of fragmenting the genome.
9. The method of claim 8, wherein the step of fragmenting the genome is sonication or endonuclease digestion.
10. The method of claim 1, wherein in step (2), the binding site of the extension primer to the genomic nucleic acid is within 2kb upstream or downstream of the double strand break at the target location.
11. The method of claim 1, wherein after obtaining the extension product in step (2), further comprising the step of isolating the extension product.
12. The method of claim 11, wherein the extension primer bears an affinity tag and the isolating is by affinity purification using the affinity tag of the extension primer.
13. The method of claim 1, wherein the activity refers to the cleavage efficiency and/or specificity of the agent; alternatively, the method further comprises the step of analyzing the cleavage efficiency and/or specificity of the reagent according to the sequencing result after the sequencing.
14. A method of screening for a site of a genomic double strand break comprising
(1) Analyzing the target genome sequence to obtain a double-strand break candidate target site;
(2) making double strand breaks at the candidate target sites in step (1) using a double strand break generating reagent;
(3) performing the steps of any one of claims 1 to 13; and
(4) analyzing the efficiency and/or specificity of cleavage of said agent at different candidate target sites.
15. A method of screening for engineered nucleases, comprising:
(1) providing an engineered nuclease candidate; and
(2) detecting activity of the engineered nuclease candidate, comprising cleavage efficiency and/or specificity, using the method of any one of claims 1-13.
16. The method of claim 15, wherein the engineered nuclease is a Cas nuclease.
17. The method of claim 16, wherein the engineered nuclease is selected from Cas9, Cas12a, Cas12b, Cas13a, Cas14, and variants thereof.
18. A method of identifying an inhibitor of a double-strand-break-producing agent, comprising:
(1) contacting a double-strand-break-generating agent with a candidate compound;
(2) assessing the activity of the double strand break generating agent using the method of any one of claims 1-13; and
(3) selecting a compound capable of reducing the activity of the double strand break generating agent.
19. A method of identifying an enhancer of a double-strand-break-generating agent, comprising:
(1) contacting a double-strand-break-generating agent with a candidate compound;
(2) assessing the activity of the double strand break generating agent using the method of any one of claims 1-13; and
(3) selecting a compound or protein capable of enhancing the activity of the double-strand break generating agent.
20. A Cas9 protein that has activity to generate a double strand break at a genomic target location and has an amino acid sequence as set forth in SEQ ID NO: 3, respectively.
21. A nucleic acid sequence encoding a Cas9 protein of claim 20, the nucleic acid sequence being set forth in SEQ ID NO: 4, respectively.
22. An expression vector comprising the nucleic acid sequence of claim 21.
23. A cell comprising a Cas9 protein of claim 20 or a nucleic acid sequence of claim 21 or an expression vector of claim 22.
24. Use of a Cas9 protein of claim 20 or a nucleic acid sequence of claim 21 or an expression vector of claim 22 or a cell of claim 23 for in vitro targeted genome modification.
CN201910199103.8A 2019-03-15 2019-03-15 Method for detecting activity of reagent generated by double-strand break Active CN111690724B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910199103.8A CN111690724B (en) 2019-03-15 2019-03-15 Method for detecting activity of reagent generated by double-strand break
PCT/CN2020/098360 WO2020228844A2 (en) 2019-03-15 2020-06-28 Method of testing activity of double strand break-generating reagent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910199103.8A CN111690724B (en) 2019-03-15 2019-03-15 Method for detecting activity of reagent generated by double-strand break

Publications (2)

Publication Number Publication Date
CN111690724A CN111690724A (en) 2020-09-22
CN111690724B true CN111690724B (en) 2022-04-26

Family

ID=72475322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910199103.8A Active CN111690724B (en) 2019-03-15 2019-03-15 Method for detecting activity of reagent generated by double-strand break

Country Status (2)

Country Link
CN (1) CN111690724B (en)
WO (1) WO2020228844A2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102628082A (en) * 2012-04-10 2012-08-08 凯晶生物科技(苏州)有限公司 Method for qualitatively and quantitatively detecting nucleic acid based on high-flux sequencing technology
CN107012250A (en) * 2017-05-16 2017-08-04 上海交通大学 A kind of analysis method of genomic DNA fragment editor's precision suitable for CRISPR/Cas9 systems and application
WO2018129368A2 (en) * 2017-01-06 2018-07-12 Editas Medicine, Inc. Methods of assessing nuclease cleavage

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205728A1 (en) * 2015-06-17 2016-12-22 Massachusetts Institute Of Technology Crispr mediated recording of cellular events
CN106636154B (en) * 2015-10-30 2020-09-22 中国科学院上海营养与健康研究所 sgRNA screening system and method
CN105647968B (en) * 2016-02-02 2019-07-23 浙江大学 A kind of CRISPR/Cas9 working efficiency fast testing system and its application
EP3414327B1 (en) * 2016-02-10 2020-09-30 The Regents of The University of Michigan Detection of nucleic acids

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102628082A (en) * 2012-04-10 2012-08-08 凯晶生物科技(苏州)有限公司 Method for qualitatively and quantitatively detecting nucleic acid based on high-flux sequencing technology
WO2018129368A2 (en) * 2017-01-06 2018-07-12 Editas Medicine, Inc. Methods of assessing nuclease cleavage
CN107012250A (en) * 2017-05-16 2017-08-04 上海交通大学 A kind of analysis method of genomic DNA fragment editor's precision suitable for CRISPR/Cas9 systems and application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Benjamin P. Kleinstiver等.Engineered CRISPR-Cas9 nucleases with altered PAM specificities.《NATURE》.2015,第523卷481-485. *
Engineered CRISPR-Cas9 nucleases with altered PAM specificities;Benjamin P. Kleinstiver等;《NATURE》;20150723;第523卷;481-485 *
Rationally engineered Cas9 nucleases with improved specificity;Slaymaker I M等;《Science》;20160101;第351卷(第6268期);84-88 *
Slaymaker I M等.Rationally engineered Cas9 nucleases with improved specificity.《Science》.2016,第351卷(第6268期),84-88. *

Also Published As

Publication number Publication date
WO2020228844A2 (en) 2020-11-19
WO2020228844A3 (en) 2020-12-30
CN111690724A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
JP7126588B2 (en) Increased specificity of RNA-guided genome editing using RNA-guided FokI nuclease (RFN)
KR102425438B1 (en) Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq)
KR101828933B1 (en) Method for detecting genome-wide off-target sites of programmable nucleases
US10011850B2 (en) Using RNA-guided FokI Nucleases (RFNs) to increase specificity for RNA-Guided Genome Editing
CA3111432A1 (en) Novel crispr enzymes and systems
JP5977234B2 (en) Target 3-D genome region sequencing strategy
KR20180053748A (en) Comprehensive in vitro reporting of cleavage by sequencing (CIRCLE-SEQ)
CN110804628B (en) High-specificity off-target-free single-base gene editing tool
US20240002834A1 (en) Adenine base editor lacking cytosine editing activity and use thereof
CN111690724B (en) Method for detecting activity of reagent generated by double-strand break
US11352666B2 (en) Method for detecting off-target sites of programmable nucleases in a genome
KR102067810B1 (en) Method for Genome Sequencing and Method for Testing Genome Editing Using Chromatin DNA
WO2023012193A1 (en) Method for targeted sequencing
Dobbs Determining the mechanism of off-target mutagenesis caused by CRISPR-Cas9 genome editing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant