CN110835630A

CN110835630A - Efficient sgRNA and application thereof in gene editing

Info

Publication number: CN110835630A
Application number: CN201911200779.0A
Authority: CN
Inventors: 张成伟; 徐雯; 刘亚; 赵思; 杨进孝
Original assignee: Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-02-25
Anticipated expiration: 2039-11-29
Also published as: CN110835630B

Abstract

The invention provides an efficient sgRNA and application thereof in gene editing. The sgRNA is shown as formula I: an RNA-engineered sgRNA backbone transcribed from the target sequence (formula I); the modified sgRNA framework is an RNA molecule obtained by inserting an RNA fragment A between 14 th and 15 th positions of the sgRNA framework and inserting an RNA fragment B between 18 th and 19 th positions of the sgRNA framework; the RNA segment A and the RNA segment B are reversely complementary; the sizes of the RNA fragment A and the RNA fragment B are both 3 nt; the sgRNA framework is an RNA molecule shown in a sequence 9. Experiments prove that: the modified sgRNA can obviously improve the C.T base replacement efficiency of a Cytosine Base Editor (CBE), and the highest efficiency can reach 86.4%.

Description

Efficient sgRNA and application thereof in gene editing

Technical Field

The invention belongs to the technical field of biology, and particularly relates to an efficient sgRNA and application thereof in gene editing.

Background

The CRISPR-Cas9 technology has become a powerful genome editing means and is widely applied to many tissues and cells. The CRISPR/Cas9 protein-RNA complex is localized on the target by a guide RNA (guide RNA), cleaved to generate a DNA double strand break (dsDNA break, DSB), and the organism will then instinctively initiate a DNA repair mechanism to repair the DSB. Repair mechanisms are generally of two types, one being non-homologous end joining (NHEJ) and the other being homologous recombination (HDR). In general, NHEJ dominates, and repair produces random indels (insertions or deletions) much higher than precise repair. For base exact substitution, the application of using HDR to achieve base exact substitution is greatly limited because of the low efficiency of HDR and the need for a DNA template.

In 2016, two laboratories such as David Liu and Akihiko Kondo independently report two different types of Cytosine Base Editors (CBEs), respectively, and use two different types of cytidine deaminases rAPOBEC1(rat APOBEC1) and PmCDA1(activation-induced Cytosine deaminase (AID) orthogonal template), which are based on the principle that the base editing of a single Cytosine (C) base is directly realized by using the cytidine deaminase, but not by generating DSB and initiating HDR repair, so that the base editing efficiency of C to be replaced by Thymine (Thymine, T) is greatly improved. Specifically, dead Cas9(dCas9) or the Cas9 nickase (Cas9n) are positioned to a target point through sgRNA together with rAPOBEC1 or PmCDA1, rAPOBEC1 or PmCDA1 catalyzes cytosine deamination reaction of C on unpaired single-stranded DNA to Uracil (U), and the U is paired with Adenine (Adenine, a) through DNA repair and finally paired with a through DNA replication, thereby realizing C-to-T conversion. The mean mutation rate of SpCas9n (D10A) & rAPOBEC1/PmCDA1& UGI base editing system (which contains uracil DNA glycosylase inhibitor, UGI)) was higher in the editor tested for two reasons: firstly, UGI can inhibit Uracil DNA Glycosylase (UDG) from catalyzing and removing U in DNA, and secondly, SpCas9n (D10A) generates a nick on a non-editing chain, and induces a eukaryotic mismatch repair mechanism or a long-patch BER (base-extension repair) repair mechanism to promote more preferential repair of U: G mismatch into U: A. In order to improve the working efficiency and reduce the working cost, the improvement of the efficiency of C.T base substitution has been the research direction of the base editing system of animal and plant genomes.

Cas9(SaCas9) from Staphylococcus aureus (Staphylococcus aureus) is a SpCas9 homologue, and the NNGRRT PAM, SaCas9 variant SaKKH recognizes a broader range of NNNRRT PAM, both of which are developed into potent CBEs, greatly expanding the range of editable C in animal and plant genomes. At present, no research report for improving the C.T base replacement efficiency of the CBE related to the SaKKH by modifying the structure of sgRNA (SaCas9 sgRNA) corresponding to SaCas9 exists.

Disclosure of Invention

The purpose of the present invention is to improve the efficiency of C.T base substitution in a Cytosine Base Editor (CBE).

To achieve the above object, the present invention provides a kit comprising a sgRNA or a biological material related to the sgRNA, a Cas9 nuclease or a biological material related to the Cas9 nuclease, a cytosine deaminase or a biological material related to the cytosine deaminase;

the sgRNA targets a target sequence;

the sgRNA is shown as formula I: an RNA-engineered sgRNA backbone transcribed from the target sequence (formula I);

the modified sgRNA framework is an RNA molecule obtained by inserting an RNA fragment A between 14 th and 15 th positions of the sgRNA framework and inserting an RNA fragment B between 18 th and 19 th positions of the sgRNA framework;

the RNA segment A and the RNA segment B are reversely complementary;

the sizes of the RNA fragment A and the RNA fragment B are both 3 nt;

the sgRNA backbone is m1) or m2) or m 3):

m1) the RNA molecule shown as the sequence 9;

m2) carrying out substitution and/or deletion and/or addition of one or more nucleotides on the RNA molecule shown in m1) and having the same function;

m3) and m1) or m2) and has the same function.

In the above kit, the modified sgRNA backbone is n1) or n2) or n 3):

n1) the RNA molecule shown as the sequence 10;

n2) carrying out substitution and/or deletion and/or addition of one or more nucleotides on the RNA molecule shown in n1) and having the same function;

n3) and n1) or n2) and has the same function.

In the kit, the Cas9 nuclease can be a protein such as SaKKHn, SaCas9, SaKKH-HF or SaCas 9-HF. In a particular embodiment of the invention, the Cas9 nuclease is specifically a SaKKHn protein.

The SaKKHn protein is E1) or E2) or E3):

E1) the amino acid sequence is a protein shown in a sequence 2;

E2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 2 in the sequence table and has the same function;

E3) a fusion protein obtained by connecting a label to the N terminal or/and the C terminal of E1) or E2);

the biological material related to the SaKKHn protein is any one of F1) to F5):

F1) a nucleic acid molecule encoding said SaKKHn protein;

F2) an expression cassette comprising the nucleic acid molecule of F1);

F3) a recombinant vector comprising the nucleic acid molecule of F1) or a recombinant vector comprising the expression cassette of F2);

F4) a recombinant microorganism containing F1) said nucleic acid molecule, or a recombinant microorganism containing F2) said expression cassette, or a recombinant microorganism containing F3) said recombinant vector;

F5) a transgenic cell line comprising the nucleic acid molecule of F1) or a transgenic cell line comprising the expression cassette of F2).

In the kit, the cytosine deaminase can be a protein such as human APOBEC3A, human AID, PmCDA1 or rAPOBEC 1. In a particular embodiment of the invention, the cytosine deaminase is in particular PmCDA1 protein.

The PmCDA1 protein is G1) or G2) or G3):

G1) the amino acid sequence is a protein shown in a sequence 3;

G2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 3 in the sequence table and has the same function;

G3) a fusion protein obtained by connecting a tag to the N-terminus or/and the C-terminus of G1) or G2);

the biological material related to the PmCDA1 protein is any one of H1) to H5):

H1) a nucleic acid molecule encoding the PmCDA1 protein;

H2) an expression cassette comprising the nucleic acid molecule of H1);

H3) a recombinant vector containing H1) the nucleic acid molecule or a recombinant vector containing H2) the expression cassette;

H4) a recombinant microorganism containing H1) the nucleic acid molecule, or a recombinant microorganism containing H2) the expression cassette, or a recombinant microorganism containing H3) the recombinant vector;

H5) a transgenic cell line containing H1) the nucleic acid molecule or a transgenic cell line containing H2) the expression cassette.

In the kit, the sgRNA can be tRNA-sgRNA;

the tRNA-sgRNA is shown as a formula I: tRNA-the RNA transcribed from the target sequence-engineered sgRNA backbone (formula I);

the tRNA is 1) or 2) or 3):

1) an RNA molecule obtained by replacing T in the 474-550 th position of the sequence 1 with U;

2) RNA molecules which are obtained by substituting and/or deleting and/or adding one or more nucleotides in the RNA molecules shown in 1) and have the same functions;

3) RNA molecule with 75% or more than 75% identity with the nucleotide sequence defined in 1) or 2) and with the same function.

The kit may further include a UGI protein or a biological material associated with the UGI protein;

the UGI protein is I1) or I2) or I3):

I1) the amino acid sequence is a protein shown in a sequence 4;

I2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 4 in the sequence table and has the same function;

I3) a fusion protein obtained by connecting labels at the N terminal or/and the C terminal of I1) or I2);

the biological material related to the UGI protein is any one of J1) to J5):

J1) a nucleic acid molecule encoding the UGI protein;

J2) an expression cassette comprising the nucleic acid molecule of J1);

J3) a recombinant vector comprising J1) said nucleic acid molecule, or a recombinant vector comprising J2) said expression cassette;

J4) a recombinant microorganism containing J1) the nucleic acid molecule, or a recombinant microorganism containing J2) the expression cassette, or a recombinant microorganism containing J3) the recombinant vector;

J5) a transgenic cell line comprising J1) the nucleic acid molecule or a transgenic cell line comprising J2) the expression cassette.

In order to facilitate the purification of the protein in E1), G1), I1), the amino terminal or the carboxyl terminal of the protein consisting of the amino acid sequence shown in the sequence 2 or the sequence 3 or the sequence 4 in the sequence table is linked with the tags shown in the following table.

Sequence of Table, tag

Label (R)	Residue of	Sequence of
			Poly-Arg	5-6 (typically 5)	RRRRR
Poly-His	2-10 (generally 6)	HHHHHH
			FLAG	8	DYKDDDDK
Strep-tag II	8	WSHPQFEK
			c-myc	10	EQKLISEEDL

The protein in E2), G2) and I2) is a protein having 75% or more identity to or having 75% or more identity to the amino acid sequence of the protein shown in SEQ ID NO. 2, SEQ ID NO. 3 or SEQ ID NO. 4 and having the same function. The identity of 75% or more than 75% is 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity.

The protein in E2), G2) and I2) can be artificially synthesized, or can be obtained by synthesizing the coding gene and then performing biological expression.

The genes encoding the proteins in E2), G2) and I2) can be obtained by deleting one or more amino acid residues from the DNA sequence shown in 3013-6225 (protein shown in coding sequence 2) of sequence 1, 6511-7134 (protein shown in coding sequence 3) of sequence 1 and 7156-7452 (protein shown in coding sequence 4) of sequence 1, and/or carrying out missense mutation of one or more base pairs, and/or connecting the coding sequence with the tags shown in the table at the 5 'end and/or the 3' end.

Further, F1) the nucleic acid molecule is F1) or F2) or F3):

f1) a cDNA molecule or DNA molecule shown in 3013-position 6225 of a sequence 1 in a sequence table;

f2) a cDNA or DNA molecule having 75% or more identity to the nucleotide sequence defined in f1) and encoding said SaKKHn;

f3) a cDNA molecule or DNA molecule which hybridizes with the nucleotide sequence limited by f1) or f2) under strict conditions and codes for the SaKKHn;

H1) the nucleic acid molecule is h1) or h2) or h 3):

h1) a cDNA molecule or DNA molecule shown in 6511-7134 site of a sequence 1 in a sequence table;

h2) a cDNA molecule or DNA molecule which has 75 percent or more identity with the nucleotide sequence defined by h1) and codes the PmCDA 1;

h3) hybridizing with the nucleotide sequence defined by h1) or h2) under strict conditions, and encoding the cDNA molecule or DNA molecule of the PmCDA 1;

J1) the nucleic acid molecule is j1) or j2) or j 3):

j1) a cDNA molecule or DNA molecule shown in the 7156-nd and 7452 site of the sequence 1 in the sequence table;

j2) a cDNA molecule or DNA molecule having 75% or more identity to the nucleotide sequence defined in j1) and encoding said UGI;

j3) hybridizing under stringent conditions with the nucleotide sequence defined in j1) or j2), and encoding the cDNA molecule or DNA molecule of the UGI.

Wherein the nucleic acid molecule may be DNA, such as cDNA, genomic DNA or recombinant DNA; the nucleic acid molecule may also be RNA, such as mRNA or hnRNA, etc.

The nucleotide sequence of the present invention encoding said SaKKHn or said PmCDA1 or said UGI can be easily mutated by a person of ordinary skill in the art using known methods such as directed evolution and point mutation. Those nucleotides which have been artificially modified to have 75% or more identity to the nucleotide sequence of said SaKKHn or said PmCDA1 or said UGI of the present invention are derived from the nucleotide sequence of the present invention and are identical to the sequence of the present invention as long as they encode said SaKKHn or said PmCDA1 or said UGI and have the same function.

The term "identity" as used herein refers to sequence similarity to a native nucleic acid sequence. "identity" includes nucleotide sequences that are 75% or more, or 85% or more, or 90% or more, or 95% or more identical to the nucleotide sequence of a protein consisting of the amino acid sequence shown in coding sequences 2, 3, 4 of the present invention. Identity can be assessed visually or by computer software. Using computer software, the identity between two or more sequences can be expressed in percent (%), which can be used to assess the identity between related sequences.

The stringent conditions are hybridization and washing of the membrane 2 times, 5min each, at 68 ℃ in a solution of 2 XSSC, 0.1% SDS, and 2 times, 15min each, at 68 ℃ in a solution of 0.5 XSSC, 0.1% SDS; alternatively, hybridization was carried out at 65 ℃ in a solution of 0.1 XSSPE (or 0.1 XSSC), 0.1% SDS, and the membrane was washed.

The above-mentioned identity of 75% or more may be 80%, 85%, 90% or 95% or more.

F2) The expression cassette containing a nucleic acid molecule encoding a SaKKHn protein (SaKKHn gene expression cassette) refers to a DNA capable of expressing the SaKKHn protein in a host cell, and the DNA may include not only a promoter which initiates transcription of the SaKKHn gene but also a terminator which terminates transcription of the SaKKHn gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the SaKKHn gene expression cassette can be constructed by using the existing expression vector.

H2) The expression cassette containing the nucleic acid molecule encoding the PmCDA1 protein (PmCDA1 gene expression cassette) refers to a DNA capable of expressing the PmCDA1 protein in a host cell, and the DNA may include not only a promoter for initiating transcription of the PmCDA1 gene, but also a terminator for terminating transcription of the PmCDA1 gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the PmCDA1 gene expression cassette can be constructed by using the existing expression vector.

J2) The expression cassette containing a nucleic acid molecule encoding the UGI protein (UGI gene expression cassette) refers to a DNA capable of expressing the UGI protein in a host cell, and the DNA may include not only a promoter for initiating transcription of the UGI gene but also a terminator for terminating transcription of the UGI gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the UGI gene expression cassette can be constructed using an existing expression vector.

The vector may be a plasmid, cosmid, phage or viral vector. In a specific embodiment of the invention, the recombinant vector is specifically a SaKKHn-pBE +3bp-1 recombinant expression vector, a SaKKHn-pBE +3bp-2 recombinant expression vector, a SaKKHn-pBE +3bp-3 recombinant expression vector, a SaKKHn-pBE +3bp-4 recombinant expression vector, a SaKKHn-pBE +3bp-5 recombinant expression vector, a SaKKHn-pBE +3bp-6 recombinant expression vector or a SaKKHn-pBE +3bp-7 recombinant expression vector.

The nucleotide sequence of the SaKKHn-pBE +3bp-1 recombinant expression vector is obtained by replacing DNA sequences of origin sgRNA frameworks in a sequence of the SaKKHn-pBE-1 recombinant expression vector with DNA sequences of +3bp sgRNA frameworks shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-2 recombinant expression vector is obtained by replacing DNA sequences of origin sgRNA frameworks in a sequence of the SaKKHn-pBE-2 recombinant expression vector with DNA sequences of +3bp sgRNA frameworks shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-3 recombinant expression vector is obtained by replacing the DNA sequence of an origin sgRNA framework in the sequence of the SaKKHn-pBE-3 recombinant expression vector with the DNA sequence of a +3bp sgRNA framework shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-4 recombinant expression vector is obtained by replacing the DNA sequence of an origin sgRNA framework in the sequence of the SaKKHn-pBE-4 recombinant expression vector with the DNA sequence of a +3bp sgRNA framework shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-5 recombinant expression vector is obtained by replacing the DNA sequence of an origin sgRNA framework in the sequence of the SaKKHn-pBE-5 recombinant expression vector with the DNA sequence of a +3bp sgRNA framework shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-6 recombinant expression vector is obtained by replacing the DNA sequence of an origin sgRNA framework in the sequence of the SaKKHn-pBE-6 recombinant expression vector with the DNA sequence of a +3bp sgRNA framework shown in a sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-7 recombinant expression vector is obtained by replacing the DNA sequence of an origin sgRNA framework in the sequence of the SaKKHn-pBE-7 recombinant expression vector with the DNA sequence of a +3bp sgRNA framework shown in a sequence 6 and keeping other sequences unchanged.

The microorganism may be a yeast, bacterium, algae or fungus. Wherein the bacterium can be an Agrobacterium, such as Agrobacterium EHA 105. In a specific embodiment of the present invention, the recombinant microorganism is agrobacterium EHA105 which contains the SaKKHn-pBE +3bp-1 recombinant expression vector, the SaKKHn-pBE +3bp-2 recombinant expression vector, the SaKKHn-pBE +3bp-3 recombinant expression vector, the SaKKHn-pBE +3bp-4 recombinant expression vector, the SaKKHn-pBE +3bp-5 recombinant expression vector, the SaKKHn-pBE +3bp-6 recombinant expression vector, or the SaKKHn-pBE +3bp-7 recombinant expression vector.

The transgenic cell line does not include propagation material.

The kit has the following uses:

x1) editing of a target sequence in the genome of an organism or cell of an organism;

x2) preparing an edited product of a target sequence of a genome of an organism or a cell of an organism;

x3) increasing the efficiency of editing a target sequence in the genome of an organism or cell of an organism;

x4) to produce a product that increases the efficiency of editing a target sequence in the genome of an organism or cell of an organism.

The sgRNA or the modified sgRNA backbone in the kit also belongs to the protection scope of the present invention.

In order to achieve the above object, the present invention also provides a new use of the above kit or sgRNA or modified sgRNA backbone.

The invention provides an application of the complete reagent set or the sgRNA or the new application of the modified sgRNA framework in any one of X1) -X4):

To achieve the above object, the present invention finally provides the process according to Y1) or Y2):

y1) or a method for improving the efficiency of editing a genomic target sequence of an organism or a biological cell, comprising expressing the sgRNA, the Cas9 nuclease, and the cytosine deaminase in an organism or a biological cell to edit the genomic target sequence; the sgRNA targets the target sequence;

y2) biological mutant, comprising the following steps: editing the genome of the organism according to the method described in Y1) to obtain a biological mutant.

In the above method, Y1), the sgRNA is the tRNA-sgRNA, and the tRNA-sgRNA transcribed from the DNA molecule that transcribes the tRNA-sgRNA is an immature RNA precursor, in which tRNA is cleaved by two enzymes (RNase P and RNase Z) to obtain mature RNA. And obtaining independent mature RNAs according to the number of targets in a recombinant expression vector, wherein each mature RNA consists of RNA transcribed by the target sequence and the sgRNA framework in sequence or consists of RNA transcribed by the target sequence, the sgRNA framework and residual individual bases of tRNA in sequence.

In the above method, Y1) further comprises the step of expressing UGI in the organism or the organism cell, and the number of the UGI may be 1 or 2 or more. In a specific embodiment of the present invention, the number of the UGIs is specifically 1.

Further, the sgRNA, the Cas9 nuclease, the cytosine deaminase, and the UGI are expressed in an organism or an organism cell by introducing a gene encoding the Cas9 nuclease, a DNA molecule that transcribes the sgRNA, a gene encoding the cytosine deaminase, and a gene encoding the UGI into the organism or the organism cell.

Further, the gene encoding Cas9 nuclease, the DNA molecule transcribing the sgRNA, the gene encoding cytosine deaminase, and the gene encoding the UGI are introduced into an organism or an organism cell via a recombinant expression vector.

The encoding gene of the Cas9 nuclease, the DNA molecule for transcribing the sgRNA, the encoding gene of the cytosine deaminase and the encoding gene of the UGI can be introduced into an organism or an organism cell through the same recombinant expression vector, or can be introduced into the organism or the organism cell through two or more recombinant expression vectors.

In a specific embodiment of the invention, the gene encoding Cas9 nuclease, the DNA molecule transcribing the sgRNA, the gene encoding cytosine deaminase and the gene encoding the UGI are introduced into the organism or the biological cell through the same recombinant expression vector. The recombinant expression vector contains an expression cassette A and an expression cassette B; the expression cassette A expresses the sgRNA, and the expression cassette B expresses a fusion protein consisting of the Cas9 nuclease, the cytosine deaminase and the UGI.

The recombinant expression vector is specifically the SaKKHn-pBE +3bp-1 recombinant expression vector, the SaKKHn-pBE +3bp-2 recombinant expression vector, the SaKKHn-pBE +3bp-3 recombinant expression vector, the SaKKHn-pBE +3bp-4 recombinant expression vector, the SaKKHn-pBE +3bp-5 recombinant expression vector, the SaKKHn-pBE +3bp-6 recombinant expression vector or the SaKKHn-pBE +3bp-7 recombinant expression vector.

In the kit or use or method, the number of target sequences may be 1 or 2 or more. The PAM sequence of the target sequence is NNNRRT.

In the kit or the use or the method, the genome target sequence is edited by mutating C in the target sequence to T. The C is C at any position in the target point sequence.

In the above kit or use or method, the organism is S1) or S2) or S3) or S4):

s1) plants or animals;

s2) a monocot or dicot;

s3) gramineous plants;

s4) rice;

the biological cell is T1) or T2) or T3) or T4):

t1) plant cells or animal cells;

t2) a monocotyledonous or dicotyledonous plant cell;

t3) graminaceous plant cells;

t4) rice cells.

The invention provides a modified sgRNA, which has a structure shown in a formula I: an RNA-engineered sgRNA backbone transcribed from a target sequence (formula I); the modified sgRNA framework is an RNA molecule obtained by inserting an RNA fragment A between 14 th and 15 th positions of the sgRNA framework and inserting an RNA fragment B between 18 th and 19 th positions of the sgRNA framework; the RNA segment A and the RNA segment B are reversely complementary; the sizes of the RNA fragment A and the RNA fragment B are both 3 nt; the sgRNA framework is an RNA molecule shown in a sequence 9. Experiments prove that: the modified sgRNA can obviously improve the C.T base replacement efficiency of a Cytosine Base Editor (CBE), and the highest efficiency can reach 86.4%.

Drawings

Fig. 1 shows the unmodified SaCas9sgRNA structure and the modified SaCas9sgRNA structure.

FIG. 2 is a schematic structural diagram of a recombinant expression vector.

Detailed Description

The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents, instruments and the like used in the following examples are commercially available unless otherwise specified. In the following examples, unless otherwise specified, the 1 st position of each nucleotide sequence in the sequence listing is the 5 'terminal nucleotide of the corresponding DNA/RNA, and the last position is the 3' terminal nucleotide of the corresponding DNA/RNA.

Primer pair T1 was composed of primer T1-F: 5'-ttcaaattctaatccccaatcc-3' and primer T1-R: 5'-tcgtacctgtctgcaaccttg-3', and is used for amplifying target T1.

Primer pair T2 was composed of primer T2-F: 5'-gctttagatgatttgttacatttcgc-3' and primer T2-R: 5'-tgagttggtatggcaagaacaag-3', and is used for amplifying target T2.

Primer pair T3 was composed of primer T3-F: 5'-aacacggtcaccaacttcatc-3' and primer T3-R: 5'-acaacctggcttgctatatatgc-3', and is used for amplifying target T3.

Primer pair T4 was composed of primer T4-F: 5'-tggatcggatatggacttctc-3' and primer T4-R: 5'-gaaatgaacaatcacctgagatctttg-3', and is used for amplifying target points T4 and T7.

Primer pair T5 was composed of primer T5-F: 5'-cgagctacctgaagaacaactacc-3' and primer T5-R: 5'-cctcgattgcctgaaatttg-3', and is used for amplifying target T5.

Primer pair T6 was composed of primer T6-F: 5'-tgcgagctcgacaacatcatg-3' and primer T6-R: 5'-gacggcccatgtggaaacc-3', and is used for amplifying target T6.

Primer pair T8 was composed of primer T8-F: 5'-gacgcccatagtcgaggtc-3' and primer T8-R: 5'-ctctgctggatcaatgtcaatg-3', and is used for amplifying target T8.

Primer pair T9 was composed of primer T9-F: 5'-cctcatccaatcgactgacac-3' and primer T9-R: 5'-gtaattgtgcttggtgatggag-3', and is used for amplifying target T9.

In the following examples, C.T base substitutions refer to mutations from C to T at any position in the target sequence.

The efficiency of C · T base substitution was equal to the number of positive T0 seedlings with C · T base substitution/total positive T0 seedlings analyzed × 100%.

Japanese fine rice: reference documents: the effects of sodium nitroprusside and its photolysis products on the growth of Nippon rice seedlings and the expression of 5 hormone marker genes [ J ]. proceedings of university of Master Henan (Nature edition), 2017(2): 48-52.; the public is available from the agroforestry academy of sciences of Beijing.

Recovering the culture medium: n6 solid medium containing 200mg/L timentin.

Screening a culture medium: n6 solid medium containing 50mg/L hygromycin.

Differentiation medium: n6 solid culture medium containing 2mg/L KT, 0.2mg/L NAA, 0.5g/L glutamic acid and 0.5g/L proline.

Rooting culture medium: n6 solid medium containing 0.2mg/L NAA, 0.5g/L glutamic acid, 0.5g/L proline.

Example 1 modification of sgRNA framework Structure in SaCas9sgRNA

The structure of SaCas9sgRNA is as follows: an RNA-sgRNA backbone transcribed from a target sequence.

The sgRNA framework structures in the SaCas9sgRNA structure are modified, and the two ways of modifying the sgRNA framework structures are shown in fig. 1.

Origin represents the unmodified SaCas9sgRNA structure, the unmodified SaCas9sgRNA is designated as origin sgRNA, the sgRNA backbone in origin sgRNA is designated as origin sgRNA backbone, and the RNA sequence of the origin sgRNA backbone is as follows: GUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAAUGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGAU (SEQ ID NO: 9); the DNA sequence of the origin sgRNA backbone is as follows: GTTTTAGTACTCTGGAAACAGAATCTACTAAAACAAGGCAAAATGCCGTGTTTATCTCGTCAACTTGTTGGCGAGAT are provided.

O +3bp represents a SaCas9sgRNA structure obtained by adding 3 pairs of bases on the basis of an unmodified SaCas9sgRNA structure, the modified SaCas9sgRNA is marked as +3bp sgRNA, a sgRNA framework in the +3bp sgRNA is marked as a +3bp sgRNA framework, and an RNA sequence of the +3bp sgRNA framework is as follows: GUUUUAGUACUCUGCUGGAAACAGCAGAAUCUACUAAAACAAGGCAAAAUGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGAU (SEQ ID NO: 10, the underlined sequence being 3 additional bases); the DNA sequence of the +3bp sgRNA framework is shown as sequence 6.

O +8bp represents a SaCas9sgRNA structure obtained by adding 8 pairs of bases on the basis of an unmodified SaCas9sgRNA structure, the modified SaCas9sgRNA is marked as +8bp sgRNA, a sgRNA framework in the +8bp sgRNA is marked as a +8bp sgRNA framework, and an RNA sequence of the +8bp sgRNA framework is as follows: GUUUUAGUACUCUGUAAUUUUAGAAAUAAAAUUACAGAAUCUACUAAAACAAGGCAAAAUGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGAU (the underlined sequence is 8 bases in addition); the DNA sequence of the +8bp sgRNA framework is shown as sequence 7.

Example 2 application of modified SacAS9sgRNA in improving C.T base substitution efficiency of SaKKHn & PmCDA1& UGI base editing system

Construction of recombinant expression vector

Artificially synthesizing the following recombinant expression vectors, wherein each expression vector is a circular plasmid:

two recombinant expression vectors containing Original sgRNA: SaKKHn-pBE-1 and SaKKHn-pBE-2;

two recombinant expression vectors containing +3bp sgRNA: SaKKHn-pBE +3bp-1 and SaKKHn-pBE +3 bp-2;

two recombinant expression vectors containing +8bp sgRNA: SaKKHn-pBE +8bp-1 and SaKKHn-pBE +8 bp-2.

The nucleotide sequence of the SaKKHn-pBE-1 recombinant expression vector is sequence 1 in the sequence table. Wherein, the 131 th-467 th site of the sequence 1 is a nucleotide sequence of OsU3 promoter, the 474 th-550 th site and the 648 th-724 th site are nucleotide sequences of tRNA, the 551 th-647 th site and the 725 th-821 th site are nucleotide sequences of two sgRNAs targeting OsWaxy gene respectively, the DNA sequence of the common sgRNA skeleton (Original sgRNA skeleton) of the two sgRNAs is the 571 th-647 th site of the sequence 1 or the 745 th-821 th site of the sequence 1, and the 996 th-1286 th site is a nucleotide sequence of OsU3 terminator; the 1293-3006 site of the sequence 1 is the nucleotide sequence of OsUbq3 promoter, the 3013-6225 site is the coding sequence of SaKKHn protein (without stop codon), the SaKKHn protein shown in the coding sequence 2; the 6511-7134 of the sequence 1 is the coding sequence of the PmCDA1 protein (without a stop codon), and the PmCDA1 protein is shown as the coding sequence 3; the 7156-7452 site of the sequence 1 is a coding sequence of UGI protein, and the UGI protein shown in a coding sequence 4; the nucleotide sequence of 35S terminator at position 7459-7653 of the sequence 1; the 7728-9720 site of the sequence 1 is the nucleotide sequence of ZmUbi1 promoter, the 9727-10749 site is the coding sequence of hygromycin phosphotransferase, and the 10779-10994 site is the nucleotide sequence of CaMV35S polyA. Two targets in the SaKKHn-pBE-1 recombinant expression vector are T1 and T2 respectively, and the sequences are shown in Table 1.

The nucleotide sequence of the SaKKHn-pBE-2 recombinant expression vector is obtained by replacing the 474 th and 995 th position of the sequence 1 with a sequence 5 and keeping other sequences unchanged. Wherein, the 1 st-77 th site and the 175 st-251 th site of the sequence 5 are both nucleotide sequences of tRNA, and the 78 th-174 th site and the 252 st-348 th site are nucleotide sequences of two sgRNAs targeting the OsNRT1.1B gene and the OsGRF4 gene, respectively. The DNA sequences of the origin sgRNA frameworks are at positions 98-174 and 272-348. Two targets in the SaKKHn-pBE-2 recombinant expression vector are T3 and T4 respectively, and the sequences are shown in Table 1.

The nucleotide sequence of the SaKKHn-pBE +3bp-1 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-1 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-2 recombinant expression vector is obtained by replacing the DNA sequences of the origin sgRNA frameworks in the sequence of the SaKKHn-pBE-2 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +8bp-1 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-1 recombinant expression vector with the DNA sequence of the +8bp sgRNA framework shown in the sequence 7 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +8bp-2 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-2 recombinant expression vector with the DNA sequence of the +8bp sgRNA framework shown in the sequence 7 and keeping other sequences unchanged.

The target nucleotide sequence and the corresponding PAM sequence of each vector are shown in table 1.

TABLE 1

Second, obtaining the Positive T0 Rice seedlings

Respectively operating the SaKKHn-pBE-1 vector, the SaKKHn-pBE-2 vector, the SaKKHn-pBE +3bp-1 vector, the SaKKHn-pBE +3bp-2 vector, the SaKKHn-pBE +8bp-1 vector and the SaKKHn-pBE +8bp-2 vector obtained in the first step according to the following steps 1-9:

1. the vector was introduced into Agrobacterium EHA105 (product of Shanghai Diego Biotechnology Ltd., CAT #: AC1010) to obtain recombinant Agrobacterium.

2. Culturing the recombinant Agrobacterium with a medium (YEP medium containing 50. mu.g/ml kanamycin and 25. mu.g/ml rifampicin), shaking at 28 ℃ and 150rpm to OD₆₀₀At room temperature, centrifuging at 10000rpm for 1min, resuspending the thallus with an infection solution (glucose and sucrose are replaced by N6 liquid culture medium, and the concentrations of glucose and sucrose in the infection solution are 10g/L and 20g/L respectively) and diluting to OD₆₀₀And the concentration is 0.2, and an agrobacterium tumefaciens infection solution is obtained.

3. The mature seeds of the rice variety Nipponbare are shelled and threshed, placed in a 100mL triangular flask, added with 70% (v/v) ethanol water solution for soaking for 30sec, then placed in 25% (v/v) sodium hypochlorite water solution, sterilized by shaking at 120rpm for 30min, washed by sterile water for 3 times, sucked by filter paper to remove water, then placed on an N6 solid culture medium with the embryo of the seeds facing downwards, and cultured in dark at 28 ℃ for 4-6 weeks to obtain the callus of the rice.

4. After the step 3 is completed, soaking the rice callus in an agrobacterium infection solution A (the agrobacterium infection solution A is a liquid obtained by adding acetosyringone into the agrobacterium infection solution, the addition amount of the acetosyringone meets the volume ratio of the acetosyringone to the agrobacterium infection solution of 25 mul: 50ml), soaking for 10min, then placing the rice callus on a culture dish (containing about 200ml of the agrobacterium-free infection solution) paved with two layers of sterilization filter paper, and performing dark culture at 21 ℃ for 1 day.

5. And (4) putting the rice callus obtained in the step (4) on a recovery culture medium, and performing dark culture at 25-28 ℃ for 3 days.

6. And (4) placing the rice callus obtained in the step (5) on a screening culture medium, and performing dark culture at 28 ℃ for 2 weeks.

7. And (4) putting the rice callus obtained in the step (6) on a screening culture medium again, and performing dark culture at 28 ℃ for 2 weeks to obtain the rice resistance callus.

8. And (3) putting the rice resistant callus obtained in the step (7) on a differentiation culture medium, performing illumination culture at 25 ℃ for about 1 month, transplanting the differentiated plantlets on a rooting culture medium, and performing illumination culture at 25 ℃ for 2 weeks to obtain rice T0 seedlings.

9. Extracting genome DNA of rice T0 seedling, using the genome DNA as a template, and performing PCR amplification by using a primer pair consisting of a primer F (5'-attatgtagcttgtgcgtttcg-3') and a primer R (5'-ctccacctcattgacattatgc-3') to obtain a PCR amplification product; the PCR amplification product was subjected to agarose gel electrophoresis, followed by judgment as follows: if the PCR amplification product contains DNA fragments of about 898bp, the corresponding rice T0 seedling is a rice positive T0 seedling; if the PCR amplification product does not contain the DNA fragment of about 898bp, the corresponding rice T0 seedling is not the rice positive T0 seedling.

Third, result analysis

1. Taking the genomic DNA of the rice positive T0 seedling obtained in the step two as a template for each vector, and carrying out PCR amplification on a T1 target spot by adopting a primer pair T1 to obtain a PCR amplification product; for the T2 target, carrying out PCR amplification on T2 by adopting a primer pair to obtain a PCR amplification product; for the T3 target, carrying out PCR amplification on T3 by adopting a primer pair to obtain a PCR amplification product; for the T4 target, PCR amplification is carried out by adopting a primer pair T4 to obtain a PCR amplification product.

2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of positive T0 seedlings with C.T base substitution of T1, T2, T3 and T4 was counted, and the C.T base substitution efficiency was calculated, and the results are shown in Table 2.

The results show that for all four targets, compared with the SaKKHn & PmCDA1& UGI base editing system using the origin sgRNA, the SaKKHn & PmCDA1& UGI base editing system using the +3bp sgRNA can improve the C.T base replacement efficiency, and only for the T2 target, the C.T base replacement efficiency is improved by 3 times. And the SaKKHn & PmCDA1& UGI base editing system using +8bp sgRNA is unstable, improves the C.T base replacement efficiency to T2, T3 and T4 targets to different degrees, but reduces the C.T base replacement efficiency to a certain degree to the T1 target. On the overall synergistic level, in addition to the T4 target, the efficiency of realizing C.T base replacement by using the SaKKHn & PmCDA1& UGI base editing system of the +3bp sgRNA is better than that of the SaKKHn & PmCDA1& UGI base editing system of the +8bp sgRNA.

TABLE 2

Example 3 application of +3bp sgRNA to increase the efficiency of C.T base substitution in the SaKKHn & PmCDA1& UGI base editing System

Construction of recombinant expression vector

five recombinant expression vectors containing Original sgRNA: SaKKHn-pBE-3, SaKKHn-pBE-4, SaKKHn-pBE-5, SaKKHn-pBE-6 and SaKKHn-pBE-7;

five recombinant expression vectors containing +3bp sgRNA: SaKKHn-pBE +3bp-3, SaKKHn-pBE +3bp-4, SaKKHn-pBE +3bp-5, SaKKHn-pBE +3bp-6 and SaKKHn-pBE +3 bp-7.

The nucleotide sequence of the SaKKHn-pBE-3 recombinant expression vector is obtained by replacing the 474 th and 995 th positions of the sequence 1 with a sequence 8 and keeping other sequences unchanged. Wherein, the 1 st to 77 th positions of the sequence 8 are nucleotide sequences of tRNA, the 78 th to 174 th positions are nucleotide sequences of sgRNA of targeted OsWaxy gene, and the 98 th to 174 th positions are DNA sequences of origin sgRNA framework. The target point in the SaKKHn-pBE-3 recombinant expression vector is T5, and the sequence is shown in Table 3.

The nucleotide sequence of the SaKKHn-pBE-4 recombinant expression vector is obtained by replacing a T5 target sequence in the sequence of the SaKKHn-pBE-3 recombinant expression vector with a T6 target sequence and keeping other sequences unchanged. The T6 target sequences are shown in Table 3.

The nucleotide sequence of the SaKKHn-pBE-5 recombinant expression vector is obtained by replacing a T5 target sequence in the sequence of the SaKKHn-pBE-3 recombinant expression vector with a T7 target sequence and keeping other sequences unchanged. The T7 target sequences are shown in Table 3.

The nucleotide sequence of the SaKKHn-pBE-6 recombinant expression vector is obtained by replacing a T5 target sequence in the sequence of the SaKKHn-pBE-3 recombinant expression vector with a T8 target sequence and keeping other sequences unchanged. The T8 target sequences are shown in Table 3.

The nucleotide sequence of the SaKKHn-pBE-7 recombinant expression vector is obtained by replacing a T5 target sequence of a sequence in the SaKKHn-pBE-3 recombinant expression vector with a T9 target sequence and keeping other sequences unchanged. The T9 target sequences are shown in Table 3.

The nucleotide sequence of the SaKKHn-pBE +3bp-3 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-3 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-4 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-4 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-5 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-5 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-6 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-6 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The nucleotide sequence of the SaKKHn-pBE +3bp-7 recombinant expression vector is obtained by replacing the DNA sequence of the origin sgRNA framework in the sequence of the SaKKHn-pBE-7 recombinant expression vector with the DNA sequence of the +3bp sgRNA framework shown in the sequence 6 and keeping other sequences unchanged.

The target nucleotide sequence and the corresponding PAM sequence for each vector are shown in table 3.

TABLE 3

Name of target point	Target gene	Target sequence (5 '-3')	PAM	Name of recombinant expression vector
					T5	OsWaxy	tcctcggcgtagtacgggct	CACGGT	SaKKHn-pBE-3；SaKKHn-pBE+3bp-3
T6	OsWaxy	tatccgggcaaggtgagggc	CGTGGT	SaKKHn-pBE-4；SaKKHn-pBE+3bp-4
					T7	OsGRF4	acgccggcaccgccctggct	CTGGGT	SaKKHn-pBE-5；SaKKHn-pBE+3bp-5
T8	OsALS	cccaagcatgcgcagggaca	ACGGGT	SaKKHn-pBE-6；SaKKHn-pBE+3bp-6
					T9	OsALS	cacgtccttcccgctcgagg	CCGGGT	SaKKHn-pBE-7；SaKKHn-pBE+3bp-7

Second, obtaining the Positive T0 Rice seedlings

And (2) operating the SaKKHn-pBE-3 vector, the SaKKHn-pBE-4 vector, the SaKKHn-pBE-5 vector, the SaKKHn-pBE-6 vector, the SaKKHn-pBE-7 vector, the SaKKHn-pBE +3bp-3 vector, the SaKKHn-pBE +3bp-4 vector, the SaKKHn-pBE +3bp-5 vector, the SaKKHn-pBE +3bp-6 vector and the SaKKHn-pBE +3bp-7 vector constructed in the step one according to 1-9 of the step two in the example 2 respectively to obtain a positive T0 seedling of rice.

Third, result analysis

1. Taking the genomic DNA of the rice positive T0 seedling obtained in the step two as a template for each vector, and carrying out PCR amplification on a T5 target spot by adopting a primer pair T5 to obtain a PCR amplification product; for the T6 target, carrying out PCR amplification on T6 by adopting a primer pair to obtain a PCR amplification product; for the T7 target, carrying out PCR amplification on T4 by adopting a primer pair to obtain a PCR amplification product; for the T8 target, carrying out PCR amplification on T8 by adopting a primer pair to obtain a PCR amplification product; for the T9 target, PCR amplification is carried out by adopting a primer pair T9 to obtain a PCR amplification product.

2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of positive T0 seedlings with C.T base substitution of T5, T6, T7, T8 and T9 was counted, and the C.T base substitution efficiency was calculated, and the results are shown in Table 4.

The results show that the SaKKHn & PmCDA1& UGI base editing system using +3bp sgRNA can improve the c.t base replacement efficiency compared with the SaKKHn & PmCDA1& UGI base editing system using Original sgRNA for all five targets. For the T9 target spot only, the SaKKHn & PmCDA1& UGI base editing system using origin sgRNA could not realize C.T base substitution, while the SaKKHn & PmCDA1& UGI base editing system using +3bp sgRNA could successfully realize C.T base substitution.

TABLE 4

The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is possible within the scope of the claims attached below.

Sequence listing

<110> agriculture and forestry academy of sciences of Beijing City

<120> high-efficiency sgRNA and application thereof in gene editing

<160>10

<170>PatentIn version 3.5

<210>1

<211>17400

<212>DNA

<213>Artificial Sequence

<400>1

ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60

ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120

ttaaggtacc gaagcaactt aaagttatca ggcatgcatg gatcttggag gaatcagatg 180

tgcagtcagg gaccatagca caagacaggc gtcttctact ggtgctacca gcaaatgctg 240

gaagccggga acactgggta cgttggaaac cacgtgatgt gaagaagtaa gataaactgt 300

aggagaaaag catttcgtag tgggccatga agcctttcag gacatgtatt gcagtatggg 360

ccggcccatt acgcaattgg acgacaacaa agactagtat tagtaccacc tcggctatcc 420

acatagatca aagctgattt aaaagagttg tgcagatgat ccgtggcgga tccaacaaag 480

caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg ttcgattccc 540

ggctggtgca catccaatgc gatgatcaag gttttagtac tctggaaaca gaatctacta 600

aaacaaggca aaatgccgtg tttatctcgt caacttgttg gcgagataac aaagcaccag 660

tggtctagtg gtagaatagt accctgccac ggtacagacc cgggttcgat tcccggctgg 720

tgcaaatcac cagtggaagc taaggtttta gtactctgga aacagaatct actaaaacaa 780

ggcaaaatgc cgtgtttatc tcgtcaactt gttggcgaga taacaaagca ccagtggtct 840

agtggtagaa tagtaccctg ccacggtaca gacccgggtt cgattcccgg ctggtgcaac 900

cggatttgaa cgatggacgt tttagtactc tggaaacaga atctactaaa acaaggcaaa 960

atgccgtgtt tatctcgtca acttgttggc gagatttttt tttttcgttt tgcattgagt 1020

tttctccgtc gcatgtttgc agttttattt tccgttttgc attgaaattt ctccgtctca 1080

tgtttgcagc gtgttcaaaa agtacgcagc tgtatttcac ttatttacgg cgccacattt 1140

tcatgccgtt tgtgccaact atcccgagct agtgaataca gcttggcttc acacaacact 1200

ggtgacccgc tgacctgctc gtacctcgta ccgtcgtacg gcacagcatt tggaattaaa 1260

gggtgtgatc gatactgctt gctgctaagc ttacaaattc gggtcaaggc ggaagccagc 1320

gcgccacccc acgtcagcaa atacggaggc gcggggttga cggcgtcacc cggtcctaac 1380

ggcgaccaac aaaccagcca gaagaaatta cagtaaaaaa aaagtaaatt gcactttgat 1440

ccacctttta ttacctaagt ctcaatttgg atcaccctta aacctatctt ttcaatttgg 1500

gccgggttgt ggtttggact accatgaaca acttttcgtc atgtctaact tccctttcag 1560

caaacatatg aaccatatat agaggagatc ggccgtatac tagagctgat gtgtttaagg 1620

tcgttgattg cacgagaaaa aaaaatccaa atcgcaacaa tagcaaattt atctggttca 1680

aagtgaaaag atatgtttaa aggtagtcca aagtaaaact tatagataat aaaatgtggt 1740

ccaaagcgta attcactcaa aaaaaatcaa cgagacgtgt accaaacgga gacaaacggc 1800

atcttctcga aatttcccaa ccgctcgctc gcccgcctcg tcttcccgga aaccgcggtg 1860

gtttcagcgt ggcggattct ccaagcagac ggagacgtca cggcacggga ctcctcccac 1920

cacccaaccg ccataaatac cagccccctc atctcctctc ctcgcatcag ctccaccccc 1980

gaaaaatttc tccccaatct cgcgaggctc tcgtcgtcga atcgaatcct ctcgcgtcct 2040

caaggtacgc tgcttctcct ctcctcgctt cgtttcgatt cgatttcgga cgggtgaggt 2100

tgttttgttg ctagatccga ttggtggtta gggttgtcga tgtgattatc gtgagatgtt 2160

taggggttgt agatctgatg gttgtgattt gggcacggtt ggttcgatag gtggaatcgt 2220

ggttaggttt tgggattgga tgttggttct gatgattggg gggaattttt acggttagat 2280

gaattgttgg atgattcgat tggggaaatc ggtgtagatc tgttggggaa ttgtggaact 2340

agtcatgcct gagtgattgg tgcgatttgt agcgtgttcc atcttgtagg ccttgttgcg 2400

agcatgttca gatctactgt tccgctcttg attgagttat tggtgccatg ggttggtgca 2460

aacacaggct ttaatatgtt atatctgttt tgtgtttgat gtagatctgt agggtagttc 2520

ttcttagaca tggttcaatt atgtagcttg tgcgtttcga tttgatttca tatgttcaca 2580

gattagataa tgatgaactc ttttaattaa ttgtcaatgg taaataggaa gtcttgtcgc 2640

tatatctgtc ataatgatct catgttacta tctgccagta atttatgcta agaactatat 2700

tagaatatca tgttacaatc tgtagtaata tcatgttaca atctgtagtt catctatata 2760

atctattgtg gtaatttctt tttactatct gtgtgaagat tattgccact agttcattct 2820

acttatttct gaagttcagg atacgtgtgc tgttactacc tatctgaata catgtgtgat 2880

gtgcctgtta ctatcttttt gaatacatgt atgttctgtt ggaatatgtt tgctgtttga 2940

tccgttgttg tgtccttaat cttgtgctag ttcttaccct atctgtttgg tgattatttc 3000

ttgcagtacg taatggctcc taagaagaag cggaaggttg gcatccacgg tgtcccggcg 3060

gcaaagagaa actacatcct gggtctggcc atcggtatta catcggtggg ctacggcatc 3120

atcgactacg agacaaggga tgtcatcgat gccggcgtcc ggctcttcaa ggaggccaac 3180

gtggagaata acgagggcag gcgctccaag cgcggcgcgc ggaggctgaa gcgcaggcgg 3240

aggcatcgca tccagcgggt gaagaagctc ctcttcgact acaatctgct cacggatcat 3300

tccgagctgt ctggcatcaa cccatacgag gcgcgggtga agggcctgtc ccagaagctc 3360

tcggaggagg agttctcggc ggccctgctg catctcgcga agaggcgcgg cgtgcataat 3420

gtcaatgagg tggaggagga taccggcaat gagctgtcaa ccaaggagca gatcagcagg 3480

aactccaagg cgctggagga gaagtatgtg gcggagctcc agctcgagag gctgaagaag 3540

gatggcgagg tccggggctc catcaatagg ttcaagacat cggactacgt gaaggaggcc 3600

aagcagctcc tgaaggtgca gaaggcgtac caccagctgg accagagctt catcgacacc 3660

tacatcgatc tgctcgagac acgccggacg tactacgagg gcccgggcga gggctcaccg 3720

ttcggctgga aggacatcaa ggagtggtac gagatgctga tgggccactg cacctacttc 3780

cctgaggagc tgaggagcgt gaagtacgcg tacaatgcgg acctctacaa cgccctgaac 3840

gacctcaata acctcgtgat cacgcgcgac gagaatgaga agctcgagta ctacgagaag 3900

ttccagatca tcgagaacgt gttcaagcag aagaagaagc cgaccctcaa gcagatcgcc 3960

aaggagatcc tcgtcaatga ggaggacatc aagggctaca gggtgacctc gaccggcaag 4020

ccagagttca ccaacctgaa ggtctaccac gacatcaagg atatcaccgc ccgcaaggag 4080

atcatcgaga atgcggagct cctggatcag atcgcgaaga tcctcaccat ctaccagtcc 4140

agcgaggaca tccaggagga gctcacgaac ctgaatagcg agctgaccca ggaggagatc 4200

gagcagatct ccaacctcaa gggctacacc ggcacgcaca atctgagcct caaggcgatc 4260

aatctcatcc tcgatgagct ctggcataca aatgataacc agatcgccat cttcaatcgc 4320

ctcaagctgg tcccaaagaa ggtcgatctg tcgcagcaga aggagatccc aacgacactg 4380

gtcgatgact tcatcctctc acctgtcgtg aagaggtcgt tcatccagtc gatcaaggtc 4440

atcaatgcga tcatcaagaa gtacggcctc cctaatgata tcatcatcga gctggcccgc 4500

gagaagaatt caaaggacgc gcagaagatg atcaacgaga tgcagaagag gaatcggcag 4560

acaaacgagc gcatcgagga gatcatccgc acaaccggca aggagaatgc caagtacctg 4620

atcgagaaga tcaagctgca tgacatgcag gagggcaagt gcctctactc actggaggcc 4680

atcccactcg aggacctgct gaataaccca ttcaattacg aggtcgacca tatcatcccg 4740

cgctccgtgt cgttcgacaa ttccttcaat aacaaggtcc tcgtcaagca ggaggagaac 4800

tccaagaagg gcaatcgcac cccgttccag tacctgtcct cttcggacag caagatctct 4860

tacgagacat tcaagaagca catcctcaac ctggccaagg gcaagggccg gatctccaag 4920

accaagaagg agtacctcct ggaggagagg gatatcaacc ggttcagcgt gcagaaggac 4980

ttcatcaatc gcaacctggt cgatacccgg tacgccacca ggggcctcat gaacctgctc 5040

cggtcctact tccgggtgaa caatctcgac gtgaaggtca agagcatcaa cggcggcttc 5100

acctcgttcc tcaggcggaa gtggaagttc aagaaggagc ggaacaaggg ctacaagcac 5160

catgccgagg acgccctcat catcgcgaac gcggacttca tcttcaagga gtggaagaag 5220

ctcgataagg cgaagaaggt catggagaac cagatgttcg aggagaagca ggccgagtcg 5280

atgccagaga tcgagacaga gcaggagtac aaggagatct tcatcacccc gcaccagatc 5340

aagcacatca aggacttcaa ggactacaag tactcccatc gggtcgataa gaagccaaat 5400

cggaagctca tcaatgatac cctctactcg acacgcaagg atgacaaggg caacaccctg 5460

atcgtcaata acctcaatgg cctctacgac aaggataacg acaagctgaa gaagctcatc 5520

aacaagagcc cagagaagct cctcatgtac caccacgatc cgcagacata ccagaagctc 5580

aagctgatca tggagcagta cggcgacgag aagaacccac tctacaagta ctacgaggag 5640

acaggcaact acctgaccaa gtactccaag aaggacaatg gcccagtgat caagaagatc 5700

aagtactacg gcaataagct gaacgcccac ctcgatatca cggacgatta ccctaacagc 5760

cggaataagg tggtcaagct gtccctcaag ccgtaccgct tcgacgtcta cctggataac 5820

ggcgtctaca agttcgtgac agtcaagaat ctcgacgtca tcaagaagga gaactactac 5880

gaggtcaatt ctaagtgcta cgaggaggcc aagaagctca agaagatcag caaccaggcc 5940

gagttcatcg ccagcttcta caagaacgat ctgatcaaga tcaacggcga gctctacagg 6000

gtcatcggcg tgaacaatga cctgctcaat aggatcgagg tgaacatgat cgacatcacc 6060

taccgcgagt acctcgagaa catgaacgat aagcggcctc cacacatcat caagacaatc 6120

gcctctaaga cccagtccat caagaagtac tccacggata tcctcggcaa cctctacgag 6180

gtgaagtcaa agaagcaccc gcagatcatc aagaagggct cggctggagg aggaggcacg 6240

ggaggaggag gctccgccga gtatgtgcgc gcgctcttcg acttcaacgg caatgacgag 6300

gaggatctcc ctttcaagaa gggcgacatc ctccgcatcc gcgataagcc ggaggagcag 6360

tggtggaacg cagaggactc cgagggcaag cggggcatga tcctggtgcc atacgtcgag 6420

aagtacagcg gcgattacaa ggaccacgat ggcgactaca aggatcatga catcgattac 6480

aaggacgatg acgataagtc cggcgtcgac atgacggacg cggagtatgt gcgcatccac 6540

gagaagctcg atatctacac cttcaagaag cagttcttca acaataagaa gtcggtgtcc 6600

catcggtgct acgtcctctt cgagctgaag cgcaggggag agcgccgcgc ctgcttctgg 6660

ggctacgcgg tgaataagcc gcagtcaggc acagagcgcg gcatccacgc cgagatcttc 6720

tcgatccgga aggtcgagga gtacctccgc gacaacccag gccagttcac gatcaattgg 6780

tactccagct ggtccccttg cgcagattgc gcagagaaga tcctcgagtg gtacaaccag 6840

gagctgaggg gcaatggcca taccctcaag atctgggcct gcaagctgta ctacgagaag 6900

aacgcgagga atcagatcgg cctctggaac ctgcgggata atggcgtggg cctcaacgtg 6960

atggtgtccg agcactacca gtgctgccgc aagatcttca tccagtcctc ccacaatcag 7020

ctgaacgaga ataggtggct cgaaaagacc ctgaagcgcg ccgagaagtg gaggagcgag 7080

ctgtctatca tgatccaggt caagatcctg cacaccacaa agtcaccggc ggtgggcggc 7140

ggcggcagcg aattctccgg cggcagcacg aacctcagcg acatcatcga gaaggagaca 7200

ggcaagcagc tcgtgatcca ggagtctatc ctcatgctgc ctgaggaggt ggaggaggtc 7260

atcggcaaca agccggagtc cgatatcctc gtgcacaccg cctacgacga gtcgacagat 7320

gagaatgtca tgctcctgac ctccgacgca ccagagtaca agccatgggc gctcgtgatc 7380

caggattcca acggcgagaa taagatcaag atgctgtctg gcggctcccc gaagaagaag 7440

cgcaaggtct agactagtct gaaatcacca gtctctctct acaaatctat ctctctctat 7500

aataatgtgt gagtagttcc cagataaggg aattagggtt cttatagggt ttcgctcatg 7560

tgttgagcat ataagaaacc cttagtatgt atttgtattt gtaaaatact tctatcaata 7620

aaatttctaa ttcctaaaac caaaatccag tggggcgccc gacctgtact cgcgaaggtt 7680

aacttacaga gagtgtccgg gcgcgcctgg tggatcgtcc gcctaggctg cagtgcagcg 7740

tgacccggtc gtgcccctct ctagagataa tgagcattgc atgtctaagt tataaaaaat 7800

taccacatat tttttttgtc acacttgttt gaagtgcagt ttatctatct ttatacatat 7860

atttaaactt tactctacga ataatataat ctatagtact acaataatat cagtgtttta 7920

gagaatcata taaatgaaca gttagacatg gtctaaagga caattgagta ttttgacaac 7980

aggactctac agttttatct ttttagtgtg catgtgttct cctttttttt tgcaaatagc 8040

ttcacctata taatacttca tccattttat tagtacatcc atttagggtt tagggttaat 8100

ggtttttata gactaatttt tttagtacat ctattttatt ctattttagc ctctaaatta 8160

agaaaactaa aactctattt tagttttttt atttaataat ttagatataa aatagaataa 8220

aataaagtga ctaaaaatta aacaaatacc ctttaagaaa ttaaaaaaac taaggaaaca 8280

tttttcttgt ttcgagtaga taatgccagc ctgttaaacg ccgtcgacga gtctaacgga 8340

caccaaccag cgaaccagca gcgtcgcgtc gggccaagcg aagcagacgg cacggcatct 8400

ctgtcgctgc ctctggaccc ctctcgagag ttccgctcca ccgttggact tgctccgctg 8460

tcggcatcca gaaattgcgt ggcggagcgg cagacgtgag ccggcacggc aggcggcctc 8520

ctcctcctct cacggcaccg gcagctacgg gggattcctt tcccaccgct ccttcgcttt 8580

cccttcctcg cccgccgtaa taaatagaca ccccctccac accctctttc cccaacctcg 8640

tgttgttcgg agcgcacaca cacacaacca gatctccccc aaatccaccc gtcggcacct 8700

ccgcttcaag gtacgccgct cgtcctcccc ccccccccct ctctaccttc tctagatcgg 8760

cgttccggtc catggttagg gcccggtagt tctacttctg ttcatgtttg tgttagatcc 8820

gtgtttgtgt tagatccgtg ctgctagcgt tcgtacacgg atgcgacctg tacgtcagac 8880

acgttctgat tgctaacttg ccagtgtttc tctttgggga atcctgggat ggctctagcc 8940

gttccgcaga cgggatcgat ttcatgattt tttttgtttc gttgcatagg gtttggtttg 9000

cccttttcct ttatttcaat atatgccgtg cacttgtttg tcgggtcatc ttttcatgct 9060

tttttttgtc ttggttgtga tgatgtggtc tggttgggcg gtcgttctag atcggagtag 9120

aattctgttt caaactacct ggtggattta ttaattttgg atctgtatgt gtgtgccata 9180

catattcata gttacgaatt gaagatgatg gatggaaata tcgatctagg ataggtatac 9240

atgttgatgc gggttttact gatgcatata cagagatgct ttttgttcgc ttggttgtga 9300

tgatgtggtg tggttgggcg gtcgttcatt cgttctagat cggagtagaa tactgtttca 9360

aactacctgg tgtatttatt aattttggaa ctgtatgtgt gtgtcataca tcttcatagt 9420

tacgagttta agatggatgg aaatatcgat ctaggatagg tatacatgtt gatgtgggtt 9480

ttactgatgc atatacatga tggcatatgc agcatctatt catatgctct aaccttgagt 9540

acctatctat tataataaac aagtatgttt tataattatt ttgatcttga tatacttgga 9600

tgatggcata tgcagcagct atatgtggat ttttttagcc ctgccttcat acgctattta 9660

tttgcttggt actgtttctt ttgtcgatgc tcaccctgtt gtttggtgtt acttctgcag 9720

gagctcatga aaaagcctga actcaccgcg acgtctgtcg agaagtttct gatcgaaaag 9780

ttcgacagcg tctccgacct gatgcagctc tcggagggcg aagaatctcg tgctttcagc 9840

ttcgatgtag gagggcgtgg atatgtcctg cgggtaaata gctgcgccga tggtttctac 9900

aaagatcgtt atgtttatcg gcactttgca tcggccgcgc tcccgattcc ggaagtgctt 9960

gacattgggg agtttagcga gagcctgacc tattgcatct cccgccgttc acagggtgtc 10020

acgttgcaag acctgcctga aaccgaactg cccgctgttc tacaaccggt cgcggaggct 10080

atggatgcga tcgctgcggc cgatcttagc cagacgagcg ggttcggccc attcggaccg 10140

caaggaatcg gtcaatacac tacatggcgt gatttcatat gcgcgattgc tgatccccat 10200

gtgtatcact ggcaaactgt gatggacgac accgtcagtg cgtccgtcgc gcaggctctc 10260

gatgagctga tgctttgggc cgaggactgc cccgaagtcc ggcacctcgt gcacgcggat 10320

ttcggctcca acaatgtcct gacggacaat ggccgcataa cagcggtcat tgactggagc 10380

gaggcgatgt tcggggattc ccaatacgag gtcgccaaca tcttcttctg gaggccgtgg 10440

ttggcttgta tggagcagca gacgcgctac ttcgagcgga ggcatccgga gcttgcagga 10500

tcgccacgac tccgggcgta tatgctccgc attggtcttg accaactcta tcagagcttg 10560

gttgacggca atttcgatga tgcagcttgg gcgcagggtc gatgcgacgc aatcgtccga 10620

tccggagccg ggactgtcgg gcgtacacaa atcgcccgca gaagcgcggc cgtctggacc 10680

gatggctgtg tagaagtact cgccgatagt ggaaaccgac gccccagcac tcgtccgagg 10740

gcaaagaaat agagtagatg ccgaccggga tctgtcgatc gacaagctcg agtttctcca 10800

taataatgtg tgagtagttc ccagataagg gaattagggt tcctataggg tttcgctcat 10860

gtgttgagca tataagaaac ccttagtatg tatttgtatt tgtaaaatac ttctatcaat 10920

aaaatttcta attcctaaaa ccaaaatcca gtactaaaat ccagatcccc cgaattaatt 10980

cggcgttaat tcagcctgca ggacgcgttt aattaagtgc acgcggccgc ctacttagtc 11040

aagagcctcg cacgcgactg tcacgcggcc aggatcgcct cgtgagcctc gcaatctgta 11100

cctagtgttt aaactatcag tgtttgacag gatatattgg cgggtaaacc taagagaaaa 11160

gagcgtttat tagaataacg gatatttaaa agggcgtgaa aaggtttatc cgttcgtcca 11220

tttgtatgtg catgccaacc acagggttcc cctcgggatc aaagtacttt gatccaaccc 11280

ctccgctgct atagtgcagt cggcttctga cgttcagtgc agccgtcttc tgaaaacgac 11340

atgtcgcaca agtcctaagt tacgcgacag gctgccgccc tgcccttttc ctggcgtttt 11400

cttgtcgcgt gttttagtcg cataaagtag aatacttgcg actagaaccg gagacattac 11460

gccatgaaca agagcgccgc cgctggcctg ctgggctatg cccgcgtcag caccgacgac 11520

caggacttga ccaaccaacg ggccgaactg cacgcggccg gctgcaccaa gctgttttcc 11580

gagaagatca ccggcaccag gcgcgaccgc ccggagctgg ccaggatgct tgaccaccta 11640

cgccctggcg acgttgtgac agtgaccagg ctagaccgcc tggcccgcag cacccgcgac 11700

ctactggaca ttgccgagcg catccaggag gccggcgcgg gcctgcgtag cctggcagag 11760

ccgtgggccg acaccaccac gccggccggc cgcatggtgt tgaccgtgtt cgccggcatt 11820

gccgagttcg agcgttccct aatcatcgac cgcacccgga gcgggcgcga ggccgccaag 11880

gcccgaggcg tgaagtttgg cccccgccct accctcaccc cggcacagat cgcgcacgcc 11940

cgcgagctga tcgaccagga aggccgcacc gtgaaagagg cggctgcact gcttggcgtg 12000

catcgctcga ccctgtaccg cgcacttgag cgcagcgagg aagtgacgcc caccgaggcc 12060

aggcggcgcg gtgccttccg tgaggacgca ttgaccgagg ccgacgccct ggcggccgcc 12120

gagaatgaac gccaagagga acaagcatga aaccgcacca ggacggccag gacgaaccgt 12180

ttttcattac cgaagagatc gaggcggaga tgatcgcggc cgggtacgtg ttcgagccgc 12240

ccgcgcacgt ctcaaccgtg cggctgcatg aaatcctggc cggtttgtct gatgccaagc 12300

tggcggcctg gccggccagc ttggccgctg aagaaaccga gcgccgccgt ctaaaaaggt 12360

gatgtgtatt tgagtaaaac agcttgcgtc atgcggtcgc tgcgtatatg atgcgatgag 12420

taaataaaca aatacgcaag gggaacgcat gaaggttatc gctgtactta accagaaagg 12480

cgggtcaggc aagacgacca tcgcaaccca tctagcccgc gccctgcaac tcgccggggc 12540

cgatgttctg ttagtcgatt ccgatcccca gggcagtgcc cgcgattggg cggccgtgcg 12600

ggaagatcaa ccgctaaccg ttgtcggcat cgaccgcccg acgattgacc gcgacgtgaa 12660

ggccatcggc cggcgcgact tcgtagtgat cgacggagcg ccccaggcgg cggacttggc 12720

tgtgtccgcg atcaaggcag ccgacttcgt gctgattccg gtgcagccaa gcccttacga 12780

catatgggcc accgccgacc tggtggagct ggttaagcag cgcattgagg tcacggatgg 12840

aaggctacaa gcggcctttg tcgtgtcgcg ggcgatcaaa ggcacgcgca tcggcggtga 12900

ggttgccgag gcgctggccg ggtacgagct gcccattctt gagtcccgta tcacgcagcg 12960

cgtgagctac ccaggcactg ccgccgccgg cacaaccgtt cttgaatcag aacccgaggg 13020

cgacgctgcc cgcgaggtcc aggcgctggc cgctgaaatt aaatcaaaac tcatttgagt 13080

taatgaggta aagagaaaat gagcaaaagc acaaacacgc taagtgccgg ccgtccgagc 13140

gcacgcagca gcaaggctgc aacgttggcc agcctggcag acacgccagc catgaagcgg 13200

gtcaactttc agttgccggc ggaggatcac accaagctga agatgtacgc ggtacgccaa 13260

ggcaagacca ttaccgagct gctatctgaa tacatcgcgc agctaccaga gtaaatgagc 13320

aaatgaataa atgagtagat gaattttagc ggctaaagga ggcggcatgg aaaatcaaga 13380

acaaccaggc accgacgccg tggaatgccc catgtgtgga ggaacgggcg gttggccagg 13440

cgtaagcggc tgggttgtct gccggccctg caatggcact ggaaccccca agcccgagga 13500

atcggcgtga cggtcgcaaa ccatccggcc cggtacaaat cggcgcggcg ctgggtgatg 13560

acctggtgga gaagttgaag gccgcgcagg ccgcccagcg gcaacgcatc gaggcagaag 13620

cacgccccgg tgaatcgtgg caagcggccg ctgatcgaat ccgcaaagaa tcccggcaac 13680

cgccggcagc cggtgcgccg tcgattagga agccgcccaa gggcgacgag caaccagatt 13740

ttttcgttcc gatgctctat gacgtgggca cccgcgatag tcgcagcatc atggacgtgg 13800

ccgttttccg tctgtcgaag cgtgaccgac gagctggcga ggtgatccgc tacgagcttc 13860

cagacgggca cgtagaggtttccgcagggc cggccggcat ggccagtgtg tgggattacg 13920

acctggtact gatggcggtt tcccatctaa ccgaatccat gaaccgatac cgggaaggga 13980

agggagacaa gcccggccgc gtgttccgtc cacacgttgc ggacgtactc aagttctgcc 14040

ggcgagccga tggcggaaag cagaaagacg acctggtaga aacctgcatt cggttaaaca 14100

ccacgcacgt tgccatgcag cgtacgaaga aggccaagaa cggccgcctg gtgacggtat 14160

ccgagggtga agccttgatt agccgctaca agatcgtaaa gagcgaaacc gggcggccgg 14220

agtacatcga gatcgagcta gctgattgga tgtaccgcga gatcacagaa ggcaagaacc 14280

cggacgtgct gacggttcac cccgattact ttttgatcga tcccggcatc ggccgttttc 14340

tctaccgcct ggcacgccgc gccgcaggca aggcagaagc cagatggttg ttcaagacga 14400

tctacgaacg cagtggcagc gccggagagt tcaagaagtt ctgtttcacc gtgcgcaagc 14460

tgatcgggtc aaatgacctg ccggagtacg atttgaagga ggaggcgggg caggctggcc 14520

cgatcctagt catgcgctac cgcaacctga tcgagggcga agcatccgcc ggttcctaat 14580

gtacggagca gatgctaggg caaattgccc tagcagggga aaaaggtcga aaaggtctct 14640

ttcctgtgga tagcacgtac attgggaacc caaagccgta cattgggaac cggaacccgt 14700

acattgggaa cccaaagccg tacattggga accggtcaca catgtaagtg actgatataa 14760

aagagaaaaa aggcgatttt tccgcctaaa actctttaaa acttattaaa actcttaaaa 14820

cccgcctggc ctgtgcataa ctgtctggcc agcgcacagc cgaagagctg caaaaagcgc 14880

ctacccttcg gtcgctgcgc tccctacgcc ccgccgcttc gcgtcggcct atcgcggccg 14940

ctggccgctc aaaaatggct ggcctacggc caggcaatct accagggcgc ggacaagccg 15000

cgccgtcgcc actcgaccgc cggcgcccac atcaaggcac cctgcctcgc gcgtttcggt 15060

gatgacggtg aaaacctctg acacatgcag ctcccggaga cggtcacagc ttgtctgtaa 15120

gcggatgccg ggagcagaca agcccgtcag ggcgcgtcag cgggtgttgg cgggtgtcgg 15180

ggcgcagcca tgacccagtc acgtagcgat agcggagtgt atactggctt aactatgcgg 15240

catcagagca gattgtactg agagtgcacc atatgcggtg tgaaataccg cacagatgcg 15300

taaggagaaa ataccgcatc aggcgctctt ccgcttcctc gctcactgac tcgctgcgct 15360

cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 15420

cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 15480

accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 15540

acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 15600

cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 15660

acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 15720

atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 15780

agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 15840

acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 15900

gtgctacaga gttcttgaag tggtggccta actacggcta cactagaagg acagtatttg 15960

gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 16020

gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 16080

gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac gctcagtgga 16140

acgaaaactc acgttaaggg attttggtca tgcattctag gtactaaaac aattcatcca 16200

gtaaaatata atattttatt ttctcccaat caggcttgat ccccagtaag tcaaaaaata 16260

gctcgacata ctgttcttcc ccgatatcct ccctgatcga ccggacgcag aaggcaatgt 16320

cataccactt gtccgccctg ccgcttctcc caagatcaat aaagccactt actttgccat 16380

ctttcacaaa gatgttgctg tctcccaggt cgccgtggga aaagacaagt tcctcttcgg 16440

gcttttccgt ctttaaaaaa tcatacagct cgcgcggatc tttaaatgga gtgtcttctt 16500

cccagttttc gcaatccaca tcggccagat cgttattcag taagtaatcc aattcggcta 16560

agcggctgtc taagctattc gtatagggac aatccgatat gtcgatggag tgaaagagcc 16620

tgatgcactc cgcatacagc tcgataatct tttcagggct ttgttcatct tcatactctt 16680

ccgagcaaag gacgccatcg gcctcactca tgagcagatt gctccagcca tcatgccgtt 16740

caaagtgcag gacctttgga acaggcagct ttccttccag ccatagcatc atgtcctttt 16800

cccgttccac atcataggtg gtccctttat accggctgtc cgtcattttt aaatataggt 16860

tttcattttc tcccaccagc ttatatacct tagcaggaga cattccttcc gtatctttta 16920

cgcagcggta tttttcgatc agttttttca attccggtga tattctcatt ttagccattt 16980

attatttcct tcctcttttc tacagtattt aaagataccc caagaagcta attataacaa 17040

gacgaactcc aattcactgt tccttgcatt ctaaaacctt aaataccaga aaacagcttt 17100

ttcaaagttgttttcaaagt tggcgtataa catagtatcg acggagccga ttttgaaacc 17160

gcggtgatca caggcagcaa cgctctgtca tcgttacaat caacatgcta ccctccgcga 17220

gatcatccgt gtttcaaacc cggcagctta gttgccgttc ttccgaatag catcggtaac 17280

atgagcaaag tctgccgcct tacaacggct ctcccgctga cgccgtcccg gactgatggg 17340

ctgcctgtat cgagtggtga ttttgtgccg agctgccggt cggggagctg ttggctggct 17400

<210>2

<211>1071

<212>PRT

<213>Artificial Sequence

<400>2

Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala

1 5 10 15

Ala Lys Arg Asn Tyr Ile Leu Gly Leu Ala Ile Gly Ile Thr Ser Val

20 25 30

Gly Tyr Gly Ile Ile Asp Tyr Glu Thr Arg Asp Val Ile Asp Ala Gly

35 40 45

Val Arg Leu Phe Lys Glu Ala Asn Val Glu Asn Asn Glu Gly Arg Arg

50 55 60

Ser Lys Arg Gly Ala Arg Arg Leu Lys Arg Arg Arg Arg His Arg Ile

65 70 75 80

Gln Arg Val Lys Lys Leu Leu Phe Asp Tyr Asn Leu Leu Thr Asp His

85 90 95

Ser Glu Leu Ser Gly Ile Asn Pro Tyr Glu Ala Arg Val Lys Gly Leu

100 105 110

Ser Gln Lys Leu Ser Glu Glu Glu Phe Ser Ala Ala Leu Leu His Leu

115 120 125

Ala Lys Arg Arg Gly Val His Asn Val Asn Glu Val Glu Glu Asp Thr

130 135 140

Gly Asn Glu Leu Ser Thr Lys Glu Gln Ile Ser Arg Asn Ser Lys Ala

145 150 155 160

Leu Glu Glu Lys Tyr Val Ala Glu Leu Gln Leu Glu Arg Leu Lys Lys

165 170 175

Asp Gly Glu Val Arg Gly Ser Ile Asn Arg Phe Lys Thr Ser Asp Tyr

180 185 190

Val Lys Glu Ala Lys Gln Leu Leu Lys Val Gln Lys Ala Tyr His Gln

195 200 205

Leu Asp Gln Ser Phe Ile Asp Thr Tyr Ile Asp Leu Leu Glu Thr Arg

210 215 220

Arg Thr Tyr Tyr Glu Gly Pro Gly Glu Gly Ser Pro Phe Gly Trp Lys

225 230 235 240

Asp Ile Lys Glu Trp Tyr Glu Met Leu Met Gly His Cys Thr Tyr Phe

245 250 255

Pro Glu Glu Leu Arg Ser Val Lys Tyr Ala Tyr Asn Ala Asp Leu Tyr

260 265 270

Asn Ala Leu Asn Asp Leu Asn Asn Leu Val Ile Thr Arg Asp Glu Asn

275 280 285

Glu Lys Leu Glu Tyr Tyr Glu Lys Phe Gln Ile Ile Glu Asn Val Phe

290 295 300

Lys Gln Lys Lys Lys Pro Thr Leu Lys Gln Ile Ala Lys Glu Ile Leu

305 310 315 320

Val Asn Glu Glu Asp Ile Lys Gly Tyr Arg Val Thr Ser Thr Gly Lys

325 330 335

Pro Glu Phe Thr Asn Leu Lys Val Tyr His Asp Ile Lys Asp Ile Thr

340 345 350

Ala Arg Lys Glu Ile Ile Glu Asn Ala Glu Leu Leu Asp Gln Ile Ala

355 360 365

Lys Ile Leu Thr Ile Tyr Gln Ser Ser Glu Asp Ile Gln Glu Glu Leu

370 375 380

Thr Asn Leu Asn Ser Glu Leu Thr Gln Glu Glu Ile Glu Gln Ile Ser

385 390 395 400

Asn Leu Lys Gly Tyr Thr Gly Thr His Asn Leu Ser Leu Lys Ala Ile

405 410 415

Asn Leu Ile Leu Asp Glu Leu Trp His Thr Asn Asp Asn Gln Ile Ala

420 425 430

Ile Phe Asn Arg Leu Lys Leu Val Pro Lys Lys Val Asp Leu Ser Gln

435 440 445

Gln Lys Glu Ile Pro Thr Thr Leu Val Asp Asp Phe Ile Leu Ser Pro

450 455 460

Val Val Lys Arg Ser Phe Ile Gln Ser Ile Lys Val Ile Asn Ala Ile

465 470 475 480

Ile Lys Lys Tyr Gly Leu Pro Asn Asp Ile Ile Ile Glu Leu Ala Arg

485 490 495

Glu Lys Asn Ser Lys Asp Ala Gln Lys Met Ile Asn Glu Met Gln Lys

500 505 510

Arg Asn Arg Gln Thr Asn Glu Arg Ile Glu Glu Ile Ile Arg Thr Thr

515 520 525

Gly Lys Glu Asn Ala Lys Tyr Leu Ile Glu Lys Ile Lys Leu His Asp

530 535 540

Met Gln Glu Gly Lys Cys Leu Tyr Ser Leu Glu Ala Ile Pro Leu Glu

545 550 555 560

Asp Leu Leu Asn Asn Pro Phe Asn Tyr Glu Val Asp His Ile Ile Pro

565 570 575

Arg Ser Val Ser Phe Asp Asn Ser Phe Asn Asn Lys Val Leu Val Lys

580 585 590

Gln Glu Glu Asn Ser Lys Lys Gly Asn Arg Thr Pro Phe Gln Tyr Leu

595 600 605

Ser Ser Ser Asp Ser Lys Ile Ser Tyr Glu Thr Phe Lys Lys His Ile

610 615 620

Leu Asn Leu Ala Lys Gly Lys Gly Arg Ile Ser Lys Thr Lys Lys Glu

625 630 635 640

Tyr Leu Leu Glu Glu Arg Asp Ile Asn Arg Phe Ser Val Gln Lys Asp

645 650 655

Phe Ile Asn Arg Asn Leu Val Asp Thr Arg Tyr Ala Thr Arg Gly Leu

660 665 670

Met Asn Leu Leu Arg Ser Tyr Phe Arg Val Asn Asn Leu Asp Val Lys

675 680 685

Val Lys Ser Ile Asn Gly Gly Phe Thr Ser Phe Leu Arg Arg Lys Trp

690 695 700

Lys Phe Lys Lys Glu Arg Asn Lys Gly Tyr Lys His His Ala Glu Asp

705 710 715 720

Ala Leu Ile Ile Ala Asn Ala Asp Phe Ile Phe Lys Glu Trp Lys Lys

725 730 735

Leu Asp Lys Ala Lys Lys Val Met Glu Asn Gln Met Phe Glu Glu Lys

740 745 750

Gln Ala Glu Ser Met Pro Glu Ile Glu Thr Glu Gln Glu Tyr Lys Glu

755 760 765

Ile Phe Ile Thr Pro His Gln Ile Lys His Ile Lys Asp Phe Lys Asp

770 775 780

Tyr Lys Tyr Ser His Arg Val Asp Lys Lys Pro Asn Arg Lys Leu Ile

785 790 795 800

Asn Asp Thr Leu Tyr Ser Thr Arg Lys Asp Asp Lys Gly Asn Thr Leu

805 810 815

Ile Val Asn Asn Leu Asn Gly Leu Tyr Asp Lys Asp Asn Asp Lys Leu

820 825 830

Lys Lys Leu Ile Asn Lys Ser Pro Glu Lys Leu Leu Met Tyr His His

835 840 845

Asp Pro Gln Thr Tyr Gln Lys Leu Lys Leu Ile Met Glu Gln Tyr Gly

850 855 860

Asp Glu Lys Asn Pro Leu Tyr Lys Tyr Tyr Glu Glu Thr Gly Asn Tyr

865 870 875 880

Leu Thr Lys Tyr Ser Lys Lys Asp Asn Gly Pro Val Ile Lys Lys Ile

885 890 895

Lys Tyr Tyr Gly Asn Lys Leu Asn Ala His Leu Asp Ile Thr Asp Asp

900 905 910

Tyr Pro Asn Ser Arg Asn Lys Val Val Lys Leu Ser Leu Lys Pro Tyr

915 920 925

Arg Phe Asp Val Tyr Leu Asp Asn Gly Val Tyr Lys Phe Val Thr Val

930 935 940

Lys Asn Leu Asp Val Ile Lys Lys Glu Asn Tyr Tyr Glu Val Asn Ser

945 950 955 960

Lys Cys Tyr Glu Glu Ala Lys Lys Leu Lys Lys Ile Ser Asn Gln Ala

965 970 975

Glu Phe Ile Ala Ser Phe Tyr Lys Asn Asp Leu Ile Lys Ile Asn Gly

980 985 990

Glu Leu Tyr Arg Val Ile Gly Val Asn Asn Asp Leu Leu Asn Arg Ile

995 1000 1005

Glu Val Asn Met Ile Asp Ile Thr Tyr Arg Glu Tyr Leu Glu Asn

1010 1015 1020

Met Asn Asp Lys Arg Pro Pro His Ile Ile Lys Thr Ile Ala Ser

1025 1030 1035

Lys Thr Gln Ser Ile Lys Lys Tyr Ser Thr Asp Ile Leu Gly Asn

1040 1045 1050

Leu Tyr Glu Val Lys Ser Lys Lys His Pro Gln Ile Ile Lys Lys

1055 1060 1065

Gly Ser Ala

1070

<210>3

<211>208

<212>PRT

<213>Artificial Sequence

<400>3

Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu Asp Ile Tyr

1 5 10 15

Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val Ser His Arg

20 25 30

Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg Arg Ala Cys

35 40 45

Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr Glu Arg Gly

50 55 60

Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu Tyr Leu Arg

65 70 75 80

Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser Trp Ser Pro

85 90 95

Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn Gln Glu Leu

100 105 110

Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys Leu Tyr Tyr

115 120 125

Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu Arg Asp Asn

130 135 140

Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln Cys Cys Arg

145 150 155 160

Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu Asn Arg Trp

165 170 175

Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser Glu Leu Ser

180 185 190

Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser Pro Ala Val

195 200 205

<210>4

<211>98

<212>PRT

<213>Artificial Sequence

<400>4

Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly

1 5 10 15

Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val

20 25 30

Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr

35 40 45

Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp

50 55 60

Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly

65 70 75 80

Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg

85 90 95

Lys Val

<210>5

<211>522

<212>DNA

<213>Artificial Sequence

<400>5

aacaaagcac cagtggtcta gtggtagaat agtaccctgc cacggtacag acccgggttc 60

gattcccggc tggtgcagat gccacacagc aaggagtgtt ttagtactct ggaaacagaa 120

tctactaaaa caaggcaaaa tgccgtgttt atctcgtcaa cttgttggcg agataacaaa 180

gcaccagtgg tctagtggta gaatagtacc ctgccacggt acagacccgg gttcgattcc 240

cggctggtgc acagaaccga caacagatga ggttttagta ctctggaaac agaatctact 300

aaaacaaggc aaaatgccgt gtttatctcg tcaacttgtt ggcgagataa caaagcacca 360

gtggtctagt ggtagaatag taccctgcca cggtacagac ccgggttcga ttcccggctg 420

gtgcaccagc tcatttggct cggcggtttt agtactctgg aaacagaatc tactaaaaca 480

aggcaaaatg ccgtgtttat ctcgtcaact tgttggcgag at 522

<210>6

<211>83

<212>DNA

<213>Artificial Sequence

<400>6

gttttagtac tctgctggaa acagcagaat ctactaaaac aaggcaaaat gccgtgttta 60

tctcgtcaac ttgttggcga gat 83

<210>7

<211>93

<212>DNA

<213>Artificial Sequence

<400>7

gttttagtac tctgtaattt tagaaataaa attacagaat ctactaaaac aaggcaaaat 60

gccgtgttta tctcgtcaac ttgttggcga gat 93

<210>8

<211>174

<212>DNA

<213>Artificial Sequence

<400>8

aacaaagcac cagtggtcta gtggtagaat agtaccctgc cacggtacag acccgggttc 60

gattcccggc tggtgcatcc tcggcgtagt acgggctgtt ttagtactct ggaaacagaa 120

tctactaaaa caaggcaaaa tgccgtgttt atctcgtcaa cttgttggcg agat 174

<210>9

<211>77

<212>RNA

<213>Artificial Sequence

<400>9

guuuuaguac ucuggaaaca gaaucuacua aaacaaggca aaaugccgug uuuaucucgu 60

caacuuguug gcgagau 77

<210>10

<211>83

<212>RNA

<213>Artificial Sequence

<400>10

guuuuaguac ucugcuggaa acagcagaau cuacuaaaac aaggcaaaau gccguguuua 60

ucucgucaac uuguuggcga gau 83

Claims

1. A kit comprising a sgRNA or a biological material associated with the sgRNA, a Cas9 nuclease or a biological material associated with the Cas9 nuclease, a cytosine deaminase or a biological material associated with the cytosine deaminase;

the sgRNA targets a target sequence;

the RNA segment A and the RNA segment B are reversely complementary;

the sizes of the RNA fragment A and the RNA fragment B are both 3 nt;

the sgRNA backbone is m1) or m2) or m 3):

m1) the RNA molecule shown as the sequence 9;

m3) and m1) or m2) and has the same function.

2. The kit of claim 1, wherein: the engineered sgRNA backbone is n1) or n2) or n 3):

n1) the RNA molecule shown as the sequence 10;

n3) and n1) or n2) and has the same function.

3. The kit of claim 1 or 2, wherein: the Cas9 nuclease is a SaKKHn protein;

the SaKKHn protein is E1) or E2) or E3):

E1) the amino acid sequence is a protein shown in a sequence 2;

the biological material related to the SaKKHn is any one of F1) to F5):

F1) a nucleic acid molecule encoding said SaKKHn protein;

F2) an expression cassette comprising the nucleic acid molecule of F1);

4. The kit of claim 1 or 2, wherein: the cytosine deaminase is PmCDA1 protein;

the PmCDA1 protein is G1) or G2) or G3):

G1) the amino acid sequence is a protein shown in a sequence 3;

the biological material related to the PmCDA1 protein is any one of H1) to H5):

H1) a nucleic acid molecule encoding the PmCDA1 protein;

H2) an expression cassette comprising the nucleic acid molecule of H1);

5. The kit of any one of claims 1 to 4, wherein: the sgRNA is tRNA-sgRNA;

the tRNA is 1) or 2) or 3):

6. The kit of any one of claims 1 to 5, wherein: the kit further comprises a UGI protein or a biological material associated with the UGI protein;

the UGI protein is I1) or I2) or I3):

I1) the amino acid sequence is a protein shown in a sequence 4;

the biological material related to the UGI protein is any one of J1) to J5):

J1) a nucleic acid molecule encoding the UGI protein;

J2) an expression cassette comprising the nucleic acid molecule of J1);

7. The sgRNA of any one of claims 1-6 or the engineered sgRNA backbone of any one of claims 1-6.

8. The kit of any one of claims 1-6, or the sgRNA of claim 7, or the modified sgRNA backbone of claim 7, for use in any one of X1) -X4):

9, Y1) or Y2):

y1) or a method of increasing the efficiency of editing a genomic target sequence of an organism or a cell of an organism, comprising expressing the sgRNA of any one of claims 1 to 6, the Cas9 nuclease of any one of claims 1 to 6, the cytosine deaminase of any one of claims 1 to 6 in the organism or cell of the organism to effect editing of the genomic target sequence; the sgRNA targets the target sequence;

10. The kit of any one of claims 1 to 6 or the use of claim 8 or the method of claim 9, wherein:

editing the genome target sequence to mutate C in the target sequence into T;

and/or, the organism is S1) or S2) or S3) or S4):

s1) plants or animals;

s2) a monocot or dicot;

s3) gramineous plants;

s4) rice;

and/or, the biological cell is T1) or T2) or T3) or T4):

t1) plant cells or animal cells;

t2) a monocotyledonous or dicotyledonous plant cell;

t3) graminaceous plant cells;

t4) rice cells.