CN109411022A

CN109411022A - A kind of gRNA of structure containing PAM targeting sequence screening method and application based on character microtomy

Info

Publication number: CN109411022A
Application number: CN201811317917.9A
Authority: CN
Inventors: 陈晓军; 樊云芳; 马斯霜; 白海波; 惠建; 李树华
Original assignee: Agricultural Biotechnology Research Center Of Ningxia Academy Of Agriculture And Forestry Sciences (ningxia Key Laboratory Of Agricultural Biotechnology)
Current assignee: Agricultural Biotechnology Research Center Of Ningxia Academy Of Agriculture And Forestry Sciences (ningxia Key Laboratory Of Agricultural Biotechnology)
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2019-03-01

Abstract

The invention belongs to genetic engineering field more particularly to a kind of Motif of Adjacent containing PAM(Protospacer based on character microtomy) structure gRNA targeting sequence screening method and application system.The following steps are included: (1) reads in deoxynucleotide (DNA) sequential file data of target gene；(2) interactive interface input needs the PAM sequence of Analysis and Screening；(3) it interprets PAM module and specified PAM motif is converted into character lists；(4) compare substring module PAM motif is converted into character lists and be compared one by one with moving window sequence given position, and decision logic relationship.(5) search in given DNA sequence dna and its reverse complementary sequence of global search module meets the sequence of condition and stores into empty list.(6) file output module is presented result with text or spreadsheet.The present invention solves the problems, such as that the prior art cannot achieve the screening of the gRNA target sequence to screening in given DNA sequence dna containing any PAM identification motif, establishes technical foundation for the assessment and selection of next step gRNA target sequence.

Description

A kind of gRNA of structure containing PAM targeting sequence screening method based on character microtomy And application

Technical field

The invention belongs to genetic engineering field more particularly to it is a kind of based on character microtomy contain PAM (Protospacer Adjacent Motif) structure gRNA targets sequence screening method and application system.

Background technique

CRISPR(Clustered regularly interspaced short palindromicrepeats) rule The short palindrome in cluster interval repeats；Cas9(CRISPR associated nuclease) it is CRISPR associated nucleic acid enzyme, CCRISPR/Cas9 is a kind of most emerging skill for being instructed by RNA, being edited using Cas9 nuclease to target gene Art.

CRISPR/Cas9 system is widely present in prokaryotic gene, is bacterium and archeobacteria for reply virus and matter Grain is constantly attacked and the acquired immunity defense mechanism of coming that develops.In these organisms, the exogenic heredity object from bacteriophage Matter obtains and is integrated into the site CRISPR.The segment of these sequence specifics is transcribed into the RNA(CRISPR- of short CRISPR DerivedRNA), crRNA is combined with tracrRNA(trans-activating RNA) by base pairing and is formed double-stranded RNA, Then tracrRNA/crRNA complex instructs Cas9 albumen to cut off double-stranded DNA.

Once crRNA is integrated to Cas9, the conformation of Cas9 nuclease changes, and generates an energy channel, DNA is allowed more to hold Easily combine.Cas9/crRNA complex can identify that the site PAM (5'-NGG) causes DNA to untwist, and crRNA is made to find the site PAM Adjacent DNA complementary strand.When Cas9 is integrated to, the site PAM is adjacent, on the DNA sequence dna complementary with crRNA, REC interlobar part Bridge spiral and target DNA form RNA-DNA heteroduplex structure.The identification in the site PAM includes that can make target DNA double-strand break (DSB) HNH and RuvC nuclear fragmentation activation, lead to DNA degradation.If crRNA is not complementary with target DNA, Cas9 will be released, Find the new site PAM.It can be engaged by nonhomologous end after linear target gene group fracture in DNA (NHEJ) or homologous The reparation (HDR) of mediation is repaired, and nonhomologous end engagement (NHEJ) can cause insertion or deletion error, to reach The purpose of certain gene is knocked out to fixed point.

During carrying out genome editor using CRISPR/Cas9 system, tracrRNA and crRNA can be fused into Targeting shearing can equally be played the role of for 1 RNA(sgRNA) expression.Therefore our CRISPR/Cas9 tool now Only Cas9 nuclease and gRNA two parts.Using this tool, can the very convenient editor for efficiently carrying out any gene change It makes, for example gene knockout, knocks in, rite-directed mutagenesis etc..Compared with ZFN and TALEN technology, CRISPR/Cas9 technology has The features such as vector construction is simple, gene editing is high-efficient, at low cost is now widely used for gene functional research and animals and plants essence The fields such as quasi-molecule breeding [1-2].

PAM sequence is to combine target essential, and particular sequence depends on the type of Cas9.Currently, widely used The PAM that streptococcus pyogenes (Streptococcus pyogenes) Sp Cas9 is identified in plant is mainly [3] NGG. In order to extend editor range of the CRISPR/Cas9 in genome, it is different that people identify identification from different microorganisms The homologous protein of PAM, as Streptococcus thermophilus (Streptococcus thermophiles) CRISPR3 Cas9 is identified NGGNG PAM [4], streptococcus thermophilus (Streptococcus thermophiles) CRISPR1 Cas9 identify NNAGAAW PAM [5], Neisseria meningitidis (Neisseria meningitides) Nme Cas9 identify NNNNGATT PAM [6], gold Yellow staphylococcus pyogenes (Staphylococcus aureus) Sa Cas9 identifies GGAGT PAM [7], campylobacter jejuni (Campylobacter jejuni) CjCas9 identifies NNNNACAC or NNNNRYAC PAM [8].Sp Cas9 variant VQR (D1135V/R1335Q/T1337R) and VRER (D1135V/G1218R/R1335E/T1337R) can identify NGA respectively PAM and NGCG PAM [9]；And variant x Cas9 can identify 3 kinds of PAM [10] of NG, GAA and GAT.

With going deep into for research, scientists have found a variety of CRISPR/Cas systems, can according to the quantity of Cas albumen To be divided into two classes (Class I and Class II), can be divided into 6 kinds (Type I ~ VI) according to the structure and function of Cas, and Multiple hypotypes (Subtype) can be further divided into.Compared to Class I, Class II only needs a Cas albumen, therefore mesh Common system is Class II, such as Cas9 in preceding gene editing, and does not need the Cpf1 of tracr RNA (Cas12a) and the Cas13 with RNA cleavage activity [11].Either Class I and Class II Cas9, is cutting In DNA or RNA, require to identify the target sequence containing PAM structure.

In biomedicine field, it is accurately fixed that scientists can carry out target sequence using CRISPR/Cas9 technology Bit manipulation provides new treatment means for genetic disease, cancer and disease of viral infection etc..Genetic disease is always The refractory disease of human health, the discovery of genetic disease related mutation gene are threatened, CRISPR- is arrived in the building of mouse model Cas9 treats genetic disease, before which has light in fields such as human genetic diseases' fundamental research, clinical treatments Scape [12-13].

In crop breeding, the dependency basis such as CRISPR/Cas9 gene editing technical antagonism, yield, quality and fertility The fixed point editor of cause, make its in crop orientation genetic improvement using more and more extensive, struck including to unfavorable gene It removes, the editor to beneficial gene control region, the regulation to apparent gene etc. [11,14-18].

However the gene editing of animal, plant and microorganism is operated, require the screening and assessment to target sequence.But It is that PAM identification motif complicated and changeable brings certain difficulty to the screening of gRNA target sequence, and especially PAM motif has letter And property when.Currently, there is no specifically for gRNA target sequence algorithm and application system containing any PAM identification motif.

The present invention is based on character microtomies, using simple scripting language (python), quickly and accurately from DNA sequence It is filtered out in column containing being arbitrarily designated the gRNA target sequence of PAM motif, while calculating the G/C content of the sequence, in DNA sequence dna Position and the position of chain, and presented with data form or text formatting as a result, being the assessment and selection of next step gRNA target sequence Establish technical foundation.

Summary of the invention

Technical problem to be solved by the present invention lies in provide one kind based on character microtomy quickly and accurately from DNA It is filtered out in sequence containing the gRNA target sequence method and system for being arbitrarily designated PAM motif, it is intended to which solving the prior art can not be real Now to the problem of screening identifies the gRNA target sequence of motif containing any PAM in given DNA sequence dna.

The invention is realized in this way a kind of quickly and accurately filtered out from DNA sequence dna based on character microtomy is contained There is the gRNA target sequence method for being arbitrarily designated PAM motif, comprising:

File output module: specified path inputs DNA or cDNA in the form of text.

It calculates GC module: calculating gRNA target sequence G/C content.

DNA complementation module: given DNA sequence dna is converted into reverse complementary sequence.

The reversed reordering module of DNA: given DNA sequence dna is converted into reverse sequence.

It interprets PAM module: being inputted by user and specified PAM motif is converted into character lists.

Substring position module: initial position of the substring in total character string is returned to character style.

Compare substring module: by PAM motif be converted into character lists one by one with moving window sequence given position into Row compares, and decision logic relationship.

Global search module: search meets the sequence of condition and deposits in given DNA sequence dna and its reverse complementary sequence It stores up in empty list.

File output module: result is presented with text or spreadsheet.

Further, the calculating GC module is specifically used for:

G/C content is calculated according to list entries.

def gc(s):

gcc = 100 * (s.count('G') + s.count('C')) / len(s)

return str(float('%.3f' % gcc))+'%'。

Further, the DNA complementation module is specifically used for:

Former given sequence is converted into reverse complementary sequence using character string Hash substitution method, specifically replaces original with " T " " A " in sequence;" A " replaces " T " in former sequence;" C " in former sequence is replaced with " G ";It is replaced in former sequence with " C " " G ", finally gets up string-concatenation.

def DNA_complement(sequence):

sequence = sequence.upper()

basecomplement = {"A": "T",

"G": "C",

"T": "A",

"C": "G",

}

letters = list(sequence)

letters = [basecomplement[base] for base in letters]

return ''.join(letters)。

Further, the reversed reordering module of the DNA is specifically used for:

It is obtained in the way of the output of character string inverted order.

def DNA_reverse(sequence):

sequence = sequence.upper()

return sequence[::-1]。

Further, the interpretation PAM module is specifically used for:

Forbidden character is excluded first, and single or degeneracy base is then reduced into single or multiple alkali using Hash dictionary format The form of base, and put into PAM list with character style.

def pamstring(string):

string = string.upper()

Print " the PAM structure that you input is: ", string

convert = []

dict = {'N': 'AGTC',

'M': 'AC',

'K': 'GT',

'S': 'CG',

'Y': 'TC',

'W': 'AT',

'R': 'AG',

'A': 'A',

'T': 'T',

'C': 'C',

'G': 'G',

'V': 'ACG',

'H': 'ACT',

'D': 'AGT',

'B': 'CGT'}

for i in string:

if i not in "MRWSYKVHDBNATGC":

print "PAM including illegal string"

convert = []

return convert

else:

convert.append(dict[i])

return convert。

Further, the substring position module is specifically used for:

Corresponding location information is obtained using the method for character index.

def substr(substr, strtotal):

start_position = strtotal.index(substr) + 1

end_position = start_position + len(substr) - 1

return str(start_position), str(end_position)。

Further, the relatively substring module is specifically used for:

Each character in PAM list is taken out using loop control, then carries out judgement ratio with moving window designated position sequence Compared with, and comparison result is returned, as shown in Fig. 1.

def compare(l1,str2):

j = 0

flag = []

last = False

for i in l1:

if str2[-len(l1)+j] in i:

flag.append(True)

else:

flag.append(False)

j +=1

for i in flag:

if i == False:

last = False

break

else:last = True

return last。

Further, the global search module is specifically used for:

In given DNA sequence dna and using DNA, reversed, complementary module is obtained in reverse complementary sequence, and continuous moving window taking-up refers to The sequence of measured length is compared using substring module is compared, while carrying out sequence information, GC to the sequence for the condition that meets Content, chain location information store into a list.

def findpam(s,pamlist):

s = s.strip()

s = s.upper()

if len(s) < 30: return 'sequence must be >30 base'

i = 0

j = 30

l = []

while len(s) - j >= 0:

fragment = s[i:j]

p1 = substr(fragment, s)[0]

p2 = substr(fragment, s)[1]

if compare(pamlist, fragment):

l.append(fragment)

l.append(p1)

l.append(p2)

l.append(gc(fragment))

l.append('+')

if compare(pamlist, DNA_reverse(DNA_complement(fragment))) :

l.append(fragment)

l.append(p1)

l.append(p2)

l.append(gc(fragment))

l.append('-')

i += 1

j += 1

return l。

Further, the file output module is specifically used for:

It is exported using spcial character control result, such as "+", the characters such as "-" are presented the sequence for the condition that meets one by one.

Compared with prior art, the present invention quickly exhaustive satisfaction arbitrary PAM motif can be specified in given DNA sequence dna All gRNA target sequences.All screening operations can be completed in single machine (desktop computer), do not need Internet remote computation or big The networking of type integrated computer calculates.The present invention can provide the accurate information of target site to gene editing designer, into one Step all plays an important role to the efficiency evaluation and function prediction of target site.

Detailed description of the invention

1. moving window schematic diagram of attached drawing.

Specific embodiment 1

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, to this hair Bright further description.It should be appreciated that this place allows described specific embodiment to be only used to explain the present invention, it is not used to Limit the present invention.

Widely used streptococcus pyogenes (Streptococcus pyogenes) Sp Cas9 identified in plant PAM motif is " NGG ", it is assumed that a gene order is X_seq: " CCCTCACGTCGAGAGAGCTCGAGTGCACAGG ".

Firstly, X_seq sequence is stored in the form of text file, while the path of this document is specified in systems.

Secondly, any character can theoretically be inputted by inputting NGG(on interactive interface, but base letter can be limited And the character in table).

In interpreting PAM module, PAM motif list is converted into [' ATGC ', ' G ', ' G '].

In global search module, in X_seq " CCCTCACGTCGAGAGAGCTCGAGTGCACAGG ", 30 bases (the base number before PAM motif can specify, and usual situation is 30 nt) series of windows provides P1, P2.

X_seq:CCCTCACGTCGAGAGAGCTCGAGTGCACAGG

P1:CCCTCACGTCGAGAGAGCTCGAGTGCACAG-

P1RC:CTGTGCACTCGAGCTCTCTCGACGTGAGGG

P2:-CCTCACGTCGAGAGAGCTCGAGTGCACAGG

P2RC:CCTGTGCACTCGAGCTCTCTCGACGTGAGG.

Detailed process is as follows:

Successively compare the list of PAM motif and P1 using the function of comparing in substring module, i.e., third in the list of PAM motif It the 30th progresss logical comparison and is returned in a logic list L in element and P1, is successively the 29 of second element and P2 Position charactor comparison simultaneously returns in above-mentioned logic list L, 29 charactor comparisons of first element and P2.The number that compares with Element number is consistent in PAM motif list, and all elements in logic list L are finally carried out logic and operation.According to finally patrolling It collects and decides whether to be recorded in final result list T with operation, if meeting condition (value T), the sequence of record window sequence Information, the location information for calculating G/C content and chain.Similarly, window sequence is obtained with DNA complementation module, the reversed reordering module of DNA The reverse complementary sequence of column, is equally judged.

Therefore in above-mentioned series of windows, the condition of satisfaction is tri- sequences of P2RC and P1RC on P2 and minus strand in normal chain Column.Using file output module by above- mentioned information text output.It is as follows:

PAM: AGTC G G

30nt sequence Start position End position GC% Strand

CCCTCACGTCGAGAGAGCTCGAGTGCACAG 1 30 63.0% -

CCTCACGTCGAGAGAGCTCGAGTGCACAGG 2 31 63.0% +

CCTCACGTCGAGAGAGCTCGAGTGCACAGG 2 31 63.0% -

Above-mentioned selected target sequence meets NGG requirement, i.e. underlined sequences, while having marked location information, G/C content and in chain On position.

Specific embodiment 2

With rice Phytoene dehydrogenase geneOsPDS(LOC_Os03g08570) cDNA is list entries, is selected empty Intestines campylobacter (Campylobacter jejuni) CjCas9 identification NRYAC PAM motif, it will obtain:

PAM: AGTC AG TC A C

30nt sequence Start position End position GC% Strand

GGTGCTTCGCAAGTAGCAGCATCCAAGCAC 83 112 56.0% +

GTGCTTCGCAAGTAGCAGCATCCAAGCACT 84 113 53.0% -

GTGCTCTACAGGTTGTTTGCCAGGACTTTC 194 223 50.0% -

GGACTTTCCAAGACCTCCACTAGAAAACAC 216 245 46.0% +

GTATGAAACTGGGCTTCATATCTTTTTTGG 459 488 36.0% -

ATATCTTTTTTGGAGCTTATCCCAACATAC 476 505 33.0% +

GTATTAATGATCGGTTGCAATGGAAGGAAC 527 556 40.0% -

ATTAATGATCGGTTGCAATGGAAGGAACAC 529 558 40.0% +

GGTTTGATTTTCCTGAAACATTGCCTGCAC 602 631 43.0% +

CTGCACCCTTAAATGGAATATGGGCCATAC 626 655 46.0% +

GTGTTCCTGATCGAGTGAACGATGAGGTTT 791 820 46.0% -

ATGAGGTTTTCATTGCAATGTCAAAGGCAC 812 841 40.0% +

GTGCATTCTGATTGCTTTAAACCGATTTCT 876 905 36.0% -

GTATTCAGAAAATAGAACTTAATCCTGATG 1022 1051 30.0% -

GAACTTAATCCTGATGGAACAGTGAAACAC 1036 1065 40.0% +

ATCCTGATGGAACAGTGAAACACTTTGCAC 1043 1072 43.0% +

TAACTGGAGATGCTTATGTTTTTGCAACAC 1091 1120 36.0% +

CACCAGTTGATATCTTGAAGCTTCTTGTAC 1118 1147 40.0% +

GTACCTCAAGAGTGGAAAGAAATATCTTAT 1144 1173 33.0% -

TATATGGTTTGATAGAAAACTGAAGAACAC 1221 1250 30.0% +

GTGTTTATGCGGACATGTCAGTAACTTGCA 1289 1318 43.0% -

GCGGACATGTCAGTAACTTGCAAGGAATAC 1297 1326 46.0% +

TGCAGAGGAATGGGTTGGACGGAGTGACAC 1368 1397 56.0% +

AGATTCTGAAGTATCATGTTGTGAAGACAC 1475 1504 36.0% +

GTATCATGTTGTGAAGACACCAAGATCTGT 1485 1514 40.0% -

TGAAGGGTTCTATCTAGCTGGTGACTACAC 1569 1598 46.0% +

GTGCAGTTCTATCTGGGAAGCTTTGTGCTC 1628 1657 50.0% -

GTGCTCAGTCTGTAGTGGAGGATTATAAAA 1652 1681 40.0% -。

Specific embodiment 3

With rice betaine-aldehyde dehydrogenaseOsBADH2 (LOC_Os08g32870) DNA is list entries, selects thermophilus Bacterium (Streptococcus thermophiles) CRISPR1 Cas9 identification NAGAAW PAM motif, it will obtain:

PAM: AGTC A G A A AT

30nt sequence start position end position GC% strand

ATTCTGCTTCTGTTTGGAATAAGTTGGAAG 823 852 36.0% -

TTTCTAGTGCCAAATGCATGCTAGATTTCT 1172 1201 36.0% -

TTTCTCACAGTTTTTCTCTTCAGGTTATAT 1197 1226 30.0% -

TTTCTCTTCAGGTTATATTTCTCGTATTTC 1209 1238 30.0% -

TTTCTCGTATTTCCTTTTCCTAAAGGATTG 1226 1255 33.0% -

TTTCTGGCATATATAGGTTATTATTATTAT 1269 1298 20.0% -

ATTCTCCAGAACAAGATTACCCATATTATG 1300 1329 33.0% -

TTTCTAGCAAAGCAGGGGATGCTAGCCTTC 1425 1454 50.0% -

TTTCTATCATAAAAATTTTCATGGCATATG 1512 1541 23.0% -

TTTCTTCAGATAATCGAGAGGAAATCTGAG 1570 1599 36.0% -

GCTTTGAGTACTTTGCAGATCTTGCAGAAT 1771 1800 40.0% +

TTTCTCTCATCCTGCGCTTATATTTATTTA 1916 1945 30.0% -

ATGTGTTAAGTTTGACCAAGTTTATAGAAA 2158 2187 26.0% +

TTTCTTCTTAATATAATGATACACAGCTCT 2271 2300 26.0% -

TTTCTCATATGTTGTCAGCATGATTCACTT 2398 2427 33.0% -

ATTCTAATTTGTTGTTTCTTTGTTATGTTC 2510 2539 23.0% -

TTTCTTTGTTATGTTCTTATCGACAATTAC 2524 2553 26.0% -

ATCGACAATTACAAATTTGATTCTGAGAAT 2542 2571 26.0% +

ATTCTGAGAATCATGTTCGGGATGTGTATT 2561 2590 36.0% -

TTTCTACTGCAGGAACTATCCTCTCCTGAT 2589 2618 43.0% -

TTTCTGTTAGGTTGCATTTACTGGGAGTTA 3023 3052 36.0% -

TTTCTGTGGATATTTTTTGTTCTCTTTCTA 3112 3141 26.0% -

TTTCTACTAACTCTCTATTATCAATTCTCA 3136 3165 26.0% -

ATTCTCAATGTTGTCCTTTTCTTTTAACTC 3159 3188 30.0% -

TTTTCTTTTAACTCCTTTACTTTTTAGAAT 3175 3204 20.0% +

TTTCTTTTAACTCCTTTACTTTTTAGAATT 3176 3205 20.0% -

ATTCTAGTAGCCAGTTCTATCCTGTTTCTT 3229 3258 36.0% -

TTTCTTACCTTTTTATGGTTCGTCTTTTCT 3253 3282 30.0% -

TTTCTTGACAGCCTGTTTCACTGGAACTTG 3278 3307 43.0% -

ATTCTGAAGTGCGGGACTTTGTAAAGCACT 3383 3412 43.0% -

CTTTTTGGTGTCTTGGGCTTGTTGCAGAAA 3445 3474 43.0% +

ACTGGTCCCAGACGAGCAGGATGCAAGAAA 3476 3505 53.0% +

TTTCTTAGAAGTTACACCTCAAGGATTAGC 3533 3562 36.0% -

TTTCTTAAAATGTGCTATTGATTAAAAAGA 3568 3597 20.0% -

TTATAATGCCATGCCAACTGAGTAAAGAAA 3664 3693 33.0% +

TTTCTTTTCGTGGCAAGGAAGGCAGTTAGG 3749 3778 46.0% -

ATTCTTAGTTCTGGAAAACTGTGTTCTTTA 3807 3836 30.0% -

ATTCTAGCTGATTATGAATTCTGTTTATAT 3886 3915 23.0% -

ATTCTGTTTATATTTCACTAATTTTGAATC 3903 3932 20.0% -

ATTCTTCATGTAAGCATTGAATATATCCGT 4116 4145 30.0% -

TTTCTGATCAACTCCTGAGTTCAGATTATT 4177 4206 33.0% -

TTGCTCCTGACCATGAAAGTTTTGCAGAAA 4453 4482 40.0% +

AAGTTTTGCAGAAAAAAATCGCTAAAGAAT 4469 4498 26.0% +

AGAAAAAAATCGCTAAAGAATTTCAAGAAA 4478 4507 23.0% +

TGTAAACTTTTTCTAAATTCAAAAAAGAAA 4602 4631 16.0% +

TTTCTAAATTCAAAAAAGAAATGCCACTGA 4611 4640 26.0% -

CTTTTGTATATATTTTCAAAGCACCAGAAT 4915 4944 26.0% +

ATTCTGACTGGTGGGGTTAGACCCAAGGTA 5045 5074 50.0% -

AGGTACCCACATATCATTATGAAGTAGAAA 5101 5130 33.0% +

CTTGTATGTTTTTGTCAGCATCTGGAGAAA 5135 5164 36.0% +

TTTCTATATTGAACCCACAATCATTACTGA 5167 5196 30.0% -

TTTTTGGTCCAGTGCTCTGTGTGAAAGAAT 5232 5261 40.0% +

ATTCTGCTACTACTACTTTTGATAGTTATG 5383 5412 30.0% -

CATGGTTGCATCAAGCTGATATTCAAGAAT 5512 5541 36.0% +

ATTCTATGCATCTCCAGTTCTTCCCTGGAC 5558 5587 46.0% -

ATATTTGACCCCTTTTTTTTGCAAAAGAAA 5678 5707 26.0% +。

Specific embodiment 4

In gene editing design, the site of editor is often selected on the exon of gene, but only does screening with cDNA sequence, Usually the sequence errors at splicing site are screened into.Editing sites are selected on exon, but be located at exon with The sequence of introne stitching portion.It at this moment, can be respectively using the DNA of target gene and cDNA as input file, then in the result Take shared part.

The foregoing is merely preferred embodiments of the invention, are not intended to limit the invention, all in spirit of the invention With any modification made within principle, equivalent replacement and improvement etc. be should all be included in the protection scope of the present invention.

Bibliography

[1] Voytas DF. Plant Genome Engineering with Sequence-specific Nucleases [J]. Annual Review of Plant Biology, 2013, 64(64): 327-350.

[2] Wang Y,Cheng X,Shan Q, et al. Simultaneous Editing of Three Homoeoalleles in Hexaploid Bread Wheat Confers Heritable Resistance to Powdery Mildew[J]. Nature Biotechnology, 2014, 32(9): 947-951.

[3] Hsu PD,Lander ES,Zhang F. Development and Applications of Crispr-cas9 for Genome Engineering[J]. Cell, 2014, 157(6): 1262-1278.

[4] Horvath P., Romero D.-A., Coute-monvoisin A.-C., wait Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus[J]. Journal of bacteriology, 2008, 190(4): 12-1401.

[5] Deveau H., Barrangou R., Garneau J.-E. wait Phage response to CRISPR- encoded resistance in Streptococcus thermophilus[J]. Journal of bacteriology, 2008, 190(4): 400-1390.

[6] Zhang Y., Heidrich N., Ampattu B.-J., wait Processing-independent CRISPR RNAs limit natural transformation in Neisseria meningitidis[J]. Molecular cell, 2013, 50(4): 488-503.

[7] Ran FA,Cong L,Yan WX, et al. In Vivo Genome Editing Using Staphylococcus Aureus Cas9.[J]. Nature, 2015, 520(7546): 186.

[8] Kim E,Koo T,Park SW, et al. In Vivo Genome Editing with a Small Cas9 Orthologue Derived From Campylobacter Jejuni[J]. Nature Communications, 2017, 8 (phase missings): 14500.

[9] Kleinstiver BP,Prew MS,Tsai SQ, et al. Engineered Crispr-cas9 Nucleases with Altered Pam Specificities[J]. Nature, 2015, 523(7561): 481.

[10] Hu JH,Miller SM,Geurts MH, et al. Evolved Cas9 Variants with Broad Pam Compatibility and High Dna Specificity [J] Nature, 2018,556 (7699): the page number Range lacks

[11] Yang Yizhou, Li Wei, Yi Tuyong wait plant virus interaction research and gene editing technology in breeding for disease resistance It is notified to using progress [J] biotechnology, 2018,34 (8): 8-16.

[12] Sun Wei, Liu Shanshan CRISPR_Cas9 gene editing system research progress [J] animal medicine progress, 2018, v.39;No.302(8): 97-100.

[13] Mei Wen, Sun Meitao, Wang Weisi, wait research of the CRISPR_Cas9 technology in genetic disease gene therapy into Open up [J] biotechnology communications, 2018,29 (4): 551-557.

[14] midsummer ice, Tan Yanning, Sun Zhizhong wait to orient reduction rice using CRISPR/Cas9 genome editing technique and fall Graininess [J] Scientia Agricultura Sinica, 2018,51 (14): 2631-2641.

[15] Xin Gaowei, recklessly prosperous Xun, Wang Kejian, region sequence is adjacent between waiting before Cas9 protein variant VQR efficient identification rice NGAC Nearly motif [J] heredity, publishes year missing, volume missing (phase missing): 1-10.

[16] Wang Huiyuan, Fan Yuelei, Chu Xin wait CRISPR gene editing Development situation to analyze [J] life science, Publish year missing, volume missing (phase missing): 1-11.

[17] Zhou Xin, Deng Li, Wang Qing wait to cultivate glutinous rice [J] Molecular Plant Breeding using gene editing technology, 2018, 16(17): 5608-5615.

[18] Xue Mande, Long Yan, Pei Xinwu gene editing technology and its in crop breeding application and safety management [J] Chinese agriculture science and technology Leader, 2018,20 (9): 12-22.

Sequence table

<110>Ningxia Academy of Agri-Forestry Sciences's agricultural biotechnologies research center (Ningxia agricultural biotechnologies emphasis room)

<120>a kind of gRNA of structure containing PAM targeting sequence screening method and application based on character microtomy

<141> 2018-11-07

<160> 89

<170> SIPOSequenceListing 1.0

<210> 1

<211> 31

<212> DNA

<213>artificial sequence (Oryza sativa)

<400> 1

ccctcacgtc gagagagctc gagtgcacag g 31

<210> 2

<211> 30

<212> DNA

<213>artificial sequence (Oryza sativa)

<400> 2

ccctcacgtc gagagagctc gagtgcacag 30

<210> 3

<211> 30

<212> DNA

<213>artificial sequence (Oryza sativa)

<400> 3

ctgtgcactc gagctctctc gacgtgaggg 30

<210> 4

<211> 30

<212> DNA

<213>artificial sequence (Oryza sativa)

<400> 4

cctcacgtcg agagagctcg agtgcacagg 30

<210> 5

<211> 30

<212> DNA

<213>artificial sequence (Oryza sativa)

<400> 5

cctgtgcact cgagctctct cgacgtgagg 30

<210> 6

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 6

ggtgcttcgc aagtagcagc atccaagcac 30

<210> 7

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 7

gtgcttcgca agtagcagca tccaagcact 30

<210> 8

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 8

gtgctctaca ggttgtttgc caggactttc 30

<210> 9

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 9

ggactttcca agacctccac tagaaaacac 30

<210> 10

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 10

gtatgaaact gggcttcata tcttttttgg 30

<210> 11

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 11

atatcttttt tggagcttat cccaacatac 30

<210> 12

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 12

gtattaatga tcggttgcaa tggaaggaac 30

<210> 13

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 13

attaatgatc ggttgcaatg gaaggaacac 30

<210> 14

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 14

ggtttgattt tcctgaaaca ttgcctgcac 30

<210> 15

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 15

ctgcaccctt aaatggaata tgggccatac 30

<210> 17

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 17

gtgttcctga tcgagtgaac gatgaggttt 30

<210> 17

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 17

atgaggtttt cattgcaatg tcaaaggcac 30

<210> 18

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 18

gtgcattctg attgctttaa accgatttct 30

<210> 19

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 19

gtattcagaa aatagaactt aatcctgatg 30

<210> 20

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 20

gaacttaatc ctgatggaac agtgaaacac 30

<210> 21

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 21

atcctgatgg aacagtgaaa cactttgcac 30

<210> 22

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 22

taactggaga tgcttatgtt tttgcaacac 30

<210> 23

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 23

caccagttga tatcttgaag cttcttgtac 30

<210> 24

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 24

gtacctcaag agtggaaaga aatatcttat 30

<210> 25

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 25

tatatggttt gatagaaaac tgaagaacac 30

<210> 26

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 26

gtgtttatgc ggacatgtca gtaacttgca 30

<210> 27

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 27

gcggacatgt cagtaacttg caaggaatac 30

<210> 28

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 28

tgcagaggaa tgggttggac ggagtgacac 30

<210> 29

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 29

agattctgaa gtatcatgtt gtgaagacac 30

<210> 30

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 30

gtatcatgtt gtgaagacac caagatctgt 30

<210> 31

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 31

tgaagggttc tatctagctg gtgactacac 30

<210> 32

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 32

gtgcagttct atctgggaag ctttgtgctc 30

<210> 33

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 33

gtgctcagtc tgtagtggag gattataaaa 30

<210> 34

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 34

attctgcttc tgtttggaat aagttggaag 30

<210> 35

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 35

tttctagtgc caaatgcatg ctagatttct 30

<210> 36

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 36

tttctcacag tttttctctt caggttatat 30

<210> 37

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 37

tttctcttca ggttatattt ctcgtatttc 30

<210> 38

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 38

tttctcgtat ttccttttcc taaaggattg 30

<210> 39

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 39

tttctggcat atataggtta ttattattat 30

<210> 40

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 40

attctccaga acaagattac ccatattatg 30

<210> 41

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 41

tttctagcaa agcaggggat gctagccttc 30

<210> 42

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 42

tttctatcat aaaaattttc atggcatatg 30

<210> 43

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 43

tttcttcaga taatcgagag gaaatctgag 30

<210> 44

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 44

gctttgagta ctttgcagat cttgcagaat 30

<210> 45

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 45

tttctctcat cctgcgctta tatttattta 30

<210> 46

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 46

atgtgttaag tttgaccaag tttatagaaa 30

<210> 47

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 47

tttcttctta atataatgat acacagctct 30

<210> 48

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 48

tttctcatat gttgtcagca tgattcactt 30

<210> 49

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 49

attctaattt gttgtttctt tgttatgttc 30

<210> 50

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 50

tttctttgtt atgttcttat cgacaattac 30

<210> 51

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 51

atcgacaatt acaaatttga ttctgagaat 30

<210> 52

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 52

attctgagaa tcatgttcgg gatgtgtatt 30

<210> 53

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 53

tttctactgc aggaactatc ctctcctgat 30

<210> 54

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 54

tttctgttag gttgcattta ctgggagtta 30

<210> 55

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 55

tttctgtgga tattttttgt tctctttcta 30

<210> 56

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 56

tttctactaa ctctctatta tcaattctca 30

<210> 57

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 57

attctcaatg ttgtcctttt cttttaactc 30

<210> 58

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 58

ttttctttta actcctttac tttttagaat 30

<210> 59

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 59

tttcttttaa ctcctttact ttttagaatt 30

<210> 60

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 60

attctagtag ccagttctat cctgtttctt 30

<210> 61

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 61

tttcttacct ttttatggtt cgtcttttct 30

<210> 62

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 62

tttcttgaca gcctgtttca ctggaacttg 30

<210> 63

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 63

attctgaagt gcgggacttt gtaaagcact 30

<210> 64

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 64

ctttttggtg tcttgggctt gttgcagaaa 30

<210> 65

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 65

actggtccca gacgagcagg atgcaagaaa 30

<210> 66

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 66

tttcttagaa gttacacctc aaggattagc 30

<210> 67

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 67

tttcttaaaa tgtgctattg attaaaaaga 30

<210> 68

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 68

ttataatgcc atgccaactg agtaaagaaa 30

<210> 69

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 69

tttcttttcg tggcaaggaa ggcagttagg 30

<210> 70

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 70

attcttagtt ctggaaaact gtgttcttta 30

<210> 71

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 71

attctagctg attatgaatt ctgtttatat 30

<210> 72

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 72

attctgttta tatttcacta attttgaatc 30

<210> 73

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 73

attcttcatg taagcattga atatatccgt 30

<210> 74

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 74

tttctgatca actcctgagt tcagattatt 30

<210> 75

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 75

ttgctcctga ccatgaaagt tttgcagaaa 30

<210> 76

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 76

aagttttgca gaaaaaaatc gctaaagaat 30

<210> 77

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 77

agaaaaaaat cgctaaagaa tttcaagaaa 30

<210> 78

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 78

tgtaaacttt ttctaaattc aaaaaagaaa 30

<210> 79

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 79

tttctaaatt caaaaaagaa atgccactga 30

<210> 80

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 80

cttttgtata tattttcaaa gcaccagaat 30

<210> 81

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 81

attctgactg gtggggttag acccaaggta 30

<210> 82

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 82

aggtacccac atatcattat gaagtagaaa 30

<210> 83

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 83

cttgtatgtt tttgtcagca tctggagaaa 30

<210> 84

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 84

tttctatatt gaacccacaa tcattactga 30

<210> 85

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 85

tttttggtcc agtgctctgt gtgaaagaat 30

<210> 86

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 86

attctgctac tactactttt gatagttatg 30

<210> 87

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 87

catggttgca tcaagctgat attcaagaat 30

<210> 88

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 88

attctatgca tctccagttc ttccctggac 30

<210> 89

<211> 30

<212> DNA

<213>rice (Oryza sativa)

<400> 89

atatttgacc cctttttttt gcaaaagaaa 30

Claims

1. a kind of Motif of Adjacent containing Protospacer (PAM) structure gRNA based on character microtomy targets sequence Column filter method and application, which comprises the following steps: (1) read in deoxynucleotide (DNA) sequence of target gene Column file data；(2) interactive interface input needs the PAM sequence of Analysis and Screening；(3) PAM module is interpreted by specified PAM motif It is converted into character lists；(4) compare substring module by PAM motif be converted into character lists one by one with moving window sequence to Positioning, which is set, to be compared, and decision logic relationship；(5) global search module is in given DNA sequence dna and its reverse complementary sequence The middle sequence for searching for the condition that meets simultaneously is stored into empty list；(6) file output module is by result with text or electronic watch trellis Formula is presented.

2. a kind of gRNA of structure containing PAM based on character microtomy according to claim 1 targets sequence screening method And application, which is characterized in that entire method include interpret PAM module, compare substring module, global search module and File output module.

3. interpretation PAM module according to claim 2, which is characterized in that secondly exclusion forbidden character first utilizes Hash Algorithm restores degeneracy sequence, and returns in a PAM character lists.

4. according to comparison substring module described in right 2, which is characterized in that taken out using loop control every in PAM list One character compared with then sequence carries out judgement with sliding window designated position, and returns to comparison result, wherein sliding window benefit It is taken out with character dicing method.

5. according to global search module described in right 2, which is characterized in that given DNA sequence dna and using DNA it is reversed, Complementary module obtains in reverse complementary sequence, and continuous sliding window takes out the sequence of designated length, using comparing substring mould Block is compared, while storing the location information of the sequence for the condition that meets progress sequence information, G/C content, chain to a list In.

6. according to global search module described in right 2, which is characterized in that exported, will be met using spcial character control result The sequence of condition is presented one by one.