CN111508558B

CN111508558B - Method and system for designing point mutation model based on CRISPR-Cas9 technology

Info

Publication number: CN111508558B
Application number: CN202010205993.1A
Authority: CN
Inventors: 高翠; 黎肖容
Original assignee: Guangzhou Saiye Baimu Biotechnology Co ltd
Current assignee: Guangzhou Saiye Baimu Biotechnology Co ltd
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-12-14
Anticipated expiration: 2040-03-23
Also published as: CN111508558A

Abstract

The invention discloses a method and a system for designing a point mutation model based on a CRISPR-Cas9 technology, wherein the method comprises the following steps: 1) determining the mutation site of the point mutation model animal; 2) calculating the coding range of a target exon where a mutation point is located or the nearest exon of the mutation site according to the mutation information; 3) gRNA design: designing gRNAs in the range of 50bp upstream and downstream of the mutation point, calculating and determining total scores of different gRNA sequences based on off-target scores of the gRNAs, scores of the gRNA sequences, scores of relative positions of cutting positions of the gRNAs and the mutation point and back-cut values of the gRNAs, and determining candidate gRNA sequences based on the total scores; 4) and outputting the design scheme. Some examples of the invention can automatically design an available point mutation model based on the existing information, thereby greatly improving the quality of the scheme, improving the success rate of the experiment and reducing the dependence on skilled experimenters; the design scheme of the point mutation model with better efficiency can be screened out, and the construction success rate or efficiency of the point mutation model can be improved.

Description

Method and system for designing point mutation model based on CRISPR-Cas9 technology

Technical Field

The invention relates to a design method and a system of a point mutation model, in particular to a method and a system for designing the point mutation model based on a CRISPR-Cas9 technology.

Background

Introduction of point mutations into gene sequences is one of the important means for studying gene structure and function and their relatedness, and is also an important way to study monogenic genetic diseases. The CRISPR/Cas9 technology can be used for conveniently and rapidly modifying gene sequences at fixed points to construct point mutation animal models. CRISPR/Cas9(Clustered regulated short palindromic repeats/Cas9) is a third generation gene editing technology that follows Zinc Finger Nucleases (ZFNs) and Transcription activator-like effector nucleases (TALENs) gene editing technology.

The CRISPR/Cas9 system is a convenient and fast gene modification method, a target site is searched and recognized through an N20 sequence in the sgRNA, and a plurality of sgRNAs can be designed if multi-site modification is needed. Recent studies have shown that cleavage can still occur when the sgRNA is mismatched by 5 bases, which results in a higher off-target effect of the CRISPR/Cas9 system. In order to reduce the off-target effect and the back-cut effect of CRISPR/Cas9, the currently considered solution is divided into two aspects, firstly, off-target analysis, screening grnas with few off-targets and more mismatches with other sites in the genome, secondly, introducing synonymous mutation to increase the number of mismatches, and increasing the base mismatch between the oligo sequence and the gRNA by amino acid codon synonymous mutation without affecting the final protein sequence to reduce or even avoid the back-cut effect of the gRNA sequence.

In the prior art, the scheme for designing the animal point mutation model is still carried out by a CRISPR-Cas9 technical mutation strategy expert or an experimenter with abundant experience to carry out purely manual analysis operation, and the design is carried out based on long-term accumulated experience. The efficiency of the method is very low, the success rate is often considered, the design scheme is often only feasible under most conditions, but the success rate is low when the gene editing scheme constructed according to the method is actually operated.

Disclosure of Invention

The invention aims to overcome at least one defect of the prior art and provides a method and a system for designing a point mutation model based on a CRISPR-Cas9 technology.

The technical scheme adopted by the invention is as follows:

a method for designing a point mutation model based on a CRISPR-Cas9 technology comprises the following steps:

1) determining the mutation site of the point mutation model animal;

2) calculating the coding range of a target exon where a mutation point is located or the nearest exon of the mutation site according to the mutation information;

3) gRNA design: designing gRNAs in the range of 50bp upstream and downstream of the mutation point, calculating and determining total scores of different gRNA sequences based on off-target scores of the gRNAs, scores of the gRNA sequences, scores of relative positions of cutting positions of the gRNAs and the mutation point and back-cut values of the gRNAs, and determining candidate gRNA sequences based on the total scores;

4) and outputting the design scheme.

In some examples, the off-target score is calculated by: there can be no number of mismatches between the off-target sequence and the gRNA less than or equal to 1 base, otherwise they are discarded directly. Off-target sequences are sequences of similar non-target regions, and in order to avoid off-target (mis-cut) of the gRNA, there cannot be a number of mismatches of less than or equal to 1 base between the off-target sequence and the gRNA. Off-target score can be set to 0 or 1, and the score is 0 when the number of mismatches between the off-target sequence and the gRNA is less than or equal to 1 base; in other cases, the off-target score is 1.

In some examples, the relative position score is calculated by:

grade 1: the gRNA is in the positive direction, the base at the right end of the gRNA is in the range from 7 to 10 on the left side of a mutation point, and the farther the gRNA is from the point mutation, the better the gRNA is;

grade 2: the gRNA is reverse, the base at the tail end of the left side of the gRNA is in the range from 7 to 10 on the left side of a mutation point, and the farther the gRNA is from the point mutation, the better the gRNA is;

grade 3: the gRNA is in the positive direction, the base at the right end of the gRNA is in the range of 11-30 right of a mutation point, and the closer the gRNA is to the point mutation, the better the gRNA is;

grade 4: the gRNA is reverse, the base at the tail end of the left side of the gRNA is in the range of 8-30 at the left side of a mutation point, and the closer the gRNA is to the point mutation, the better the gRNA is;

grade 5: the gRNA is reverse, the base at the tail end on the left side of the gRNA is in the range of 11-35 on the right side of a mutation point, and the closer the gRNA is to the point mutation, the better the gRNA is;

under the same condition, the gRNA with the front grade is selected, and the distance is better in the same grade.

In some examples, the gRNA back cut score is calculated by:

the method is determined according to the number and the position of mismatched bases of a target sequence and a gRNA sequence and comprises the following steps:

a. the mismatched base is positioned in the core region of the PAM sequence (namely the last two bases of the gRNA sequence, not GA and AG), and one base is 110 minutes;

b. the mismatched base is positioned at the 1 st to the 7 th positions of the gRNA, and one base is divided into 20 parts;

c. the mismatched base is positioned at 8 th to 13 th positions of the gRNA, and one base is divided into 50 parts;

d. the mismatched base is located at 14 th to 20 th positions of the gRNA, and one base is 100 minutes.

In some examples, the total score of the gRNA sequences is calculated by:

a. off-target screening is met;

b. the damage back-cut setting is met, and the accumulated value is higher than or equal to 100 points;

c. number of synonymous mutated amino acids;

i.0 amino acid synonymy mutations;

1-2 amino acid synonymous mutations;

iii.3-4 amino acid synonymy mutation.

Ranking the relative position scores of grna cleavage positions and mutation points.

In some examples, the mutation sites of the point mutation model animal are determined based on known homologous mutation sites of different species, the method comprising:

processing data of amino acid mutation, base insertion and base deletion through a protein sequence of a known species mutation site of BLAST and a homologous protein sequence of the model animal to obtain an mRNA sequence of the model animal;

and converting the mRNA sequence of the model animal to correspond to the gene sequence of the model animal to obtain the mutation site of the point mutation model animal.

In some examples, in the gRNA design, the probability of gRNA back-cutting is reduced by introducing a synonymous mutation in the gRNA.

In some examples, the model animal includes, but is not limited to, common animal models of mice, rabbits, pigs, dogs, primates, fruit flies, and the like.

In a second aspect of the present invention, there is provided:

a system for designing a point mutation model based on CRISPR-Cas9 technology, comprising:

mutation site determination module: mutation sites for identifying or designing point mutation model animals;

gRNA design module: the method is used for designing gRNAs in the range of 50bp upstream and downstream of a mutation point, calculating and determining total scores of different gRNA sequences based on off-target scores of the gRNAs, scores of the gRNA sequences, scores of relative positions of cutting positions of the gRNAs and the mutation point and back-cut values of the gRNAs, and determining candidate gRNA sequences based on the total scores; and

a scheme output module: and outputting the designed point mutation construction scheme.

In some examples, the off-target score is calculated by: there can be no number of mismatches between the off-target sequence and the gRNA less than or equal to 1 base, otherwise they are discarded directly.

In some examples, the relative position score is calculated by:

In some examples, the gRNA back cut score is calculated by:

In some examples, the total score of the gRNA sequences is calculated by:

a. off-target screening is met;

c. number of synonymous mutated amino acids;

i.0 amino acid synonymy mutations;

1-2 amino acid synonymous mutations;

iii.3-4 amino acid synonymy mutation.

If the mutation site is known, the corresponding mutation site can be used directly; if the corresponding mutation site does not exist in the model animal, the corresponding mutation site of the model animal is deduced through homologous sequences of other known species. One example of a mutation site switch includes:

other species gene transcripts: the corresponding relationship is found through BLAST other species protein sequences and model animal protein sequences, and then data of amino acid mutation, base insertion and base deletion are processed one by one, and mRNA coordinate data are converted into coordinate data corresponding to the model animal gene sequences. BLAST two-sequence alignment tool selection the BLAST alignment software of extensive NCBI was used. The conversion rules follow that corresponding base positions change corresponding bases, whereas non-corresponding amino acid positions change corresponding amino acids.

Model animal gene transcripts: the data of amino acid mutation, base insertion and base deletion are directly processed and converted from mRNA coordinate data into coordinate data corresponding to the gene sequence.

Endonuclease protocol: the design can be carried out by using the existing method or according to the following principle:

according to the common endonuclease data, an endonuclease scheme capable of distinguishing wild type data from mutant type data is obtained by synonymously mutating zero to a plurality of amino acids around a mutation point.

The endonuclease schemes are divided into two types, wherein the enzyme cutting position is introduced into the mutant type and the enzyme cutting position in the mutant type is damaged, so that the aim of cutting one of the wild type and the mutant type is fulfilled.

The restriction enzyme recognition sequence of the same endonuclease is not allowed to appear for many times within 500bp upstream and downstream of the mutation point, otherwise, the PCR band can not distinguish whether the experiment is successful or not.

Data import is divided into two blocks: the original mutation already forms a proper enzyme cutting scheme, and the enzyme cutting scheme is formed without additionally introducing synonymous mutation; and introducing synonymous mutation, thereby generating a new endonuclease recognition sequence or destroying the original endonuclease recognition sequence.

Designing a synonymous mutation to disrupt gRNA back-cutting:

in the mutation data, the problem of gRNA back-cutting is considered, namely, a sequence which is extremely similar to the gRNA still exists on a new mutation sequence, and the gRNA still cuts new mutant type data, so that in order to prevent the gRNA from cutting a new target sequence, the sequence of the corresponding position of the gRNA of the mutation sequence needs to be changed, and the introduction of a synonymous mutation and the addition of a mismatch are needed.

Designing a PCR primer: can be designed by using the existing method or based on the following principle.

A group of PCR primers is designed in the upstream and downstream 500bp of the mutation point, and when an enzyme digestion scheme is considered, the length difference of two fragments after endonuclease cleavage is more than 100bp as much as possible.

Designing a PCR primer pair: a set of PCR primers was designed using the open source tool primer3, and analysis of primer specificity was required.

Sequencing primer pairs: the length of the primer product is close to 600bp, the difference of the incision length of the endonuclease scheme is more than 100bp, and the preference is satisfied.

Mutation model reporting: an existing report form may be used, or the following information may be included:

once the result of the single mutation point model data is calculated, the result is stored, the number of the calculated version is recorded, and when the same mutation data of the same version is encountered, the page can directly extract the stored result in the database. Protocol with appropriate gRNA protocols, there must be a protocol model with appropriate strategy to prevent back-cutting. The model report shows a gene basic information and mutation information module, a wild type and mutant sequence comparison display module, an oligo sequence display module, a PCR primer display module, an endonuclease scheme display module (displayed if existing), a gRNA off-target information display module and a two-species protein sequence comparison information display module (displayed if existing).

1) Gene essential information and mutation information module: most of data and gene information corresponding to each mutation scheme are different, and the module dynamically generates a report module meeting the description specification according to the difference of the mutation data and the gene information.

2) Wild type and mutant alignment display modules: displaying target mutation and introduced synonymous mutation in wild type sequence and mutant sequence in fonts with different colors, distinguishing deletion, insertion and mutation types in display forms with different forms, and simultaneously displaying gRNA sequence information.

3) oligo sequence display Module: the single-stranded nucleotide sequence finally used for the model is displayed.

4) PCR primer display module: each protocol contains a set of PCR primers, including a positive and negative primer pair.

5) Endonuclease protocol display module: this module is only displayed when an endonuclease protocol is present. The display is the enzyme name and the size of the sequence fragment after cleavage of the pcr product of the wild type sequence and the mutant sequence.

6) gRNA off-target information display module: including gRNA sequences of the gRNA protocol as well as off-target information.

7) Two species protein sequence alignment information display module: when the transcript corresponding to the mutation information is not that of the model animal, the module will display the amino acid sequence alignment data of the transcripts between the subject species and the model animal. This module is also a presentation module that is extended when data is present.

The information in the report may be increased or decreased according to the actual situation.

The invention has the beneficial effects that:

some examples of the invention can automatically design an available point mutation model based on the existing information, thereby greatly improving the quality of the scheme, improving the success rate of the experiment and reducing the dependence on skilled experimenters.

According to some examples of the invention, a point mutation model design scheme with better efficiency can be screened out, and the construction success rate or efficiency of the point mutation model can be improved.

In some embodiments of the present invention, a user can obtain a protocol report containing information about gRNA protocol, PCR primer pair, enzyme digestion verification protocol, and the like, by simply selecting parameters and setting point mutations.

Detailed Description

The technical scheme of the invention is further explained by combining the examples.

In the following examples, if the gRNA sequence is labeled as inverted, meaning that it is located on the reverse complement of the gene sequence, the reverse gRNA sequence will be labeled and analyzed with its reverse complement during model design analysis.

Example 1

Species: mouse

A target gene: elf4

Target transcripts: NM _019680.2

Mutation target: the 230 th amino acid of mRNA is mutated from W to R, and the specific mutation sequence is TGG- > CGC

Finally designing a model:

wild-type sequence:

in the above sequences, the DNA bases corresponding to the amino acids to be mutated are marked in bold, and the DNA bases corresponding to the amino acids to be mutated in bold italics are marked in bold. The underlined sequences indicate the positions corresponding to two grnas,/indicating the positions at which the grnas cleave (the last three bases of the gRNA are the PAM sequence, and the positions between the first third and fourth bases of the PAM sequence are the positions at which the grnas cleave).

Mutant sequence:

the mutant sequence is the final DNA sequence of the model and is also the oligo sequence used in the experiment, and the given sequence combination is 60bp homologous sequence + containing all mutant base sequences +60bp homologous sequence. The superscript sequence of the wavy line is an endonuclease recognition sequence.

The gRNA designed by the protocol was the following two sequences:

gRNA-A1 (reverse): CTGCGTCCACTTAATGTACTTGG(SEQ ID NO.：3)

gRNA-B1 (forward): CCTGCCCCAAGTACATTAAGTGG(SEQ ID NO.：4)

The enzyme digestion scheme is as follows:

and (3) endonuclease: HhaI, recognition sequence GCGC.

Model interpretation: because the target mutation is an amino acid codon, only one gRNA is needed to act in a single experiment, and the candidate gRNA scheme is designed into two independent gRNAs in the experiment. Meanwhile, a synonymous mutation is introduced at the two amino acids before the mutation point so as to increase that the sequences corresponding to the gRNA cutting sites on the mutant sequences have enough mismatched bases with the target gRNA sequences, so that ATT codons are converted into ATC codons, and the corresponding amino acids are isoleucine. The recognition sequence GCGC of the endonuclease HhaI is introduced into the target mutation position, and finally the enzyme digestion scheme of experimental design is formed.

The design process comprises the following steps:

1. designing a suitable gRNA

The sequence numbered 2 has off-target sequence by comparison

Available gRNAs near the mutation position are analyzed, off-target positions of the gRNAs in the whole chromosome set are obtained, and the gRNAs of off-target sites with the same or only one base mismatch are knocked out.

gRNAs are set in a region (front and back 50bp) at the left and right of a point mutation site, and 12 alternative gRNAs exist in the item, wherein because 6 gRNAs are screened out by off-target data and the relative position distance (2 nd gRNA has extremely similar off-target sites, and other positions are not considered), the rest gRNAs can be pairwise combined into a group of gRNAs (except for 4 th and 5 th gRNAs which are two gRNAs at the same position and in different directions and cannot be used as two gRNA cutting schemes).

2. Design of enzyme digestion scheme

The target mutation is changed from TGG to CGC, a recognition site sequence GCGC of enzyme HhaI is directly introduced into a mutation position (no enzyme recognition site exists within the front and back 500 bp), a new synonymous mutation design enzyme digestion verification experiment scheme is not required to be introduced, and the bases of the positions of enzyme digestion recognition sites cannot be changed by introducing the synonymous mutation subsequently.

gRNA backscattering assay

All gRNAs need to meet the condition that the sequence scores corresponding to the wild type sequence and the mutant sequence before and after exceed 100. We take two grnas selected by the protocol model as an example to analyze, and need to introduce additional synonymous mutation to avoid gRNA back cutting, and the synonymous mutation introduced by the protocol is ATC (ATT) > ATC, and the amino acid corresponds to isoleucine.

In the table, SEQ ID No.: the number of the grooves is 16-28 from top to bottom.

The sequence CGC is in bold, the information of the mutation desired by the user, the additional synonymous mutation introduced to prevent back-cutting of the gRNA in bold and italics, the endonuclease recognition sequence is underlined. Can simultaneously meet the condition that all gRNA backseaval values reach 100 and the enzyme digestion scheme is a qualified model design scheme. It can be found that the introduction of one synonymous mutation into the project can meet all requirements of model design, so that more models introducing multiple synonymous mutations are not analyzed.

Other gRNA combinations were similarly analyzed, and the introduction of synonymous mutations required disruption of gRNA back-cuts and not disruption of existing cleavage sites or introduction of cleavage protocols. Finally, considering factors such as the number of synonymous mutations, the grade of the gRNA, an enzyme digestion scheme, a score for destroying the gRNA back-cutting and the longest distance of a mutation point, and the like, and finally confirming the scheme. This project finally selects the 3 rd and 4 th grnas as gRNA protocols for the model, with both grnas used alone, with one gRNA being an alternative to the other.

The final model oligo sequence was:

Donor oligo

4. design of a set of PCR primers

Primers are designed for target region amplification in a conventional PCR primer mode, and PCR products are used for sequencing to verify the accuracy of experimental results. The scheme has a proper enzyme cutting scheme, and the two fragments after the final enzyme cutting are required to have the length difference of 100bp as much as possible (two bands are obvious on an electrophoresis gel image after the experiment is successful).

PCR primer pair:

mElf4-F：GTGGTATAGATACTTCTTGGCTG(SEQ ID NO.：30)

mElf4-R：ATGTTAGAGCCATTTCCTAGAG(SEQ ID NO.：31)

length after hhal enzymatic cleavage:

wild type: 609bp

Mutant type: 369bp, 240bp

5. Experimental verification data

Project a model was designed using a single gRNA protocol (table 1), and the data presented in the table contain only a partial region of sequence change, and the actual experiment used data was an oligo sequence extended 60bp in length before and after the mutated base.

Grnas with very similar off-target sites (lane 2) and grnas outside the defined range (lanes 6 to 10) were not modeled and no experimental data. The remaining grnas designed model data for each gRNA (proto represents the proto sequence of the wild-type sequence, mod represents the model sequence designed for each gRNA, and the table shows only the regions containing the gRNA cleavage recognition region and base changes). From the positive rate (positive number/birth number) of the statistical data, the positive rate is basically consistent with the position grade of the scheme design, and the experimental connection of the grade 1 has the highest power, which reaches 24.24%.

Example 2

Species: mouse

A target gene: agpat5

Target transcripts: NM _026792.3

Mutation target: the 310 th amino acid of mRNA is mutated from G to E, and the specific mutation sequence is GGG- > GAG

Final design model

Wild-type sequence:

mutant sequence:

the wild-type sequence underlined sequences are the grnas selected for the two model designs, double underlined corresponding to the PAM sequences of the grnas. The mutant sequence is the final design model, including the mutant region and homologous sequences of 60bp on both sides (the homologous sequences are the same as the DNA sequence of the wild model animal, and are used for the homologous complementary pairing connection of the experiment). Bold italic base characters represent synonymous mutations, bold base characters represent target design mutations. The superscript sequence of the wavy line indicates the restriction recognition sequence of the restriction enzyme.

The gRNA designed by the protocol was the following two sequences:

gRNA-A1 (reverse): CTGGAATGAACACTTTTCCCAGG(SEQ ID NO.：34)

gRNA-B1 (forward): AAAGAAGAAACAAATTTCCTGGG(SEQ ID NO.：35)

The enzyme digestion scheme is as follows:

the endonuclease is AvaI, the recognition sequence is CYCGRG, Y and R represent a degenerate character, Y represents any base of CT, and R represents any base of AG.

Model interpretation:

because the mutation of interest is an amino acid codon, only one gRNA needs to be used in a single experiment, and the candidate gRNA schemes are designed to be two separate grnas (/ to be the cleavage sites of two grnas). The enzyme cutting scheme introduces a synonymous amino acid mutation at the left side of a target mutation amino acid, and finally forms an AvaI recognition sequence CCCGAG which accords with an enzyme cutting recognition pattern sequence CYCGRG. The target mutation and the synonymous mutation of the enzyme cutting scheme have met the screening design condition for avoiding the back cutting for two gRNAs, so that the synonymous mutation does not need to be introduced again.

The design process comprises the following steps:

1. designing a suitable gRNA

In the table, SEQ ID NO. of SEQ ID Nos. 1, 3 to 5: the sequence is 36-39.

Analyzing the available gRNAs near the mutation position and obtaining the off-target position of the gRNAs in the whole chromosome set, we need to knock out the gRNAs of off-target sites with the same or only one base mismatch.

gRNAs are set in a left section and a right section of a point mutation site (front and back 50bp), 6 alternative gRNAs are provided in the item, 2 gRNAs are screened out due to relative position distance, and the rest gRNAs form a group of gRNA design alternative schemes pairwise.

2. Design of enzyme digestion scheme

Carrying out synonymous mutation of one to more amino acids at 20bp before and after the target mutation, and analyzing whether a restriction enzyme cutting site meeting the requirement of generating a new restriction enzyme cutting recognition site or destroying the restriction enzyme cutting site on a wild type sequence exists or not, and meeting the requirement that no recognition site of the same enzyme exists within 500bp before and after the target mutation. The table shows the cleavage schemes near the mutation points,underliningThe marker character represents an altered base, three bases corresponding to one amino acid.

The number of synonymous mutation amino acids affects the selection of a later model, and the number of base changes of some amino acids with more than one synonymous mutation selection affects the scoring of gRNA backcut, so that one to multiple base change schemes can exist for the same amino acid mutation. Base changes at the same position have selected the optimal mutation scheme, in particular the base position is exactly the PAM sequence position of the gRNA sequence.

The synonymous mutations at amino acid positions 310 of the transcript, 309- & gt 310 of AvaI/BsoBI and 308- & gt 310 of NciI, which are closest to the target mutation, are most likely to be the selection strategy for the model.

gRNA backscattering assay

In addition to the target mutations, the arbitrary combinations of different cleavage protocols with different grnas were the data analyzed next (table 2). A group of grnas finally selected by the scheme is taken as an example to give an analysis data display, and the processing modes of other data are the same, which is not described herein again.

The sequences in table 2, bold GAGs are the user desired mutation information, and the upper line represents the recognition sequence of the corresponding endonuclease. The sequences of gRNAs avoiding back cutting are firstly analyzed according to the synonymous mutation introduced by an endonuclease scheme for scoring, part of schemes can directly meet the design requirement, and other synonymous mutations are introduced until the score is equal to or more than 100 for the model scheme which cannot meet the score requirement (the score of a single gRNA is less than 100). The table is observed to find that all requirements of model design can be met by introducing a synonymous mutation (CCT- > CCC, bold italics) into the item, and AvaI/BsoBI enzyme digestion recognition sites are introduced, and AvaI is selected.

Other gRNA combinations were similarly analyzed, and the introduction of synonymous mutations required disruption of gRNA back-cuts and not disruption of existing cleavage sites or introduction of cleavage protocols. Finally, considering factors such as the number of synonymous mutations, the grade of the gRNA, an enzyme digestion scheme, a score for destroying the gRNA back-cutting and the longest distance of a mutation point, and the like, and finally confirming the scheme. This project finally selected 2 nd and 6 th grnas as gRNA protocols for the model, two grnas used alone, one gRNA being a cleavage alternative.

The final model oligo sequence was:

Donor oligo

4. design of a set of PCR primers

PCR primer pair:

mAgpat5-F：GAGTTCAGGTTCATTTCTCAGT(SEQ ID NO.：41)

mAgpat5-R：AGCATAAACTCCACTTAGCTTC(SEQ ID NO：42)

length after hhal enzymatic cleavage:

wild type: 927bp

Mutant type: 499bp, 428bp

According to the preferred design conditions, the PCR primers do not reach a length difference of 100 bp.

5. Experimental verification data

Project a model was designed using a single gRNA protocol (Table 3), and the data presented in the table contain only a partial region of sequence alterations, and the actual experiment used data was an oligo sequence extended 60bp in length before and after the mutated base.

Four protocol models (primary table wild-type DNA sequence, model representing partial sequence of oligo sequence used in the experiment) were used for validation experiments, where gRNA2 and 6 correspond to the same protocol model sequence, introducing a synonymous mutation, and only the gRNA sequences used were different. In addition to the enzymatic cleavage protocol, the other two gRNA models also introduce a synonymous mutation, respectively, in order to reduce the possibility of a gRNA back cleavage.

Various factors:

gRNA position ordering: gRNA4> gRNA6> gRNA2> gRNA1

Ranked from number of synonymous mutations: gRNA6 ═ gRNA2> gRNA4 ═ gRNA1

And (3) comprehensive ranking: gRNA6> gRNA2> gRNA4> gRNA1

From the positive rates in table 3, the final positive rates differed because the positions of gRNA4 and gRNA6 were very similar, but the synonymous mutations introduced were not identical. The gRNA6 is consistent with the gRNA2 model, but the gRNA cutting positions are different, the positive rate is also higher than that of the gRNA6, and the gRNA position ordering is basically met. The positive rates of model 2 and model 4 did not differ much, but because of the research objectives of the experiment, it was also an important objective to introduce fewer synonymous mutations. The final experimental conclusion conforms to the scheme design gRNA screening and model design.

Example 3

Species: mouse

A target gene: g6pc3

Target transcripts: NM _175935.3

Mutation target: deletion of bases 766 to 768 of the CDS sequence of mRNA, specifically deletion of bases GGA

Other requirements: irrespective of the cleavage scheme

Final design model

Wild-type sequence:

mutant sequence:

GAACGGCCCGAGTGGGTGCACATGGACAGTCGGCCTTTTGCCTCACTGAGCCGTGACTCA---

TCTGCCCTGGGTCTGGGCATTGCCCTCCACACTCCCTGCTATGCCCAGATACGGCGGGCG(SEQ ID NO.：44)

the gRNA designed by the protocol was the following two sequences:

gRNA-A1 (forward): GCCTCACTGAGCCGTGACTCAGG(SEQ ID NO.：45)

gRNA-B1 (forward): CCGTGACTCAGGATCTGCCCTGG(SEQ ID NO.：46)

The enzyme digestion scheme is as follows: irrespective of the cleavage protocol.

Model interpretation:

the objective mutation is to delete one amino acid codon, so that only one gRNA action is needed in a single experiment, and the scheme for designing candidate gRNAs is two independent gRNAs (/ is the cleavage sites of two gRNAs). The purpose of the experiment does not require consideration of the enzyme digestion scheme, so the model does not design the enzyme digestion scheme. Base deletion shifts the sequence, the gRNA sequence has enough mismatched bases directly with the new sequence, and the final model does not need to introduce synonymous mutation.

Analyzing a design process:

1. designing a suitable gRNA

None of the gRNA sequences had off-target positions of 1 mm, and no off-target sequence. The nucleic acid sequences numbered 1 and 4 to 14 are 47 to 58 in sequence.

Available alternative sequences of gRNAs are searched in a left section and a right section of region (front and back 50bp) of the point mutation site, the item has 14 alternative gRNAs, wherein 1 gRNA is screened out due to the relative position distance, and the rest gRNAs form a group of gRNA design schemes pairwise. Lanes 2 and 8 are the reverse complements of the same position and therefore cannot be combined together.

2. Design of enzyme digestion scheme

The project design requires no enzyme digestion scheme, so the project does not consider enzyme digestion scheme analysis.

gRNA backscattering assay

Enzyme cleavage analysis need not be considered, and this item is a base deletion item, so we temporarily consider only gRNA analysis with deletion positions within the gRNA sequence, since grnas not within the deletion region necessarily need to introduce additional synonymous mutations to satisfy the problem of avoiding gRNA back-cleavage.

In the table, the 14 nucleic acid sequences numbered 1 to 7 are numbered 59 to 72 from top to bottom, wherein the odd-numbered sequence is the original sequence and the even-numbered sequence is the model sequence.

The information in the table is that the scoring requirement score of gRNA for the problem of back-cutting is only 100 or more in the case of gRNA sequence and in the case of no introduction of synonymous mutation (the number of introduced synonymous mutation is 0). The 7 grnas were equal to or greater than 100 points without introducing synonymous mutations, so all grnas in the table could be used directly. The final preferred grnas are 3 rd and 11 th, ordered by our position rank and distance.

The final model oligo sequence was:

Donor oligo

TGATTGTTCAGCGGGTGACACATCTTTCTTTTCCTTTCTCCTCCAGAAACCCATGTATGAA(---)CAATGGAAGGTTGTCGAGGAGATAAATGGAAACAATTATGTTTACATAGACCCGACGCAA(SEQ ID NO.：73)

4. design of a set of PCR primers

Primers are designed for target region amplification in a conventional PCR primer mode, and PCR products are used for sequencing to verify the accuracy of experimental results. The scheme has no enzyme digestion scheme, and the requirement that the length of the final PCR product is about 600bp can be met.

PCR primer pair:

mKit-F:AGAGGGAGAGATGATGTATTTG(SEQ ID NO.：74)

mKit-R:GACTTAATCAAGCCATATGCAG(SEQ ID NO.：75)

5. experimental verification data

Deleting the item: project a model was designed using a single gRNA protocol (Table 3), and the data presented in the table contain only a partial region of sequence alterations, and the actual experiment used data was an oligo sequence extended 60bp in length before and after the mutated base. This item is a deletion item and does not require consideration of the cleavage scheme.

The project completes the experiments of 9 gRNA models in total, and the model with the synonymy mutation introduced into the position far away does not perform the experiments. Without comparing the terms of introducing a synonymous mutation with the terms of introducing a synonymous mutation, for experimental purposes, the protocol that does not require the introduction of a synonymous mutation is consistently superior to the protocol that introduces a synonymous mutation.

From comparison of the 5 th and 7 th data with other data, the positions of the grnas were in accordance with our statistical rules. Model comparisons without introduction of synonymous mutations, ordered by position as follows:

position sorting: gRNA3> gRNA2> gRNA11> gRNA11> gRNA10> gRNA8> gRNA6

The statistical results of the experiment are also consistent with this ranking.

SEQUENCE LISTING

<110> Guangzhou Seiki Baimu Biotech Co., Ltd

<120> method and system for designing point mutation model based on CRISPR-Cas9 technology

<130>

<160> 75

<170> PatentIn version 3.5

<210> 1

<211> 127

<212> DNA

<213> Artificial sequence

<400> 1

ttatctgtgg gagttcctcc tggctcttct gcaagacaga aacacctgcc ccaagtacat 60

taagtggacg cagagagaga agggcatctt caagttggtg gactccaagg ctgtgtccaa 120

gctgtgg 127

<210> 2

<211> 127

<212> DNA

<213> Artificial sequence

<400> 2

ttatctgtgg gagttcctcc tggctcttct gcaagacaga aacacctgcc ccaagtacat 60

caagcgcacg cagagagaga agggcatctt caagttggtg gactccaagg ctgtgtccaa 120

gctgtgg 127

<210> 3

<211> 23

<212> DNA

<213> Artificial sequence

<400> 3

ccaagtacat taagtggacg cag 23

<210> 4

<211> 23

<212> DNA

<213> Artificial sequence

<400> 4

cctgccccaa gtacattaag tgg 23

<210> 5

<211> 23

<212> DNA

<213> Artificial sequence

<400> 5

ccccaagtac attaagtgga cgc 23

<210> 6

<211> 23

<212> DNA

<213> Artificial sequence

<400> 6

cccaagtaca ttaagtggac gca 23

<210> 7

<211> 23

<212> DNA

<213> Artificial sequence

<400> 7

cctgccccaa gtacattaag tgg 23

<210> 8

<211> 23

<212> DNA

<213> Artificial sequence

<400> 8

cttcaagttg gtggactcca agg 23

<210> 9

<211> 23

<212> DNA

<213> Artificial sequence

<400> 9

cctgccccaa gtacattaag tgg 23

<210> 10

<211> 23

<212> DNA

<213> Artificial sequence

<400> 10

cttcaagttg gtggactcca agg 23

<210> 11

<211> 23

<212> DNA

<213> Artificial sequence

<400> 11

agagaagggc atcttcaagt tgg 23

<210> 12

<211> 23

<212> DNA

<213> Artificial sequence

<400> 12

gaagggcatc ttcaagttgg tgg 23

<210> 13

<211> 23

<212> DNA

<213> Artificial sequence

<400> 13

cctcctggct cttctgcaag aca 23

<210> 14

<211> 23

<212> DNA

<213> Artificial sequence

<400> 14

cctggctctt ctgcaagaca gaa 23

<210> 15

<211> 23

<212> DNA

<213> Artificial sequence

<400> 15

taagtggacg cagagagaga agg 23

<210> 16

<211> 23

<212> DNA

<213> Artificial sequence

<400> 16

aagtggacgc agagagagaa ggg 23

<210> 17

<211> 23

<212> DNA

<213> Artificial sequence

<400> 17

tgcgtccact tgatgtactt agg 23

<210> 18

<211> 29

<212> DNA

<213> Artificial sequence

<400> 18

cctgccccaa gtacattaag tggacgcag 29

<210> 19

<211> 29

<212> DNA

<213> Artificial sequence

<400> 19

cctgccccaa gtacattaaa cgcacgcag 29

<210> 20

<211> 29

<212> DNA

<213> Artificial sequence

<400> 20

cctgccccaa gtacattaag cgcacgcag 29

<210> 21

<211> 29

<212> DNA

<213> Artificial sequence

<400> 21

cctgccccaa gtacatcaag cgcacgcag 29

<210> 22

<211> 29

<212> DNA

<213> Artificial sequence

<400> 22

cctgccccaa gtatattaag cgcacgcag 29

<210> 23

<211> 29

<212> DNA

<213> Artificial sequence

<400> 23

cctgccccaa atacattaag cgcacgcag 29

<210> 24

<211> 29

<212> DNA

<213> Artificial sequence

<400> 24

cctgcccgaa gtacattaag cgcacgcag 29

<210> 25

<211> 29

<212> DNA

<213> Artificial sequence

<400> 25

cctgtcccaa gtacattaag cgcacgcag 29

<210> 26

<211> 29

<212> DNA

<213> Artificial sequence

<400> 26

cctgccccaa gtacattaag cgcactcag 29

<210> 27

<211> 29

<212> DNA

<213> Artificial sequence

<400> 27

cctgccccaa gtacattaag cgcacgcaa 29

<210> 28

<211> 29

<212> DNA

<213> Artificial sequence

<400> 28

cctgccccaa gtacattaag cgcactcaa 29

<210> 29

<211> 127

<212> DNA

<213> Artificial sequence

<400> 29

ttatctgtgg gagttcctcc tggctcttct gcaagacaga aacacctgcc ccaagtacat 60

caagcgcacg cagagagaga agggcatctt caagttggtg gactccaagg ctgtgtccaa 120

gctgtgg 127

<210> 30

<211> 23

<212> DNA

<213> Artificial sequence

<400> 30

gtggtataga tacttcttgg ctg 23

<210> 31

<211> 22

<212> DNA

<213> Artificial sequence

<400> 31

atgttagagc catttcctag ag 22

<210> 32

<211> 122

<212> DNA

<213> Artificial sequence

<400> 32

cataggttgc tcatagagtt ctatgattca ccagatccag aaagaagaaa caaatttcct 60

gggaaaagtg ttcattccag actaagtgtg aagaagactt taccttcagt gttgatcttg 120

gg 122

<210> 33

<211> 122

<212> DNA

<213> Artificial sequence

<400> 33

cataggttgc tcatagagtt ctatgattca ccagatccag aaagaagaaa caaatttccc 60

gagaaaagtg ttcattccag actaagtgtg aagaagactt taccttcagt gttgatcttg 120

gg 122

<210> 34

<211> 23

<212> DNA

<213> Artificial sequence

<400> 34

cctgggaaaa gtgttcattc cag 23

<210> 35

<211> 23

<212> DNA

<213> Artificial sequence

<400> 35

aaagaagaaa caaatttcct ggg 23

<210> 36

<211> 23

<212> DNA

<213> Artificial sequence

<400> 36

ccagactaag tgtgaagaag act 23

<210> 37

<211> 23

<212> DNA

<213> Artificial sequence

<400> 37

ccagaaagaa gaaacaaatt tcc 23

<210> 38

<211> 23

<212> DNA

<213> Artificial sequence

<400> 38

gaaagaagaa acaaatttcc tgg 23

<210> 39

<211> 23

<212> DNA

<213> Artificial sequence

<400> 39

ccagatccag aaagaagaaa caa 23

<210> 40

<211> 127

<212> DNA

<213> Artificial sequence

<400> 40

tggccatagg ttgctcatag agttctatga ttcaccagat ccagaaagaa gaaacaaatt 60

tcccgagaaa agtgttcatt ccagactaag tgtgaagaag actttacctt cagtgttgat 120

cttgggg 127

<210> 41

<211> 22

<212> DNA

<213> Artificial sequence

<400> 41

gagttcaggt tcatttctca gt 22

<210> 42

<211> 22

<212> DNA

<213> Artificial sequence

<400> 42

agcataaact ccacttagct tc 22

<210> 43

<211> 123

<212> DNA

<213> Artificial sequence

<400> 43

gaacggcccg agtgggtgca catggacagt cggccttttg cctcactgag ccgtgactca 60

ggatctgccc tgggtctggg cattgccctc cacactccct gctatgccca gatacggcgg 120

gcg 123

<210> 44

<211> 120

<212> DNA

<213> Artificial sequence

<400> 44

gaacggcccg agtgggtgca catggacagt cggccttttg cctcactgag ccgtgactca 60

tctgccctgg gtctgggcat tgccctccac actccctgct atgcccagat acggcgggcg 120

<210> 45

<211> 23

<212> DNA

<213> Artificial sequence

<400> 45

gcctcactga gccgtgactc agg 23

<210> 46

<211> 23

<212> DNA

<213> Artificial sequence

<400> 46

ccgtgactca ggatctgccc tgg 23

<210> 47

<211> 23

<212> DNA

<213> Artificial sequence

<400> 47

ccttttgcct cactgagccg tga 23

<210> 48

<211> 23

<212> DNA

<213> Artificial sequence

<400> 48

agtgggtgca catggacagt cgg 23

<210> 49

<211> 23

<212> DNA

<213> Artificial sequence

<400> 49

ccctgggtct gggcattgcc ctc 23

<210> 50

<211> 23

<212> DNA

<213> Artificial sequence

<400> 50

cctcactgag ccgtgactca gga 23

<210> 51

<211> 23

<212> DNA

<213> Artificial sequence

<400> 51

cctgggtctg ggcattgccc tcc 23

<210> 52

<211> 23

<212> DNA

<213> Artificial sequence

<400> 52

ccgtgactca ggatctgccc tgg 23

<210> 53

<211> 23

<212> DNA

<213> Artificial sequence

<400> 53

tcaggatctg ccctgggtct ggg 23

<210> 54

<211> 23

<212> DNA

<213> Artificial sequence

<400> 54

ctcaggatct gccctgggtc tgg 23

<210> 55

<211> 23

<212> DNA

<213> Artificial sequence

<400> 55

cgtgactcag gatctgccct ggg 23

<210> 56

<211> 23

<212> DNA

<213> Artificial sequence

<400> 56

ccacactccc tgctatgccc aga 23

<210> 57

<211> 23

<212> DNA

<213> Artificial sequence

<400> 57

cctccacact ccctgctatg ccc 23

<210> 58

<211> 23

<212> DNA

<213> Artificial sequence

<400> 58

ccctccacac tccctgctat gcc 23

<210> 59

<211> 23

<212> DNA

<213> Artificial sequence

<400> 59

ccgtgactca ggatctgccc tgg 23

<210> 60

<211> 23

<212> DNA

<213> Artificial sequence

<400> 60

gagccgtgac tcatctgccc tgg 23

<210> 61

<211> 23

<212> DNA

<213> Artificial sequence

<400> 61

gcctcactga gccgtgactc agg 23

<210> 62

<211> 23

<212> DNA

<213> Artificial sequence

<400> 62

gcctcactga gccgtgactc atc 23

<210> 63

<211> 23

<212> DNA

<213> Artificial sequence

<400> 63

cctcactgag ccgtgactca gga 23

<210> 64

<211> 23

<212> DNA

<213> Artificial sequence

<400> 64

cctcactgag ccgtgactca tct 23

<210> 65

<211> 23

<212> DNA

<213> Artificial sequence

<400> 65

ccgtgactca ggatctgccc tgg 23

<210> 66

<211> 23

<212> DNA

<213> Artificial sequence

<400> 66

gagccgtgac tcatctgccc tgg 23

<210> 67

<211> 23

<212> DNA

<213> Artificial sequence

<400> 67

tcaggatctg ccctgggtct ggg 23

<210> 68

<211> 23

<212> DNA

<213> Artificial sequence

<400> 68

gactcatctg ccctgggtct ggg 23

<210> 69

<211> 23

<212> DNA

<213> Artificial sequence

<400> 69

ctcaggatct gccctgggtc tgg 23

<210> 70

<211> 23

<212> DNA

<213> Artificial sequence

<400> 70

tgactcatct gccctgggtc tgg 23

<210> 71

<211> 23

<212> DNA

<213> Artificial sequence

<400> 71

cgtgactcag gatctgccct ggg 23

<210> 72

<211> 23

<212> DNA

<213> Artificial sequence

<400> 72

agccgtgact catctgccct ggg 23

<210> 73

<211> 121

<212> DNA

<213> Artificial sequence

<400> 73

tgattgttca gcgggtgaca catctttctt ttcctttctc ctccagaaac ccatgtatga 60

acaatggaag gttgtcgagg agataaatgg aaacaattat gtttacatag acccgacgca 120

a 121

<210> 74

<211> 22

<212> DNA

<213> Artificial sequence

<400> 74

agagggagag atgatgtatt tg 22

<210> 75

<211> 22

<212> DNA

<213> Artificial sequence

<400> 75

gacttaatca agccatatgc ag 22

Claims

1. A method for designing a point mutation model based on a CRISPR-Cas9 technology comprises the following steps:

determining the mutation site of the point mutation model animal;

calculating the coding range of a target exon where a mutation point is located or the nearest exon of the mutation site according to the mutation information;

gRNA design: designing gRNAs in the range of 50bp upstream and downstream of the mutation point, calculating and determining the total score of different gRNA sequence combinations based on the off-target score of the gRNAs, the relative position score of the gRNA cutting position and the mutation point and the gRNA backstepping score, and determining candidate gRNA sequence combinations based on the total score;

outputting a design scheme;

the off-target score is calculated by the following method: off-target score is set to 0 or 1, and when the number of mismatches between the off-target sequence and the gRNA is less than or equal to 1 base, the score is 0; in other cases, the off-target score is 1;

the calculation method of the relative position score comprises the following steps:

under the same condition, selecting gRNAs close to the front level, wherein the gRNAs of the same level have better selection distance;

the calculation method of the gRNA back-cutting score comprises the following steps:

a. the mismatched base is positioned in the last two bases of the gRNA sequence, one base is divided into 110 parts, but not AG or GA, otherwise, no part is formed;

d. the mismatched base is positioned at 14 th to 20 th positions of the gRNA, and 100 minutes is one base;

the calculation method of the total score of the gRNA sequence combination comprises the following steps:

a. off-target screening is met, and the off-target score = 1;

b. the setting of the destructive backstepping is met, and the cumulative value of the gRNA backstepping is higher than or equal to 100 points;

c. number of synonymous mutated amino acids;

i.0 amino acid synonymy mutations;

1-2 amino acid synonymous mutations;

3-4 amino acid synonymous mutations;

2. The method of claim 1, wherein: the mutation sites of the point mutation model animals are determined based on the known homologous mutation sites of different species, and the method comprises the following steps:

processing data of amino acid mutation, base insertion and base deletion through a protein sequence of a known species mutation site of BLAST and a protein sequence of a homologous gene of the model animal to obtain an mRNA sequence of the model animal;

3. The method of claim 1, wherein: in the gRNA design, the mismatched bases of a gRNA sequence and a target sequence are increased by introducing synonymous mutation into the gRNA, so that the gRNA back-cutting probability is reduced.

4. The method of claim 1, wherein: the model animal is mouse, rabbit, pig, dog, primate, fruit fly.

5. A system for designing a point mutation model based on CRISPR-Cas9 technology, comprising:

gRNA design module: the method is used for designing the gRNA within 50bp range upstream and downstream of the mutation point, calculating and determining total scores of different gRNA sequence combinations based on off-target scores of the gRNA, scores of relative positions of a gRNA cutting position and the mutation point and back cutting values of the gRNA, and determining candidate gRNA sequence combinations based on the total scores; and

a scheme output module: outputting the designed point mutation construction scheme;

under the same condition, selecting gRNAs with the front grades, and selecting gRNAs with excellent distance at the same grade;

a. the mismatched base is positioned in the last two bases of the gRNA sequence, one base is divided into 110 parts, but cannot become AG or GA, otherwise, no part is formed;

a. off-target screening is met, and the off-target score = 1;

c. number of synonymous mutated amino acids;

i.0 amino acid synonymy mutations;

1-2 amino acid synonymous mutations;

3-4 amino acid synonymous mutations;