CN115148281A - Automatic design method and system for gene editing point mutation scheme - Google Patents

Automatic design method and system for gene editing point mutation scheme Download PDF

Info

Publication number
CN115148281A
CN115148281A CN202210753169.9A CN202210753169A CN115148281A CN 115148281 A CN115148281 A CN 115148281A CN 202210753169 A CN202210753169 A CN 202210753169A CN 115148281 A CN115148281 A CN 115148281A
Authority
CN
China
Prior art keywords
mutation
sequence
grna
gene
scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210753169.9A
Other languages
Chinese (zh)
Other versions
CN115148281B (en
Inventor
万翠荣
林剑锋
郑虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yuanjing Biotechnology Co ltd
Original Assignee
Guangzhou Yuanjing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yuanjing Biotechnology Co ltd filed Critical Guangzhou Yuanjing Biotechnology Co ltd
Priority to CN202210753169.9A priority Critical patent/CN115148281B/en
Publication of CN115148281A publication Critical patent/CN115148281A/en
Application granted granted Critical
Publication of CN115148281B publication Critical patent/CN115148281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of gene editing, in particular to a method and a system for automatically designing a gene editing point mutation scheme, wherein the method comprises the following steps: determining mutation types and mutation sites according to a positioning method; and B, step B: and C, judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, if so, executing the step C: designing a Base Editing scheme according to the mutation type, and screening a gRNA sequence; if not, executing the step D: designing an RNP scheme based on CRISPR and Cas9 according to mutation types, and carrying out gRNA sequence screening, oligo sequence selection and mutation site introduction; step E: performing sequence complexity analysis and GC content calculation on the gene sequences within each range of 500bp before and after the mutation site, and storing the analysis result; step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result. The method can automatically design the gene editing scheme, realize the gene editing of point mutation on the gene and improve the efficiency and the accuracy.

Description

Automatic design method and system for gene editing point mutation scheme
Technical Field
The invention relates to the technical field of gene editing, in particular to a method and a system for automatically designing a gene editing point mutation scheme.
Background
In the prior art, the CRISPR/Cas9 guides a Cas9 endonuclease to cut the PAM upstream of a target sequence by designing a specific guide RNA recognition target sequence so as to break double strand of DNA of the target site; after the DNA double strand breaks, the cleavage site is repaired by utilizing the non-homologous end joining (NHEJ) or homologous recombination (HDR) mode of the cell, and gene knockout, knock-in or point mutation at the DNA level is realized. CRISPR/Cas9 is one of the most popular techniques in the field of life science due to its advantages of simple operation, low cost, and high editing efficiency, and has been widely used in gene editing-related functional studies of many model organisms, such as mammals (rats, mice, pigs, rabbits, monkeys, etc.), zebrafish, stem cells, tumor cell lines, and bacterial fungi.
The design of a point mutation gene editing scheme based on CRISPR/Cas9 seems to be easy, but for a gene editing beginner, the time is required for several days, and the effect is not necessarily ideal; for researchers with experience in gene editing, it may also be difficult to obtain an optimal solution because of incomplete information sources and limited human knowledge. Therefore, the manual design scheme is complicated and time-consuming, and now, the development of a computer intelligent method for automatically designing a gene editing scheme becomes an urgent problem to be solved.
Disclosure of Invention
In view of the above-mentioned drawbacks, the present invention aims to provide a method and a system for automatically designing a gene editing point mutation scheme, which can comprehensively analyze sequences and grnas near a target mutation site, automatically design a gene editing scheme, realize the gene editing of point mutations on genes, and improve the efficiency and accuracy of a selection scheme.
In order to achieve the purpose, the invention adopts the following technical scheme:
an automatic design method of a gene editing point mutation scheme comprises the following steps:
step A: determining mutation types and mutation sites according to a positioning method, wherein the positioning method comprises an Amino Acid method, an SNP method, a Chromosomal Location method and a Nucleotide Sequence method;
and B: judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, if so, executing the step C, and if not, executing the step D;
and C: designing a Base Editing scheme according to mutation types, and screening gRNA sequences, wherein the gRNA sequences need to contain mutation sites;
step D: designing an RNP scheme based on CRISPR and Cas9 according to mutation types, and carrying out gRNA sequence screening, oligo sequence selection and mutation site introduction, wherein the gRNA sequence needs to contain a mutation site;
and E, step E: performing sequence complexity analysis and GC content calculation on the gene sequences within 500bp ranges before and after the mutation site, and storing the analysis result;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
Preferably, in step a, the Amino Acid method comprises the steps of: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; selecting one of the transcripts and one of the mutation types, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting the number of the SNP locus, inquiring an NCBI database to obtain the data of the gene of the locus, the position of the locus on a chromosome and the allelic base of the SNP locus, and calculating the position of the SNP locus on the gene according to the positions of the gene and the SNP locus on the chromosome; selecting one of the allelic bases, and automatically calculating the mutation type according to the base and the allelic base of the original sequence;
the Chromosomal Location method comprises the following steps: selecting species and genome version, selecting chromosome number, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one of the mutation types, inputting parameters of a starting position and an ending position, querying an Ensembl database to obtain a base corresponding to the position segment and a gene comprising the position segment, and calculating the position of the mutation position segment on the gene;
the Nucleotide Sequence method comprises the following steps: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; one of the transcripts is selected, parameters of 'wild type sequence' and 'mutated sequence' are input, and the mutation type and mutation site are automatically calculated according to the difference between the two sequences and the wild type sequence and the position on the gene.
Preferably, in the Amino Acid method, one of the mutation types is selected, and the final mutation site is obtained by inputting the relevant parameters, and the steps are respectively corresponding to the following steps:
when the selection mutation type is "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and calculating the accurate position of the mutation site on the gene according to the amino acid sequence number in a reverse-deduction manner;
when the mutation type is selected as "deletion": inputting parameters of a starting position and an ending position, acquiring a sequence to be deleted from the coding sequence according to the two positions, and calculating the accurate position of a deleted sequence segment on the gene according to the distribution position of a coding region on the gene;
when the mutation type is selected as "insertion": inputting parameters of 'initial position' and 'inserted sequence', and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "deletion + insertion": inputting parameters of a starting position, an ending position and an inserted sequence, acquiring a sequence segment to be deleted from the coding sequence according to the starting position and the ending position, and calculating the accurate position of the deleted sequence segment on the gene.
Preferably, the step C: judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, wherein the judgment conditions comprise: a single base is mutated and the base is changed from C to T or G to a.
Preferably, the step D: designing an RNP scheme based on CRISPR and Cas9 according to mutation types, and carrying out gRNA screening, oligo sequence selection and mutation site introduction, wherein the method specifically comprises the following steps:
step D1: screening a gRNA sequence according to the mutation type, and grading the gRNA obtained by screening;
step D2: selecting gRNA1 and gRNA2 with highest specificity fractions in gRNA sequences;
and D3: designing Oligo sequences for gRNA1 and gRNA2 respectively;
and D4: a mutation site is introduced into the Oligo sequence.
Preferably, the step D1: screening a gRNA sequence according to the mutation type, and grading the gRNA obtained by screening; the mutation types comprise four types of mutation, insertion, deletion and insertion, and have corresponding operation steps aiming at different mutation types;
when the mutation type is "mutation", the following steps are specifically included:
step a1: screening all gRNA sequences in 50bp ranges of upstream and downstream of the mutation site, and acquiring information such as specificity scores, off-target conditions, cutting scores and the like of the gRNA sequences through CRISPER online software;
step a2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions of specificity fractions, cutting fractions, off-target conditions, GC content and the like of the gRNAs;
step a3: ranking the grnas according to the distance of the mutation site and the gRNA cleavage site:
a: n is less than or equal to 5,B:5<N ≤ 10, C:10 are woven with N less than or equal to 23, D: n >23 (a is highest level);
step a4: selecting a gRNA with the minimum relative position N as a gRNA1 from high to low according to grades, selecting a gRNA with the highest specificity score as a gRNA2 if two or more gRNA sequences exist in the same grade, and otherwise selecting a gRNA with the minimum relative position N as a gRNA2 from the next grade;
step a5: designing Oligo sequences for gRNA1 and gRNA2 respectively, and extending outwards by 60bp respectively by taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as a starting point, wherein the sequences in the range are the Oligo sequences;
step a6: introducing a mutation site in Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp ranges before and after the mutation site;
when the mutation type is 'insertion, deletion + insertion', the method specifically comprises the following steps:
step b1: screening all gRNA sequences within 35bp range upstream and downstream of the deletion/insertion site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through online software;
step b2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions such as gRNA specificity fractions, cutting fractions, off-target conditions, GC content and the like;
step b3: ranking grnas according to their relative positions of cleavage site and editing region:
a: n =0, with the cleavage site inside the editing region, B: n is less than or equal to 5,C:5<N ≤ 10, D: 10-N-P (n) s 23;
step b4: if two or more gRNAs exist in the grade A, selecting two gRNAs with highest specific scores as gRNAs 1 and gRNAs 2, otherwise, selecting two gRNAs with minimum relative positions N from the next grade as the gRNAs 1 and the gRNAs 2;
step b5: designing an Oligo: if the cutting position is inside the editing region, respectively extending 60bp around the editing region as the center to serve as Oligo sequences; if the cutting position is not in the editing region, respectively taking the cutting position and the farthest end of the editing region as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site in an Oligo;
step b7: the sequence was analyzed for complexity and GC content within 500bp each before and after the mutation site.
Stated further, the step D4: the method for introducing the mutation site into the Oligo sequence specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed or not, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing area, if so, executing a step D43; if not, random mutations are introduced in the gRNA: NGG → NGC, then execute step D46;
step D43: sequentially introducing synonymous mutations into the gRNA from the 3 'end to the 5' end, judging whether the gRNA is on the PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed or not, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the mutation points are overlapped with the target mutation points or not, if so, executing a step D46, otherwise, returning to the step D43; (ii) a
Step D46: and modifying the Oligo sequence and outputting.
A gene editing point mutation scheme automatic design system adopts the steps of the gene editing point mutation scheme automatic design method, and comprises a mutation site reading module, a judgment module, a first scheme design module, a second scheme design module, a calculation module and a result generation module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judgment module is used for judging whether the mutation site is suitable for Base Editing to carry out point mutation gene Editing;
the first scheme design module is used for designing a Base Editing scheme and evaluating and screening gRNAs;
the second scheme design module is used for designing an RNP scheme based on CRISPR and Cas9, and carrying out gRNA screening, oligo sequence selection and mutation site introduction;
the calculation module is used for performing sequence complexity analysis and GC content calculation on sequences in the range of 500bp before and after the mutation site and storing the analysis result;
and the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for automated design of a gene editing point mutation scheme as described above when executing the program.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of automatically designing a gene editing point mutation scheme as described above.
One of the above technical solutions has the following beneficial effects: aiming at the problems of complicated and time-consuming manual design scheme, high dependence on experience of experiment technicians and the like, the method for automatically designing the gene editing scheme by comprehensively analyzing the sequences and gRNAs near the target mutation site is provided, and the point mutation gene editing on the gene is realized: the method comprises the following steps of selecting one of Amino Acid (Amino Acid), single Nucleotide Polymorphism (SNP), chromosome Location and Nucleotide Sequence (Nucleotide Sequence) of a mutation site, determining mutation types and mutation sites according to different positioning methods, comprehensively analyzing sequences before and after the mutation sites, designing two types of point mutation schemes, namely a Base Editing scheme and an RNP point mutation scheme, wherein the Base Editing scheme comprises gRNA evaluation and screening, and the RNP point mutation scheme comprises gRNA screening, oligo selection and mutation site introduction. An on-line point mutation scheme is used for replacing a complicated and time-consuming manual scheme, only parameters of site information need to be selected, and an optimal scheme can be obtained through gRNA screening and additional mutation introduction on Oligo, so that a large amount of time is saved for scientific research users.
Drawings
FIG. 1 is a method flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic diagram of the scheme for introducing a mutation site in Oligo according to the present invention;
FIG. 3 is a graph of output results for one embodiment of the present invention;
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The following describes a method for automatically designing a gene editing point mutation scheme according to an embodiment of the present invention with reference to fig. 1 to 2, including the following steps:
step A: determining mutation types and mutation sites according to a positioning method, wherein the positioning method comprises an Amino Acid method, an SNP method, a Chromosomal Location method and a Nucleotide Sequence method;
and B, step B: judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, if so, executing the step C, and if not, executing the step D;
step C: designing a Base Editing scheme according to mutation types, and screening gRNA sequences, wherein the gRNA sequences need to contain mutation sites;
step D: designing an RNP scheme based on CRISPR and Cas9 according to mutation types, and carrying out gRNA sequence screening, oligo sequence selection and mutation site introduction, wherein the gRNA sequence needs to contain mutation sites;
step E: performing sequence complexity analysis and GC content calculation on the gene sequences in the range of 500bp before and after the mutation site, and storing the analysis result;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
In the prior art, CRISPR/Cas9 guides a Cas9 endonuclease to cut PAM upstream of a target sequence by designing a specific guide RNA recognition target sequence, so that a target site DNA double strand is broken; after the DNA double strand breaks, repairing the cutting site by using a non-homologous end joining (NHEJ) or homologous recombination (HDR) mode of the cell to realize gene knockout, knock-in or point mutation at the DNA level. CRISPR/Cas9 is one of the most popular techniques in the field of life science due to its advantages of simple operation, low cost, and high editing efficiency, and has been widely used in gene editing-related functional studies of many model organisms, such as mammals (rats, mice, pigs, rabbits, monkeys, etc.), zebrafish, stem cells, tumor cell lines, and bacterial fungi.
The design of the point mutation gene editing scheme based on CRISPR/Cas9 seems not difficult, but for a gene editing beginner, the time is required for several days, and the effect is not necessarily ideal; for researchers with experience in gene editing, there may also be incomplete sources of information and no optimal solution may be selected.
In the embodiment of the invention, aiming at the problems of complicated and time-consuming manual design scheme, high dependence on experience of experiment technicians and the like, the method for comprehensively analyzing the sequences and gRNAs near the target mutation sites and automatically designing the gene editing scheme is provided, so that the point mutation gene editing on the gene is realized: the method comprises the following steps of selecting one of Amino Acid (Amino Acid), single Nucleotide Polymorphism (SNP), chromosome Location and Nucleotide Sequence (Nucleotide Sequence) of a mutation site, determining mutation types and mutation sites according to different positioning methods, comprehensively analyzing sequences before and after the mutation sites, designing two types of point mutation schemes, namely a Base Editing scheme and an RNP point mutation scheme, wherein the Base Editing scheme comprises gRNA evaluation and screening, and the RNP point mutation scheme comprises gRNA screening, oligo selection and mutation site introduction. An on-line point mutation scheme is used for replacing a complicated and time-consuming manual scheme, only parameters of site information need to be selected, gRNA screening is carried out, mutation is additionally introduced to the Oligo, an optimal scheme can be obtained, and a large amount of time is saved for scientific research users.
To be further explained, in the step a, the Amino Acid method includes the following steps: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; selecting one of the transcripts and one of the mutation types, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting the number of the SNP locus, inquiring an NCBI database to obtain the data of the gene of the locus, the position of the locus on a chromosome and the allelic base of the SNP locus, and calculating the position of the SNP locus on the gene according to the positions of the gene and the SNP locus on the chromosome; selecting one of the allelic bases, and automatically calculating the mutation type according to the base of the original sequence and the allelic base;
the Chromosomal Location method comprises the following steps: selecting species and genome version, selecting chromosome number, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one of the mutation types, inputting parameters of a starting position and an ending position, querying an Ensembl database to obtain a base corresponding to the position segment and a gene comprising the position segment, and calculating the position of the mutation position segment on the gene; it should be noted that the gene sequence of the chromosome is derived from the NCBI database, and since the sequence of one chromosome has hundreds of millions of character strings, the sequence is simplified and stored in the custom database on the server after being processed, thereby improving the query speed.
The Nucleotide Sequence method comprises the following steps: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; one of the transcripts is selected, parameters of 'wild type sequence' and 'mutated sequence' are input, and the mutation type and mutation site are automatically calculated according to the difference between the two sequences and the wild type sequence and the position on the gene.
Further, in the Amino Acid method, one of the mutation types is selected, and the final mutation site is obtained by inputting the relevant parameters, which respectively correspond to the following steps:
when the selection mutation type is "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and calculating the accurate position of the mutation site on the gene according to the amino acid sequence number in a reverse-deduction manner; for example: the amino acid sequence number is 50, the original amino acid is Ser (sequence AGC) and is mutated into Gly (sequence GGC), namely A → G, and the accurate position of the mutation of A → G on the gene can be obtained according to the position of the 50 th amino acid on the gene;
when the mutation type is selected as "deletion": inputting parameters of a starting position and an ending position, acquiring a sequence to be deleted from the coding sequence according to the two positions, and calculating the accurate position of a deleted sequence segment on the gene according to the distribution position of a coding region on the gene;
when the mutation type is selected as "insertion": inputting parameters of 'initial position' and 'inserted sequence', and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "deletion + insertion": inputting parameters of a starting position, an ending position and an inserted sequence, acquiring a sequence segment to be deleted from the coding sequence according to the starting position and the ending position, and calculating the accurate position of the deleted sequence segment on the gene.
Stated further, the step C: judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, wherein the judgment conditions comprise: a single base is mutated and the base is changed from C to T or G to a.
To be further described, the step D: designing CRISPR and Cas 9-based RNP scheme according to mutation types, and carrying out gRNA screening, oligo sequence selection and mutation site introduction, wherein the method specifically comprises the following steps:
step D1: screening a gRNA sequence according to the mutation type, and grading the screened gRNA;
step D2: selecting gRNA1 and gRNA2 with highest specificity fractions in gRNA sequences;
and D3: designing Oligo sequences for gRNA1 and gRNA2 respectively;
step D4: a mutation site is introduced into the Oligo sequence.
To explain further, the step D1: screening a gRNA sequence according to the mutation type, and grading the screened gRNA; the mutation types comprise four types of mutation, insertion, deletion and insertion, and corresponding operation steps are carried out aiming at different mutation types;
when the mutation type is "mutation", the following steps are specifically included:
step a1: screening all gRNA sequences in 50bp ranges of upstream and downstream of the mutation site, and acquiring information such as specificity scores, off-target conditions, cutting scores and the like of the gRNA sequences through CRISPER online software; it should be noted that the crisp online software is an existing biological database and online tool;
step a2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions such as specificity fractions, cutting fractions, off-target conditions, GC content and the like of the gRNAs;
step a3: ranking the grnas according to the distance of the mutation site and the gRNA cleavage site:
a: n is not more than 5,B:5<N ≤ 10, C:10 are woven with N less than or equal to 23, D: n >23 (a is highest level);
step a4: selecting a gRNA with the minimum relative position N as a gRNA1 from high to low according to grades, selecting a gRNA with the highest specificity score as a gRNA2 if two or more gRNA sequences exist in the same grade, and otherwise selecting a gRNA with the minimum relative position N as a gRNA2 from the next grade;
step a5: designing Oligo sequences for gRNA1 and gRNA2 respectively, and extending outwards by 60bp respectively by taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as a starting point, wherein the sequences in the range are the Oligo sequences;
step a6: introducing a mutation site in an Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp ranges before and after the mutation site;
when the mutation type is 'insertion, deletion + insertion', the method specifically comprises the following steps:
step b1: screening all gRNA sequences in 35bp ranges of upstream and downstream of the deletion/insertion site, and acquiring information such as specificity scores, off-target conditions, cutting scores and the like of the gRNA sequences through online software;
step b2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions such as gRNA specificity fractions, cutting fractions, off-target conditions, GC content and the like;
step b3: grnas were ranked according to their relative positions of the cleavage site and editing region:
a: n =0, with the cleavage site inside the editing region, B: n is not more than 5,C:5<N ≤ 10, D: 10-n woven fabric 23 (ii) a;
step b4: if two or more gRNAs exist in the grade A, selecting two gRNAs with highest specificity scores as gRNAs 1 and gRNAs 2, otherwise, selecting two gRNAs with the smallest relative positions N from the next grade as gRNAs 1 and gRNAs 2;
step b5: designing an Oligo: if the cutting position is inside the editing region, respectively extending 60bp around the editing region as the center to serve as Oligo sequences; if the cutting position is not in the editing region, respectively taking the cutting position and the farthest end of the editing region as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site in Oligo;
step b7: the sequence was analyzed for complexity and GC content within 500bp each before and after the mutation site.
Stated further, the step D4: the method for introducing the mutation site into the Oligo sequence specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed or not, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing area, if so, executing a step D43; if not, random mutations are introduced in the gRNA: NGG → NGC, then execute step D46;
step D43: sequentially introducing synonymous mutations into the gRNA from the 3 'end to the 5' end, judging whether the gRNA is on the PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed or not, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the mutation points are overlapped with the target mutation points or not, if so, executing a step D46, otherwise, returning to the step D43; (ii) a
Step D46: and modifying the Oligo sequence and outputting.
The embodiment also discloses a system for automatically designing the gene editing point mutation scheme, which adopts the steps of the method for automatically designing the gene editing point mutation scheme, and comprises a mutation site reading module, a judging module, a first scheme designing module, a second scheme designing module, a calculating module and a result generating module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judgment module is used for judging whether the mutation site is suitable for Base Editing to carry out point mutation gene Editing;
the first scheme design module is used for designing a Base Editing scheme and evaluating and screening gRNAs;
the second scheme design module is used for designing an RNP scheme based on CRISPR and Cas9, and carrying out gRNA screening, oligo sequence selection and mutation site introduction;
the calculation module is used for performing sequence complexity analysis and GC content calculation on sequences in the range of 500bp before and after the mutation site and storing the analysis result;
and the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
The embodiment also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the automatic design method of the gene editing point mutation scheme.
The present embodiment also discloses a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for automatically designing a gene editing point mutation scheme as described above.
Other configurations and operations of a method and system for automatically designing a gene editing point mutation according to an embodiment of the present invention are known to those skilled in the art and will not be described in detail herein.
All or part of the modules in the automatic design system for the gene editing point mutation scheme can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, and can also be stored in a memory of the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the device is divided into different functional units or modules, so as to perform all or part of the above described functions.
The above description of the embodiments of the present invention is provided for the purpose of illustrating the technical lines and features of the present invention and is provided for the purpose of enabling those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.

Claims (10)

1. An automatic design method of a gene editing point mutation scheme is characterized in that: the method comprises the following steps:
step A: determining mutation types and mutation sites according to a positioning method, wherein the positioning method comprises an Amino Acid method, an SNP method, a Chromosomal Location method and a Nucleotide Sequence method;
and B: judging whether the mutation site is suitable for performing point mutation gene Editing by using a Base Editing method, if so, executing the step C, and if not, executing the step D;
and C: designing a Base Editing scheme according to mutation types, and screening gRNA sequences, wherein the gRNA sequences need to contain mutation sites;
step D: designing an RNP scheme based on CRISPR and Cas9 according to mutation types, and carrying out gRNA sequence screening, oligo sequence selection and mutation site introduction, wherein the gRNA sequence needs to contain mutation sites;
step E: performing sequence complexity analysis and GC content calculation on the gene sequences within 500bp ranges before and after the mutation site, and storing the analysis result;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
2. The method according to claim 1, wherein the automated design of the gene editing point mutation scheme comprises: in step a, the Amino Acid method includes the steps of: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; selecting one of the transcripts and one of the mutation types, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting the number of the SNP locus, inquiring an NCBI database to obtain the data of the gene of the locus, the position of the locus on a chromosome and the allelic base of the SNP locus, and calculating the position of the SNP locus on the gene according to the positions of the gene and the SNP locus on the chromosome; selecting one of the allelic bases, and automatically calculating the mutation type according to the base and the allelic base of the original sequence;
the Chromosomal Location method comprises the following steps: selecting species and genome version, selecting chromosome number, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one of the mutation types, inputting parameters of 'initial position' and 'end position', querying an Ensembl database to obtain a base corresponding to the position segment and a gene comprising the position segment, and calculating the position of the mutation position segment on the gene;
the Nucleotide Sequence method comprises the following steps: selecting species, inputting genes, inquiring an NCBI database and an Ensembl database, and acquiring transcripts, gene sequences and coding sequence data of the genes; one of the transcripts is selected, parameters of 'wild type sequence' and 'mutated sequence' are input, and the mutation type and mutation site are automatically calculated according to the difference between the two sequences and the wild type sequence and the position on the gene.
3. The method of claim 2, wherein the automated design of the gene editing point mutation scheme comprises: in the Amino Acid method, one mutation type is selected, relevant parameters are input to obtain a final mutation site, and the steps are respectively corresponding to the following steps:
when the selection mutation type is "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and calculating the accurate position of the mutation site on the gene according to the amino acid sequence number in a reverse-deduction manner;
when the mutation type is selected as "deletion": inputting parameters of a starting position and an ending position, acquiring a sequence to be deleted from the coding sequence according to the two positions, and calculating the accurate position of a deleted sequence segment on the gene according to the distribution position of a coding region on the gene;
when the mutation type is selected as "insertion": inputting parameters of 'initial position' and 'inserted sequence', and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "deletion + insertion": inputting parameters of a starting position, an ending position and an inserted sequence, acquiring a sequence segment to be deleted from the coding sequence according to the starting position and the ending position, and calculating the accurate position of the deleted sequence segment on the gene.
4. The method according to claim 1, wherein the automated design of the gene editing point mutation scheme comprises: and C, performing the step of: judging whether the mutation site is subjected to point mutation gene Editing by using a Base Editing method, wherein the judging conditions comprise that: a single base is mutated and the base is changed from C to T or G to a.
5. The method according to claim 1, wherein the automated design of the gene editing point mutation scheme comprises: the step D: designing CRISPR and Cas 9-based RNP scheme according to mutation types, and carrying out gRNA screening, oligo sequence selection and mutation site introduction, wherein the method specifically comprises the following steps:
step D1: screening a gRNA sequence according to the mutation type, and grading the screened gRNA;
step D2: selecting gRNA1 and gRNA2 with highest specificity fractions in gRNA sequences;
and D3: designing Oligo sequences for gRNA1 and gRNA2 respectively;
step D4: a mutation site is introduced into the Oligo sequence.
6. The method according to claim 5, wherein the automated design of the gene editing point mutation scheme comprises: the step D1: screening a gRNA sequence according to the mutation type, and grading the gRNA obtained by screening; the mutation types comprise four types of mutation, insertion, deletion and insertion, and corresponding operation steps are carried out aiming at different mutation types;
when the mutation type is "mutation", the following steps are specifically included:
step a1: screening all gRNA sequences in the range of 50bp at the upstream and downstream of the mutation site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through CRISPER online software;
step a2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions of specificity fractions, cutting fractions, off-target conditions, GC content and the like of the gRNAs;
step a3: ranking the grnas according to the distance of the mutation site and the gRNA cleavage site:
a: n is not more than 5,B:5<N ≤ 10, C:10 are woven with N less than or equal to 23, D: n >23 (a is highest level);
step a4: selecting a gRNA with the minimum relative position N as a gRNA1 from high to low according to grades, selecting a gRNA with the highest specificity score as a gRNA2 if two or more gRNA sequences exist in the same grade, and otherwise selecting a gRNA with the minimum relative position N as a gRNA2 from the next grade;
step a5: designing Oligo sequences for gRNA1 and gRNA2 respectively, and extending outwards by 60bp respectively by taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as a starting point, wherein the sequences in the range are the Oligo sequences;
step a6: introducing a mutation site in Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp ranges before and after the mutation site;
when the mutation type is 'insertion, deletion + insertion', the method specifically comprises the following steps:
step b1: screening all gRNA sequences in 35bp ranges of upstream and downstream of the deletion/insertion site, and acquiring information such as specificity scores, off-target conditions, cutting scores and the like of the gRNA sequences through online software;
step b2: screening out gRNAs with specificity fractions of more than 80, cutting efficiencies of more than 0.5, no off-target and GC content of 40-60% according to conditions such as gRNA specificity fractions, cutting fractions, off-target conditions, GC content and the like;
step b3: ranking grnas according to their relative positions of cleavage site and editing region:
a: n =0, with the cleavage site inside the editing region, B: n is less than or equal to 5,C:5<N ≤ 10, D: 10-n-woven fabric 23;
step b4: if two or more gRNAs exist in the grade A, selecting two gRNAs with highest specific scores as gRNAs 1 and gRNAs 2, otherwise, selecting two gRNAs with minimum relative positions N from the next grade as the gRNAs 1 and the gRNAs 2;
and b5: design of Oligo: if the cutting position is in the editing region, respectively extending 60bp around the editing region as the center to be used as an Oligo sequence; if the cutting position is not in the editing region, respectively taking the cutting position and the farthest end of the editing region as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site in Oligo;
step b7: the sequence was analyzed for complexity and GC content within 500bp each before and after the mutation site.
7. The method according to claim 5, wherein the automated design of the gene editing point mutation scheme comprises: the step D4: the method for introducing the mutation site into the Oligo sequence specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed or not, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing area, if so, executing a step D43; if not, random mutations are introduced in the gRNA: NGG → NGC, then executing step D46;
step D43: sequentially introducing synonymous mutations into the gRNA from the 3 'end to the 5' end, judging whether the gRNA is on the PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed or not, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the mutation points are overlapped with the target mutation points or not, if not, executing a step D46, otherwise, returning to the step D43;
step D46: and modifying the Oligo sequence and outputting.
8. An automatic design system for gene editing point mutation scheme, which is characterized in that: the step of using the automatic design method of gene editing point mutation scheme according to any one of claims 1 to 7, comprising a mutation site interpretation module, a judgment module, a first scheme design module, a second scheme design module, a calculation module and a result generation module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judgment module is used for judging whether the mutation site is suitable for Base Editing to carry out point mutation gene Editing;
the first scheme design module is used for designing a Base Editing scheme and evaluating and screening gRNAs;
the second scheme design module is used for designing an RNP scheme based on CRISPR and Cas9, and carrying out gRNA screening, oligo sequence selection and mutation site introduction;
the calculation module is used for performing sequence complexity analysis and GC content calculation on sequences within the range of 500bp before and after the mutation site, and storing the analysis result;
and the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method of automatically designing a gene editing point mutation scheme according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of a method for automatically designing a gene editing point mutation scheme according to any one of claims 1 to 7.
CN202210753169.9A 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme Active CN115148281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210753169.9A CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210753169.9A CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Publications (2)

Publication Number Publication Date
CN115148281A true CN115148281A (en) 2022-10-04
CN115148281B CN115148281B (en) 2023-07-14

Family

ID=83410965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210753169.9A Active CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Country Status (1)

Country Link
CN (1) CN115148281B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109266652A (en) * 2018-10-15 2019-01-25 广州鼓润医疗科技有限公司 SgRNA, carrier and application based on the mutational site CRISPR/Cas9 technical editor HBB-28
CN109652422A (en) * 2019-01-31 2019-04-19 安徽省农业科学院水稻研究所 Efficient single base editing system OsSpCas9-eCDA and its application
WO2021042047A1 (en) * 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors
CN112469446A (en) * 2018-05-11 2021-03-09 比姆医疗股份有限公司 Method for editing single nucleotide polymorphisms using a programmable base editor system
CN113891936A (en) * 2019-03-19 2022-01-04 布罗德研究所股份有限公司 Methods and compositions for editing nucleotide sequences
CN114121153A (en) * 2021-11-23 2022-03-01 广州金域医学检验中心有限公司 Gene mutation site detection method, device, electronic equipment and storage medium
US20220177877A1 (en) * 2019-03-04 2022-06-09 President And Fellows Of Harvard College Highly multiplexed base editing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112469446A (en) * 2018-05-11 2021-03-09 比姆医疗股份有限公司 Method for editing single nucleotide polymorphisms using a programmable base editor system
CN109266652A (en) * 2018-10-15 2019-01-25 广州鼓润医疗科技有限公司 SgRNA, carrier and application based on the mutational site CRISPR/Cas9 technical editor HBB-28
CN109652422A (en) * 2019-01-31 2019-04-19 安徽省农业科学院水稻研究所 Efficient single base editing system OsSpCas9-eCDA and its application
US20220177877A1 (en) * 2019-03-04 2022-06-09 President And Fellows Of Harvard College Highly multiplexed base editing
CN113891936A (en) * 2019-03-19 2022-01-04 布罗德研究所股份有限公司 Methods and compositions for editing nucleotide sequences
WO2021042047A1 (en) * 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors
CN114121153A (en) * 2021-11-23 2022-03-01 广州金域医学检验中心有限公司 Gene mutation site detection method, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MATTHEW COELHO ET AL.: "DNA Base Editing Strategies for Genome Editing", GENOME EDITING IN DRUG DISCOVERY *

Also Published As

Publication number Publication date
CN115148281B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
Venturini et al. Leveraging multiple transcriptome assembly methods for improved gene structure annotation
Schmid et al. A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism
Li et al. Computational tools and resources for CRISPR/Cas genome editing
Beissinger et al. Marker density and read depth for genotyping populations using genotyping-by-sequencing
US20210317444A1 (en) System and method for gene editing cassette design
Baril et al. Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline
JP2017535271A5 (en)
US20220277807A1 (en) Methods and systems for assessing genetic variants
Rettelbach et al. How linked selection shapes the diversity landscape in Ficedula flycatchers
CN112614541A (en) Automatic screening method, system, device and storage medium for gene editing sites
US11970733B2 (en) Methods for analyzing nucleic acid sequences
CN115148281A (en) Automatic design method and system for gene editing point mutation scheme
CA3069749A1 (en) Systems and methods for targeted genome editing
Wang et al. RestrictionDigest: A powerful Perl module for simulating genomic restriction digests
CN112226529A (en) SNP molecular marker of wax gourd blight-resistant gene and application
CN111128303B (en) Method and system for determining corresponding sequences in a target species based on known sequences
Min et al. Spatial structure alters the site frequency spectrum produced by hitchhiking
CN111349651A (en) Arabidopsis hsd1/2/3/4/5/6 hexamutant and construction method and application thereof
Jiang et al. A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations
CN111808935B (en) Identification method of plant endogenous siRNA transcription regulation relationship
Rodrigues et al. Shared evolutionary processes shape landscapes of genomic variation in the great apes
CN107868835B (en) SNP molecular marker related to sheep tail width and application thereof
Veltsos et al. The quantitative genetics of gene expression in Mimulus guttatus
Eriksson et al. An accurate model for genetic hitchhiking
O'Brien Generalisable Methods for Improving CRISPR Efficiency and Outcome Specificity using Machine Learning Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant