CN115148281B - Automatic design method and system for gene editing point mutation scheme - Google Patents

Automatic design method and system for gene editing point mutation scheme Download PDF

Info

Publication number
CN115148281B
CN115148281B CN202210753169.9A CN202210753169A CN115148281B CN 115148281 B CN115148281 B CN 115148281B CN 202210753169 A CN202210753169 A CN 202210753169A CN 115148281 B CN115148281 B CN 115148281B
Authority
CN
China
Prior art keywords
mutation
grna
sequence
gene
site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210753169.9A
Other languages
Chinese (zh)
Other versions
CN115148281A (en
Inventor
万翠荣
林剑锋
郑虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yuanjing Biotechnology Co ltd
Original Assignee
Guangzhou Yuanjing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yuanjing Biotechnology Co ltd filed Critical Guangzhou Yuanjing Biotechnology Co ltd
Priority to CN202210753169.9A priority Critical patent/CN115148281B/en
Publication of CN115148281A publication Critical patent/CN115148281A/en
Application granted granted Critical
Publication of CN115148281B publication Critical patent/CN115148281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of gene editing, in particular to an automatic design method and system for a point mutation scheme of gene editing, wherein the method comprises the following steps: determining mutation types and mutation sites according to a positioning method; and (B) step (B): judging whether the mutation site is suitable for point mutation gene Editing by a Base modifying method, if so, executing the step C: designing a Base edition scheme according to mutation types, and screening gRNA sequences; if not, executing the step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening a gRNA sequence, selecting an oligo sequence and introducing mutation sites; step E: carrying out sequence complexity analysis and GC content calculation on the gene sequences within each range of 500bp before and after the mutation site, and storing analysis results; step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result. The method can automatically design the gene editing scheme, realize the point mutation gene editing on the gene and improve the efficiency and the accuracy.

Description

Automatic design method and system for gene editing point mutation scheme
Technical Field
The invention relates to the technical field of gene editing, in particular to an automatic design method and system for a point mutation scheme of gene editing.
Background
In the prior art, CRISPR/Cas9 directs Cas9 endonuclease to cleave PAM upstream of a target sequence by designing a specific guide RNA recognition target sequence, causing target site DNA double strand breaks; after double strand break of DNA, repair is carried out on the cutting site by utilizing a non-homologous end joining (NHEJ) or homologous recombination (HDR) mode of cells, so that gene knockout, knock-in or point mutation at the DNA level is realized. CRISPR/Cas9 is one of the most hot techniques in the field of life science due to the advantages of simple operation, low cost, editing efficiency and the like, and has been widely applied to gene editing related function research of various types of organisms, such as mammals (rats, mice, pigs, rabbits, monkeys and the like), zebra fish, stem cells, tumor cell lines, bacterial fungi and the like.
The design of a CRISPR/Cas 9-based point mutation gene editing scheme seems to be not difficult, but for a gene editing beginner, a few days are required, and the effect is not necessarily ideal; for researchers with genetic editing experience, it may be difficult to obtain an optimal solution because of incomplete information sources and limited human knowledge. Therefore, the manual design is complicated and time-consuming, and the research of a computer intelligent method for automatically designing the gene editing scheme is now a problem to be solved.
Disclosure of Invention
Aiming at the defects, the invention aims to provide an automatic design method and an automatic design system for a point mutation scheme of gene editing, which can comprehensively analyze sequences and gRNA near a target mutation site and automatically design the method for the gene editing scheme, so that the point mutation gene editing on the gene is realized, and the efficiency and the accuracy of a selection scheme are improved.
To achieve the purpose, the invention adopts the following technical scheme:
an automatic design method for a gene editing point mutation scheme comprises the following steps:
step A: determining mutation types and mutation sites according to a localization method, wherein the localization method comprises an Amino Acid method, a SNP method, a Chromosomal Location method and a Nucleotide Sequence method;
and (B) step (B): judging whether the mutation site is suitable for the Base modifying method to carry out point mutation gene Editing, if yes, executing the step C, and if no, executing the step D;
step C: designing a Base modifying scheme according to mutation types, and screening a gRNA sequence, wherein the gRNA sequence needs to contain mutation sites;
step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening a gRNA sequence, selecting an oligo sequence and introducing mutation sites, wherein the gRNA sequence needs to contain the mutation sites;
step E: carrying out sequence complexity analysis and GC content calculation on the gene sequences within the range of 500bp before and after the mutation site, and storing analysis results;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
Preferably, in step a, the Amino Acid method includes the following steps: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; selecting one transcript and one mutation type, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting an SNP site number, inquiring an NCBI database to obtain the data of the gene where the site is located, the position of the site on a chromosome and the allele of the SNP site, and calculating the position of the SNP site on the gene through the positions of the gene and the SNP site on the chromosome; selecting one of the allelic bases, and automatically calculating mutation types according to the bases and the allelic bases of the original sequence;
the Chromosomal Location method comprises the following steps: selecting species and genome versions, selecting chromosome numbers, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one mutation type, inputting parameters of a start position and an end position, inquiring an Ensembl database to obtain a base corresponding to the position segment and a gene containing the position segment, and calculating to obtain the position of the mutation position segment on the gene;
the Nucleotide Sequence method comprises the following steps: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; one of the transcripts is selected, and parameters of a wild type sequence and a mutated sequence are input, and mutation types and mutation sites are automatically calculated according to the difference between the two sequences and the wild type sequence at the positions on the gene.
Preferably, in the Amino Acid method, one mutation type is selected, and relevant parameters are input to obtain final mutation sites, which correspond to the following steps:
when the mutation type is selected as "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and reversely calculating according to the amino acid sequence number to obtain the accurate position of the mutation site on the gene;
when the mutation type is selected as "delete": inputting parameters of a start position and an end position, acquiring a sequence to be deleted from a coding sequence according to the two positions, and calculating the accurate position of the deleted sequence fragment on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "insert": inputting parameters of a start position and an inserted sequence, and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "delete+insert": the parameters "start position", "end position" and "inserted sequence" are input, the sequence fragment to be deleted is obtained from the coding sequence according to the start position and the end position, and the accurate position of the deleted sequence fragment on the gene is calculated.
Preferably, the step C: judging whether the mutation site is suitable for point mutation gene Editing by a Base modifying method or not, wherein the judgment conditions comprise: a single base is mutated and the base is changed from C to T or G to a.
Preferably, the step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening gRNA, selecting an oligo sequence and introducing mutation sites, and specifically comprising the following steps:
step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA;
step D2: selecting gRNA1 and gRNA2 with the highest specificity fraction in the gRNA sequence;
step D3: designing Oligo sequences for gRNA1 and gRNA2, respectively;
step D4: a mutation site is introduced in the Oligo sequence.
Preferably, the step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA; the mutation types comprise four types of mutation, insertion, deletion and insertion, and the corresponding operation steps are carried out for different mutation types;
when the mutation type is "mutation", the method specifically comprises the following steps:
step a1: screening all gRNA sequences within 50bp ranges at the upstream and downstream of the mutation site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through CRISPOR online software;
step a2: screening out gRNA with the specificity score of more than 80, the cutting efficiency of more than 0.5, no off-target and the GC content of 40-60% according to the conditions such as the specificity score, the cutting score, the off-target condition, the GC content and the like of the gRNA;
step a3: grnas were ranked according to distance between mutation site and gRNA cleavage site:
a: n is less than or equal to 5, B:5<N is less than or equal to 10, C:10< N is less than or equal to 23, D: n >23 (A is highest);
step a4: firstly selecting a gRNA with the minimum relative position N from high to low according to the grade as gRNA1, selecting the gRNA with the highest specificity score as gRNA2 if two or more gRNA sequences exist in the same grade, otherwise selecting the gRNA with the minimum relative position N from the next grade as gRNA2;
step a5: designing Oligo sequences for the gRNA1 and the gRNA2 respectively, and taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as the start to extend outwards by 60bp respectively, wherein the sequences in the range are Oligo sequences;
step a6: introducing a mutation site into the Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp before and after the mutation site;
when the mutation type is "insert, delete, delete+insert", the method specifically comprises the following steps:
step b1: screening all gRNA sequences within 35bp of the upstream and downstream of the deletion/insertion site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through online software;
step b2: screening out gRNA with specificity score greater than 80, cutting efficiency greater than 0.5, no off-target and GC content between 40% and 60% according to conditions such as gRNA specificity score, cutting score, off-target condition, GC content and the like;
step b3: grading the grnas according to the relative positions of the gRNA cleavage sites and the editing region:
a: n=0, wherein the cleavage site is inside the editing region, B: n is less than or equal to 5, C:5<N is less than or equal to 10, D:10< N <23;
step b4: if two or more gRNAs exist in the class A, selecting the two gRNAs with the highest specificity score as gRNA1 and gRNA2, otherwise, selecting the two gRNAs with the smallest relative position N from the next class as gRNA1 and gRNA2;
step b5: designing an Oligo: if the cutting position is in the editing region, respectively extending 60bp around the editing region as a center to serve as an Oligo sequence; if the cutting position is not in the editing area, respectively taking the cutting position and the farthest end of the editing area as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site into the Oligo;
step b7: sequences within 500bp of each before and after the mutation site were analyzed for complexity and GC content.
Further describing, the step D4: introducing mutation sites into an Oligo sequence, which specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing region, if so, executing a step D43; if not, introducing random mutations in the gRNA: ngg→ngc, and then step D46 is performed;
step D43: introducing synonymous mutation from 3 'end to 5' end of gRNA in turn, judging whether the gRNA is on a PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the target mutation point is overlapped with the target mutation point, if so, executing the step D46, otherwise, returning to the step D43; the method comprises the steps of carrying out a first treatment on the surface of the
Step D46: the Oligo sequence was modified and output.
The automatic design system for the gene editing point mutation scheme adopts the steps of the automatic design method for the gene editing point mutation scheme, and comprises a mutation site interpretation module, a judgment module, a first scheme design module, a second scheme design module, a calculation module and a result generation module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judging module is used for judging whether the mutation site is suitable for point mutation gene Editing by a Base modifying method;
the first scheme design module is used for designing a Base edition scheme and carrying out evaluation and screening of gRNA;
the second scheme design module is used for designing a CRISPR and Cas 9-based RNP scheme, screening gRNA, oligo sequence selection and mutation site introduction;
the calculation module is used for carrying out sequence complexity analysis and GC content calculation on sequences within the range of 500bp before and after the mutation site, and storing analysis results;
the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for automatically designing a point mutation scheme for gene editing as described above when the program is executed by the processor.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of automatically designing a point mutation scheme for gene editing as described above.
One of the above technical solutions has the following beneficial effects: aiming at the problems of complicated and time-consuming manual design scheme, higher dependence on experience of experiment technicians and the like, the method for comprehensively analyzing sequences and gRNA near a target mutation site and automatically designing a gene editing scheme is provided, so that point mutation gene editing on genes is realized: the method comprises the steps of selecting one of Amino Acid, SNP (single nucleotide polymorphism site), chromosomal Location (chromosome position) and Nucleotide Sequence (nucleotide sequence) of a mutation site, determining mutation types and mutation sites according to different positioning methods, comprehensively analyzing sequences before and after the mutation sites, and designing two types of point mutation schemes which are Base edition and RNP point mutation schemes respectively, wherein the Base edition scheme comprises gRNA evaluation and screening, and the RNP point mutation scheme comprises gRNA screening, oligo selection and mutation site introduction. The point mutation scheme is designed on line to replace a complicated and time-consuming manual scheme, and only parameters of site information are selected, and the optimal scheme can be obtained by gRNA screening and additionally introducing mutation on Oligo, so that a great amount of time is saved for scientific research users.
Drawings
FIG. 1 is a method flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic flow chart of the introduction of mutation sites in Oligo according to the present invention;
FIG. 3 is a graph of output results for one embodiment of the invention;
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The following describes a method for automatically designing a point mutation scheme for gene editing according to an embodiment of the present invention with reference to fig. 1 to 2, comprising the steps of:
step A: determining mutation types and mutation sites according to a localization method, wherein the localization method comprises an Amino Acid method, a SNP method, a Chromosomal Location method and a Nucleotide Sequence method;
and (B) step (B): judging whether the mutation site is suitable for the Base modifying method to carry out point mutation gene Editing, if yes, executing the step C, and if no, executing the step D;
step C: designing a Base modifying scheme according to mutation types, and screening a gRNA sequence, wherein the gRNA sequence needs to contain mutation sites;
step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening a gRNA sequence, selecting an oligo sequence and introducing mutation sites, wherein the gRNA sequence needs to contain the mutation sites;
step E: carrying out sequence complexity analysis and GC content calculation on the gene sequences within the range of 500bp before and after the mutation site, and storing analysis results;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
In the prior art, CRISPR/Cas9 directs Cas9 endonuclease to cleave PAM upstream of a target sequence by designing a specific guide RNA recognition target sequence, causing target site DNA double strand breaks; after double strand break of DNA, repair is carried out on the cutting site by utilizing a non-homologous end joining (NHEJ) or homologous recombination (HDR) mode of cells, so that gene knockout, knock-in or point mutation at the DNA level is realized. CRISPR/Cas9 is one of the most hot techniques in the field of life science due to the advantages of simple operation, low cost, editing efficiency and the like, and has been widely applied to gene editing related function research of various types of organisms, such as mammals (rats, mice, pigs, rabbits, monkeys and the like), zebra fish, stem cells, tumor cell lines, bacterial fungi and the like.
The design of a CRISPR/Cas 9-based point mutation gene editing scheme seems to be not difficult, but for a gene editing beginner, a few days are required, and the effect is not necessarily ideal; for researchers with genetic editing experience, optimal solutions may not be selected because the information sources are not comprehensive.
In the embodiment of the invention, aiming at the problems that the manual design scheme is complicated and time-consuming, the experience of experimental technicians is high, and the like, the invention provides a method capable of comprehensively analyzing sequences and gRNA near a target mutation site and automatically designing a gene editing scheme, so as to realize point mutation gene editing on genes: the method comprises the steps of selecting one of Amino Acid, SNP (single nucleotide polymorphism site), chromosomal Location (chromosome position) and Nucleotide Sequence (nucleotide sequence) of a mutation site, determining mutation types and mutation sites according to different positioning methods, comprehensively analyzing sequences before and after the mutation sites, and designing two types of point mutation schemes which are Base edition and RNP point mutation schemes respectively, wherein the Base edition scheme comprises gRNA evaluation and screening, and the RNP point mutation scheme comprises gRNA screening, oligo selection and mutation site introduction. The point mutation scheme is designed on line to replace a complicated and time-consuming manual scheme, and only parameters of site information are selected, and the optimal scheme can be obtained by gRNA screening and additionally introducing mutation on Oligo, so that a great amount of time is saved for scientific research users.
Still further illustratively, in step A, the Amino Acid method comprises the steps of: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; selecting one transcript and one mutation type, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting an SNP site number, inquiring an NCBI database to obtain the data of the gene where the site is located, the position of the site on a chromosome and the allele of the SNP site, and calculating the position of the SNP site on the gene through the positions of the gene and the SNP site on the chromosome; selecting one of the allelic bases, and automatically calculating mutation types according to the bases and the allelic bases of the original sequence;
the Chromosomal Location method comprises the following steps: selecting species and genome versions, selecting chromosome numbers, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one mutation type, inputting parameters of a start position and an end position, inquiring an Ensembl database to obtain a base corresponding to the position segment and a gene containing the position segment, and calculating to obtain the position of the mutation position segment on the gene; it should be noted that, the gene sequence of the chromosome is derived from the NCBI database, and the sequence of one chromosome has hundreds of millions of character strings, so that the sequence is simplified after being processed and stored in a custom database on a server, and the query speed is improved.
The Nucleotide Sequence method comprises the following steps: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; one of the transcripts is selected, and parameters of a wild type sequence and a mutated sequence are input, and mutation types and mutation sites are automatically calculated according to the difference between the two sequences and the wild type sequence at the positions on the gene.
Further describing, in the Amino Acid method, one mutation type is selected, and relevant parameters are input to obtain final mutation sites, which correspond to the following steps:
when the mutation type is selected as "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and reversely calculating according to the amino acid sequence number to obtain the accurate position of the mutation site on the gene; for example: the amino acid sequence number is 50, the original amino acid is Ser (sequence AGC) and is mutated into Gly (sequence GGC), namely A-G, and the accurate position of the mutation of A-G on the gene can be obtained according to the position of the 50 th amino acid on the gene;
when the mutation type is selected as "delete": inputting parameters of a start position and an end position, acquiring a sequence to be deleted from a coding sequence according to the two positions, and calculating the accurate position of the deleted sequence fragment on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "insert": inputting parameters of a start position and an inserted sequence, and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "delete+insert": the parameters "start position", "end position" and "inserted sequence" are input, the sequence fragment to be deleted is obtained from the coding sequence according to the start position and the end position, and the accurate position of the deleted sequence fragment on the gene is calculated.
Further illustratively, the step C: judging whether the mutation site is suitable for point mutation gene Editing by a Base modifying method or not, wherein the judgment conditions comprise: a single base is mutated and the base is changed from C to T or G to a.
Further illustratively, the step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening gRNA, selecting an oligo sequence and introducing mutation sites, and specifically comprising the following steps:
step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA;
step D2: selecting gRNA1 and gRNA2 with the highest specificity fraction in the gRNA sequence;
step D3: designing Oligo sequences for gRNA1 and gRNA2, respectively;
step D4: a mutation site is introduced in the Oligo sequence.
Further describing, the step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA; the mutation types comprise four types of mutation, insertion, deletion and insertion, and the corresponding operation steps are carried out for different mutation types;
when the mutation type is "mutation", the method specifically comprises the following steps:
step a1: screening all gRNA sequences within 50bp ranges at the upstream and downstream of the mutation site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through CRISPOR online software; it should be noted that CRISPOR online software is an existing biological database and online tool;
step a2: screening out gRNA with the specificity score of more than 80, the cutting efficiency of more than 0.5, no off-target and the GC content of 40-60% according to the conditions such as the specificity score, the cutting score, the off-target condition, the GC content and the like of the gRNA;
step a3: grnas were ranked according to distance between mutation site and gRNA cleavage site:
a: n is less than or equal to 5, B:5<N is less than or equal to 10, C:10< N is less than or equal to 23, D: n >23 (A is highest);
step a4: firstly selecting a gRNA with the minimum relative position N from high to low according to the grade as gRNA1, selecting the gRNA with the highest specificity score as gRNA2 if two or more gRNA sequences exist in the same grade, otherwise selecting the gRNA with the minimum relative position N from the next grade as gRNA2;
step a5: designing Oligo sequences for the gRNA1 and the gRNA2 respectively, and taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as the start to extend outwards by 60bp respectively, wherein the sequences in the range are Oligo sequences;
step a6: introducing a mutation site into the Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp before and after the mutation site;
when the mutation type is "insert, delete, delete+insert", the method specifically comprises the following steps:
step b1: screening all gRNA sequences in the range of 35bp at the upstream and downstream of the deletion/insertion site, and acquiring information such as specificity score, off-target condition, cutting score and the like of the gRNA sequences through online software;
step b2: screening out gRNA with specificity score greater than 80, cutting efficiency greater than 0.5, no off-target and GC content between 40% and 60% according to conditions such as gRNA specificity score, cutting score, off-target condition, GC content and the like;
step b3: grading the grnas according to the relative positions of the gRNA cleavage sites and the editing region:
a: n=0, wherein the cleavage site is inside the editing region, B: n is less than or equal to 5, C:5<N is less than or equal to 10, D:10< N <23;
step b4: if two or more gRNAs exist in the class A, selecting the two gRNAs with the highest specificity score as gRNA1 and gRNA2, otherwise, selecting the two gRNAs with the smallest relative position N from the next class as gRNA1 and gRNA2;
step b5: designing an Oligo: if the cutting position is in the editing region, respectively extending 60bp around the editing region as a center to serve as an Oligo sequence; if the cutting position is not in the editing area, respectively taking the cutting position and the farthest end of the editing area as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site into the Oligo;
step b7: sequences within 500bp of each before and after the mutation site were analyzed for complexity and GC content.
Further describing, the step D4: introducing mutation sites into an Oligo sequence, which specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing region, if so, executing a step D43; if not, introducing random mutations in the gRNA: ngg→ngc, and then step D46 is performed;
step D43: introducing synonymous mutation from 3 'end to 5' end of gRNA in turn, judging whether the gRNA is on a PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the target mutation point is overlapped with the target mutation point, if so, executing the step D46, otherwise, returning to the step D43; the method comprises the steps of carrying out a first treatment on the surface of the
Step D46: the Oligo sequence was modified and output.
The embodiment also discloses an automatic design system of the gene editing point mutation scheme, which adopts the steps of the automatic design method of the gene editing point mutation scheme, and comprises a mutation site interpretation module, a judgment module, a first scheme design module, a second scheme design module, a calculation module and a result generation module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judging module is used for judging whether the mutation site is suitable for point mutation gene Editing by a Base modifying method;
the first scheme design module is used for designing a Base edition scheme and carrying out evaluation and screening of gRNA;
the second scheme design module is used for designing a CRISPR and Cas 9-based RNP scheme, screening gRNA, oligo sequence selection and mutation site introduction;
the calculation module is used for carrying out sequence complexity analysis and GC content calculation on sequences within the range of 500bp before and after the mutation site, and storing analysis results;
the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
The embodiment also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the automatic design method of the gene editing point mutation scheme when executing the program.
The present embodiment also discloses a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for automatically designing a point mutation scheme for gene editing as described above.
Other configurations, etc. and operations of a method and system for automatically designing a point mutation scheme for gene editing according to an embodiment of the present invention are known to those skilled in the art, and will not be described in detail herein.
All or part of each module in the automatic design system of the gene editing point mutation scheme can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or independent of a processor in the electronic device, or may be stored in software in a memory of the electronic device, so that the processor may call and execute operations corresponding to the above modules.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above description of the specific embodiments of the present invention is merely for illustrating the technical route and features of the present invention, and is intended to enable those skilled in the art to understand the content of the present invention and implement it accordingly, but the present invention is not limited to the above-described specific embodiments. All changes or modifications that come within the scope of the appended claims are intended to be embraced therein.

Claims (10)

1. An automatic design method for a gene editing point mutation scheme is characterized by comprising the following steps: the method comprises the following steps:
step A: determining mutation types and mutation sites according to a positioning method, wherein the positioning method comprises an Amino Acid method, a SNP method, a chromosomal positioning method and a NucleotideSequence method;
and (B) step (B): judging whether the mutation site is suitable for the Baseedition method to edit the point mutation gene, if so, executing the step C, and if not, executing the step D;
step C: designing a Baseedition scheme according to mutation types, and screening a gRNA sequence, wherein the gRNA sequence needs to contain mutation sites;
step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening a gRNA sequence, selecting an oligo sequence and introducing mutation sites, wherein the gRNA sequence needs to contain the mutation sites;
step E: carrying out sequence complexity analysis and GC content calculation on the gene sequences within the range of 500bp before and after the mutation site, and storing analysis results;
step F: and generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
2. The method for automatically designing a point mutation scheme for gene editing according to claim 1, wherein: in step a, the amioacid method includes the steps of: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; selecting one transcript and one mutation type, and inputting related parameters to obtain a final mutation site;
the SNP method comprises the following steps: inputting an SNP site number, inquiring an NCBI database to obtain the data of the gene where the site is located, the position of the site on a chromosome and the allele of the SNP site, and calculating the position of the SNP site on the gene through the positions of the gene and the SNP site on the chromosome; selecting one of the allelic bases, and automatically calculating mutation types according to the bases and the allelic bases of the original sequence;
the ChromosomalLocation method comprises the following steps: selecting species and genome versions, selecting chromosome numbers, and inquiring a custom database to obtain the gene sequence of the chromosome; selecting one mutation type, inputting parameters of a start position and an end position, inquiring an Ensembl database to obtain a base corresponding to the position segment and a gene containing the position segment, and calculating to obtain the position of the mutation position segment on the gene;
the NucleotedSequence method comprises the following steps: selecting a species, inputting a gene, querying an NCBI database and an Ensembl database and acquiring transcripts, gene sequences and coding sequence data of the gene; one of the transcripts is selected, and parameters of a wild type sequence and a mutated sequence are input, and mutation types and mutation sites are automatically calculated according to the difference between the two sequences and the wild type sequence at the positions on the gene.
3. The method for automatically designing a point mutation scheme for gene editing according to claim 2, wherein: in the AminoAcid method, one mutation type is selected, and relevant parameters are input to obtain final mutation sites, which correspond to the following steps:
when the mutation type is selected as "mutation": inputting parameters of 'amino acid sequence number' and 'mutated amino acid', and reversely calculating according to the amino acid sequence number to obtain the accurate position of the mutation site on the gene;
when the mutation type is selected as "delete": inputting parameters of a start position and an end position, acquiring a sequence to be deleted from a coding sequence according to the two positions, and calculating the accurate position of the deleted sequence fragment on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "insert": inputting parameters of a start position and an inserted sequence, and calculating the position of the inserted sequence on the gene according to the distribution position of the coding region on the gene;
when the mutation type is selected as "delete+insert": the parameters "start position", "end position" and "inserted sequence" are input, the sequence fragment to be deleted is obtained from the coding sequence according to the start position and the end position, and the accurate position of the deleted sequence fragment on the gene is calculated.
4. The method for automatically designing a point mutation scheme for gene editing according to claim 1, wherein: the step C: judging whether the mutation site is suitable for point mutation gene editing by a BaseEditing method or not, wherein the judgment conditions comprise: a single base is mutated and the base is changed from C to T or G to a.
5. The method for automatically designing a point mutation scheme for gene editing according to claim 1, wherein: the step D: designing a CRISPR and Cas 9-based RNP scheme according to mutation types, screening gRNA, selecting an oligo sequence and introducing mutation sites, and specifically comprising the following steps:
step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA;
step D2: selecting gRNA1 and gRNA2 with the highest specificity fraction in the gRNA sequence;
step D3: designing Oligo sequences for gRNA1 and gRNA2, respectively;
step D4: a mutation site is introduced in the Oligo sequence.
6. The method for automatically designing a point mutation scheme for gene editing according to claim 5, wherein: the step D1: screening the gRNA sequence according to mutation types, and grading the screened gRNA; the mutation types comprise four types of mutation, insertion, deletion and insertion, and the corresponding operation steps are carried out for different mutation types;
when the mutation type is "mutation", the method specifically comprises the following steps:
step a1: screening all gRNA sequences within 50bp ranges at the upstream and downstream of the mutation site, and acquiring the specificity score, off-target condition and cutting score information of the gRNA sequences through CRISPOR online software;
step a2: screening out gRNA with the specificity score larger than 80, the cutting efficiency larger than 0.5 and no off-target and the GC content between 40% and 60% according to the specificity score, the cutting score, the off-target condition and the GC content condition of the gRNA;
step a3: grnas were ranked according to distance between mutation site and gRNA cleavage site:
a: n is less than or equal to 5, B:5<N is less than or equal to 10, C:10< N is less than or equal to 23, D: n >23, A being the highest level;
step a4: firstly selecting a gRNA with the minimum relative position N from high to low according to the grade as gRNA1, selecting the gRNA with the highest specificity score as gRNA2 if two or more gRNA sequences exist in the same grade, otherwise selecting the gRNA with the minimum relative position N from the next grade as gRNA2;
step a5: designing Oligo sequences for the gRNA1 and the gRNA2 respectively, and taking the end of the cutting site and the end of the mutation site, which is farthest from the cutting site, as the start to extend outwards by 60bp respectively, wherein the sequences in the range are Oligo sequences;
step a6: introducing a mutation site into the Oligo;
step a7: analyzing the complexity and GC content of sequences within 500bp before and after the mutation site;
when the mutation type is "insert, delete, delete+insert", the method specifically comprises the following steps:
step b1: screening all gRNA sequences in the range of 35bp at the upstream and downstream of the deletion/insertion site, and acquiring the specificity score, off-target condition and cutting score information of the gRNA sequences through online software;
step b2: screening out gRNA with the specificity score larger than 80, the cutting efficiency larger than 0.5 and no off-target and the GC content between 40% and 60% according to the gRNA specificity score, the cutting score, the off-target condition and the GC content condition;
step b3: grading the grnas according to the relative positions of the gRNA cleavage sites and the editing region:
a: n=0, wherein the cleavage site is inside the editing region, B: n is less than or equal to 5, C:5<N is less than or equal to 10, D:10< N <23;
step b4: if two or more gRNAs exist in the class A, selecting the two gRNAs with the highest specificity score as gRNA1 and gRNA2, otherwise, selecting the two gRNAs with the smallest relative position N from the next class as gRNA1 and gRNA2;
step b5: designing an Oligo: if the cutting position is in the editing region, respectively extending 60bp around the editing region as a center to serve as an Oligo sequence; if the cutting position is not in the editing area, respectively taking the cutting position and the farthest end of the editing area as the starting points to extend outwards by 60bp;
step b6: introducing a mutation site into the Oligo;
step b7: sequences within 500bp of each before and after the mutation site were analyzed for complexity and GC content.
7. The method for automatically designing a point mutation scheme for gene editing according to claim 5, wherein: the step D4 is as follows: introducing mutation sites into an Oligo sequence, which specifically comprises the following steps:
step D41: inputting an Oligo sequence, judging whether the PAM structure of the gRNA on the Oligo sequence is changed, if so, outputting the Oligo sequence, and if not, executing the step D42;
step D42: judging whether the gRNA is in the gene editing region, if so, executing a step D43; if not, introducing random mutations in the gRNA: ngg→ngc, and then step D46 is performed;
step D43: introducing synonymous mutation from 3 'end to 5' end of gRNA in turn, judging whether the gRNA is on a PAM structure NGG, if so, executing a step D44, and if not, executing a step D45;
step D44: judging whether the PAM structure is changed, if not, returning to the step D43; if yes, go to step D46;
step D45: judging whether the target mutation point is overlapped with the target mutation point, if not, executing the step D46, otherwise, returning to the step D43;
step D46: the Oligo sequence was modified and output.
8. An automatic design system for a gene editing point mutation scheme is characterized in that: the method for automatically designing a gene editing point mutation scheme according to any one of claims 1 to 7, comprising a mutation site interpretation module, a judgment module, a first scheme design module, a second scheme design module, a calculation module and a result generation module;
the mutation site reading module is used for determining mutation types and mutation sites according to a positioning method;
the judging module is used for judging whether the mutation site is suitable for point mutation gene editing by using a Baseedition method;
the first scheme design module is used for designing a Baseedition scheme and carrying out evaluation and screening of gRNA;
the second scheme design module is used for designing a CRISPR and Cas 9-based RNP scheme, screening gRNA, oligo sequence selection and mutation site introduction;
the calculation module is used for carrying out sequence complexity analysis and GC content calculation on sequences within a range of 500bp before and after the mutation site, and storing analysis results;
the result generation module is used for generating and outputting a gRNA targeting region schematic diagram and a scheme report according to the analysis result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of a method for automatically designing a point mutation scheme for gene editing according to any one of claims 1 to 7 when the program is executed.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of a method for automatically designing a point mutation scheme for gene editing according to any one of claims 1 to 7.
CN202210753169.9A 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme Active CN115148281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210753169.9A CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210753169.9A CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Publications (2)

Publication Number Publication Date
CN115148281A CN115148281A (en) 2022-10-04
CN115148281B true CN115148281B (en) 2023-07-14

Family

ID=83410965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210753169.9A Active CN115148281B (en) 2022-06-29 2022-06-29 Automatic design method and system for gene editing point mutation scheme

Country Status (1)

Country Link
CN (1) CN115148281B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109652422A (en) * 2019-01-31 2019-04-19 安徽省农业科学院水稻研究所 Efficient single base editing system OsSpCas9-eCDA and its application
WO2021042047A1 (en) * 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021523739A (en) * 2018-05-11 2021-09-09 ビーム セラピューティクス インク. How to edit single nucleotide polymorphisms using a programmable base editor system
CN109266652A (en) * 2018-10-15 2019-01-25 广州鼓润医疗科技有限公司 SgRNA, carrier and application based on the mutational site CRISPR/Cas9 technical editor HBB-28
US20220177877A1 (en) * 2019-03-04 2022-06-09 President And Fellows Of Harvard College Highly multiplexed base editing
BR112021018606A2 (en) * 2019-03-19 2021-11-23 Harvard College Methods and compositions for editing nucleotide sequences
CN114121153A (en) * 2021-11-23 2022-03-01 广州金域医学检验中心有限公司 Gene mutation site detection method, device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109652422A (en) * 2019-01-31 2019-04-19 安徽省农业科学院水稻研究所 Efficient single base editing system OsSpCas9-eCDA and its application
WO2021042047A1 (en) * 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors

Also Published As

Publication number Publication date
CN115148281A (en) 2022-10-04

Similar Documents

Publication Publication Date Title
Beissinger et al. Marker density and read depth for genotyping populations using genotyping-by-sequencing
Comeron et al. The many landscapes of recombination in Drosophila melanogaster
JP2017535271A5 (en)
Addo-Quaye et al. Whole-genome sequence accuracy is improved by replication in a population of mutagenized sorghum
Becher et al. Patterns of genetic variability in genomic regions with low rates of recombination
Rettelbach et al. How linked selection shapes the diversity landscape in Ficedula flycatchers
US20220277807A1 (en) Methods and systems for assessing genetic variants
Till et al. TILLING: the next generation
Huang et al. Inferring genome-wide correlations of mutation fitness effects between populations
Gileta et al. Adapting genotyping-by-sequencing and variant calling for heterogeneous stock rats
US11970733B2 (en) Methods for analyzing nucleic acid sequences
Maddamsetti et al. Synonymous genetic variation in natural isolates of Escherichia coli does not predict where synonymous substitutions occur in a long-term experiment
CN115148281B (en) Automatic design method and system for gene editing point mutation scheme
CA3069749A1 (en) Systems and methods for targeted genome editing
CN111349651A (en) Arabidopsis hsd1/2/3/4/5/6 hexamutant and construction method and application thereof
Miskel et al. The cell cycle stage of bovine zygotes electroporated with CRISPR/Cas9-RNP affects frequency of loss-of-heterozygosity editing events
CN111128303B (en) Method and system for determining corresponding sequences in a target species based on known sequences
Ko et al. Characterization of cetacean Numt and its application into cetacean phylogeny
Jiang et al. Identification and characterization of presence/absence variation in maize genotype Mo17
CN112980881A (en) Construction method and application of Arvcf gene knockout animal model
CN110959178B (en) Systems and methods for targeted genome editing
Flagel et al. A synthesis of mapping experiments reveals extensive genomic structural diversity in the Mimulus guttatus species complex
AU2021101343A4 (en) A method for analysis and interpretation of crop bioinformatics repeats sequence pattern
Shpak et al. The Precision and Power of Population Branch Statistics in Identifying the Genomic Signatures of Local Adaptation
Ptacek et al. A tiered approach to comparative genomics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant