CN116665780A - gRNA, plasmid and primer design system - Google Patents

gRNA, plasmid and primer design system Download PDF

Info

Publication number
CN116665780A
CN116665780A CN202310619371.7A CN202310619371A CN116665780A CN 116665780 A CN116665780 A CN 116665780A CN 202310619371 A CN202310619371 A CN 202310619371A CN 116665780 A CN116665780 A CN 116665780A
Authority
CN
China
Prior art keywords
grna
design
primer
target
plasmid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310619371.7A
Other languages
Chinese (zh)
Inventor
罗小舟
杨统见
凌路頔
谢尚波
杰·基斯林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202310619371.7A priority Critical patent/CN116665780A/en
Publication of CN116665780A publication Critical patent/CN116665780A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly

Abstract

The invention discloses a gRNA, plasmid and primer design system, and belongs to the field of bioinformatics. The invention combines open source code, python script and Primer3.0 of the gRNA-based online design software CHOPCHOP to realize the design of gRNA and the design of plasmids and primers aiming at target genes according to any given genome and vector, reduces the limitation on research objects, and can effectively lead users to carry out CRISPRi related researches on more different species; the invention realizes the CRISPRi design of single or multiple target genes in batches and directly outputs the complete plasmid map containing gRNA and the vector, thereby facilitating the comparison and the traceability of sequencing data in the subsequent cloning construction. The system of the invention realizes rapid one-stop provision of all primer and plasmid maps required by downstream wet experiments, effectively reduces the working efficiency of scientific researchers, and saves time and economic cost.

Description

gRNA, plasmid and primer design system
Technical Field
The invention relates to a gRNA, plasmid and primer design system, in particular to a gRNA, plasmid and primer design system of CRISPR and a derivative technology thereof, such as CRISPRi, CRISPRa, CRISPR-based single base editing and the like, and belongs to the field of bioinformatics.
Background
CRISPR and its derivative technology can be widely used in gene/genome editing and gene expression level regulation, etc., and has high efficiency, and is convenient and quick, for example, CRISPR can be used for realizing high-efficiency gene knockout, knock-in, replacement, etc., CRISPRi can be used for realizing gene expression knockdown, CRISPRa can be used for realizing gene expression activation, and CRISPR base editor can be used for realizing gene site-directed mutation. However, to ensure efficient operation of CRISPR and its derivative tools, a suitable Cas protein (recognizing different PAM sequences) needs to be selected to adapt to the specific genetic characteristics of the target host, such as different Cas proteins with different operating efficiencies in different hosts, while high quality sgrnas can avoid off-target effects. Therefore, in constructing CRISPR and its derivative plasmids, it is generally necessary to design high quality gRNA for constructing primers for the plasmid, and to draw a target plasmid map that can be used to view and analyze experimental results, such as by comparing the target plasmid map to sequencing data. On the other hand, in recent years, technologies and devices capable of automatically performing various links of design, construction, test, and learning (DBTL) in synthetic biology in high throughput through multidisciplinary cross-development, and constructing them into large-scale "bio-foundries" to realize laboratory automation and higher level reproducibility have been one of long-term targets for development of synthetic biology. Rapid high throughput completion of relevant plasmid construction designs is necessary for the development of high throughput CRISPR, CRISPRi and like related studies using automated platforms.
Patents of the prior art relating to CRISPR include: (CN 20201154641.9) a method for simultaneously realizing gene editing and transcription regulation by using an I-type CRISPR-Cas system mainly uses CRISPR technology to perform two functions of new application superposition gene editing and transcription regulation; (cn201980052855. X) amplification methods, systems and diagnostics based on CRISPR effect systems are mainly used to expand the nucleic acid detection scenario of CRISPR in clinical diagnostics.
Still other gRNA design software has been developed, such as CHOPCHOP, which is often only used for individual genomes, and does not involve subsequent primer and plasmid map designs, thus requiring significant time and effort from the user to complete primer design using other software, such as snapgene, which is inefficient and detrimental to subsequent automated and high throughput experimental operations.
The problems of the prior art mainly have four aspects: firstly, the gRNA is generally set only for a limited genome, and the gRNA design cannot be carried out by directly uploading the genome to the genome except for a database referenced by software, so that the gRNA is required to be uploaded through a software developer, the process is complicated, and the efficiency is low; secondly, the method only can help predict the original gRNA for evaluation, can not provide the primer and plasmid map information required for constructing the plasmid, requires a user to complete subsequent work through other ways or methods, has poor efficiency, takes time and labor, and is not beneficial to the construction of subsequent wet experiments; thirdly, the gRNA design can be carried out only aiming at one target point at a time, and the function of carrying out batch gRNA, primer and plasmid map design on a plurality of target points is not provided; fourth, the existing software is mainly an online version.
Disclosure of Invention
Aiming at the problems, the invention expects to develop a set of gRNA, plasmid and primer design which can allow clients to give genome, give master set carrier and allow batch production of a plurality of targets, and output results, besides gRNA and quality evaluation, also provides a primer sequence for plasmid construction and a plasmid map which is convenient for sequencing data comparison and traceability in subsequent clone construction; the tool has two versions of on-line version and off-line version, and is greatly convenient for users to use at any time and any place.
The invention aims to develop a gRNA, plasmid and primer design system. In order to achieve the compatibility of a system and the one-stop convenient design, the invention uses an open source code based on the gRNA online design software CHOPCHOP, a given genome, a given vector and a single target are designed by using a bash and python script in a Linux operation system, and a plurality of gRNAs, plasmids and primers are designed for a single target/a plurality of targets are designed in batches, the target positioning allows the CRISPR and a derivative system thereof to be selectively based on two Cas proteins spCas9 and Cas12a (Cpf 1) according to unique locus tag (locus_tag) or specific base position information so as to meet different design requirements; the output compressed package provides a gRNA and quality assessment, a plasmid map containing annotation information, primers required for construction of the plasmid, and the like. The system has two modes, an online version of jupyter deployment and an offline version of GUI packaging.
The invention provides a gRNA, plasmid and primer design system, which comprises a genome adding and preprocessing module, a gRNA design and evaluation module, a primer design module and a plasmid map processing module:
(1) The genome adding and preprocessing module is used for uploading a genome and constructing index for the newly added genome;
(2) The gRNA design and evaluation module is used for designing and screening the gRNA sequence of the target genes of the genome and analyzing factors affecting plasmid construction and working efficiency;
(3) The primer design module is used for designing a primer sequence for amplifying the fragment containing the cross-lap gRNA and a primer for amplifying the linearized carrier fragment;
(4) The plasmid map processing module is used for drawing a complete plasmid map annotated with gRNA and primer information.
In one embodiment, the gRNA design and evaluation module, the user sets Cas protein type, PAM sequence, length of gRNA sequence, algorithm model for designing gRNA, design region of gRNA and design positive and negative strand of gRNA as required, and analyzes off-target effect.
In one embodiment, the primer design module comprises a gRNA primer design module and a carrier primer design module.
In one embodiment, the gRNA primer design module ligates an overlap homologous upstream and downstream of the linearized vector end on either side of the gRNA sequence, generating a complete fragment, and generating a complementary strand.
In one embodiment, the primer is upstream and downstream in sequence from the 3 'end to the 5' end of the primer.
In one embodiment, the vector primer design module is used for designing primers for PCR amplification linearization of the vector backbone.
In one embodiment, the system includes an online version or an offline version.
In one embodiment, the online version is deployed by jupyter.
In one embodiment, the local version is packaged as a graphical operation interface (GUI) by a pyinstalller.
The invention provides a method for designing high-quality gRNA, plasmids and primers in batches, which comprises the following specific steps:
(1) Inputting or selecting a target genome file, a linearization vector skeleton file and a target file of a target gene from a local database;
(2) Parameter setting is carried out on gRNA design and primer design;
(3) Designing and screening according to the parameters of the step (2) to obtain gRNA;
(4) Designing and/or evaluating an upstream and downstream overlay;
(5) Designing a gRNA primer according to the gRNA generated in the step (3);
(6) Designing a carrier primer according to the linearization carrier skeleton file in the step (1);
(7) Generating a plasmid map.
In one embodiment, in step (1), the target genomic file allows the user to upload the genomic file from the local to the server, or allows the user to input NCBIAssemblyAccession ID.
In one embodiment, in step (1), the user is allowed to upload the linearized carrier file from local to the server, generating a local database.
In one embodiment, in step (1), the user is allowed to upload the txt formatted target file from the local to the server, creating a local database.
In one embodiment, the target file contains a single or multiple targets.
In one embodiment, the target may be a unique locus_tag, or may be a specific base position.
In one embodiment, multiple targets are separated by a table.
In one embodiment, in step (2), the user sets Cas protein type, PAM sequence, length of gRNA sequence, algorithm model for designing gRNA, design region of gRNA, design positive and negative strands of gRNA, number of gRNA designed per target point, and algorithm model for evaluating quality of gRNA as required.
In one embodiment, in step (2), an overlap sequence is entered, or the length of the overlap is set.
In one embodiment, in step (2), the user sets the number of carrier primers, length, tm value, GC content, and number of 3' terminal repeat bases as desired.
In one embodiment, in step (3), the design and screening of the gRNA is accomplished using CHOPCHOP and python scripts.
In one embodiment, in step (3), the gRNA is designed according to the parameters of step (2) using a chop module.
In one embodiment, in step (3), the target of the target gene is located using a Python script.
In one embodiment, in step (3), the off-target effect is analyzed using an algorithmic model (Algorithm) that evaluates the quality of the gRNA.
In one embodiment, the off-target effect comprises GC content, self-complementarity, and the number of other binding sites on the genome to the gRNA sequence.
In one embodiment, in step (4), the overlay is designed using a Python script.
In one embodiment, in step (4), the evaluation of the length, GC content and free energy of the sequence of the upstream and downstream overlap is achieved using primer 3.0.
In one embodiment, in step (5), the design of the gRNA primer is accomplished using a Python script.
In one embodiment, in step (5), the downstream overlap and the gRNA screened in step (3) are spliced into a complete primer using a Python script and complementary strands are generated.
In one embodiment, the primer is upstream and downstream in sequence from the 3 'end to the 5' end of the primer.
In one embodiment, in step (6), primers are generated using Primer3.0 for PCR amplification of the vector backbone.
In one embodiment, in step (7), mapping of the complete plasmid map of the gRNA and vector is accomplished by a python script.
The present invention also provides a computer device programmed to perform the steps of the above method or a storage medium of the computer device having stored therein a computer program programmed to perform the above method.
The beneficial effects are that:
the invention develops a gRNA, plasmid and primer design system, combines an open source code, a python script and Primer3.0 of gRNA-based online design software CHOPCHOP, and enables a user to design gRNA and design plasmid and primer aiming at target genes according to any given genome and vector by uploading a genome file, linearizing a vector skeleton file and a target file by oneself, thereby reducing the limitation on research objects and effectively enabling the user to perform CRISPRi related research on more different species; the system allows a user to upload target files containing multiple target information separated by a table to realize CRISPRi design of single or multiple target genes in batches, wherein the target information comprises a DNA fragment name, an editing site name (locus_tag), and a specific base position; performing quality evaluation on the output multiple gRNAs through a CHOPCHOP built-in algorithm model so as to facilitate user selection; and the drawing of the complete plasmid map containing the gRNA and the vector is realized through the python script, so that the comparison and the tracing of sequencing data in the subsequent clone construction are convenient. Meanwhile, a visual operation interface is provided, and the method has the advantages of convenience in use of online versions and offline versions and the like. The system of the invention realizes rapid one-stop provision of all primer and plasmid maps required by downstream wet experiments, effectively reduces the working efficiency of scientific researchers, and saves time and economic cost.
Drawings
FIG. 1 is a schematic diagram of a gRNA, plasmid and primer design system;
FIG. 2 shows an on-line version of the operator interface and output data for a gRNA, plasmid and primer design system;
FIG. 3 is an off-line version of the operator interface for a gRNA, plasmid and primer design system;
FIG. 4 is a schematic diagram of a target file, wherein "ChrID" is a DNA fragment name, "Gene ID" is an editing site name, "Start" and "End" information are specific base position information on the DNA fragment, respectively.
Detailed Description
The following provides definitions of some of the terms used in this specification. Unless otherwise defined, technical and scientific terms used in the following examples have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Meanwhile, in order to better understand the present invention, definitions and explanations of related terms are provided below.
The term "guide RNA" as used herein, the terms "guide RNA", "guide sequence", "guide RNA" are used interchangeably and have the meaning commonly understood by a person skilled in the art. In general, the gRNA can comprise a sequence that binds to a Cas protein and a specific targeting sequence (target site) of about 20nt, which can bind to the Cas protein.
The terms "overlap", "overlap sequence", "overlap end", "overlap region" as used herein are used interchangeably herein to join base sequences overlapping or overlapping a plasmid or vector at both ends of the resulting gRNA sequence as designed in the present invention, which can then be constructed into the plasmid or vector of interest by homologous recombination. The length may be defined by the user according to actual needs.
The term "vector" as used herein is a nucleic acid molecule capable of transporting another nucleic acid molecule linked thereto. One type of vector is a "plasmid", which refers to a circular double stranded DNA loop into which additional DNA fragments may be inserted, for example, by standard molecular cloning techniques. In the present invention, the vector may be ligated by homologous recombination with the ovap-containing gRNA sequence after linearization.
The terms "Cas protein", "Cas nuclease" as used herein are used interchangeably herein to include at least one domain that interacts with a gRNA, having DNA endonuclease activity. Cleavage of the target sequence can be achieved by directing Cas protein to the target sequence via gRNA. Cas may be selected from the group consisting of type II CRISPR-Cas systems, i.e., single protein Cas nucleases, e.g., cas9 protein and Cpf1 protein.
Example 1
The system comprises 4 modules, namely a genome adding and preprocessing module, a gRNA designing and evaluating module, a primer designing module and a plasmid map processing module.
(1) And the genome newly-adding and preprocessing module is used for uploading a genome, constructing index for the newly-added genome, and preparing data for subsequent call CHOPCHOP design gRNA. The genome may be uploaded from local or downloaded into the system from the NCBI database.
(2) The gRNA design and evaluation module is used for designing and screening a gRNA sequence of a target gene of a target genome, designing and screening the target on a designated genome according to user requirements, providing related information of the designed gRNA sequence and evaluating quality including the sequence, PAM sequence and specific base position and positive and negative strands of the gRNA, and analyzing factors which can influence plasmid construction and working efficiency, namely off-target effect, including GC content, self-complementarity of the sequence and the number of other binding sites on the genome.
On the design rule of the gRNA, a user selects the type of Cas protein according to the requirement; filling in a designated PAM sequence; setting the length of the gRNA sequence; selecting an algorithm model for designing the gRNA; selecting a design region of the gRNA according to the detail base position information of the locus_tag of the target gene of the target genome or the target gene of the target genome; the gRNA design is specified from either the sense or antisense strand. At the same time, to ensure efficient operation of the relevant CRISPR tool, a user is allowed to design multiple gRNA sequences for a single target and specify gRNA designs from either the sense or antisense strand.
(3) The primer design module is used for designing a primer sequence for PCR amplification of the gRNA fragment containing the overlap so as to construct a plasmid for gene editing, and particularly comprises a gRNA primer design module and a carrier primer design module.
(a) gRNA primer design Module: and (3) designing primers for PCR amplification of the gRNA fragment containing the overlap, namely adding the overlap homologous to the upstream and downstream of the end of the linearization vector at two sides of the gRNA sequence according to the gRNA sequence transmitted by the gRNA evaluation module, splicing the oligonucleotide sequences of the left overlap-gRNA-right overlap, and generating complementary strands to obtain the primer pair.
(b) Carrier primer design module: the primer used for designing PCR amplification to obtain the linearized vector skeleton is designed by Primer3.0, and the Tm value, GC content and length of the primer are given.
(4) Plasmid map processing module: the method is used for simulating the complete plasmid after the gRNA is connected with the vector skeleton, and drawing a complete plasmid map, wherein the plasmid map is annotated with gRNA and primer information.
Example 2
The invention develops a gRNA, plasmid and primer design system, and the online version is operated on a Linux operating system by utilizing the mirror image function of a remote server. The operation of the system comprises three stages, namely a data input and parameter setting stage, a data processing and analyzing stage and a data output stage, wherein the online version is deployed by using a jupyter as shown in figure 1, and the operation flow of the online version is as follows:
(1) In the data input and parameter setting stage:
s001 user-defined input file:
(a) The genome file of interest (InputGenome), allows the user to upload the genome file from local to under the genome folder of interest under the jupyter service (InputGenome /). Allowing user input NCBI Assembly Accession ID. The user can then select a genome file from the genome folder in the runme.ipynb application main interface, preparing data for subsequent calls to CHOPCHOP for gRNA design.
(b) A linearized carrier skeleton file (backup_vector) allows a user to upload the linearized carrier skeleton file from local to under the linearized carrier skeleton folder under the jupyter server (backup_vector /). The user is then allowed to select a linearized carrier skeleton from under the folder in the runme. Ipynb application main interface.
(c) The target file (TargetLists) based on the target gene allows a user to upload the target file in txt format to the target folder (TargetLists /) of the target gene under the jupyter service, wherein the target file comprises single or multiple targets, and the targets can be unique locus_tags (figure 4 a) or specific base positions (figure 4 b). The user is then allowed to select a target file from the folder in the runme. Ipynb application main interface.
S002 parameter setting: clicking RUNME.ipynb to enter a specific parameter setting interface to perform specific parameter setting:
(a) Parameter settings for gRNA design
(1) Cas protein type (Model);
(2) PAM sequence (PAM);
(3) Length of gRNA (spacer_length);
(4) gRNA design area (Upstream/Downstream);
(5) Designing the positive and negative Strand of gRNA (gRNA_Strand);
(6) Designing the quantity (gRNA_number) of gRNA for each target point;
(7) An algorithmic model (Algorithm) for designing and evaluating gRNA quality;
(b) Parameter setting for primer design
An Overlap sequence (overlay_left/overlay_right) on the gRNA primer;
the number, length, tm, GC content, number of 3' -terminal repeat bases of the vector primer.
Selecting the genome file in the step (a) in S001 and the linearization vector skeleton file in the step (b), and based on the target file of the target gene in the step (c) in S001 and the parameters set in S002, filling in the compressed file name output by the result to click RUN operation, so as to perform data processing analysis.
(2) Data processing analysis stage:
s003 gRNA design and screening, wherein CHOPCHOP and python script are used for realizing the gRNA design and screening, and the step of designing the gRNA sequence according to the genome file of S001 and parameters set by S002 user and designing three by default; providing related information and quality evaluation of the designed gRNA, wherein the related information and quality evaluation comprises a gRNA sequence, a PAM sequence, specific base positions of a target gene and positive and negative chains; off-target effects were analyzed using an algorithmic model (Algorithm).
The gRNA is designed by using CHOPCHOP according to the parameters set by the user in S002 and the design rules of the CHOPCHOP.
And positioning a target point of a target gene based on the target point file provided by S001 by using the python script, positioning the target point according to the locus_tag name of the gene or providing detailed base position information of the target point on a genome, and allowing a user to conduct gRNA design in a designated position area of the gene by taking the first base of the gene as a +1 position reference when selecting the former (locus_tag name).
The Algorithm model (Algorithm) is a module built in CHOPCHOP, and is used for analyzing GC content, self-complementarity and the number of other binding sites on the genome of the gRNA sequence.
Evaluation of upstream and downstream overlap in S004, evaluation analysis of the length, GC content, and free energy of the sequence of upstream and downstream overlap was performed using primer 3.0.
And S005 gRNA primer design, namely realizing the design of the gRNA primer by using a python script, respectively connecting the upstream and downstream overlapping on two sides of the gRNA sequence screened in the S003, splicing into a complete fragment, and generating a complementary strand to obtain a primer pair.
Design of S006 vector primer under the primer design parameters given by S002 user, primer3.0 is used to generate multiple high quality primers for user selection for PCR amplification linearization vector backbone.
Splicing of S007 plasmid maps, drawing of a complete plasmid map containing gRNA and vector was achieved by python script, including annotating the plasmid map with the gRNA designed in S003, the upstream and downstream overlap designed in S004, the gRNA primer designed in S005, and the vector primer designed in S006.
(3) Data output stage
And in the data output stage, returning to the main interface, clicking the Result, downloading the Result to the local through the download, and decompressing the file to obtain the required gRNA, primer and plasmid map information file. Storing primer information required for constructing a target plasmid in a_gp.txt file, wherein the file provides a target name corresponding to the primer, a position on a genome, and positive and negative chains, sequences, lengths and GC contents of a designed gRNA; the grna_result file provides subfolders of gRNA predictions for each target, named target name_result.txt file, which provides all target sequences (containing PAM sequences) for target predicted gRNA, positional information (base position on genome, plus and minus strands), and quality assessment thereof (GC content, potential self-complementarity of sequences, off-target effects and comprehensive efficiency analysis); the Genbank format plasmid map file, in which gRNA is annotated, along with primer information.
Example 3
The invention develops a gRNA, plasmid and primer design system, and the GUI packaged local version software can be operated in a conventional Windows system. The operation of the system comprises three stages, namely a data input and parameter setting stage, a data processing and analyzing stage and a data output stage, wherein the local edition uses GUI packaging to realize the visualization operation, and the operation flow of the local edition software is as follows, as shown in figure 3:
(1) In the data input and parameter setting stage:
s001 user-defined input file:
(a) A Genome file (Genome) of interest, allowing the user to upload the Genome of interest from local or input NCBI Assembly Accession ID;
(b) A CRISPR (hybrid family planning and security feature) linearization carrier skeleton file (Vector), wherein a main interface allows a user to upload the linearization carrier skeleton file, and the uploaded linearization carrier skeleton file is stored in a server to form a linearization carrier skeleton database; allowing a user to select a carrier skeleton file of a target from a linearized carrier skeleton database;
(c) Allowing a user to upload target files based on target files (configuration) of the target genes, and storing the uploaded target files in a server to form a target database; allowing a user to select a target file of a target from a target database. The target file is a text file with table separation, can be a unique locus_tag (figure 4 a) or a specific base position (figure 4 b), and the target design can be batch;
s002 parameter setting: the set parameters include target-related parameters, gRNA-related parameters, and primer-related parameters:
1. target related parameters: a Target area Type (target_type), a Target Upstream/Downstream section (Upstream/Downstream);
2. gRNA related parameters: CRISPR tool Type (design_type), cas protein Type (Model), PAM sequence (PAM), gRNA length (spacer_length), location of gRNA insertion into vector (insert_site), whether or not to evaluate gRNA (score_gc), algorithm to evaluate gRNA (screening_method), number of designed gRNA (grna_num), maximum number of off-target (max_off target), maximum number of mismatched bases of gRNA (max_mismatch);
3. primer related parameters: the length of upstream and downstream Overlap (Overlap_left/Overlap_Right), the length of the carrier primer, the range of GC content, the method and range of Tm calculation, the maximum allowable number of repeated bases at the 3' end of the primer, the number of primer designs and the concentration of PCR reaction ions.
(2) Data processing analysis stage
S003 gRNA design and screening, wherein CHOPCHOP is used for realizing the design of gRNA, and the python script is combined for screening the gRNA meeting the requirements; this stage is mainly to design and screen out the gene on the designated genome for the expected gRNA sequence, give the relevant information of the designed gRNA and quality assessment, including the sequence of gRNA, PAM sequence, specific base position of target, positive and negative strand, and analyze factors that may affect plasmid construction and working efficiency, including GC content of the sequence, self-complementarity, and the number of other binding sites on the genome, i.e. off-target effect.
And designing the gRNA according to the gRNA related parameters given by the S002 user and the design rules of the CHOPCHOP.
And positioning a target point based on the target point file provided by S001 by using the python script, and positioning the target point according to the locus_tag name of the gene or providing detailed base position information of the target point on the genome, wherein when the former is selected, the first base of the gene is used as +1 position reference, so that a user is allowed to carry out gRNA design in a designated position area of the gene.
The algorithm (screening_method) for evaluating gRNA is a module built in CHOPCHOP, and factors that may affect efficiency, including GC content, self-complementarity, and the number of other binding sites on the genome, i.e., off-target effects, are analyzed using the selected algorithm.
S004, designing and evaluating the upstream and downstream overlasps by using a python script and Primer3.0; the python script is used for acquiring the overlap according to the primer related parameters set in the S002, and Primer3.0 is used for realizing the length, GC content and free energy assessment analysis of the sequence of the upstream and downstream overlap.
And S005 gRNA primer design, namely realizing the design of the gRNA primer by using a python script, respectively connecting the upstream and downstream overlapping transmitted by S004 at two sides of the gRNA sequence screened by S003, splicing into a complete fragment, and generating a complementary strand to obtain a primer pair.
Design of S006 vector primer high quality primers were generated using Primer3.0 under the primer design parameters given by the user in S002.
Splicing of S007 plasmid maps, drawing of a complete plasmid map containing gRNA and vector was achieved by python script, including annotating the plasmid map with the gRNA designed in S003, the upstream and downstream overlap designed in S004, the gRNA primer designed in S005, and the vector primer designed in S006.
(3) Data output stage
In the data output stage, the processing result is directly output to a local designated path.
The output content mainly comprises: 1) The method comprises the steps of constructing primer information required by a target plasmid, storing the primer information in a final_CRISPRi_PrimerSeq.xlsx file in a final_result, wherein the file provides a target name corresponding to a primer, a position on a genome, a positive and negative strand of a designed gRNA, a sequence, a length and GC content of the designed gRNA; 2) The gRNA_result file provides a gRNA prediction result for each target point, the subfolder is named as a target point name_gRNA.xls file, the file provides all target sequences (containing PAM sequences) of the target point prediction gRNA, position information (base position on genome, positive and negative strands) and quality assessment (GC content, potential self-complementarity of sequences, off-target effect and comprehensive efficiency analysis) thereof; 3) A plasmid map file in Genbank format, in which gRNA, and primer information are annotated; 4) In addition, there are original input data file, parameters of program execution, input data record configuration file grna.conf, and program operation record file run.log.
While the invention has been described with reference to the preferred embodiments, it is not limited thereto, and various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (25)

1. A gRNA, plasmid and primer design system, which is characterized by comprising a genome adding and preprocessing module, a gRNA design and evaluation module, a primer design module and a plasmid map processing module:
(1) The genome adding and preprocessing module is used for uploading a genome and constructing index for the newly added genome;
(2) The gRNA design and evaluation module is used for designing and screening the gRNA sequence of the target genes of the genome and analyzing factors affecting plasmid construction and working efficiency;
(3) The primer design module is used for designing a primer sequence for amplifying the fragment containing the cross-lap gRNA and a primer for amplifying the linearized carrier fragment;
(4) The plasmid map processing module is used for drawing a complete plasmid map annotated with gRNA and primer information.
2. The gRNA, plasmid and primer design system of claim 1, wherein the gRNA design and evaluation module is configured to set Cas protein type, PAM sequence, length of gRNA sequence, algorithm model for gRNA design, design region of gRNA and positive and negative strand of gRNA design as required by a user and analyze off-target effects.
3. The gRNA, plasmid, and primer design system of claim 1, wherein the primer design module comprises a gRNA primer design module and a vector primer design module: the gRNA primer design module is used for respectively connecting overlapping homologous with the upstream and downstream of the end of the linearization carrier at two sides of the gRNA sequence to generate a complete fragment and a complementary strand; the carrier primer design module is used for designing primers of a PCR amplification linearization carrier skeleton.
4. A gRNA, plasmid and primer design system according to claim 1, characterized in that the system of any one of claims 1-3 comprises an on-line version or an off-line version.
5. The gRNA, plasmid, and primer design system of claim 4, wherein the online version is deployed by jupyter; and the local version program is packaged into a graphical operation interface.
6. A method for designing high-quality gRNA, plasmids and primers in batches is characterized by comprising the following specific steps:
(1) Inputting or selecting a target genome file, a linearization vector skeleton file and a target file of a target gene from a local database;
(2) Parameter setting is carried out on gRNA design and primer design;
(3) Designing and screening according to the parameters of the step (2) to obtain gRNA;
(4) Designing and/or evaluating an upstream and downstream overlay;
(5) Designing a gRNA primer according to the gRNA generated in the step (3);
(6) Designing a carrier primer according to the linearization carrier skeleton file in the step (1);
(7) Generating a plasmid map.
7. The method of claim 6, wherein in step (1), the target genomic file allows the user to upload the genomic file from the local to the server or allows the user to input the NCBIAssemblyAccessionID.
8. The method of claim 6, wherein in step (1), the user is allowed to upload the linearized carrier file from local to the server to generate a local database.
9. The method of claim 6, wherein in step (1), the user is allowed to upload the target file of the target gene in txt format from the local to the server to generate the local database.
10. The method of claim 9, wherein the target file contains a single or multiple targets.
11. The method of claim 9, wherein the target is a unique locus_tag, or base position.
12. The method of claim 6, wherein in step (2), the user sets Cas protein type, PAM sequence, length of gRNA sequence, algorithm model for designing gRNA, design region of gRNA, design positive and negative strands of gRNA, number of gRNA designed per target point and algorithm model for evaluating quality of gRNA according to the need.
13. The method of claim 6, wherein in step (2), an overlap sequence is input, or the length of the overlap is set.
14. The method according to claim 6, wherein in the step (2), the number, length, tm value, GC content and 3' -terminal repeat base number of the carrier primer are set.
15. The method of claim 6, wherein in step (3), the design and screening of the gRNA is accomplished using CHOPCHOP and python scripts.
16. The method of claim 15, wherein in step (3), the gRNA is designed according to the parameters of step (2) using a CHOPCHOP module.
17. The method of claim 15, wherein in step (3), the target of the target gene is located using a Python script.
18. The method of claim 15, wherein in step (3), the off-target effect is analyzed using the algorithmic model set forth in step (2) to evaluate the quality of the gRNA.
19. The method of claim 18, wherein the off-target effect comprises GC content, self-complementarity, and number of other binding sites on the genome for the gRNA sequence.
20. The method of claim 6, wherein in step (4), the evaluation of the length, GC content, and free energy of the sequence of the upstream and downstream overlap is achieved using primer 3.0.
21. The method of claim 6, wherein in step (5), the design of the gRNA primer is accomplished using Python script.
22. The method of claim 6, wherein in step (5), the downstream overlap and the gRNA screened in step (3) are spliced into complete primers using Python script and complementary strands are generated.
23. The method of claim 6, wherein in step (6), primer3.0 is used to generate primers for PCR amplification of the vector backbone.
24. The method of claim 6, wherein in step (7), mapping of the complete plasmid map of the gRNA and linearized vector backbone is achieved by Python script.
25. A computer device programmed to perform the steps of the method of any one of claims 8 to 26 or having stored in a storage medium thereof a computer program programmed to perform the method of any one of claims 8 to 26.
CN202310619371.7A 2023-05-30 2023-05-30 gRNA, plasmid and primer design system Pending CN116665780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310619371.7A CN116665780A (en) 2023-05-30 2023-05-30 gRNA, plasmid and primer design system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310619371.7A CN116665780A (en) 2023-05-30 2023-05-30 gRNA, plasmid and primer design system

Publications (1)

Publication Number Publication Date
CN116665780A true CN116665780A (en) 2023-08-29

Family

ID=87718439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310619371.7A Pending CN116665780A (en) 2023-05-30 2023-05-30 gRNA, plasmid and primer design system

Country Status (1)

Country Link
CN (1) CN116665780A (en)

Similar Documents

Publication Publication Date Title
Venturini et al. Leveraging multiple transcriptome assembly methods for improved gene structure annotation
Konstantakos et al. CRISPR–Cas9 gRNA efficiency prediction: an overview of predictive tools and the role of deep learning
Davis et al. ApE, a plasmid editor: a freely available DNA manipulation and visualization program
Loots et al. rVISTA 2.0: evolutionary analysis of transcription factor binding sites
US20170140093A1 (en) Methods and systems for in silico design
US9465519B2 (en) Methods and systems for in silico experimental designing and performing a biological workflow
Zhou et al. In silico Whole Genome Sequencer and Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies
Peri et al. Read mapping and transcript assembly: a scalable and high-throughput workflow for the processing and analysis of ribonucleic acid sequencing data
Li et al. Foster thy young: enhanced prediction of orphan genes in assembled genomes
Chen et al. Plastaumatic: Automating plastome assembly and annotation
Villalobos et al. In silico design of functional DNA constructs
CN116665780A (en) gRNA, plasmid and primer design system
CN113571131A (en) Pangenome construction method and corresponding structural variation mining method
Torkamaneh et al. DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing
Fiaux et al. Discovering functional sequences with RELICS, an analysis method for CRISPR screens
CN112164424B (en) Group evolution analysis method based on no-reference genome
Etherington et al. A Galaxy-based training resource for single-cell RNA-sequencing quality control and analyses
Saski et al. BAC sequencing using pooled methods
CN108388771A (en) A kind of bio-diversity automatic analysis method
CN111429967A (en) Processing method of Pacbio third-generation sequencing data
Coradini et al. Building synthetic chromosomes from natural DNA
CN113257348A (en) Macro-transcriptome sequencing data processing method and system
CN117316287A (en) CRISPR plasmid and primer design system
CN112582024B (en) Construction method, system and platform of gene site-specific knock-in vector
Christen et al. Genome partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination