CN106446600A - CRISPR/Cas9-based sgRNA design method - Google Patents
CRISPR/Cas9-based sgRNA design method Download PDFInfo
- Publication number
- CN106446600A CN106446600A CN201610341946.3A CN201610341946A CN106446600A CN 106446600 A CN106446600 A CN 106446600A CN 201610341946 A CN201610341946 A CN 201610341946A CN 106446600 A CN106446600 A CN 106446600A
- Authority
- CN
- China
- Prior art keywords
- sgrna
- cas9
- value
- model
- specially
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a CRISPR/Cas9-based sgRNA design method. The method is characterized by comprising the following steps of obtaining the value of enzyme digestion efficiency of sgRNA and corresponding Cas9; building a personalized sgRNA design model; measuring the quality of the built personalized sgRNA design model by applying an NDCG algorithm and updating a database; and designing sgRNA and giving out an assessment value of each sgRNA. Compared with the prior art, the method has the characteristics of high accuracy, complete characteristics, wide application range and wide data analysis range.
Description
Technical field
The present invention relates to gene editing research field, especially a kind of based on CRISPR/Cas9 gene editing technology
The method for designing of sgRNA.
Background technology
With the development of molecular biology, people have had more further understanding for the constitution element of life, but raw
The curative mechanism of the mechanism of life process, especially some disease there is also a lot of not understanding.Relation between gene and phenotype, gene
And influencing each other between gene, in the urgent need to a kind of engineering that in vivo can quickly knock out and insert gene.
CRISPR/Cas9 system occurs at once, meets this demand of researcher.
CRISPR/Cas9 system (Clustered regularly interspaced short palindromic
Repeats/CRISPR-associated protein 9) it is a kind of simple to operate, applicability extensive gene editing instrument.
Whole system is mainly risen by a nucleic acid cleaving enzymatic (Cas9) and one and guides the RNA (sgRNA) of recognition reaction to form.sgRNA
Identified with target gene site by base pair complementarity, then recruit Cas9 and be digested, produce double-strand break, thus realize
The gene editing of DNA level.It because its applicability is wide, convenient and time-saving, is quickly applied to various aspects, especially builds at cancer model
Vertical and gene therapy probe into aspect, have very big superiority.
But, discovery in the constantly exploration of scientist, the different sgRNA's designing for same gene in same cell
Digesting efficiency has very big difference, if high efficiency sgRNA can not be designed, can only be made up by increasing concentration, so will
Can a lot of gene rubbish be brought to cell, produce a high proportion of missing the target simultaneously, give scientific research personnel research bring very big not
Just, the sgRNA therefore designing a high digesting efficiency is extremely important for the research in terms of gene.
At present, the design software of existing sgRNA has nearly 30 kinds, is broadly divided into two classes:One class is to sum up from experiment
Some of sgRNA are regular, and PAM sequence must be contained in the sgRNA sequence one end for example matched, and 5 ' ends should be GG, G/C content
Should be maintained at about 60%, Seed Sequences can't stand mispairing etc., is then directly screened by arranging condition,;Another kind of master
To calculate the specific of sgRNA by using statistical method to give a weight to each base, such as CRISPR
Design.What the software of both types was all set up is the model of a versatility, yet with different plant species and different cell
Between have very big heterogeneity, cause the prediction efficiency of existing software to be not very well, and because different under different experimental conditions
The digesting efficiency to sgRNA for the matter has a certain impact, and general model evaluation accuracy rate ratio is relatively low.
Accordingly, it is considered to the heterogeneity between different platform species data, set up individual by the data of different platform or species
Property model to improve the specific of sgRNA and high efficiency, extremely heavy for the miss the target research of problem of CRISPR/Cas9 system
Want.
Content of the invention
It is an object of the invention to for the problems referred to above provide that a kind of accuracy rate is high, applied range based on CRISPR/
The method for designing of the sgRNA of Cas9.
For realizing purpose of the present invention, the present invention provides the method for designing of a kind of sgRNA based on CRISPR/Cas9,
The method comprises the following steps:
1) obtain the value of the digesting efficiency of sgRNA and corresponding Cas9, be specially:
11) from document, obtain the value of the digesting efficiency of sgRNA and corresponding Cas9;
12) from SRA database, obtain sgRNA, calculate the value of the digesting efficiency obtaining corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data that get be categorized into difference
Reference genome, each with reference to genome is listed a first be classified as sgRNA title, second be classified as sgRNA sequence with
And the 3rd form of digesting efficiency being classified as corresponding Cas9;
2) set up personalized sgRNA to design a model, be specially:
21) according to demand from corresponding with reference to genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) to step 21) in extract sgRNA sequence information carry out binary coding according to binary rules;
23) to step 21) the middle sgRNA obtaining, it is judged that the data type of the digesting efficiency of its Cas9, if numeric type is then
Enter step 24), if classifying type then enters step 25);
24) to step 22) in coding after sgRNA sequence information, carry out feature extraction with Lasso model, according to standard
Linear regression is set up personalized sgRNA and is designed a model;
25) to step 22) in coding after sgRNA sequence information, with two sorted logics recurrence in L1 regularization carry out
Feature selecting, the L2 regularization in returning further according to two sorted logics is set up personalized sgRNA and is designed a model;
3) use NDCG algorithm to weigh step 2) in the quality that designs a model of personalized sgRNA set up update SRA number
It according to storehouse, is specially:
31) calculation procedure 2) in the NDCG value that designs a model of personalized sgRNA set up;
32) judge whether existing SRA database has corresponding personalized sgRNA model, if being otherwise added to SRA
Database, if then entering step 33);
33) compare this personalization sgRNA model and the sgRNA model in corresponding SRA database, select NDCG value big
One is stored in SRA database;
4) design sgRNA the assessed value providing each sgRNA, be specially:
41) genome area being given according to user, it is suitable with reference to genome to choose from SRA database, therefrom searches
Suo Suoyou meets the sgRNA of design rule, as the sgRNA of design;
42) to step 41) in design sgRNA, use step 2) in set up personalized sgRNA model be estimated.
Preferably, described step 12) in be calculated corresponding Cas9 the value of digesting efficiency be specially:
121) the long comparison of reading of sgRNA and corresponding two generations order-checking to reference on genome;
122) reading comprising sgRNA is taken out long;
123) judge cut point whether produce on DNA insert or whether the insertion on deletion and DNA or deletion are
Frameshift mutation;
124) add up the frameshift mutation rate of each sgRNA, be specially:
125) using step 124) in calculated frameshift mutation rate as the value of the digesting efficiency of Cas9.
Preferably, described step 21) in the sequence information of sgRNA include that sgRNA sequence, sgRNA identify the required mark of DNA
The base of the upstream and downstream of the spacer of master chip section and sgRNA, the bases longs of the upstream and downstream of the spacer of described sgRNA is flat
Platform default value or the value of user setup.
Preferably, described step 22) in binary rules be specially:A correspondence 1000, C correspondence 0100, G correspondence 0010,
T correspondence 0001, N correspondence 0000.
Preferably, described step 24) in Lasso model carry out feature extraction be by extract non-zero weight select spy
Levy vector, be specially:
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is sgRNA
Quantity, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;
Lasso model is by increasing α | | w | |1Solve this least square loss function, by traversal regularization matrix, non-zero weight
Feature be extracted.
Preferably, described step 25) in L1 regularization be specially:
Wherein, w and c is weight and the intercept of estimative feature, and X is the binary matrix of the sgRNA of coding, and n is
The quantity of sgRNA, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA.
Preferably, described L2 regularization is specially:
Preferably, described step 31) in calculate the NDCG value that designs a model of personalized sgRNA set up and be specially:
Wherein, DCG is the numerical value calculating with prediction sequence, and IDCG is the preferable DCG calculating gained with true sequence,
reliIt is the ranking value of the i-th position prediction.
Preferably, described step 41) in design rule be specially:
20bp+PAM
Wherein, bp is for representing the unit of DNA length, and PAM is that sgRNA identifies the required mark fragment of DNA.
Compared with prior art, the invention have the advantages that:
(1) for the dissimilar cell of different plant species, the strategy of personalization is employed, and with the machine learning of data-driven
Algorithm is modeled, and assessment accuracy rate has significant improvement.
(2) new coding rule is used so that the feature finding is more complete, is not limited only between PAM and spacer.
(3) impart user oneself and build the flow process of model so that range of application is wider, be not limited only in database only have
Some species.
(4) use the OTF of NGS data to lead as being digested rate, expand the scope that can analyze data;
(5) user can upload the data of oneself and comes expanding data storehouse, accelerates the accumulation of data, advantageously accounts for now
Factor data amount deficiency causes the predicament that can not design optimum sgRNA very well.
Brief description
Fig. 1 is for setting up the method flow diagram of personalized sgRNA model and model evaluation;
Fig. 2 is the method flow diagram of design and assessment sgRNA.
Detailed description of the invention
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention
Premised on implement, give detailed embodiment and concrete operating process, but protection scope of the present invention be not limited to
Following embodiment.
Abbreviation explanation:
CRISPR:Clustered regularly interspaced short palindromic repeats
The little palindrome repetitive sequence at the interval of the rule of cluster
Cas9:With the related enzyme of CRISPR II type system
NGS:Next Generation Sequencing, the order-checking of two generations
PAM:Protospacer-adjacent motif, sgRNA identify the required mark fragment of DNA
sgRNA:CRISPR/Cas9 system rises the RNA of guiding function
indel:CRISPR/Cas9 edits the insertion on the DNA causing, deletion
spacer:SgRNA rises the base of about 20 of base pair complementarity
OTF:Out of frame, frameshift mutation.
Read:Read long, be the sequencing sequence that in high-flux sequence, a reaction obtains.
The present embodiment provides the method for designing of a kind of sgRNA based on CRISPR/Cas9, dissimilar for different plant species
Cell sets up the flow process that oneself personalized sgRNA designs a model, and can set up model according to different demands and design sgRNA, tool
Body includes following four step:
(1) Data Collection:The receipt collected from document is generally two classes:SgRNA and corresponding digesting efficiency number
Value type or sgRNA and corresponding digesting efficiency classifying type (such as effective or invalid two classification);Download from SRA database
NGS then only have numeric type a kind of.Because NGS data are by the numeric type one collected in flow process and the document after adding up OTF and leading
Cause, therefore document classification type and two kinds of data of NGS are only illustrated by the present embodiment.
Classifying type data:For the classifying type data collected from document, the present embodiment regulation is effectively 1, and invalid is 0,
It is organized into such as the form of table 1.
Table 1
sgID | Sequence | Score |
sgRNA_1 | CGCAACCTGCTCAGCGCCTACGG | 1 |
sgRNA_2 | CAGTCTACATAACACGCCCATGG | 1 |
sgRNA_3 | CGCAACCTGCTCAGCGCCTACGG | 1 |
…… | …… | …… |
sgRNA_1_1 | GGCAACCGTGGCGGCAATCGAGG | 0 |
sgRNA_2_2 | CTTCTCGGAATTCGGTGAAGGTGG | 0 |
sgRNA_3_3 | AACCTCCCGGCTTCTCGGAATTCGG | 0 |
…… | …… | …… |
Numeric type data:For the numeric type data of NGS, first pass through BWA respectively the sequence of sgRNA and NGS
Reads comparison with reference on genome to the mankind, is taken out the reads comprising sgRNA, and is judged whether cut point produces indel
And whether indel is OTF, then adds up the OTF of each sgRNA and lead that (OTF leads=comprises this sgRNA and is OTF
The sum of reads is divided by the total reads number comprising this sgRNA).Finally arrange as such as the form of table 2.
Table 2
sgID | Sequence | Score |
sgRNA_1 | CGCAACCTGCTCAGCGCCTACGG | 0.2345 |
sgRNA_2 | CAGTCTACATAACACGCCCATGG | 0.7846 |
sgRNA_3 | CGCAACCTGCTCAGCGCCTACGG | 0.2367 |
…… | …… | …… |
(2) model is set up:As it is shown in figure 1, from the corresponding sequence information extracting the sgRNA collecting with reference to genome.
Assume that arranging upstream and downstream sequence is respectively 35 and 32 bases, then the sequence taken out is 90 (35+20+3+32) individual base.
CACCTGGTAT GTTCGTATCG GGCAGAATATCGCAACCTGC TCAGCGCC TA CGGTCCATCT CGCTCAGGTA
CGACTGACCGACCCAGTCTA.
Carry out binary coding to the sgRNA information extracted, rule is as shown in table 3.
Table 3
Then 90 bases of above taking-up may be encoded as:
0100 1000 0100 0100 0001 0010 0010 0001 1000 0001
0010 0001 0001 0100 0010 0001 1000 0001 0100 0010
0010 0010 0100 1000 0010 1000 1000 0001 1000 0001
0100 0010 0100 1000 1000 0100 0100 0001 0010 0100
0001 0100 1000 0010 0100 0010 0100 0100 0001 1000
0100 0010 0010 0001 0100 0100 1000 0001 0100 0001
0100 0010 0100 0001 0100 1000 0010 0010 0001 1000
0100 0010 1000 0100 0001 0010 1000 0100 0100 0010
1000 0100 0100 0100 1000 0010 0001 0100 0001 1000
Extract feature with machine learning method, set up personalized sgRNA and design a model.
For classifying type data, select feature with logistic regression and set up forecast model.Two sorted logics return two
Individual optional regularization, present invention L1 regularization carries out feature selecting, and model is set up in L2 regularization.
The optimization problem that the following sparse features of L1 regularization logistic regression solution selects:
Wherein, w and c is weight and the intercept of estimative feature, and X is the character representation of training sample, and n is training sample
Quantity, y is the corresponding digesting efficiency value of sgRNA.
Minimize cost function by L2 punishment logistic regression solution:
For numeric type data, doing feature selecting with Lasso model, standard linear regression sets up forecast model.
Lasso is the linear model estimating sparse coefficient correlation, mainly selects characteristic vector by extracting non-zero weight.Minimize mesh
Scalar functions is:
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is training sample
Quantity, y is the corresponding digesting efficiency value of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso mould
Type is by increasing α | | w | |1Solve this least square loss function, by traversal regularization matrix, the feature quilt of non-zero weight
Extracting, these features are considered as the important element affecting sgRNA digesting efficiency.
After choosing these features, then set up an assessment models with a standard linear regression.
The modeling result of numeric type and classifying type all produces two files:One is xml document, and content comprises selectively
Feature, and the result of cross validation;Another file is pkl file, and content is the forecast model set up, binary file.
Xml document content is as follows:
(3) assessment models:NDCG algorithm is used to weigh the quality of forecast model, NDCG (Normalized Discounted
Cumulative Gain, storage gain is lost in normalization) it is the usefulness being mainly used to weigh an order models, its value generation
The similitude between the ranking results of prediction and the sequence of reality by table, and between zero and one, 1 represents completely the same to scope, numerical value
Represent more greatly this model better.Concrete formula is as follows:
DCG (Discounted Cumulative Gain, lose storage gain) is the numerical value calculating with prediction sequence,
IDCG (ideal DCG), is preferable DCG, calculates gained with true sequence.The mathematical definition of DCG is as follows:
Wherein, reliIt is the ranking value of the i-th position prediction.
As shown in the table, sgID is the title of sgRNA, and seq is the spacer sequence of sgRNA, and Benchmark Score is
Benchmark fraction, BS_rank is the sequence of Benchmark Score, and Cage is the fraction of forecast model of the present invention assessment, C_rank
Sequence for Cage is as shown in table 4.
Table 4
sgID | seq | Benchmark Score | BS_rank | Cage Score | C_rank |
sg1000 | GCAGGTACCCTGCAACGTCGCGG | 0.789456865 | 1 | 0.6905 | 1 |
sg1001 | CTCCACTAGTCCCCGCGCCGCGG | 0.506422166 | 2 | 0.6026 | 2 |
sg1 | GTAATGGCTTCCTCGTGAGTTGG | 0.325738326 | 3 | 0.5548 | 3 |
sg1002 | GACTCCGTTGGGATCCGCGCCGG | 0.092078991 | 4 | 0.5095 | 4 |
sg10 | ATCTTAAGCAAACGCTTACCAGG | 0.072255575 | 5 | 0.4959 | 5 |
sg1003 | CCCGAAACGGTTGACTCCGTTGG | 0.037552375 | 6 | 0.4473 | 6 |
sg1004 | AGGCGCGCGATCCAGGTAGCTGG | 0.019922477 | 7 | 0.3281 | 8 |
sg100 | AAAAAGCTGATGAAGTTGTTTGG | 0.017296539 | 8 | 0.3357 | 7 |
sg1005 | CGGGGCCACCGCGACGTTGCAGG | 0.002206787 | 9 | 0.3056 | 9 |
…… | …… | …… | …… | …… | …… |
TOP50 NDCG=0.876322904
TOP 10%NDCG=0.84340749
If not having this model in database, then updating and arriving database, the NDCG value otherwise calculating two groups compares, if
New model is bigger than the NDCG value of existing model, then renewable to database.
(4) design and assessment:As in figure 2 it is shown, be estimated for the sgRNA that user has designed or for user to
The genome area (such as chromosome 1,1,000,000to 1,002,000, hg19) going out, carries out the design of sgRNA, first
First determine species or the cell type of sgRNA to be assessed, then select the model being suitable for be estimated, without properly
Model, optional similar model, present embodiments provide that to relate to 10 models of 8 kinds of cells of 3 species for you to choose
Use.Result output is as shown in table 5.
Table 5
So far, user can select the sgRNA being suitable for oneself demand to carry out next step research.
Claims (9)
1. the method for designing based on the sgRNA of CRISPR/Cas9, it is characterised in that the method comprises the following steps:
1) obtain the value of the digesting efficiency of sgRNA and corresponding Cas9, be specially:
11) from document, obtain the value of the digesting efficiency of sgRNA and corresponding Cas9;
12) from SRA database, obtain sgRNA, calculate the value of the digesting efficiency obtaining corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data that get be categorized into different ginsengs
Examining genome, each a first is classified as sgRNA title, second is classified as sgRNA sequence and the with reference to listing in genome
The form of three digesting efficiency being classified as corresponding Cas9;
2) set up personalized sgRNA to design a model, be specially:
21) according to demand from corresponding with reference to genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) to step 21) in extract sgRNA sequence information carry out binary coding according to binary rules;
23) to step 21) the middle sgRNA obtaining, it is judged that the data type of the digesting efficiency of its Cas9, if numeric type then enters
Step 24), if classifying type then enters step 25);
24) to step 22) in coding after sgRNA sequence information, carry out feature extraction with Lasso model, according to normal linearity
Return the personalized sgRNA of foundation to design a model;
25) to step 22) in coding after sgRNA sequence information, with two sorted logics recurrence in L1 regularization carry out feature
Selecting, the L2 regularization in returning further according to two sorted logics is set up personalized sgRNA and is designed a model;
3) use NDCG algorithm to weigh step 2) in the quality that designs a model of personalized sgRNA set up update SRA database,
It is specially:
31) calculation procedure 2) in the NDCG value that designs a model of personalized sgRNA set up;
32) judge whether existing SRA database has corresponding personalized sgRNA model, if being otherwise added to SRA data
Storehouse, if then entering step 33);
33) the sgRNA model in this personalization sgRNA model and corresponding SRA database is compared, select that NDCG value is big one
It is stored in SRA database;
4) design sgRNA the assessed value providing each sgRNA, be specially:
41) genome area being given according to user, it is suitable with reference to genome to choose from SRA database, therefrom searches for institute
There is the sgRNA meeting design rule, as the sgRNA of design;
42) to step 41) in design sgRNA, use step 2) in set up personalized sgRNA model be estimated.
2. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
12) value of the digesting efficiency being calculated corresponding Cas9 in is specially:
121) the long comparison of reading of sgRNA and corresponding two generations order-checking to reference on genome;
122) reading comprising sgRNA is taken out long;
123) judge whether the insertion in the insertion whether cut point produces on DNA or deletion and DNA or deletion are frameshit
Sudden change;
124) add up the frameshift mutation rate of each sgRNA, be specially:
125) using step 124) in calculated frameshift mutation rate as the value of the digesting efficiency of Cas9.
3. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
21) in, the sequence information of sgRNA includes that sgRNA sequence, sgRNA identify the spacer of the required mark fragment of DNA and sgRNA
The base of upstream and downstream, the bases longs of the upstream and downstream of the spacer of described sgRNA is the value of platform default value or user setup.
4. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
22) binary rules in is specially:A correspondence 1000, C correspondence 0100, G correspondence 0010, T correspondence 0001, N correspondence 0000.
5. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
24) carrying out feature extraction with Lasso model in is to select characteristic vector by extracting non-zero weight, is specially:
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is the quantity of sgRNA,
Y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso model
By increasing α | | w | |1Solving this least square loss function, by traversal regularization matrix, the feature of non-zero weight is carried
Take out.
6. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
25) the L1 regularization in is specially:
Wherein, w and c is weight and the intercept of estimative feature, and X is the binary matrix of the sgRNA of coding, and n is sgRNA
Quantity, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA.
7. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 6, it is characterised in that described L2 is just
Then change and be specially:
8. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
31) the NDCG value that the personalized sgRNA that in, calculating is set up designs a model is specially:
Wherein, DCG is the numerical value calculating with prediction sequence, and IDCG is the preferable DCG, rel calculating gained with true sequenceiIt is
The ranking value of the i-th position prediction.
9. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step
41) in, design rule is specially:
20bp+PAM
Wherein, bp is for representing the unit of DNA length, and PAM is that sgRNA identifies the required mark fragment of DNA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446600A true CN106446600A (en) | 2017-02-22 |
CN106446600B CN106446600B (en) | 2019-10-18 |
Family
ID=58183551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610341946.3A Expired - Fee Related CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446600B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9999671B2 (en) | 2013-09-06 | 2018-06-19 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10465176B2 (en) | 2013-12-12 | 2019-11-05 | President And Fellows Of Harvard College | Cas variants for gene editing |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
CN110689922A (en) * | 2018-07-04 | 2020-01-14 | 赛业(广州)生物科技有限公司 | Method and system for GC content analysis of automatic parallelization knockout strategy |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111881324A (en) * | 2020-07-30 | 2020-11-03 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data universal storage format structure, construction method and application thereof |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
CN117252306A (en) * | 2023-10-11 | 2023-12-19 | 中央民族大学 | Gene editing capability index calculation method |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
-
2016
- 2016-05-20 CN CN201610341946.3A patent/CN106446600B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
Non-Patent Citations (6)
Title |
---|
JOHN G DOENCH ET AL: "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9", 《NATURE BIOTECHNOLOGY》 * |
NICOLO FUSI ET AL: "In Silico Predictive Modeling of CRISPR/Cas9 guide efficiency", 《BIORXIV》 * |
YANG LEI ET AL: "CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR system in plants", 《MOLECULAR PLANT》 * |
王立人: "CRISPR/CAS系统介导的基因组大片段DNA编辑", 《中国博士学位论文全文数据库基础科学辑》 * |
谢胜松等: "CRISPR/Cas9系统中sgRNA设计与脱靶效应评估", 《遗传》 * |
邵红伟等: "CRISPR-Cas9系统定向编辑TCR基因的sgRNA筛选", 《集美大学学报(自然科学版)》 * |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12006520B2 (en) | 2011-07-22 | 2024-06-11 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
US11920181B2 (en) | 2013-08-09 | 2024-03-05 | President And Fellows Of Harvard College | Nuclease profiling system |
US10954548B2 (en) | 2013-08-09 | 2021-03-23 | President And Fellows Of Harvard College | Nuclease profiling system |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
US11299755B2 (en) | 2013-09-06 | 2022-04-12 | President And Fellows Of Harvard College | Switchable CAS9 nucleases and uses thereof |
US9999671B2 (en) | 2013-09-06 | 2018-06-19 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US10682410B2 (en) | 2013-09-06 | 2020-06-16 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US10912833B2 (en) | 2013-09-06 | 2021-02-09 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US10465176B2 (en) | 2013-12-12 | 2019-11-05 | President And Fellows Of Harvard College | Cas variants for gene editing |
US11124782B2 (en) | 2013-12-12 | 2021-09-21 | President And Fellows Of Harvard College | Cas variants for gene editing |
US11053481B2 (en) | 2013-12-12 | 2021-07-06 | President And Fellows Of Harvard College | Fusions of Cas9 domains and nucleic acid-editing domains |
US10704062B2 (en) | 2014-07-30 | 2020-07-07 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US11578343B2 (en) | 2014-07-30 | 2023-02-14 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US12043852B2 (en) | 2015-10-23 | 2024-07-23 | President And Fellows Of Harvard College | Evolved Cas9 proteins for gene editing |
US11214780B2 (en) | 2015-10-23 | 2022-01-04 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US11999947B2 (en) | 2016-08-03 | 2024-06-04 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10947530B2 (en) | 2016-08-03 | 2021-03-16 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11702651B2 (en) | 2016-08-03 | 2023-07-18 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US12084663B2 (en) | 2016-08-24 | 2024-09-10 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
US11820969B2 (en) | 2016-12-23 | 2023-11-21 | President And Fellows Of Harvard College | Editing of CCR2 receptor gene to protect against HIV infection |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11932884B2 (en) | 2017-08-30 | 2024-03-19 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
CN110689922B (en) * | 2018-07-04 | 2023-07-14 | 广州赛业百沐生物科技有限公司 | Method and system for GC content analysis of automatic parallelization knockout strategy |
CN110689922A (en) * | 2018-07-04 | 2020-01-14 | 赛业(广州)生物科技有限公司 | Method and system for GC content analysis of automatic parallelization knockout strategy |
US11643652B2 (en) | 2019-03-19 | 2023-05-09 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11795452B2 (en) | 2019-03-19 | 2023-10-24 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
US12031126B2 (en) | 2020-05-08 | 2024-07-09 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111881324A (en) * | 2020-07-30 | 2020-11-03 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data universal storage format structure, construction method and application thereof |
CN111881324B (en) * | 2020-07-30 | 2023-12-15 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data general storage format structure, construction method and application thereof |
CN117252306A (en) * | 2023-10-11 | 2023-12-19 | 中央民族大学 | Gene editing capability index calculation method |
CN117252306B (en) * | 2023-10-11 | 2024-02-27 | 中央民族大学 | Gene editing capability index calculation method |
Also Published As
Publication number | Publication date |
---|---|
CN106446600B (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446600B (en) | A kind of design method of the sgRNA based on CRISPR/Cas9 | |
CA2424031C (en) | System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map | |
CN108319984B (en) | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level | |
Linde et al. | Congruent species delineation of Tulasnella using multiple loci and methods | |
CN108564117B (en) | SVM-based poverty and life assisting identification method | |
CN101145171A (en) | Gene microarray data predication method based on independent component integrated study | |
Kolpakov et al. | Searching for gapped palindromes | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
CN109545283B (en) | Method for constructing phylogenetic tree based on sequence pattern mining algorithm | |
CN104966106A (en) | Biological age step-by-step predication method based on support vector machine | |
Willi et al. | Demographic processes linked to genetic diversity and positive selection across a species' range | |
CN115881232A (en) | ScRNA-seq cell type annotation method based on graph neural network and feature fusion | |
Gonzalez-Alvarez et al. | Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery | |
CN114093420A (en) | XGboost-based DNA recombination site prediction method | |
CN115995262B (en) | Method for analyzing corn genetic mechanism based on random forest and LASSO regression | |
CN101894216B (en) | Method of discovering SNP group related to complex disease from SNP information | |
CN115249538B (en) | Construction method of lncRNA-disease associated prediction model for generating countermeasure network based on heterogeneous graph | |
CN105335626A (en) | Method for clustering lasso cluster characteristics based on network analysis | |
CN105046107B (en) | A kind of discovery method of limited die body | |
Chen et al. | Multi-objective evolutionary triclustering with constraints of time-series gene expression data | |
Mitra et al. | Spatial Analysis of Tumor Heterogeneity Using Machine Learning Techniques | |
Li et al. | Information recognition of pathogenic modules in gene statistics of big data | |
CN113140255B (en) | Method for predicting interaction of lncRNA-miRNA of plant | |
CN116364191A (en) | Single cell m 6 Prediction method of A methylation spectrum |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191018 |