CN106446600A - CRISPR/Cas9-based sgRNA design method - Google Patents

CRISPR/Cas9-based sgRNA design method Download PDF

Info

Publication number
CN106446600A
CN106446600A CN201610341946.3A CN201610341946A CN106446600A CN 106446600 A CN106446600 A CN 106446600A CN 201610341946 A CN201610341946 A CN 201610341946A CN 106446600 A CN106446600 A CN 106446600A
Authority
CN
China
Prior art keywords
sgrna
cas9
value
model
specially
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610341946.3A
Other languages
Chinese (zh)
Other versions
CN106446600B (en
Inventor
刘琦
啜国晖
陈亚男
闫纪芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201610341946.3A priority Critical patent/CN106446600B/en
Publication of CN106446600A publication Critical patent/CN106446600A/en
Application granted granted Critical
Publication of CN106446600B publication Critical patent/CN106446600B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a CRISPR/Cas9-based sgRNA design method. The method is characterized by comprising the following steps of obtaining the value of enzyme digestion efficiency of sgRNA and corresponding Cas9; building a personalized sgRNA design model; measuring the quality of the built personalized sgRNA design model by applying an NDCG algorithm and updating a database; and designing sgRNA and giving out an assessment value of each sgRNA. Compared with the prior art, the method has the characteristics of high accuracy, complete characteristics, wide application range and wide data analysis range.

Description

A kind of method for designing of the sgRNA based on CRISPR/Cas9
Technical field
The present invention relates to gene editing research field, especially a kind of based on CRISPR/Cas9 gene editing technology The method for designing of sgRNA.
Background technology
With the development of molecular biology, people have had more further understanding for the constitution element of life, but raw The curative mechanism of the mechanism of life process, especially some disease there is also a lot of not understanding.Relation between gene and phenotype, gene And influencing each other between gene, in the urgent need to a kind of engineering that in vivo can quickly knock out and insert gene. CRISPR/Cas9 system occurs at once, meets this demand of researcher.
CRISPR/Cas9 system (Clustered regularly interspaced short palindromic Repeats/CRISPR-associated protein 9) it is a kind of simple to operate, applicability extensive gene editing instrument. Whole system is mainly risen by a nucleic acid cleaving enzymatic (Cas9) and one and guides the RNA (sgRNA) of recognition reaction to form.sgRNA Identified with target gene site by base pair complementarity, then recruit Cas9 and be digested, produce double-strand break, thus realize The gene editing of DNA level.It because its applicability is wide, convenient and time-saving, is quickly applied to various aspects, especially builds at cancer model Vertical and gene therapy probe into aspect, have very big superiority.
But, discovery in the constantly exploration of scientist, the different sgRNA's designing for same gene in same cell Digesting efficiency has very big difference, if high efficiency sgRNA can not be designed, can only be made up by increasing concentration, so will Can a lot of gene rubbish be brought to cell, produce a high proportion of missing the target simultaneously, give scientific research personnel research bring very big not Just, the sgRNA therefore designing a high digesting efficiency is extremely important for the research in terms of gene.
At present, the design software of existing sgRNA has nearly 30 kinds, is broadly divided into two classes:One class is to sum up from experiment Some of sgRNA are regular, and PAM sequence must be contained in the sgRNA sequence one end for example matched, and 5 ' ends should be GG, G/C content Should be maintained at about 60%, Seed Sequences can't stand mispairing etc., is then directly screened by arranging condition,;Another kind of master To calculate the specific of sgRNA by using statistical method to give a weight to each base, such as CRISPR Design.What the software of both types was all set up is the model of a versatility, yet with different plant species and different cell Between have very big heterogeneity, cause the prediction efficiency of existing software to be not very well, and because different under different experimental conditions The digesting efficiency to sgRNA for the matter has a certain impact, and general model evaluation accuracy rate ratio is relatively low.
Accordingly, it is considered to the heterogeneity between different platform species data, set up individual by the data of different platform or species Property model to improve the specific of sgRNA and high efficiency, extremely heavy for the miss the target research of problem of CRISPR/Cas9 system Want.
Content of the invention
It is an object of the invention to for the problems referred to above provide that a kind of accuracy rate is high, applied range based on CRISPR/ The method for designing of the sgRNA of Cas9.
For realizing purpose of the present invention, the present invention provides the method for designing of a kind of sgRNA based on CRISPR/Cas9, The method comprises the following steps:
1) obtain the value of the digesting efficiency of sgRNA and corresponding Cas9, be specially:
11) from document, obtain the value of the digesting efficiency of sgRNA and corresponding Cas9;
12) from SRA database, obtain sgRNA, calculate the value of the digesting efficiency obtaining corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data that get be categorized into difference Reference genome, each with reference to genome is listed a first be classified as sgRNA title, second be classified as sgRNA sequence with And the 3rd form of digesting efficiency being classified as corresponding Cas9;
2) set up personalized sgRNA to design a model, be specially:
21) according to demand from corresponding with reference to genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) to step 21) in extract sgRNA sequence information carry out binary coding according to binary rules;
23) to step 21) the middle sgRNA obtaining, it is judged that the data type of the digesting efficiency of its Cas9, if numeric type is then Enter step 24), if classifying type then enters step 25);
24) to step 22) in coding after sgRNA sequence information, carry out feature extraction with Lasso model, according to standard Linear regression is set up personalized sgRNA and is designed a model;
25) to step 22) in coding after sgRNA sequence information, with two sorted logics recurrence in L1 regularization carry out Feature selecting, the L2 regularization in returning further according to two sorted logics is set up personalized sgRNA and is designed a model;
3) use NDCG algorithm to weigh step 2) in the quality that designs a model of personalized sgRNA set up update SRA number It according to storehouse, is specially:
31) calculation procedure 2) in the NDCG value that designs a model of personalized sgRNA set up;
32) judge whether existing SRA database has corresponding personalized sgRNA model, if being otherwise added to SRA Database, if then entering step 33);
33) compare this personalization sgRNA model and the sgRNA model in corresponding SRA database, select NDCG value big One is stored in SRA database;
4) design sgRNA the assessed value providing each sgRNA, be specially:
41) genome area being given according to user, it is suitable with reference to genome to choose from SRA database, therefrom searches Suo Suoyou meets the sgRNA of design rule, as the sgRNA of design;
42) to step 41) in design sgRNA, use step 2) in set up personalized sgRNA model be estimated.
Preferably, described step 12) in be calculated corresponding Cas9 the value of digesting efficiency be specially:
121) the long comparison of reading of sgRNA and corresponding two generations order-checking to reference on genome;
122) reading comprising sgRNA is taken out long;
123) judge cut point whether produce on DNA insert or whether the insertion on deletion and DNA or deletion are Frameshift mutation;
124) add up the frameshift mutation rate of each sgRNA, be specially:
125) using step 124) in calculated frameshift mutation rate as the value of the digesting efficiency of Cas9.
Preferably, described step 21) in the sequence information of sgRNA include that sgRNA sequence, sgRNA identify the required mark of DNA The base of the upstream and downstream of the spacer of master chip section and sgRNA, the bases longs of the upstream and downstream of the spacer of described sgRNA is flat Platform default value or the value of user setup.
Preferably, described step 22) in binary rules be specially:A correspondence 1000, C correspondence 0100, G correspondence 0010, T correspondence 0001, N correspondence 0000.
Preferably, described step 24) in Lasso model carry out feature extraction be by extract non-zero weight select spy Levy vector, be specially:
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is sgRNA Quantity, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector; Lasso model is by increasing α | | w | |1Solve this least square loss function, by traversal regularization matrix, non-zero weight Feature be extracted.
Preferably, described step 25) in L1 regularization be specially:
Wherein, w and c is weight and the intercept of estimative feature, and X is the binary matrix of the sgRNA of coding, and n is The quantity of sgRNA, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA.
Preferably, described L2 regularization is specially:
Preferably, described step 31) in calculate the NDCG value that designs a model of personalized sgRNA set up and be specially:
Wherein, DCG is the numerical value calculating with prediction sequence, and IDCG is the preferable DCG calculating gained with true sequence, reliIt is the ranking value of the i-th position prediction.
Preferably, described step 41) in design rule be specially:
20bp+PAM
Wherein, bp is for representing the unit of DNA length, and PAM is that sgRNA identifies the required mark fragment of DNA.
Compared with prior art, the invention have the advantages that:
(1) for the dissimilar cell of different plant species, the strategy of personalization is employed, and with the machine learning of data-driven Algorithm is modeled, and assessment accuracy rate has significant improvement.
(2) new coding rule is used so that the feature finding is more complete, is not limited only between PAM and spacer.
(3) impart user oneself and build the flow process of model so that range of application is wider, be not limited only in database only have Some species.
(4) use the OTF of NGS data to lead as being digested rate, expand the scope that can analyze data;
(5) user can upload the data of oneself and comes expanding data storehouse, accelerates the accumulation of data, advantageously accounts for now Factor data amount deficiency causes the predicament that can not design optimum sgRNA very well.
Brief description
Fig. 1 is for setting up the method flow diagram of personalized sgRNA model and model evaluation;
Fig. 2 is the method flow diagram of design and assessment sgRNA.
Detailed description of the invention
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implement, give detailed embodiment and concrete operating process, but protection scope of the present invention be not limited to Following embodiment.
Abbreviation explanation:
CRISPR:Clustered regularly interspaced short palindromic repeats
The little palindrome repetitive sequence at the interval of the rule of cluster
Cas9:With the related enzyme of CRISPR II type system
NGS:Next Generation Sequencing, the order-checking of two generations
PAM:Protospacer-adjacent motif, sgRNA identify the required mark fragment of DNA
sgRNA:CRISPR/Cas9 system rises the RNA of guiding function
indel:CRISPR/Cas9 edits the insertion on the DNA causing, deletion
spacer:SgRNA rises the base of about 20 of base pair complementarity
OTF:Out of frame, frameshift mutation.
Read:Read long, be the sequencing sequence that in high-flux sequence, a reaction obtains.
The present embodiment provides the method for designing of a kind of sgRNA based on CRISPR/Cas9, dissimilar for different plant species Cell sets up the flow process that oneself personalized sgRNA designs a model, and can set up model according to different demands and design sgRNA, tool Body includes following four step:
(1) Data Collection:The receipt collected from document is generally two classes:SgRNA and corresponding digesting efficiency number Value type or sgRNA and corresponding digesting efficiency classifying type (such as effective or invalid two classification);Download from SRA database NGS then only have numeric type a kind of.Because NGS data are by the numeric type one collected in flow process and the document after adding up OTF and leading Cause, therefore document classification type and two kinds of data of NGS are only illustrated by the present embodiment.
Classifying type data:For the classifying type data collected from document, the present embodiment regulation is effectively 1, and invalid is 0, It is organized into such as the form of table 1.
Table 1
sgID Sequence Score
sgRNA_1 CGCAACCTGCTCAGCGCCTACGG 1
sgRNA_2 CAGTCTACATAACACGCCCATGG 1
sgRNA_3 CGCAACCTGCTCAGCGCCTACGG 1
…… …… ……
sgRNA_1_1 GGCAACCGTGGCGGCAATCGAGG 0
sgRNA_2_2 CTTCTCGGAATTCGGTGAAGGTGG 0
sgRNA_3_3 AACCTCCCGGCTTCTCGGAATTCGG 0
…… …… ……
Numeric type data:For the numeric type data of NGS, first pass through BWA respectively the sequence of sgRNA and NGS Reads comparison with reference on genome to the mankind, is taken out the reads comprising sgRNA, and is judged whether cut point produces indel And whether indel is OTF, then adds up the OTF of each sgRNA and lead that (OTF leads=comprises this sgRNA and is OTF The sum of reads is divided by the total reads number comprising this sgRNA).Finally arrange as such as the form of table 2.
Table 2
sgID Sequence Score
sgRNA_1 CGCAACCTGCTCAGCGCCTACGG 0.2345
sgRNA_2 CAGTCTACATAACACGCCCATGG 0.7846
sgRNA_3 CGCAACCTGCTCAGCGCCTACGG 0.2367
…… …… ……
(2) model is set up:As it is shown in figure 1, from the corresponding sequence information extracting the sgRNA collecting with reference to genome. Assume that arranging upstream and downstream sequence is respectively 35 and 32 bases, then the sequence taken out is 90 (35+20+3+32) individual base. CACCTGGTAT GTTCGTATCG GGCAGAATATCGCAACCTGC TCAGCGCC TA CGGTCCATCT CGCTCAGGTA CGACTGACCGACCCAGTCTA.
Carry out binary coding to the sgRNA information extracted, rule is as shown in table 3.
Table 3
Then 90 bases of above taking-up may be encoded as:
0100 1000 0100 0100 0001 0010 0010 0001 1000 0001
0010 0001 0001 0100 0010 0001 1000 0001 0100 0010
0010 0010 0100 1000 0010 1000 1000 0001 1000 0001
0100 0010 0100 1000 1000 0100 0100 0001 0010 0100
0001 0100 1000 0010 0100 0010 0100 0100 0001 1000
0100 0010 0010 0001 0100 0100 1000 0001 0100 0001
0100 0010 0100 0001 0100 1000 0010 0010 0001 1000
0100 0010 1000 0100 0001 0010 1000 0100 0100 0010
1000 0100 0100 0100 1000 0010 0001 0100 0001 1000
Extract feature with machine learning method, set up personalized sgRNA and design a model.
For classifying type data, select feature with logistic regression and set up forecast model.Two sorted logics return two Individual optional regularization, present invention L1 regularization carries out feature selecting, and model is set up in L2 regularization.
The optimization problem that the following sparse features of L1 regularization logistic regression solution selects:
Wherein, w and c is weight and the intercept of estimative feature, and X is the character representation of training sample, and n is training sample Quantity, y is the corresponding digesting efficiency value of sgRNA.
Minimize cost function by L2 punishment logistic regression solution:
For numeric type data, doing feature selecting with Lasso model, standard linear regression sets up forecast model. Lasso is the linear model estimating sparse coefficient correlation, mainly selects characteristic vector by extracting non-zero weight.Minimize mesh Scalar functions is:
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is training sample Quantity, y is the corresponding digesting efficiency value of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso mould Type is by increasing α | | w | |1Solve this least square loss function, by traversal regularization matrix, the feature quilt of non-zero weight Extracting, these features are considered as the important element affecting sgRNA digesting efficiency.
After choosing these features, then set up an assessment models with a standard linear regression.
The modeling result of numeric type and classifying type all produces two files:One is xml document, and content comprises selectively Feature, and the result of cross validation;Another file is pkl file, and content is the forecast model set up, binary file.
Xml document content is as follows:
(3) assessment models:NDCG algorithm is used to weigh the quality of forecast model, NDCG (Normalized Discounted Cumulative Gain, storage gain is lost in normalization) it is the usefulness being mainly used to weigh an order models, its value generation The similitude between the ranking results of prediction and the sequence of reality by table, and between zero and one, 1 represents completely the same to scope, numerical value Represent more greatly this model better.Concrete formula is as follows:
DCG (Discounted Cumulative Gain, lose storage gain) is the numerical value calculating with prediction sequence, IDCG (ideal DCG), is preferable DCG, calculates gained with true sequence.The mathematical definition of DCG is as follows:
Wherein, reliIt is the ranking value of the i-th position prediction.
As shown in the table, sgID is the title of sgRNA, and seq is the spacer sequence of sgRNA, and Benchmark Score is Benchmark fraction, BS_rank is the sequence of Benchmark Score, and Cage is the fraction of forecast model of the present invention assessment, C_rank Sequence for Cage is as shown in table 4.
Table 4
sgID seq Benchmark Score BS_rank Cage Score C_rank
sg1000 GCAGGTACCCTGCAACGTCGCGG 0.789456865 1 0.6905 1
sg1001 CTCCACTAGTCCCCGCGCCGCGG 0.506422166 2 0.6026 2
sg1 GTAATGGCTTCCTCGTGAGTTGG 0.325738326 3 0.5548 3
sg1002 GACTCCGTTGGGATCCGCGCCGG 0.092078991 4 0.5095 4
sg10 ATCTTAAGCAAACGCTTACCAGG 0.072255575 5 0.4959 5
sg1003 CCCGAAACGGTTGACTCCGTTGG 0.037552375 6 0.4473 6
sg1004 AGGCGCGCGATCCAGGTAGCTGG 0.019922477 7 0.3281 8
sg100 AAAAAGCTGATGAAGTTGTTTGG 0.017296539 8 0.3357 7
sg1005 CGGGGCCACCGCGACGTTGCAGG 0.002206787 9 0.3056 9
…… …… …… …… …… ……
TOP50 NDCG=0.876322904
TOP 10%NDCG=0.84340749
If not having this model in database, then updating and arriving database, the NDCG value otherwise calculating two groups compares, if New model is bigger than the NDCG value of existing model, then renewable to database.
(4) design and assessment:As in figure 2 it is shown, be estimated for the sgRNA that user has designed or for user to The genome area (such as chromosome 1,1,000,000to 1,002,000, hg19) going out, carries out the design of sgRNA, first First determine species or the cell type of sgRNA to be assessed, then select the model being suitable for be estimated, without properly Model, optional similar model, present embodiments provide that to relate to 10 models of 8 kinds of cells of 3 species for you to choose Use.Result output is as shown in table 5.
Table 5
So far, user can select the sgRNA being suitable for oneself demand to carry out next step research.

Claims (9)

1. the method for designing based on the sgRNA of CRISPR/Cas9, it is characterised in that the method comprises the following steps:
1) obtain the value of the digesting efficiency of sgRNA and corresponding Cas9, be specially:
11) from document, obtain the value of the digesting efficiency of sgRNA and corresponding Cas9;
12) from SRA database, obtain sgRNA, calculate the value of the digesting efficiency obtaining corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data that get be categorized into different ginsengs Examining genome, each a first is classified as sgRNA title, second is classified as sgRNA sequence and the with reference to listing in genome The form of three digesting efficiency being classified as corresponding Cas9;
2) set up personalized sgRNA to design a model, be specially:
21) according to demand from corresponding with reference to genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) to step 21) in extract sgRNA sequence information carry out binary coding according to binary rules;
23) to step 21) the middle sgRNA obtaining, it is judged that the data type of the digesting efficiency of its Cas9, if numeric type then enters Step 24), if classifying type then enters step 25);
24) to step 22) in coding after sgRNA sequence information, carry out feature extraction with Lasso model, according to normal linearity Return the personalized sgRNA of foundation to design a model;
25) to step 22) in coding after sgRNA sequence information, with two sorted logics recurrence in L1 regularization carry out feature Selecting, the L2 regularization in returning further according to two sorted logics is set up personalized sgRNA and is designed a model;
3) use NDCG algorithm to weigh step 2) in the quality that designs a model of personalized sgRNA set up update SRA database, It is specially:
31) calculation procedure 2) in the NDCG value that designs a model of personalized sgRNA set up;
32) judge whether existing SRA database has corresponding personalized sgRNA model, if being otherwise added to SRA data Storehouse, if then entering step 33);
33) the sgRNA model in this personalization sgRNA model and corresponding SRA database is compared, select that NDCG value is big one It is stored in SRA database;
4) design sgRNA the assessed value providing each sgRNA, be specially:
41) genome area being given according to user, it is suitable with reference to genome to choose from SRA database, therefrom searches for institute There is the sgRNA meeting design rule, as the sgRNA of design;
42) to step 41) in design sgRNA, use step 2) in set up personalized sgRNA model be estimated.
2. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 12) value of the digesting efficiency being calculated corresponding Cas9 in is specially:
121) the long comparison of reading of sgRNA and corresponding two generations order-checking to reference on genome;
122) reading comprising sgRNA is taken out long;
123) judge whether the insertion in the insertion whether cut point produces on DNA or deletion and DNA or deletion are frameshit Sudden change;
124) add up the frameshift mutation rate of each sgRNA, be specially:
125) using step 124) in calculated frameshift mutation rate as the value of the digesting efficiency of Cas9.
3. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 21) in, the sequence information of sgRNA includes that sgRNA sequence, sgRNA identify the spacer of the required mark fragment of DNA and sgRNA The base of upstream and downstream, the bases longs of the upstream and downstream of the spacer of described sgRNA is the value of platform default value or user setup.
4. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 22) binary rules in is specially:A correspondence 1000, C correspondence 0100, G correspondence 0010, T correspondence 0001, N correspondence 0000.
5. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 24) carrying out feature extraction with Lasso model in is to select characteristic vector by extracting non-zero weight, is specially:
m i n w 1 2 n | | x w - y | | 2 2 + α | | w | | 1
Wherein, w is the weight of estimative characteristic vector, and x is the characteristic vector of selected sgRNA, and n is the quantity of sgRNA, Y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso model By increasing α | | w | |1Solving this least square loss function, by traversal regularization matrix, the feature of non-zero weight is carried Take out.
6. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 25) the L1 regularization in is specially:
m i n w , c 1 2 | | w | | 1 + CΣ i = 1 n l o g ( exp ( - y i ( X i T w + c ) ) + 1 )
Wherein, w and c is weight and the intercept of estimative feature, and X is the binary matrix of the sgRNA of coding, and n is sgRNA Quantity, y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA.
7. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 6, it is characterised in that described L2 is just Then change and be specially:
m i n w , c 1 2 w T w + CΣ i = 1 n l o g ( exp ( - y i ( X i T w + c ) ) + 1 ) .
8. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 31) the NDCG value that the personalized sgRNA that in, calculating is set up designs a model is specially:
N D C G = D C G I D C G
D C G = rel 1 + Σ i = 2 n rel i log 2 i
Wherein, DCG is the numerical value calculating with prediction sequence, and IDCG is the preferable DCG, rel calculating gained with true sequenceiIt is The ranking value of the i-th position prediction.
9. the method for designing of the sgRNA based on CRISPR/Cas9 according to claim 1, it is characterised in that described step 41) in, design rule is specially:
20bp+PAM
Wherein, bp is for representing the unit of DNA length, and PAM is that sgRNA identifies the required mark fragment of DNA.
CN201610341946.3A 2016-05-20 2016-05-20 A kind of design method of the sgRNA based on CRISPR/Cas9 Expired - Fee Related CN106446600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610341946.3A CN106446600B (en) 2016-05-20 2016-05-20 A kind of design method of the sgRNA based on CRISPR/Cas9

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610341946.3A CN106446600B (en) 2016-05-20 2016-05-20 A kind of design method of the sgRNA based on CRISPR/Cas9

Publications (2)

Publication Number Publication Date
CN106446600A true CN106446600A (en) 2017-02-22
CN106446600B CN106446600B (en) 2019-10-18

Family

ID=58183551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610341946.3A Expired - Fee Related CN106446600B (en) 2016-05-20 2016-05-20 A kind of design method of the sgRNA based on CRISPR/Cas9

Country Status (1)

Country Link
CN (1) CN106446600B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
CN110689922A (en) * 2018-07-04 2020-01-14 赛业(广州)生物科技有限公司 Method and system for GC content analysis of automatic parallelization knockout strategy
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN111881324A (en) * 2020-07-30 2020-11-03 苏州工业园区服务外包职业学院 High-throughput sequencing data universal storage format structure, construction method and application thereof
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
CN117252306A (en) * 2023-10-11 2023-12-19 中央民族大学 Gene editing capability index calculation method
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103805606A (en) * 2014-02-28 2014-05-21 青岛市畜牧兽医研究所 Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs
CN104109687A (en) * 2014-07-14 2014-10-22 四川大学 Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system
CN105255937A (en) * 2015-08-14 2016-01-20 西北农林科技大学 Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof
CN105296518A (en) * 2015-12-01 2016-02-03 中国农业大学 Homologous arm vector construction method used for CRISPR/Cas 9 technology
CN105400779A (en) * 2015-10-15 2016-03-16 芜湖医诺生物技术有限公司 Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103805606A (en) * 2014-02-28 2014-05-21 青岛市畜牧兽医研究所 Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs
CN104109687A (en) * 2014-07-14 2014-10-22 四川大学 Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system
CN105255937A (en) * 2015-08-14 2016-01-20 西北农林科技大学 Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof
CN105400779A (en) * 2015-10-15 2016-03-16 芜湖医诺生物技术有限公司 Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system
CN105296518A (en) * 2015-12-01 2016-02-03 中国农业大学 Homologous arm vector construction method used for CRISPR/Cas 9 technology

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JOHN G DOENCH ET AL: "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9", 《NATURE BIOTECHNOLOGY》 *
NICOLO FUSI ET AL: "In Silico Predictive Modeling of CRISPR/Cas9 guide efficiency", 《BIORXIV》 *
YANG LEI ET AL: "CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR system in plants", 《MOLECULAR PLANT》 *
王立人: "CRISPR/CAS系统介导的基因组大片段DNA编辑", 《中国博士学位论文全文数据库基础科学辑》 *
谢胜松等: "CRISPR/Cas9系统中sgRNA设计与脱靶效应评估", 《遗传》 *
邵红伟等: "CRISPR-Cas9系统定向编辑TCR基因的sgRNA筛选", 《集美大学学报(自然科学版)》 *

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12006520B2 (en) 2011-07-22 2024-06-11 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US12043852B2 (en) 2015-10-23 2024-07-23 President And Fellows Of Harvard College Evolved Cas9 proteins for gene editing
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11999947B2 (en) 2016-08-03 2024-06-04 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US12084663B2 (en) 2016-08-24 2024-09-10 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
CN110689922B (en) * 2018-07-04 2023-07-14 广州赛业百沐生物科技有限公司 Method and system for GC content analysis of automatic parallelization knockout strategy
CN110689922A (en) * 2018-07-04 2020-01-14 赛业(广州)生物科技有限公司 Method and system for GC content analysis of automatic parallelization knockout strategy
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12031126B2 (en) 2020-05-08 2024-07-09 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN111881324A (en) * 2020-07-30 2020-11-03 苏州工业园区服务外包职业学院 High-throughput sequencing data universal storage format structure, construction method and application thereof
CN111881324B (en) * 2020-07-30 2023-12-15 苏州工业园区服务外包职业学院 High-throughput sequencing data general storage format structure, construction method and application thereof
CN117252306A (en) * 2023-10-11 2023-12-19 中央民族大学 Gene editing capability index calculation method
CN117252306B (en) * 2023-10-11 2024-02-27 中央民族大学 Gene editing capability index calculation method

Also Published As

Publication number Publication date
CN106446600B (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN106446600B (en) A kind of design method of the sgRNA based on CRISPR/Cas9
CA2424031C (en) System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map
CN108319984B (en) The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level
Linde et al. Congruent species delineation of Tulasnella using multiple loci and methods
CN108564117B (en) SVM-based poverty and life assisting identification method
CN101145171A (en) Gene microarray data predication method based on independent component integrated study
Kolpakov et al. Searching for gapped palindromes
CN114708910B (en) Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data
CN105808976A (en) Recommendation model based miRNA target gene prediction method
CN109545283B (en) Method for constructing phylogenetic tree based on sequence pattern mining algorithm
CN104966106A (en) Biological age step-by-step predication method based on support vector machine
Willi et al. Demographic processes linked to genetic diversity and positive selection across a species' range
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
Gonzalez-Alvarez et al. Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery
CN114093420A (en) XGboost-based DNA recombination site prediction method
CN115995262B (en) Method for analyzing corn genetic mechanism based on random forest and LASSO regression
CN101894216B (en) Method of discovering SNP group related to complex disease from SNP information
CN115249538B (en) Construction method of lncRNA-disease associated prediction model for generating countermeasure network based on heterogeneous graph
CN105335626A (en) Method for clustering lasso cluster characteristics based on network analysis
CN105046107B (en) A kind of discovery method of limited die body
Chen et al. Multi-objective evolutionary triclustering with constraints of time-series gene expression data
Mitra et al. Spatial Analysis of Tumor Heterogeneity Using Machine Learning Techniques
Li et al. Information recognition of pathogenic modules in gene statistics of big data
CN113140255B (en) Method for predicting interaction of lncRNA-miRNA of plant
CN116364191A (en) Single cell m 6 Prediction method of A methylation spectrum

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191018