CN106446600B - A kind of design method of the sgRNA based on CRISPR/Cas9 - Google Patents
A kind of design method of the sgRNA based on CRISPR/Cas9 Download PDFInfo
- Publication number
- CN106446600B CN106446600B CN201610341946.3A CN201610341946A CN106446600B CN 106446600 B CN106446600 B CN 106446600B CN 201610341946 A CN201610341946 A CN 201610341946A CN 106446600 B CN106446600 B CN 106446600B
- Authority
- CN
- China
- Prior art keywords
- sgrna
- cas9
- model
- value
- crispr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to the design methods of sgRNA based on CRISPR/Cas9 a kind of, which is characterized in that this method includes the following steps: the value for obtaining the digesting efficiency of sgRNA and corresponding Cas9;Personalization sgRNA is established to design a model;The quality and more new database that the personalized sgRNA established designs a model are measured with NDCG algorithm;Design sgRNA and the assessed value for providing each sgRNA.Compared with prior art, the present invention has the characteristics that accuracy rate is high, feature is complete, it is wide with analysis data to have a wide range of application.
Description
Technical field
It is especially a kind of based on CRISPR/Cas9 gene editing technology the present invention relates to gene editing research field
The design method of sgRNA.
Background technique
With the development of molecular biology, people have more further understanding for the constitution element of life, but raw
The curative mechanism of the mechanism of life process, especially certain diseases is not understood there is also very much.Relationship between gene and phenotype, gene
Influencing each other between gene, there is an urgent need to a kind of engineering technology that in vivo quickly can knock out and be inserted into gene.
CRISPR/Cas9 system occurs at once, meets this demand of researcher.
CRISPR/Cas9 system (Clustered regularly interspaced short palindromic
Repeats/CRISPR-associated protein 9) it is a kind of easy to operate, extensive gene editing tool of applicability.
Whole system is mainly made of the RNA (sgRNA) that a nucleic acid cleaving enzymatic (Cas9) and one play guidance recognition reaction.sgRNA
It is identified by base pair complementarity and target gene site, then recruits Cas9 and carry out digestion, generate double-strand break, to realize
The gene editing of DNA level.Because its applicability is wide, convenient and time-saving, it is applied to various aspects quickly, is especially built in cancer model
It is vertical and gene therapy to probe into aspect, there is very big superiority.
However, being found in the continuous exploration of scientist, for the different sgRNA's of same gene design in same cell
Digesting efficiency has very big difference, if efficient sgRNA cannot be designed, can only be made up by increasing concentration, in this way will
Many gene rubbish can be brought to cell, at the same generate it is a high proportion of miss the target, to the research of scientific research personnel bring it is very big not
Just, therefore the sgRNA of one high digesting efficiency of design is extremely important for the research in terms of gene.
Currently, the design software of existing sgRNA has nearly 30 kinds, be broadly divided into two classes: one kind is summarized from experiment
Some rules of sgRNA, such as sgRNA sequence one end of pairing must contain PAM sequence, and 5 ' ends should be GG, G/C content
It should be maintained at 60% or so, seed sequence can't stand mispairing etc., then directly screened by setting condition,;Another kind of master
The specificity of sgRNA, such as CRISPR are calculated by assigning a weight to each base with statistical method
Design.What the software of both types was all established is the model of a versatility, however due to different plant species and different cells
Between have very big heterogeneity, cause existing software prediction efficiency be not very well, and because different experimental conditions under it is different
Matter has a certain impact to the digesting efficiency of sgRNA, and general model evaluation accuracy rate is relatively low.
Accordingly, it is considered to which the heterogeneity between different platform species data, is established a with different platform or the data of species
Property model to improve the specificity and high efficiency of sgRNA, for CRISPR/Cas9 system miss the target problem research it is extremely heavy
It wants.
Summary of the invention
The purpose of the present invention is provide regarding to the issue above a kind of accuracy rate is high, have a wide range of application based on CRISPR/
The design method of the sgRNA of Cas9.
The purpose to realize the present invention, the present invention provide the design method of sgRNA based on CRISPR/Cas9 a kind of,
This method includes the following steps:
1) value of the digesting efficiency of sgRNA and corresponding Cas9 is obtained, specifically:
11) value of the digesting efficiency of sgRNA and corresponding Cas9 is obtained from document;
12) sgRNA is obtained from SRA database, calculates the value for obtaining the digesting efficiency of corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data classification that gets at difference
Reference genome, it is each with reference to list in genome portion first be classified as sgRNA title, second be classified as sgRNA sequence with
And third is classified as the table of the digesting efficiency of corresponding Cas9;
2) personalization sgRNA is established to design a model, specifically:
21) according to demand from accordingly with reference in genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) binary coding is carried out according to binary rules to the sgRNA sequence information extracted in step 21);
23) to the sgRNA obtained in step 21), judge the data type of the digesting efficiency of its Cas9, then if numeric type
It enters step 24), is then entered step 25) if classifying type;
24) to the sgRNA sequence information after coding in step 22), feature extraction is carried out with Lasso model, according to standard
Linear regression establishes personalization sgRNA and designs a model;
25) it to the sgRNA sequence information after coding in step 22), is carried out with the L1 regularization in the recurrence of two sorted logics
Feature selecting establishes personalization sgRNA further according to the L2 regularization in the recurrence of two sorted logics and designs a model;
3) quality that designs a model of personalized sgRNA established in step 2) is measured with NDCG algorithm and updates SRA number
According to library, specifically:
31) the NDCG value that the personalized sgRNA established in step 2) designs a model is calculated;
32) judge whether there is corresponding personalization sgRNA model in existing SRA database, if being otherwise added to SRA
Database, if then entering step 33);
33) compare personalization sgRNA model and the sgRNA model in corresponding SRA database, select NDCG value big
One is stored in SRA database;
4) it designs sgRNA and provides the assessed value of each sgRNA, specifically:
41) genome area provided according to user is chosen from SRA database and suitably refers to genome, therefrom searches
Suo Suoyou meets the sgRNA of design rule, as the sgRNA of design;
42) it to the sgRNA designed in step 41), is assessed with the personalized sgRNA model established in step 2).
Preferably, the value of the digesting efficiency of corresponding Cas9 is calculated in the step 12) specifically:
121) in the long comparison to reference genome of reading sgRNA and corresponding two generation being sequenced;
122) it is long to take out the reading comprising sgRNA;
123) judge cut point whether generate the insertion on DNA or the insertion on deletion and DNA or delete whether be
Frameshift mutation;
124) the frameshift mutation rate of each sgRNA is counted, specifically:
125) using the frameshift mutation rate being calculated in step 124) as the value of the digesting efficiency of Cas9.
Preferably, the sequence information of sgRNA includes sgRNA sequence, the required mark of sgRNA identification DNA in the step 21)
The base of the upstream and downstream of the spacer of master chip section and sgRNA, the bases longs of the upstream and downstream of the spacer of the sgRNA are flat
The value of platform default value or user setting.
Preferably, the binary rules in the step 22) specifically: corresponding 0100, the G corresponding 0010 of corresponding 1000, the C of A,
Corresponding 0001, the N corresponding 0000 of T.
Preferably, carrying out feature extraction with Lasso model in the step 24) is to select spy by extracting non-zero weight
Vector is levied, specifically:
Wherein, w is the weight of estimative feature vector, and x is the feature vector of the sgRNA selected, and n is sgRNA
Quantity, y are the values of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;
Lasso model is by increasing α | | w | |1It solves this least square loss function, passes through traversal regularization matrix, non-zero weight
Feature be extracted.
Preferably, the L1 regularization in the step 25) specifically:
Wherein, w and c is the weight and intercept of estimative feature, and X is the binary matrix of the sgRNA of coding, and n is
The quantity of sgRNA, y are the values of the digesting efficiency of the corresponding Cas9 of sgRNA.
Preferably, the L2 regularization specifically:
Preferably, the NDCG value that the personalized sgRNA established designs a model is calculated in the step 31) specifically:
Wherein, DCG is the numerical value calculated with prediction sequence, and IDCG is to calculate resulting ideal DCG with true sequence,
reliIt is the ranking value of the i-th position prediction.
Preferably, design rule in the step 41) specifically:
20bp+PAM
Wherein, bp is the unit for indicating DNA length, and PAM is that sgRNA identifies that the required mark segment of DNA, "+" indicate DNA
Length meets 20bp, simultaneously containing the required mark segment PAM of sgRNA identification DNA.
Compared with prior art, the invention has the following advantages:
(1) it is directed to different plant species different type cell, has used personalized strategy, and with the machine learning of data-driven
Algorithm is modeled, and assessment accuracy rate has significant improvement.
(2) it is not limited only between PAM and spacer using new coding rule so that the feature found is more complete.
(3) process that user oneself constructs model is imparted, so that application range is wider, is not limited only in database only
Some species.
(4) it uses the OTF rate of NGS data as digestion rate, expands the range that can analyze data;
(5) data that user can upload oneself come expanding data library, accelerate the accumulation of data, advantageously account for now
Because data volume deficiency leads to the predicament that cannot design optimal sgRNA very well.
Detailed description of the invention
Fig. 1 is the method flow diagram for establishing personalization sgRNA model and model evaluation;
Fig. 2 is the method flow diagram of design and assessment sgRNA.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention
Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to
Following embodiments.
Abbreviation explanation:
CRISPR:Clustered regularly interspaced short palindromic repeats
The small palindrome repetitive sequence at the interval of the rule of cluster
Cas9: the relevant enzyme with CRISPR II type system
NGS:Next Generation Sequencing, the sequencing of two generations
PAM:Protospacer-adjacent motif, sgRNA identify that DNA is required and indicate segment
The RNA of guiding function is played in sgRNA:CRISPR/Cas9 system
Insertion, deletion on DNA caused by indel:CRISPR/Cas9 is edited
20 or so bases of base pair complementarity are played in spacer:sgRNA
OTF:out of frame, frameshift mutation.
Read: reading length, is the sequencing sequence that a reaction obtains in high-flux sequence.
The present embodiment provides the design methods of sgRNA based on CRISPR/Cas9 a kind of, for different plant species different type
Cell establishes the process that oneself personalization sgRNA designs a model, and model can be established according to different demands and designs sgRNA, tool
Body includes following four step:
(1) data collection: the receipt being collected into from document is generally two classes: sgRNA and corresponding digesting efficiency number
Value type or sgRNA and corresponding digesting efficiency classifying type (such as effective or invalid two classification);It is downloaded from SRA database
NGS then only have numeric type a kind of.Because NGS data pass through the numeric type one collected in process and document after counting OTF rate
It causes, therefore the present embodiment is only illustrated two kinds of data of document classification type and NGS.
Classifying type data: for the classifying type data collected from document, it is in vain 0 that the present embodiment regulation, which is effectively 1,
It is organized into the format such as table 1.
Table 1
sgID | Sequence | Score |
sgRNA_1 | CGCAACCTGCTCAGCGCCTACGG | 1 |
sgRNA_2 | CAGTCTACATAACACGCCCATGG | 1 |
sgRNA_3 | CGCAACCTGCTCAGCGCCTACGG | 1 |
…… | …… | …… |
sgRNA_1_1 | GGCAACCGTGGCGGCAATCGAGG | 0 |
sgRNA_2_2 | CTTCTCGGAATTCGGTGAAGGTGG | 0 |
sgRNA_3_3 | AACCTCCCGGCTTCTCGGAATTCGG | 0 |
…… | …… | …… |
Numeric type data: for the numeric type data of NGS, first by BWA respectively the sequence of sgRNA and NGS
Reads is compared to the mankind with reference on genome, takes out the reads comprising sgRNA, and judge whether generate indel in cut point
And whether indel is OTF, then counts OTF rate (the OTF rate=include the sgRNA and be OTF of each sgRNA
The sum of reads is divided by total reads number comprising the sgRNA).It finally arranges as the format such as table 2.
Table 2
sgID | Sequence | Score |
sgRNA_1 | CGCAACCTGCTCAGCGCCTACGG | 0.2345 |
sgRNA_2 | CAGTCTACATAACACGCCCATGG | 0.7846 |
sgRNA_3 | CGCAACCTGCTCAGCGCCTACGG | 0.2367 |
…… | …… | …… |
(2) model is established: as shown in Figure 1, from the corresponding sequence information for extracting the sgRNA being collected into reference to genome.
Assuming that setting upstream and downstream sequence is respectively 35 and 32 bases, then the sequence taken out is 90 (35+20+3+32) a bases.CACC
TGGTATGTTCGTATCGGGCAGAATATCGCAACCTGCTCAGCGCCTACGGTCCATCTCGCTCAGGTACGACTGACCG
ACCCAGTCTA。
Binary coding is carried out to the sgRNA information of extraction, rule is as shown in table 3.
Table 3
Then 90 bases of the above taking-up may be encoded as:
Feature is extracted with machine learning method, personalization sgRNA is established and designs a model.
For classifying type data, feature is selected with logistic regression and establishes prediction model.The recurrence of two sorted logics has two
A optional regularization, the present invention carry out feature selecting with L1 regularization, and model is established in L2 regularization.
The optimization problem of the following sparse features selection of L1 regularization logistic regression solution:
Wherein, w and c is the weight and intercept of estimative feature, and X is the character representation of training sample, and n is training sample
Quantity, y is the corresponding digesting efficiency value of sgRNA.
Cost function is minimized with L2 punishment logistic regression solution:
For numeric type data, make feature selecting of Lasso model, standard linear regression establishes prediction model.
Lasso is the linear model for estimating sparse related coefficient, mainly selects feature vector by extracting non-zero weight.Minimize mesh
Scalar functions are as follows:
Wherein, w is the weight of estimative feature vector, and x is the feature vector of the sgRNA selected, and n is training sample
Quantity, y is the corresponding digesting efficiency value of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso mould
Type is by increasing α | | w | |1This least square loss function is solved, traversal regularization matrix, the feature quilt of non-zero weight are passed through
It extracts, these features are considered being important the element for influencing sgRNA digesting efficiency.
After choosing these features, an assessment models then are established with a standard linear regression.
The modeling result of numeric type and classifying type all generates two files: one is xml document, and content includes selectable
The result of feature and cross validation;Another file is pkl file, and content is the prediction model established, binary file.
Xml document content is as follows:
<report>
<features method=" lasso " n=" 36 " ups=" 35 " dws=" 32 ">
<ups_33_C/>
<ups_30_G/>
<ups_22_G/>
<ups_21_G/>
<ups_20_A/>
<ups_14_G/>
<ups_13_C/>
<ups_9_G/>
<ups_8_A/>
<ups_5_G/>
<ups_1_A/>
<spa_1_C/>
<spa_2_C/>
<spa_4_A/>
<spa_5_T/>
<spa_6_C/>
<spa_8_A/>
<spa_9_C/>
<spa_9_T/>
<spa_10_C/>
<spa_15_T/>
<spa_17_A/>
<spa_19_C/>
<pam_1_G/>
<pam_2_G/>
<dws_2_A/>
<dws_6_G/>
<dws_12_G/>
<dws_13_C/>
<dws_14_C/>
<dws_15_A/>
<dws_24_A/>
<dws_24_G/>
<dws_26_G/>
<dws_26_T/>
<dws_28_A/>
</features>
<cross_validation fold=" 5 ">
<metric>
<pearson_cor value="0.862"/>
<r2value=" 0.683 "/>
</metric>
</cross_validation>
</report>
(3) quality of prediction model, NDCG (Normalized Discounted assessment models: are measured using NDCG algorithm
Storage gain is lost in Cumulative Gain, normalization) it is the efficiency for being mainly used to measure an order models, its value generation
Table prediction ranking results and actual sequence between similitude, range between zero and one, 1 indicate completely the same, numerical value
It is better to represent this model more greatly.Specific formula is as follows:
DCG (Discounted Cumulative Gain, lose storage gain) is the numerical value calculated with prediction sequence,
IDCG (ideal DCG), is ideal DCG, calculates gained with true sequence.The mathematical definition of DCG is as follows:
Wherein, reliIt is the ranking value of the i-th position prediction.
As shown in the table, sgID is the title of sgRNA, and seq is the spacer sequence of sgRNA, and Benchmark Score is
Benchmark score, BS_rank are the sequence of Benchmark Score, and Cage is the score of prediction model of the present invention assessment, C_rank
It is as shown in table 4 for the sequence of Cage.
Table 4
sgID | seq | Benchmark Score | BS_rank | Cage Score | C_rank |
sg1000 | GCAGGTACCCTGCAACGTCGCGG | 0.789456865 | 1 | 0.6905 | 1 |
sg1001 | CTCCACTAGTCCCCGCGCCGCGG | 0.506422166 | 2 | 0.6026 | 2 |
sg1 | GTAATGGCTTCCTCGTGAGTTGG | 0.325738326 | 3 | 0.5548 | 3 |
sg1002 | GACTCCGTTGGGATCCGCGCCGG | 0.092078991 | 4 | 0.5095 | 4 |
sg10 | ATCTTAAGCAAACGCTTACCAGG | 0.072255575 | 5 | 0.4959 | 5 |
sg1003 | CCCGAAACGGTTGACTCCGTTGG | 0.037552375 | 6 | 0.4473 | 6 |
sg1004 | AGGCGCGCGATCCAGGTAGCTGG | 0.019922477 | 7 | 0.3281 | 8 |
sg100 | AAAAAGCTGATGAAGTTGTTTGG | 0.017296539 | 8 | 0.3357 | 7 |
sg1005 | CGGGGCCACCGCGACGTTGCAGG | 0.002206787 | 9 | 0.3056 | 9 |
…… | …… | …… | …… | …… | …… |
TOP50 NDCG=0.876322904
TOP 10%NDCG=0.84340749
If there is no this model in database, updates and arrive database, otherwise calculate two groups of NDCG value and be compared, if
New model is bigger than the NDCG value of existing model, then database may be updated.
(4) design and assess: as shown in Fig. 2, for user designed sgRNA assess or for user to
Genome area (such as chromosome 1,1,000,000to 1,002,000, hg19) out, carries out the design of sgRNA, first
The species or cell type of the first determination sgRNA to be assessed, then select suitable model to be assessed, if without suitable
Model, similar model may be selected, present embodiments provide be related to 3 species, 8 kinds of cells 10 models it is for selection
It uses.As a result output is as shown in table 5.
Table 5
So far, user can choose the research for being suitble to the sgRNA of oneself demand to carry out next step.
Claims (7)
1. a kind of design method of the sgRNA based on CRISPR/Cas9, which is characterized in that this method includes the following steps:
1) value of the digesting efficiency of sgRNA and corresponding Cas9 is obtained, specifically:
11) value of the digesting efficiency of sgRNA and corresponding Cas9 is obtained from document;
12) sgRNA is obtained from SRA database, calculates the value for obtaining the digesting efficiency of corresponding Cas9;
13) according to species, cell type and experiment condition by step 11) and 12) in the data classification that gets at different ginsengs
Examine genome, it is each to be classified as sgRNA title, second be classified as sgRNA sequence and the with reference to listing portion first in genome
Three are classified as the table of the digesting efficiency of corresponding Cas9;
2) personalization sgRNA is established to design a model, specifically:
21) according to demand from accordingly with reference in genome, extraction step 1) in the sequence information of sgRNA that obtains;
22) binary coding is carried out according to binary rules to the sgRNA sequence information extracted in step 21);
23) to the sgRNA obtained in step 21), judge the data type of the digesting efficiency of its Cas9, then enter if numeric type
25) step 24) is then entered step if classifying type;
Classifying type data: being in vain 0 for the classifying type data collected from document, it is specified that being effectively 1;
Numeric type data: for the numeric type data of NGS, first by BWA respectively the reads of the sequence of sgRNA and NGS ratio
To the mankind with reference on genome, take out include sgRNA reads, and judge cut point whether generate indel and
Whether indel is OTF, then counts the OTF rate of each sgRNA, OTF rate=comprising the sgRNA and be OTF reads
Sum is divided by total reads number comprising the sgRNA;
24) to the sgRNA sequence information after coding in step 22), feature extraction is carried out with Lasso model, according to normal linearity
Foundation personalization sgRNA is returned to design a model;
25) to the sgRNA sequence information after coding in step 22), feature is carried out with the L1 regularization in the recurrence of two sorted logics
Selection establishes personalization sgRNA further according to the L2 regularization in the recurrence of two sorted logics and designs a model;
3) quality that designs a model of personalized sgRNA established in step 2) is measured with NDCG algorithm and updates SRA database,
Specifically:
31) the NDCG value that the personalized sgRNA established in step 2) designs a model is calculated;
32) judge whether there is corresponding personalization sgRNA model in existing SRA database, if being otherwise added to SRA data
Library, if then entering step 33);
33) compare personalization sgRNA model and the sgRNA model in corresponding SRA database, select one that NDCG value is big
It is stored in SRA database;
4) it designs sgRNA and provides the assessed value of each sgRNA, specifically:
41) genome area provided according to user is chosen from SRA database and suitably refers to genome, therefrom searches for institute
There is the sgRNA for meeting design rule, as the sgRNA of design;
42) it to the sgRNA designed in step 41), is assessed with the personalized sgRNA model established in step 2).
2. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
12) value of the digesting efficiency of corresponding Cas9 is calculated in specifically:
121) in the long comparison to reference genome of reading sgRNA and corresponding two generation being sequenced;
122) it is long to take out the reading comprising sgRNA;
123) judge whether cut point generates the insertion on DNA or the insertion on deletion and DNA or delete whether to be frameshit
Mutation;
124) the frameshift mutation rate of each sgRNA is counted, specifically:
125) using the frameshift mutation rate being calculated in step 124) as the value of the digesting efficiency of Cas9.
3. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
21) sequence information of sgRNA includes the spacer of sgRNA sequence, the required mark segment of sgRNA identification DNA and sgRNA in
Upstream and downstream base, the bases longs of the upstream and downstream of the spacer of the sgRNA are the value of platform default value or user setting.
4. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
22) binary rules in specifically: corresponding 0001, the N corresponding 0000 of corresponding 0010, the T of corresponding 0100, the G of corresponding 1000, the C of A.
5. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
24) carrying out feature extraction with Lasso model in is to select feature vector by extracting non-zero weight, specifically:
Wherein, w is the weight of estimative feature vector, and x is the feature vector of the sgRNA selected, and n is the quantity of sgRNA,
Y is the value of the digesting efficiency of the corresponding Cas9 of sgRNA;α is a constant, | | w | |1It is the matrix of parameter vector;Lasso model
By increasing α | | w | |1This least square loss function is solved, by traversing regularization matrix, the feature of non-zero weight is mentioned
It takes out.
6. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
31) the NDCG value that the personalized sgRNA established designs a model is calculated in specifically:
Wherein, DCG is the numerical value calculated with prediction sequence, and IDCG is to calculate resulting ideal DCG, rel with true sequenceiIt is
The ranking value of i-th position prediction.
7. the design method of the sgRNA according to claim 1 based on CRISPR/Cas9, which is characterized in that the step
41) design rule in specifically:
20bp+PAM
Wherein, bp is the unit for indicating DNA length, and PAM is that sgRNA identifies that the required mark segment of DNA, "+" indicate DNA length
Meet 20bp, simultaneously containing the required mark segment PAM of sgRNA identification DNA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446600A CN106446600A (en) | 2017-02-22 |
CN106446600B true CN106446600B (en) | 2019-10-18 |
Family
ID=58183551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610341946.3A Expired - Fee Related CN106446600B (en) | 2016-05-20 | 2016-05-20 | A kind of design method of the sgRNA based on CRISPR/Cas9 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446600B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US20150044192A1 (en) | 2013-08-09 | 2015-02-12 | President And Fellows Of Harvard College | Methods for identifying a target site of a cas9 nuclease |
US9359599B2 (en) | 2013-08-22 | 2016-06-07 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US9526784B2 (en) | 2013-09-06 | 2016-12-27 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
US9340799B2 (en) | 2013-09-06 | 2016-05-17 | President And Fellows Of Harvard College | MRNA-sensing switchable gRNAs |
US9388430B2 (en) | 2013-09-06 | 2016-07-12 | President And Fellows Of Harvard College | Cas9-recombinase fusion proteins and uses thereof |
US9840699B2 (en) | 2013-12-12 | 2017-12-12 | President And Fellows Of Harvard College | Methods for nucleic acid editing |
EP3177718B1 (en) | 2014-07-30 | 2022-03-16 | President and Fellows of Harvard College | Cas9 proteins including ligand-dependent inteins |
EP3365356B1 (en) | 2015-10-23 | 2023-06-28 | President and Fellows of Harvard College | Nucleobase editors and uses thereof |
GB2568182A (en) | 2016-08-03 | 2019-05-08 | Harvard College | Adenosine nucleobase editors and uses thereof |
AU2017308889B2 (en) | 2016-08-09 | 2023-11-09 | President And Fellows Of Harvard College | Programmable Cas9-recombinase fusion proteins and uses thereof |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
KR102622411B1 (en) | 2016-10-14 | 2024-01-10 | 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 | AAV delivery of nucleobase editor |
WO2018119359A1 (en) | 2016-12-23 | 2018-06-28 | President And Fellows Of Harvard College | Editing of ccr5 receptor gene to protect against hiv infection |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
WO2018165629A1 (en) | 2017-03-10 | 2018-09-13 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
EP3601562A1 (en) | 2017-03-23 | 2020-02-05 | President and Fellows of Harvard College | Nucleobase editors comprising nucleic acid programmable dna binding proteins |
WO2018209320A1 (en) | 2017-05-12 | 2018-11-15 | President And Fellows Of Harvard College | Aptazyme-embedded guide rnas for use with crispr-cas9 in genome editing and transcriptional activation |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
EP3676376A2 (en) | 2017-08-30 | 2020-07-08 | President and Fellows of Harvard College | High efficiency base editors comprising gam |
KR20200121782A (en) | 2017-10-16 | 2020-10-26 | 더 브로드 인스티튜트, 인코퍼레이티드 | Uses of adenosine base editor |
CN110751982B (en) * | 2018-07-04 | 2023-11-10 | 广州赛业百沐生物科技有限公司 | Intelligent parallelization knockout strategy screening method and system |
BR112021018606A2 (en) | 2019-03-19 | 2021-11-23 | Harvard College | Methods and compositions for editing nucleotide sequences |
CN111261223B (en) * | 2020-01-12 | 2022-05-03 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
DE112021002672T5 (en) | 2020-05-08 | 2023-04-13 | President And Fellows Of Harvard College | METHODS AND COMPOSITIONS FOR EDIT BOTH STRANDS SIMULTANEOUSLY OF A DOUBLE STRANDED NUCLEOTIDE TARGET SEQUENCE |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111881324B (en) * | 2020-07-30 | 2023-12-15 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data general storage format structure, construction method and application thereof |
CN117252306B (en) * | 2023-10-11 | 2024-02-27 | 中央民族大学 | Gene editing capability index calculation method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
-
2016
- 2016-05-20 CN CN201610341946.3A patent/CN106446600B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
Non-Patent Citations (6)
Title |
---|
CRISPR/Cas9系统中sgRNA设计与脱靶效应评估;谢胜松等;《遗传》;20151130;第37卷(第11期);第1125-1136页 * |
CRISPR/CAS系统介导的基因组大片段DNA编辑;王立人;《中国博士学位论文全文数据库基础科学辑》;20151015;第2015年卷(第10期);第A006-18页 * |
CRISPR-Cas9系统定向编辑TCR基因的sgRNA筛选;邵红伟等;《集美大学学报(自然科学版)》;20150731;第20卷(第4期);第265-270页 * |
CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR system in plants;Yang Lei et al;《Molecular Plant》;20140930;第7卷(第9期);第1494-1496页 * |
In Silico Predictive Modeling of CRISPR/Cas9 guide efficiency;Nicolo Fusi et al;《BioRxiv》;20150626;第1-31页 * |
Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9;John G Doench et al;《Nature Biotechnology》;20160118;第34卷;第184-191页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106446600A (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446600B (en) | A kind of design method of the sgRNA based on CRISPR/Cas9 | |
Zhao et al. | Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols | |
CN104504304B (en) | A kind of short palindrome repetitive sequence recognition methods of regular intervals of cluster and device | |
CA2424031C (en) | System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map | |
CN108319984B (en) | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level | |
CN108564117B (en) | SVM-based poverty and life assisting identification method | |
CN106446597B (en) | Several species feature selecting and the method for identifying unknown gene | |
CN107679367B (en) | Method and system for identifying co-regulation network function module based on network node association degree | |
JP2014505935A (en) | DNA sequence data analysis method | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
Hu et al. | Algorithm for discovering low-variance 3-clusters from real-valued datasets | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
Williams et al. | Plant microRNA prediction by supervised machine learning using C5. 0 decision trees | |
CN112349346A (en) | Method for detecting structural variations in genomic regions | |
CN114496092A (en) | miRNA and disease association relation prediction method based on graph convolution network | |
CN111462820A (en) | Non-coding RNA prediction method based on feature screening and integration algorithm | |
CN104966106A (en) | Biological age step-by-step predication method based on support vector machine | |
CN109545283B (en) | Method for constructing phylogenetic tree based on sequence pattern mining algorithm | |
CN116525010A (en) | Single-cell transcriptome double-source multi-cell filtering method, medium and equipment | |
CN115249538B (en) | Construction method of lncRNA-disease associated prediction model for generating countermeasure network based on heterogeneous graph | |
CN115394348A (en) | IncRNA subcellular localization prediction method, equipment and medium based on graph convolution network | |
CN108595914A (en) | One grows tobacco mitochondrial RNA (mt RNA) editing sites high-precision forecasting method | |
CN114093420A (en) | XGboost-based DNA recombination site prediction method | |
CN113035279A (en) | Parkinson disease evolution key module identification method based on miRNA sequencing data | |
Mu et al. | Investigation on tree molecular genome of Arabidopsis thaliana for internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191018 |
|
CF01 | Termination of patent right due to non-payment of annual fee |