CN110070912A - A kind of prediction technique of CRISPR/Cas9 undershooting-effect - Google Patents
A kind of prediction technique of CRISPR/Cas9 undershooting-effect Download PDFInfo
- Publication number
- CN110070912A CN110070912A CN201910299222.0A CN201910299222A CN110070912A CN 110070912 A CN110070912 A CN 110070912A CN 201910299222 A CN201910299222 A CN 201910299222A CN 110070912 A CN110070912 A CN 110070912A
- Authority
- CN
- China
- Prior art keywords
- sample
- dna
- sgrna
- crispr
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of prediction techniques of CRISPR/Cas9 undershooting-effect, which comprises the steps of: 1) building includes the data set of positive sample and negative sample;2) sample data set is encoded and feature is added;3) sample data is handled using the method for feature selecting;4) BroadLearning fallout predictor is constructed.This method predetermined speed is fast, precision is high.
Description
Technical field
The present invention relates to gene technology, the prediction technique of specifically a kind of CRISPR/Cas9 undershooting-effect.
Background technique
Since CRISPR/Cas9 technology is applied to gene editing field for the first time, CRISPR/Cas9 system is swept across rapidly
Life science causes the dramatic change of gene editing technology.CRISPR/Cas9 system is turned after Zinc finger nuclease, class
Third generation genome after record activity factor effector nuclease determines editing technique, can DNA sequence dna to specific position into
Edlin and modification.Preceding two generations genome editing technique removes identification DNA sequence dna by protein-specific, and CRISPR passes through base
It is the DNA sequence dna of 20nt that complementary pairing, which goes successful match length, to position target dna, therefore has better versatility.
CRISPR/Cas9 system is made of CRISPR sequential element and Cas9 nuclease.Cas9 nuclease is crRNA's and tracRNA
Under guidance, before having between region sequence targeted adjacent to the DNA double chain of motif (PAM, usually NGG, N are arbitrary base)
Cutting forms DNA double chain break.There are potential undershooting-effects for CRISPR/Cas9 system target biology genome.Cas9 core
Sour enzyme has certain fault-tolerant ability to sgRNA and target dna sequence Mismatching.SgRNA is in addition to cutting target site DNA chain
In addition, it is also possible to, with the non-targeted DNA sequence dna local matching compared with high homology, activate the digestion of Cas9 nucleic acid with target site
Non-targeted DNA sequence dna is cut, undershooting-effect is generated.Undershooting-effect can generate a large amount of non-targeted cutting to the gene of genome, make
At uncontrollable influence, this is also the greatest problem that CRISPR/Cas9 system is used for clinical application.
One important research direction of current CRISPR/Cas9 system is exactly to predict targeting hit efficiency and position of missing the target
It sets, effect that the accurate interaction for predicting CRISPR system and DNA sequence dna can be used to maximize targeting activity and minimum is missed the target
It answers.Current most of existing CRISRP/Cas9 undershooting-effect design tools are all simply by sequences match score and alkali
Site of missing the target is searched in base mispairing.Other tools miss the target site score by designing one come the effect of missing the target in site of predicting to miss the target
Rate.The undershooting-effect of external research is predicted mainly to be predicted by sequence similarity and physicochemical properties at present, and
It is domestic that more CRISPR/Cas9 sequence is predicted using convolutional neural networks.It is main in terms of the technology used by these achievements
It is divided into: based on support vector machines (SVM) (Wong N, Liu W, Wang X.WU-CRISPR:characteristics of
Functional guide RNAs for the CRISPR/Cas9system.Genome Biol, 2015,16:218.), base
In random forest (Abadi S, Yan WX, Amar D, Mayrose I.A machine learning approach for
predicting CRISPR-Cas9cleavage efficiencies and patterns underlying its
Mechanism of action.PLoS Comput Biol, 2017,13 (10): e1005807.), it is based on convolutional neural networks
(Kim HK,Min S,Song M,Jung S,Choi JW,Kim Y,Lee S,Yoon S,Kim HH.Deep learning
improves prediction of CRISPR-Cpf1guide RNA activity.Nat Biotechnol,2018,36
(3): 239-241.), logic-based returns (Prykhozhij SV, Rajan V, Gaston D, Berman JN.CRISPR
multitargeter:a web tool to find common and unique CRISPR single guide RNA
Targets in a set of similar sequences.PLoS One, 2015,10 (3): e0119372.), it is based on pattra leaves
This analysis (Hart T, Moffat J.BAGEL:a computational framework for identifying
essential genes from pooled library screens.BMC Bioinformatics,2016,17:164.)。
The above technology is applied to machine learning method in the prediction of CRISPR/Cas9 undershooting-effect.
Summary of the invention
It is an object of the invention in order to overcome by biochemical test establish CRISPR/Cas9 undershooting-effect library it is time-consuming and at
This high and conventional method short time consumption is long and precision of prediction is undesirable defect provides a kind of CRISPR/Cas9 and misses the target
The prediction technique of effect.This method predetermined speed is fast, precision is high.
Realizing the technical solution of the object of the invention is:
A kind of prediction technique of CRISPR/Cas9 undershooting-effect, points unlike the prior art are, include the following steps:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the reality of HTGTS, BLESS
It tests and obtains positive sample in data, sgRNA is mapped in human genome using bowtie2 program, lookup and target dna sequence
Mismatch DNA sequence dna of the number less than 4, as possible sequence of missing the target, obtained sequence be length be 23 bases and with
The DNA sequence dna of NGG ending, wherein N is any one in ACGT, and it is obtained i.e. that positive sample sequence is removed from these sequences
For negative sample, over-sampling is carried out to positive sample using Bootstrap method, and from negative sample carry out lack sampling therefrom choose with just
The identical negative sample of number of samples, since human genome data are huge, the negative sample of finally obtained enormous amount be will lead to
The very big imbalance of positive negative sample, this imbalance can adversely affect training process and even result in failure to train, therefore
It needs to carry out resampling to the huge negative sample of quantity, this extreme imbalance is solved, here using Bootstrap method to sample
This is sampled: making have a resampling put back in the range of n initial data, sample size is still n, each in initial data
The probability that observation unit is pumped to every time is equal, is 1/n, obtained sample is known as Bootstrap sample, as reference data
Collect S, formula (1) can be formulated as:
Wherein subsetOnly comprising positive sample, that is, sgRNA sequence and in practice can be in conjunction with CRISPR/Cas9 system sgRNA
Sequence of missing the target, subsetOnly comprising in negative sample, that is, sgRNA sequence human's genome only there are four and following mismatch but not
The union that two sequences combine can be indicated with the site in conjunction with sgRNA, ∪;
2) encoded to sample data set and feature is added: the sgRNA sequence and DNA sequence dna obtain to step 1) carries out
One-hot coding, obtains sequence vector, adds CFD score, CCTop score, CRISTA score, GC of the sgRNA with DNA pairs
Content, mispairing number and sgRNA-DNA sequence similarity score, obtain feature vector, while generating corresponding two tag along sort,
Wherein, CRISTA score is based on random forest and regression model, it is contemplated that DNA protrusion and RNA are extracted to sgRNA editorial efficiency
It influences, in conjunction with genome nucleotide acid content, sgRNA macroscopic property, sgRNA and target dna sequence base similitude etc., finally
CRISTA score is generated,
CCTop score calculates the mismatch score that misses the target according to formula (2) first for each target of missing the target:
scoreoff-target=∑mismatch1.2pos(2),
Wherein pos indicates the position that mispairing occurs in sequence of missing the target, and calculates by 5 ' ends, and recycling is missed the target mismatch score
Calculating is missed the target score, such as formula (3):
Wherein dist indicates each to miss the target site to the distance of nearest exon accordingly, and totaLoff_targets is
Miss the target number of loci, this score only considers sequence of missing the target relevant to exon,
CFD score calculates in CD33 data sgRNA and DNA sequence dna in the case of each position mispairing and different PAM first
The reaction efficiency of single mispairing is then multiplied by reaction efficiency, is used as multiple mismatches, such as a sgRNA-DNA is in place
It sets 3 and A:G mispairing has occurred, T:C mispairing has occurred in position 5, and PAM type is ' AG ', this sgRNA-DNA couples CFD
Score is CFDscore=P (active | A:G, 3) × P (active | T:C, 5) × P (active | AG), and each of these item is all
It is that the frequency observed from CD33 data is calculated, CFD score is represented by formula (4):
Wherein Y=1 indicates that sgRNA can react with DNA sequence dna, Xi=1 indicates to occur in the mispairing of the position i,
G/C content is the ratio of ' G ' and ' C ' two total base numbers of base number Zhan in DNA sequence dna, can pass through formula
(5) it calculates:
Mispairing number is sgRNA sequence and the unmatched number of DNA sequence dna, and sgRNA-DNA similarity score is sgRNA
Sequence matches the ratio that number accounts for sequence total length with DNA sequence dna;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained vector
It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature of 190 dimensional vectors
Importance, then before feature importance ranking 150 vector progress training in next step is therefrom selected, this step had both reduced vector dimension
Degree, accelerates subsequent training speed, also by redundancy feature is reduced, improves training precision, feature importance uses Gini system
Number is defined as follows:
For two classification problems, the target value of classification is 0 or 1, for node m, NmThe secondary obtained region of observation is Rm,
It enables
Wherein pmkFor classification k ratio what is observed in node m, k is classification i.e. 0 or 1, yiFor predicted value, thus obtain
It is formula (7) to Gini coefficient:
H(Xm)=∑kpmk(1-pmk) (7), it is clear that H (Xm) value is bigger, illustrate this feature discrimination energy with higher
Power, therefore, can the size based on value come ranking and then select to need the feature that retains and cast out those useless features;
4) it constructs BroadLearning fallout predictor: being used as using width learning algorithm addition BP tune ginseng and be directed to CRISPR/
The Broad learning fallout predictor of Cas9 undershooting-effect obtains prediction result and determines whether sgRNA and DNA sequence dna can occur instead
It answers, predicts the undershooting-effect of CRISPR/Cas9 system,
It selects Broad learning as fallout predictor, training sample is trained, first construction feature mapping node,
For given input dataIt is defined as N number of sample of M dimensional feature, generates weight at randomAnd deviation
For activation primitive, Feature Mapping node definition is formula (8):
Building enhancing node, generates weight at random in Feature Mapping nodal basisAnd deviationξjTo activate letter
Number, enhancing node definition are formula (9):
Wherein Zn=[Z1,…,Zn], finally obtained output isN is sample number, and C is sample class, output
Y may be defined as formula (10):
Wherein,[Zn|Hm]+For [Zn|Hm] pseudoinverse, then using BP algorithm to weight carry out
Ginseng is adjusted, final weight is obtained, finally, obtaining ten width learning models by distributing different weights to width learning model, obtaining
Final output is obtained to the output of ten classifiers, then by temporal voting strategy.
BroadLearning fallout predictor is constructed based on Broad learning algorithm, sequence is being added in this fallout predictor
On the basis of information and sequence physical chemical property, using integrated learning approach, CRISTA score described in step 2) is added,
CCTOP score and CFD score are trained as training parameter, greatly improve the accuracy rate of model.
The technical program is compared with existing Predicting Technique:
(1) time-consuming short: Broad learning compares traditional depth structure, does not need the complicated fortune for carrying out multilayer mechanism
It calculates, and does not need to carry out backpropagation using BP algorithm to adjust weight, can greatly shorten the training time;
(2) accuracy rate is high: CRISPR/Cas9 undershooting-effect fallout predictor of the designed, designed based on Broad learning, energy
Feature is enough efficiently extracted, and then improves predictablity rate.
This method predetermined speed is fast, precision is high.
Detailed description of the invention
Fig. 1 is prediction technique schematic illustration in embodiment.
Specific embodiment
The contents of the present invention are further elaborated with reference to the accompanying drawings and examples, but are not to limit of the invention
It is fixed.
Embodiment:
Referring to Fig.1, a kind of prediction technique of CRISPR/Cas9 undershooting-effect, includes the following steps:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the reality of HTGTS, BLESS
It tests and obtains positive sample in data, sgRNA is mapped in human genome using bowtie2 program, lookup and target dna sequence
Mismatch DNA sequence dna of the number less than 4, as possible sequence of missing the target, obtained sequence be length be 23 bases and with
The DNA sequence dna of NGG ending, wherein N is any one in ACGT, and it is obtained i.e. that positive sample sequence is excluded from these sequences
For negative sample, over-sampling is carried out to positive sample using Bootstrap method, and from negative sample carry out lack sampling therefrom choose with just
The identical negative sample of number of samples, since human genome data are huge, the negative sample of finally obtained enormous amount be will lead to
The very big imbalance of positive negative sample, this imbalance can adversely affect training process and even result in failure to train, therefore
It needs to carry out resampling to the huge negative sample of quantity, this extreme imbalance is solved, here using Bootstrap method to negative
Sample carries out stochastical sampling: making have the resampling put back in the range of n initial data, sample size is still n, initial data
In the probability that is pumped to every time of each observation unit it is equal, be 1/n, obtained sample becomes Bootstrap sample, as base
Quasi- data set S can be formulated as formula (1):
Wherein, subsetOnly comprising that positive sample, that is, sgRNA sequence and can be tied in practice with CRISPR/Cas9 system sgRNA
The sequence of missing the target closed, subsetOnly comprising meeting preset condition in negative sample, that is, sgRNA sequence human's genome but cannot be with
The site that sgRNA is combined, ∪ indicate the union that two sequences combine, and this example obtains 1744 samples, wherein positive and negative sample is
872;
2) encoded to sample data set and feature is added: the sgRNA sequence and DNA sequence dna obtain to step 1) carries out
One-hot coding, obtains 23*4*2=184 dimensional vector, add sgRNA and DNA pairs of CFD score, CCTop score,
CRISTA score, G/C content, mispairing number and sgRNA-DNA sequence similarity score, obtain 190 dimensional vectors, while generating phase
0 answered, 1 two tag along sorts, wherein CRISTA score is based on random forest and regression model, it is contemplated that DNA protrusion and RNA are mentioned
The influence that steamed stuffed bun by small bamboo food steamer is edited to sgRNA is taken, in conjunction with genome nucleotide acid content, sgRNA macroscopic property, sgRNA and target dna
Series similitude etc. ultimately generates CRISTA score,
CCTop score calculates the mismatch score that misses the target according to formula (2) first for each target of missing the target:
scoreoff-target=∑mismatch1.2pos(2),
Wherein pos indicates the position that mispairing occurs in sequence of missing the target, and calculates by 5 ' ends, and recycling is missed the target mismatch score
Calculating is missed the target score, such as formula (3):
Wherein dist indicates each to miss the target site to the distance of nearest exon accordingly, and totaLoff_targets is
Miss the target number of loci, this score only considers sequence of missing the target relevant to exon,
CFD score calculates in CD33 data sgRNA and DNA sequence dna in the case of each position mispairing and different PAM first
The reaction efficiency of single mispairing is then multiplied by reaction efficiency, is used as multiple mismatches, such as a sgRNA-DNA is in place
It sets 3 and A:G mispairing has occurred, T:C mispairing has occurred in position 5, and PAM type is ' AG ', this sgRNA-DNA couples CFD
Score is CFDscore=P (active | A:G, 3) × P (active | T:C, 5) × P (active | AG), and each of these item is all
It is that the frequency observed from CD33 data is calculated, CFD score is represented by formula (4):
Wherein Y=1 indicates that sgRNA can react with DNA sequence dna, Xi=1 indicates to occur in the mispairing of the position i,
G/C content is the ratio of ' G ' and ' C ' two total base numbers of base number Zhan in DNA sequence dna, can pass through formula
(5) it calculates:
Mispairing number is sgRNA sequence and the unmatched number of DNA sequence dna, and sgRNA-DNA similarity score is sgRNA
Sequence matches the ratio that number accounts for sequence total length with DNA sequence dna;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained vector
It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature of 190 dimensional vectors
Importance, then before feature importance ranking 150 vector progress training in next step is therefrom selected, this step had both reduced vector dimension
Degree, accelerates subsequent training speed, also by redundancy feature is reduced, improves training precision;Feature importance is adopted in this example
It is defined as follows with Gini coefficient:
For two classification problems, the target value of classification is 0 or 1, for node m, NmThe secondary obtained region of observation is Rm,
It enables
Wherein pmkFor classification k ratio what is observed in node m, k is classification i.e. 0 or 1, yiFor predicted value, thus obtain
It is formula (7) to Gini coefficient:
H(Xm)=∑kpmk(1-pmk) (7),
Obviously, H (Xm) value is bigger, illustrate this feature resolving ability with higher, therefore, can the size based on value come
Ranking and then selection need the feature retained and cast out those useless features, are taken 150 before feature importance ranking in final this example
Feature vector as final data, obtain 1744*150 dimensional vector;
4) it constructs BroadLearning fallout predictor: being used as using width learning algorithm addition BP tune ginseng and be directed to CRISPR/
The Broad learning fallout predictor of Cas9 undershooting-effect obtains prediction result and determines whether sgRNA and DNA sequence dna can occur instead
It answers, predicts the undershooting-effect of CRISPR/Cas9 system,
This example selects Broad learning as fallout predictor, is trained to training sample, and construction feature maps first
Node, for given input dataIt is defined as N number of sample of M dimensional feature, generates weight at randomAnd deviation For activation primitive, activation primitive used is tanh, and Feature Mapping node definition is formula (8):
Building enhancing node, generates weight at random in Feature Mapping nodal basisAnd deviationξjTo activate letter
Number, activation primitive used are sigmoid, and enhancing node definition is formula (9):
Wherein Zn=[Z1,…,Zn], finally obtained output isN is sample number, and C is sample class, output
Y may be defined as formula (10):
Wherein,[Zn|Hm]+For [Zn|Hm] pseudoinverse, then using BP algorithm to weight carry out
Ginseng is adjusted, final weight is obtained, finally, obtaining ten width learning models by distributing different weights to width learning model, obtaining
Final output is obtained to the output of ten classifiers, then by temporal voting strategy.
Experiments have shown that:
The human genome CRISPR/Cas9 of the result predicted according to the method for this example and current mainstream is missed the target pre-
Survey method CFD approach (Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF,
Smith I,Tothova Z,Wilen C,Orchard R.Optimized sgRNA design to maximize
activity and minimize off-target effects of CRISPR-Cas9.Nat Biotechnol.2016:
34 (2): 184-91.) it compares, the results are shown in Table 1:
The Experimental comparison results of table 1 and CFD approach
Prediction technique | Accuracy | AUC value |
CFD score | 0.897 | 0.91 |
Broad learning | 0.923 | 0.93 |
As can be seen from Table 1, higher accuracy rate can be obtained compared to CFD score according to the method for this example, while can be with
Higher AUC value is obtained, this, which is represented, according to the fallout predictor that the method for this example obtains there is preferably prediction stability and classification to imitate
Fruit.
Claims (1)
1. a kind of prediction technique of CRISPR/Cas9 undershooting-effect, which comprises the steps of:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the experiment number of HTGTS, BLESS
According to middle acquisition positive sample, sgRNA is mapped in human genome using bowtie2 program, is not searched with target dna sequence not
DNA sequence dna with number less than 4, as possible sequence of missing the target, obtained sequence is that length is 23 bases and is tied with NGG
The DNA sequence dna of tail, wherein N is any one in ACGT, and it is to be negative that it is obtained that positive sample sequence is excluded from these sequences
Sample carries out over-sampling to positive sample using Bootstrap method, and carries out lack sampling from negative sample and therefrom choose and positive sample
The same number of negative sample;
2) sample data set is encoded and feature is added: to being made of sgRNA sequence and DNA sequence dna of obtaining of step 1
Positive negative sample carries out one-hot coding, adds sgRNA and contains with DNA pairs of CFD score, CCTop score, CRISTA score, GC
Amount, mispairing number and sgRNA-DNA sequence similarity score this six features collectively form feature vector, while generating corresponding
Two tag along sorts;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained feature vector
It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature weight of feature vector
The property wanted, then the feature of feature importance ranking previous 105 is therefrom selected to carry out training in next step;
4) it constructs BroadLearning fallout predictor: BP tune ginseng is added as de- for CRISPR/Cas9 using width learning algorithm
The Broad learning fallout predictor of targeted effect obtains prediction result and determines whether sgRNA can react with DNA sequence dna, in advance
Survey the undershooting-effect of CRISPR/Cas9 system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910299222.0A CN110070912B (en) | 2019-04-15 | 2019-04-15 | Prediction method for CRISPR/Cas9 off-target effect |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910299222.0A CN110070912B (en) | 2019-04-15 | 2019-04-15 | Prediction method for CRISPR/Cas9 off-target effect |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110070912A true CN110070912A (en) | 2019-07-30 |
CN110070912B CN110070912B (en) | 2023-06-23 |
Family
ID=67367624
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910299222.0A Active CN110070912B (en) | 2019-04-15 | 2019-04-15 | Prediction method for CRISPR/Cas9 off-target effect |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070912B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
CN111489787A (en) * | 2020-04-21 | 2020-08-04 | 桂林电子科技大学 | Method for predicting efficiency of targeted knockout of fixed-point DNA (deoxyribonucleic acid) by CRISPR/Cas9 |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN112086145A (en) * | 2020-09-02 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Compound activity prediction method and device, electronic equipment and storage medium |
CN113611367A (en) * | 2021-08-05 | 2021-11-05 | 湖南大学 | CRISPR/Cas9 off-target prediction method based on VAE data enhancement |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710362A (en) * | 2009-12-10 | 2010-05-19 | 浙江大学 | microRNA target position point prediction method based on support vector machine |
WO2014093709A1 (en) * | 2012-12-12 | 2014-06-19 | The Broad Institute, Inc. | Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof |
CA2923758A1 (en) * | 2013-09-27 | 2015-04-02 | Codexis, Inc. | Structure based predictive modeling |
WO2015089364A1 (en) * | 2013-12-12 | 2015-06-18 | The Broad Institute Inc. | Crystal structure of a crispr-cas system, and uses thereof |
WO2015113063A1 (en) * | 2014-01-27 | 2015-07-30 | Georgia Tech Research Corporation | Methods and systems for identifying crispr/cas off-target sites |
WO2016100974A1 (en) * | 2014-12-19 | 2016-06-23 | The Broad Institute Inc. | Unbiased identification of double-strand breaks and genomic rearrangement by genome-wide insert capture sequencing |
EP3130679A1 (en) * | 2015-08-13 | 2017-02-15 | Cladiac GmbH | Method and test system for the detection and/or quantification of a target nucleic acid in a sample |
CN106446602A (en) * | 2016-09-06 | 2017-02-22 | 中南大学 | Prediction method and system for RNA binding sites in protein molecules |
US20170185892A1 (en) * | 2015-12-27 | 2017-06-29 | Beijing University Of Technology | Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network |
CN107742063A (en) * | 2017-10-20 | 2018-02-27 | 桂林电子科技大学 | A kind of prokaryotes σ54The Forecasting Methodology of promoter |
CN108549709A (en) * | 2018-04-20 | 2018-09-18 | 福州大学 | Fusion method of the multi-source heterogeneous data based on range learning algorithm inside and outside block chain |
CN109308935A (en) * | 2018-09-10 | 2019-02-05 | 天津大学 | A kind of method and application platform based on SVM prediction noncoding DNA |
-
2019
- 2019-04-15 CN CN201910299222.0A patent/CN110070912B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710362A (en) * | 2009-12-10 | 2010-05-19 | 浙江大学 | microRNA target position point prediction method based on support vector machine |
WO2014093709A1 (en) * | 2012-12-12 | 2014-06-19 | The Broad Institute, Inc. | Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof |
CA2923758A1 (en) * | 2013-09-27 | 2015-04-02 | Codexis, Inc. | Structure based predictive modeling |
WO2015089364A1 (en) * | 2013-12-12 | 2015-06-18 | The Broad Institute Inc. | Crystal structure of a crispr-cas system, and uses thereof |
WO2015113063A1 (en) * | 2014-01-27 | 2015-07-30 | Georgia Tech Research Corporation | Methods and systems for identifying crispr/cas off-target sites |
WO2016100974A1 (en) * | 2014-12-19 | 2016-06-23 | The Broad Institute Inc. | Unbiased identification of double-strand breaks and genomic rearrangement by genome-wide insert capture sequencing |
EP3130679A1 (en) * | 2015-08-13 | 2017-02-15 | Cladiac GmbH | Method and test system for the detection and/or quantification of a target nucleic acid in a sample |
US20170185892A1 (en) * | 2015-12-27 | 2017-06-29 | Beijing University Of Technology | Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network |
CN106446602A (en) * | 2016-09-06 | 2017-02-22 | 中南大学 | Prediction method and system for RNA binding sites in protein molecules |
CN107742063A (en) * | 2017-10-20 | 2018-02-27 | 桂林电子科技大学 | A kind of prokaryotes σ54The Forecasting Methodology of promoter |
CN108549709A (en) * | 2018-04-20 | 2018-09-18 | 福州大学 | Fusion method of the multi-source heterogeneous data based on range learning algorithm inside and outside block chain |
CN109308935A (en) * | 2018-09-10 | 2019-02-05 | 天津大学 | A kind of method and application platform based on SVM prediction noncoding DNA |
Non-Patent Citations (1)
Title |
---|
张桂珊等: "机器学习方法在CRISPR/Cas9系统中的应用", 《遗传》, vol. 40, no. 09, pages 704 - 723 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
CN111261223B (en) * | 2020-01-12 | 2022-05-03 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
CN111489787A (en) * | 2020-04-21 | 2020-08-04 | 桂林电子科技大学 | Method for predicting efficiency of targeted knockout of fixed-point DNA (deoxyribonucleic acid) by CRISPR/Cas9 |
CN111489787B (en) * | 2020-04-21 | 2023-05-12 | 桂林电子科技大学 | Prediction method for CRISPR/Cas9 targeted knockout site DNA efficiency |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN112086145A (en) * | 2020-09-02 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Compound activity prediction method and device, electronic equipment and storage medium |
CN112086145B (en) * | 2020-09-02 | 2024-04-16 | 腾讯科技(深圳)有限公司 | Compound activity prediction method and device, electronic equipment and storage medium |
CN113611367A (en) * | 2021-08-05 | 2021-11-05 | 湖南大学 | CRISPR/Cas9 off-target prediction method based on VAE data enhancement |
Also Published As
Publication number | Publication date |
---|---|
CN110070912B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110070912A (en) | A kind of prediction technique of CRISPR/Cas9 undershooting-effect | |
Kim et al. | SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance | |
Song et al. | Sequence-specific prediction of the efficiencies of adenine and cytosine base editors | |
Chen et al. | Identify key sequence features to improve CRISPR sgRNA efficacy | |
Lavarenne et al. | The spring of systems biology-driven breeding | |
Kleftogiannis et al. | YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features | |
Zheng et al. | Exponentially decreased dimension number strategy based dynamic search fireworks algorithm for solving CEC2015 competition problems | |
Zhang et al. | Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on-and off-target activities | |
Fonseca et al. | Ranking beta sheet topologies with applications to protein structure prediction | |
Sutanto et al. | Assessing the use of secondary structure fingerprints and deep learning to classify RNA sequences | |
Li et al. | Characteristics and prediction of RNA structure | |
Niu et al. | SgRNA-RF: identification of SgRNA on-target activity with imbalanced datasets | |
Angenent-Mari et al. | Deep learning for RNA synthetic biology | |
Zhang et al. | Subcellular localization prediction of human proteins using multifeature selection methods | |
Toufikuzzaman et al. | CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction | |
Jayasundara et al. | Machine learning for plant microrna prediction: A systematic review | |
Morgado et al. | Learning sequence patterns of AGO-sRNA affinity from high-throughput sequencing libraries to improve in silico functional small RNA detection and classification in plants | |
Manuweera et al. | Computational methods for the ab initio identification of novel microRNA in plants: a systematic review | |
Lan et al. | Optimized sgRNA design by deep learning to balance the off-target effects and on-target activity of CRISPR/Cas9 | |
Wen et al. | Sea-ATI unravels novel vocabularies of plant active cistrome | |
Won et al. | Modeling promoter grammars with evolving hidden Markov models | |
Chen et al. | Optimizing precision genome editing through machine learning | |
Sutanto et al. | Assessing global-local secondary structure fingerprints to classify RNA sequences with deep learning | |
Chen | Modeling of Functional Gene Regulation Through Machine Learning and Deep Learning Methods | |
Chen et al. | Integrating machine learning and genome editing for crop improvement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |