CN110070912A - A kind of prediction technique of CRISPR/Cas9 undershooting-effect - Google Patents

A kind of prediction technique of CRISPR/Cas9 undershooting-effect Download PDF

Info

Publication number
CN110070912A
CN110070912A CN201910299222.0A CN201910299222A CN110070912A CN 110070912 A CN110070912 A CN 110070912A CN 201910299222 A CN201910299222 A CN 201910299222A CN 110070912 A CN110070912 A CN 110070912A
Authority
CN
China
Prior art keywords
sample
dna
sgrna
crispr
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910299222.0A
Other languages
Chinese (zh)
Other versions
CN110070912B (en
Inventor
樊永显
徐海波
张向文
张龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201910299222.0A priority Critical patent/CN110070912B/en
Publication of CN110070912A publication Critical patent/CN110070912A/en
Application granted granted Critical
Publication of CN110070912B publication Critical patent/CN110070912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of prediction techniques of CRISPR/Cas9 undershooting-effect, which comprises the steps of: 1) building includes the data set of positive sample and negative sample;2) sample data set is encoded and feature is added;3) sample data is handled using the method for feature selecting;4) BroadLearning fallout predictor is constructed.This method predetermined speed is fast, precision is high.

Description

A kind of prediction technique of CRISPR/Cas9 undershooting-effect
Technical field
The present invention relates to gene technology, the prediction technique of specifically a kind of CRISPR/Cas9 undershooting-effect.
Background technique
Since CRISPR/Cas9 technology is applied to gene editing field for the first time, CRISPR/Cas9 system is swept across rapidly Life science causes the dramatic change of gene editing technology.CRISPR/Cas9 system is turned after Zinc finger nuclease, class Third generation genome after record activity factor effector nuclease determines editing technique, can DNA sequence dna to specific position into Edlin and modification.Preceding two generations genome editing technique removes identification DNA sequence dna by protein-specific, and CRISPR passes through base It is the DNA sequence dna of 20nt that complementary pairing, which goes successful match length, to position target dna, therefore has better versatility. CRISPR/Cas9 system is made of CRISPR sequential element and Cas9 nuclease.Cas9 nuclease is crRNA's and tracRNA Under guidance, before having between region sequence targeted adjacent to the DNA double chain of motif (PAM, usually NGG, N are arbitrary base) Cutting forms DNA double chain break.There are potential undershooting-effects for CRISPR/Cas9 system target biology genome.Cas9 core Sour enzyme has certain fault-tolerant ability to sgRNA and target dna sequence Mismatching.SgRNA is in addition to cutting target site DNA chain In addition, it is also possible to, with the non-targeted DNA sequence dna local matching compared with high homology, activate the digestion of Cas9 nucleic acid with target site Non-targeted DNA sequence dna is cut, undershooting-effect is generated.Undershooting-effect can generate a large amount of non-targeted cutting to the gene of genome, make At uncontrollable influence, this is also the greatest problem that CRISPR/Cas9 system is used for clinical application.
One important research direction of current CRISPR/Cas9 system is exactly to predict targeting hit efficiency and position of missing the target It sets, effect that the accurate interaction for predicting CRISPR system and DNA sequence dna can be used to maximize targeting activity and minimum is missed the target It answers.Current most of existing CRISRP/Cas9 undershooting-effect design tools are all simply by sequences match score and alkali Site of missing the target is searched in base mispairing.Other tools miss the target site score by designing one come the effect of missing the target in site of predicting to miss the target Rate.The undershooting-effect of external research is predicted mainly to be predicted by sequence similarity and physicochemical properties at present, and It is domestic that more CRISPR/Cas9 sequence is predicted using convolutional neural networks.It is main in terms of the technology used by these achievements It is divided into: based on support vector machines (SVM) (Wong N, Liu W, Wang X.WU-CRISPR:characteristics of Functional guide RNAs for the CRISPR/Cas9system.Genome Biol, 2015,16:218.), base In random forest (Abadi S, Yan WX, Amar D, Mayrose I.A machine learning approach for predicting CRISPR-Cas9cleavage efficiencies and patterns underlying its Mechanism of action.PLoS Comput Biol, 2017,13 (10): e1005807.), it is based on convolutional neural networks (Kim HK,Min S,Song M,Jung S,Choi JW,Kim Y,Lee S,Yoon S,Kim HH.Deep learning improves prediction of CRISPR-Cpf1guide RNA activity.Nat Biotechnol,2018,36 (3): 239-241.), logic-based returns (Prykhozhij SV, Rajan V, Gaston D, Berman JN.CRISPR multitargeter:a web tool to find common and unique CRISPR single guide RNA Targets in a set of similar sequences.PLoS One, 2015,10 (3): e0119372.), it is based on pattra leaves This analysis (Hart T, Moffat J.BAGEL:a computational framework for identifying essential genes from pooled library screens.BMC Bioinformatics,2016,17:164.)。 The above technology is applied to machine learning method in the prediction of CRISPR/Cas9 undershooting-effect.
Summary of the invention
It is an object of the invention in order to overcome by biochemical test establish CRISPR/Cas9 undershooting-effect library it is time-consuming and at This high and conventional method short time consumption is long and precision of prediction is undesirable defect provides a kind of CRISPR/Cas9 and misses the target The prediction technique of effect.This method predetermined speed is fast, precision is high.
Realizing the technical solution of the object of the invention is:
A kind of prediction technique of CRISPR/Cas9 undershooting-effect, points unlike the prior art are, include the following steps:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the reality of HTGTS, BLESS It tests and obtains positive sample in data, sgRNA is mapped in human genome using bowtie2 program, lookup and target dna sequence Mismatch DNA sequence dna of the number less than 4, as possible sequence of missing the target, obtained sequence be length be 23 bases and with The DNA sequence dna of NGG ending, wherein N is any one in ACGT, and it is obtained i.e. that positive sample sequence is removed from these sequences For negative sample, over-sampling is carried out to positive sample using Bootstrap method, and from negative sample carry out lack sampling therefrom choose with just The identical negative sample of number of samples, since human genome data are huge, the negative sample of finally obtained enormous amount be will lead to The very big imbalance of positive negative sample, this imbalance can adversely affect training process and even result in failure to train, therefore It needs to carry out resampling to the huge negative sample of quantity, this extreme imbalance is solved, here using Bootstrap method to sample This is sampled: making have a resampling put back in the range of n initial data, sample size is still n, each in initial data The probability that observation unit is pumped to every time is equal, is 1/n, obtained sample is known as Bootstrap sample, as reference data Collect S, formula (1) can be formulated as:
Wherein subsetOnly comprising positive sample, that is, sgRNA sequence and in practice can be in conjunction with CRISPR/Cas9 system sgRNA Sequence of missing the target, subsetOnly comprising in negative sample, that is, sgRNA sequence human's genome only there are four and following mismatch but not The union that two sequences combine can be indicated with the site in conjunction with sgRNA, ∪;
2) encoded to sample data set and feature is added: the sgRNA sequence and DNA sequence dna obtain to step 1) carries out One-hot coding, obtains sequence vector, adds CFD score, CCTop score, CRISTA score, GC of the sgRNA with DNA pairs Content, mispairing number and sgRNA-DNA sequence similarity score, obtain feature vector, while generating corresponding two tag along sort, Wherein, CRISTA score is based on random forest and regression model, it is contemplated that DNA protrusion and RNA are extracted to sgRNA editorial efficiency It influences, in conjunction with genome nucleotide acid content, sgRNA macroscopic property, sgRNA and target dna sequence base similitude etc., finally CRISTA score is generated,
CCTop score calculates the mismatch score that misses the target according to formula (2) first for each target of missing the target:
scoreoff-target=∑mismatch1.2pos(2),
Wherein pos indicates the position that mispairing occurs in sequence of missing the target, and calculates by 5 ' ends, and recycling is missed the target mismatch score Calculating is missed the target score, such as formula (3):
Wherein dist indicates each to miss the target site to the distance of nearest exon accordingly, and totaLoff_targets is Miss the target number of loci, this score only considers sequence of missing the target relevant to exon,
CFD score calculates in CD33 data sgRNA and DNA sequence dna in the case of each position mispairing and different PAM first The reaction efficiency of single mispairing is then multiplied by reaction efficiency, is used as multiple mismatches, such as a sgRNA-DNA is in place It sets 3 and A:G mispairing has occurred, T:C mispairing has occurred in position 5, and PAM type is ' AG ', this sgRNA-DNA couples CFD Score is CFDscore=P (active | A:G, 3) × P (active | T:C, 5) × P (active | AG), and each of these item is all It is that the frequency observed from CD33 data is calculated, CFD score is represented by formula (4):
Wherein Y=1 indicates that sgRNA can react with DNA sequence dna, Xi=1 indicates to occur in the mispairing of the position i,
G/C content is the ratio of ' G ' and ' C ' two total base numbers of base number Zhan in DNA sequence dna, can pass through formula (5) it calculates:
Mispairing number is sgRNA sequence and the unmatched number of DNA sequence dna, and sgRNA-DNA similarity score is sgRNA Sequence matches the ratio that number accounts for sequence total length with DNA sequence dna;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained vector It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature of 190 dimensional vectors Importance, then before feature importance ranking 150 vector progress training in next step is therefrom selected, this step had both reduced vector dimension Degree, accelerates subsequent training speed, also by redundancy feature is reduced, improves training precision, feature importance uses Gini system Number is defined as follows:
For two classification problems, the target value of classification is 0 or 1, for node m, NmThe secondary obtained region of observation is Rm, It enables
Wherein pmkFor classification k ratio what is observed in node m, k is classification i.e. 0 or 1, yiFor predicted value, thus obtain It is formula (7) to Gini coefficient:
H(Xm)=∑kpmk(1-pmk) (7), it is clear that H (Xm) value is bigger, illustrate this feature discrimination energy with higher Power, therefore, can the size based on value come ranking and then select to need the feature that retains and cast out those useless features;
4) it constructs BroadLearning fallout predictor: being used as using width learning algorithm addition BP tune ginseng and be directed to CRISPR/ The Broad learning fallout predictor of Cas9 undershooting-effect obtains prediction result and determines whether sgRNA and DNA sequence dna can occur instead It answers, predicts the undershooting-effect of CRISPR/Cas9 system,
It selects Broad learning as fallout predictor, training sample is trained, first construction feature mapping node, For given input dataIt is defined as N number of sample of M dimensional feature, generates weight at randomAnd deviation For activation primitive, Feature Mapping node definition is formula (8):
Building enhancing node, generates weight at random in Feature Mapping nodal basisAnd deviationξjTo activate letter Number, enhancing node definition are formula (9):
Wherein Zn=[Z1,…,Zn], finally obtained output isN is sample number, and C is sample class, output Y may be defined as formula (10):
Wherein,[Zn|Hm]+For [Zn|Hm] pseudoinverse, then using BP algorithm to weight carry out Ginseng is adjusted, final weight is obtained, finally, obtaining ten width learning models by distributing different weights to width learning model, obtaining Final output is obtained to the output of ten classifiers, then by temporal voting strategy.
BroadLearning fallout predictor is constructed based on Broad learning algorithm, sequence is being added in this fallout predictor On the basis of information and sequence physical chemical property, using integrated learning approach, CRISTA score described in step 2) is added, CCTOP score and CFD score are trained as training parameter, greatly improve the accuracy rate of model.
The technical program is compared with existing Predicting Technique:
(1) time-consuming short: Broad learning compares traditional depth structure, does not need the complicated fortune for carrying out multilayer mechanism It calculates, and does not need to carry out backpropagation using BP algorithm to adjust weight, can greatly shorten the training time;
(2) accuracy rate is high: CRISPR/Cas9 undershooting-effect fallout predictor of the designed, designed based on Broad learning, energy Feature is enough efficiently extracted, and then improves predictablity rate.
This method predetermined speed is fast, precision is high.
Detailed description of the invention
Fig. 1 is prediction technique schematic illustration in embodiment.
Specific embodiment
The contents of the present invention are further elaborated with reference to the accompanying drawings and examples, but are not to limit of the invention It is fixed.
Embodiment:
Referring to Fig.1, a kind of prediction technique of CRISPR/Cas9 undershooting-effect, includes the following steps:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the reality of HTGTS, BLESS It tests and obtains positive sample in data, sgRNA is mapped in human genome using bowtie2 program, lookup and target dna sequence Mismatch DNA sequence dna of the number less than 4, as possible sequence of missing the target, obtained sequence be length be 23 bases and with The DNA sequence dna of NGG ending, wherein N is any one in ACGT, and it is obtained i.e. that positive sample sequence is excluded from these sequences For negative sample, over-sampling is carried out to positive sample using Bootstrap method, and from negative sample carry out lack sampling therefrom choose with just The identical negative sample of number of samples, since human genome data are huge, the negative sample of finally obtained enormous amount be will lead to The very big imbalance of positive negative sample, this imbalance can adversely affect training process and even result in failure to train, therefore It needs to carry out resampling to the huge negative sample of quantity, this extreme imbalance is solved, here using Bootstrap method to negative Sample carries out stochastical sampling: making have the resampling put back in the range of n initial data, sample size is still n, initial data In the probability that is pumped to every time of each observation unit it is equal, be 1/n, obtained sample becomes Bootstrap sample, as base Quasi- data set S can be formulated as formula (1):
Wherein, subsetOnly comprising that positive sample, that is, sgRNA sequence and can be tied in practice with CRISPR/Cas9 system sgRNA The sequence of missing the target closed, subsetOnly comprising meeting preset condition in negative sample, that is, sgRNA sequence human's genome but cannot be with The site that sgRNA is combined, ∪ indicate the union that two sequences combine, and this example obtains 1744 samples, wherein positive and negative sample is 872;
2) encoded to sample data set and feature is added: the sgRNA sequence and DNA sequence dna obtain to step 1) carries out One-hot coding, obtains 23*4*2=184 dimensional vector, add sgRNA and DNA pairs of CFD score, CCTop score, CRISTA score, G/C content, mispairing number and sgRNA-DNA sequence similarity score, obtain 190 dimensional vectors, while generating phase 0 answered, 1 two tag along sorts, wherein CRISTA score is based on random forest and regression model, it is contemplated that DNA protrusion and RNA are mentioned The influence that steamed stuffed bun by small bamboo food steamer is edited to sgRNA is taken, in conjunction with genome nucleotide acid content, sgRNA macroscopic property, sgRNA and target dna Series similitude etc. ultimately generates CRISTA score,
CCTop score calculates the mismatch score that misses the target according to formula (2) first for each target of missing the target:
scoreoff-target=∑mismatch1.2pos(2),
Wherein pos indicates the position that mispairing occurs in sequence of missing the target, and calculates by 5 ' ends, and recycling is missed the target mismatch score Calculating is missed the target score, such as formula (3):
Wherein dist indicates each to miss the target site to the distance of nearest exon accordingly, and totaLoff_targets is Miss the target number of loci, this score only considers sequence of missing the target relevant to exon,
CFD score calculates in CD33 data sgRNA and DNA sequence dna in the case of each position mispairing and different PAM first The reaction efficiency of single mispairing is then multiplied by reaction efficiency, is used as multiple mismatches, such as a sgRNA-DNA is in place It sets 3 and A:G mispairing has occurred, T:C mispairing has occurred in position 5, and PAM type is ' AG ', this sgRNA-DNA couples CFD Score is CFDscore=P (active | A:G, 3) × P (active | T:C, 5) × P (active | AG), and each of these item is all It is that the frequency observed from CD33 data is calculated, CFD score is represented by formula (4):
Wherein Y=1 indicates that sgRNA can react with DNA sequence dna, Xi=1 indicates to occur in the mispairing of the position i,
G/C content is the ratio of ' G ' and ' C ' two total base numbers of base number Zhan in DNA sequence dna, can pass through formula (5) it calculates:
Mispairing number is sgRNA sequence and the unmatched number of DNA sequence dna, and sgRNA-DNA similarity score is sgRNA Sequence matches the ratio that number accounts for sequence total length with DNA sequence dna;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained vector It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature of 190 dimensional vectors Importance, then before feature importance ranking 150 vector progress training in next step is therefrom selected, this step had both reduced vector dimension Degree, accelerates subsequent training speed, also by redundancy feature is reduced, improves training precision;Feature importance is adopted in this example It is defined as follows with Gini coefficient:
For two classification problems, the target value of classification is 0 or 1, for node m, NmThe secondary obtained region of observation is Rm, It enables
Wherein pmkFor classification k ratio what is observed in node m, k is classification i.e. 0 or 1, yiFor predicted value, thus obtain It is formula (7) to Gini coefficient:
H(Xm)=∑kpmk(1-pmk) (7),
Obviously, H (Xm) value is bigger, illustrate this feature resolving ability with higher, therefore, can the size based on value come Ranking and then selection need the feature retained and cast out those useless features, are taken 150 before feature importance ranking in final this example Feature vector as final data, obtain 1744*150 dimensional vector;
4) it constructs BroadLearning fallout predictor: being used as using width learning algorithm addition BP tune ginseng and be directed to CRISPR/ The Broad learning fallout predictor of Cas9 undershooting-effect obtains prediction result and determines whether sgRNA and DNA sequence dna can occur instead It answers, predicts the undershooting-effect of CRISPR/Cas9 system,
This example selects Broad learning as fallout predictor, is trained to training sample, and construction feature maps first Node, for given input dataIt is defined as N number of sample of M dimensional feature, generates weight at randomAnd deviation For activation primitive, activation primitive used is tanh, and Feature Mapping node definition is formula (8):
Building enhancing node, generates weight at random in Feature Mapping nodal basisAnd deviationξjTo activate letter Number, activation primitive used are sigmoid, and enhancing node definition is formula (9):
Wherein Zn=[Z1,…,Zn], finally obtained output isN is sample number, and C is sample class, output Y may be defined as formula (10):
Wherein,[Zn|Hm]+For [Zn|Hm] pseudoinverse, then using BP algorithm to weight carry out Ginseng is adjusted, final weight is obtained, finally, obtaining ten width learning models by distributing different weights to width learning model, obtaining Final output is obtained to the output of ten classifiers, then by temporal voting strategy.
Experiments have shown that:
The human genome CRISPR/Cas9 of the result predicted according to the method for this example and current mainstream is missed the target pre- Survey method CFD approach (Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I,Tothova Z,Wilen C,Orchard R.Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.Nat Biotechnol.2016: 34 (2): 184-91.) it compares, the results are shown in Table 1:
The Experimental comparison results of table 1 and CFD approach
Prediction technique Accuracy AUC value
CFD score 0.897 0.91
Broad learning 0.923 0.93
As can be seen from Table 1, higher accuracy rate can be obtained compared to CFD score according to the method for this example, while can be with Higher AUC value is obtained, this, which is represented, according to the fallout predictor that the method for this example obtains there is preferably prediction stability and classification to imitate Fruit.

Claims (1)

1. a kind of prediction technique of CRISPR/Cas9 undershooting-effect, which comprises the steps of:
1) building includes the data set of positive sample and negative sample: from published GUIDE-Seq, the experiment number of HTGTS, BLESS According to middle acquisition positive sample, sgRNA is mapped in human genome using bowtie2 program, is not searched with target dna sequence not DNA sequence dna with number less than 4, as possible sequence of missing the target, obtained sequence is that length is 23 bases and is tied with NGG The DNA sequence dna of tail, wherein N is any one in ACGT, and it is to be negative that it is obtained that positive sample sequence is excluded from these sequences Sample carries out over-sampling to positive sample using Bootstrap method, and carries out lack sampling from negative sample and therefrom choose and positive sample The same number of negative sample;
2) sample data set is encoded and feature is added: to being made of sgRNA sequence and DNA sequence dna of obtaining of step 1 Positive negative sample carries out one-hot coding, adds sgRNA and contains with DNA pairs of CFD score, CCTop score, CRISTA score, GC Amount, mispairing number and sgRNA-DNA sequence similarity score this six features collectively form feature vector, while generating corresponding Two tag along sorts;
3) sample data is handled using the method for feature selecting: using the sklearn module of python, to obtained feature vector It is handled, extreme random forest training aids training data is constructed using Extratree module, obtains the feature weight of feature vector The property wanted, then the feature of feature importance ranking previous 105 is therefrom selected to carry out training in next step;
4) it constructs BroadLearning fallout predictor: BP tune ginseng is added as de- for CRISPR/Cas9 using width learning algorithm The Broad learning fallout predictor of targeted effect obtains prediction result and determines whether sgRNA can react with DNA sequence dna, in advance Survey the undershooting-effect of CRISPR/Cas9 system.
CN201910299222.0A 2019-04-15 2019-04-15 Prediction method for CRISPR/Cas9 off-target effect Active CN110070912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910299222.0A CN110070912B (en) 2019-04-15 2019-04-15 Prediction method for CRISPR/Cas9 off-target effect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910299222.0A CN110070912B (en) 2019-04-15 2019-04-15 Prediction method for CRISPR/Cas9 off-target effect

Publications (2)

Publication Number Publication Date
CN110070912A true CN110070912A (en) 2019-07-30
CN110070912B CN110070912B (en) 2023-06-23

Family

ID=67367624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910299222.0A Active CN110070912B (en) 2019-04-15 2019-04-15 Prediction method for CRISPR/Cas9 off-target effect

Country Status (1)

Country Link
CN (1) CN110070912B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
CN111489787A (en) * 2020-04-21 2020-08-04 桂林电子科技大学 Method for predicting efficiency of targeted knockout of fixed-point DNA (deoxyribonucleic acid) by CRISPR/Cas9
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN112086145A (en) * 2020-09-02 2020-12-15 腾讯科技(深圳)有限公司 Compound activity prediction method and device, electronic equipment and storage medium
CN113611367A (en) * 2021-08-05 2021-11-05 湖南大学 CRISPR/Cas9 off-target prediction method based on VAE data enhancement

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710362A (en) * 2009-12-10 2010-05-19 浙江大学 microRNA target position point prediction method based on support vector machine
WO2014093709A1 (en) * 2012-12-12 2014-06-19 The Broad Institute, Inc. Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
CA2923758A1 (en) * 2013-09-27 2015-04-02 Codexis, Inc. Structure based predictive modeling
WO2015089364A1 (en) * 2013-12-12 2015-06-18 The Broad Institute Inc. Crystal structure of a crispr-cas system, and uses thereof
WO2015113063A1 (en) * 2014-01-27 2015-07-30 Georgia Tech Research Corporation Methods and systems for identifying crispr/cas off-target sites
WO2016100974A1 (en) * 2014-12-19 2016-06-23 The Broad Institute Inc. Unbiased identification of double-strand breaks and genomic rearrangement by genome-wide insert capture sequencing
EP3130679A1 (en) * 2015-08-13 2017-02-15 Cladiac GmbH Method and test system for the detection and/or quantification of a target nucleic acid in a sample
CN106446602A (en) * 2016-09-06 2017-02-22 中南大学 Prediction method and system for RNA binding sites in protein molecules
US20170185892A1 (en) * 2015-12-27 2017-06-29 Beijing University Of Technology Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network
CN107742063A (en) * 2017-10-20 2018-02-27 桂林电子科技大学 A kind of prokaryotes σ54The Forecasting Methodology of promoter
CN108549709A (en) * 2018-04-20 2018-09-18 福州大学 Fusion method of the multi-source heterogeneous data based on range learning algorithm inside and outside block chain
CN109308935A (en) * 2018-09-10 2019-02-05 天津大学 A kind of method and application platform based on SVM prediction noncoding DNA

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710362A (en) * 2009-12-10 2010-05-19 浙江大学 microRNA target position point prediction method based on support vector machine
WO2014093709A1 (en) * 2012-12-12 2014-06-19 The Broad Institute, Inc. Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
CA2923758A1 (en) * 2013-09-27 2015-04-02 Codexis, Inc. Structure based predictive modeling
WO2015089364A1 (en) * 2013-12-12 2015-06-18 The Broad Institute Inc. Crystal structure of a crispr-cas system, and uses thereof
WO2015113063A1 (en) * 2014-01-27 2015-07-30 Georgia Tech Research Corporation Methods and systems for identifying crispr/cas off-target sites
WO2016100974A1 (en) * 2014-12-19 2016-06-23 The Broad Institute Inc. Unbiased identification of double-strand breaks and genomic rearrangement by genome-wide insert capture sequencing
EP3130679A1 (en) * 2015-08-13 2017-02-15 Cladiac GmbH Method and test system for the detection and/or quantification of a target nucleic acid in a sample
US20170185892A1 (en) * 2015-12-27 2017-06-29 Beijing University Of Technology Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network
CN106446602A (en) * 2016-09-06 2017-02-22 中南大学 Prediction method and system for RNA binding sites in protein molecules
CN107742063A (en) * 2017-10-20 2018-02-27 桂林电子科技大学 A kind of prokaryotes σ54The Forecasting Methodology of promoter
CN108549709A (en) * 2018-04-20 2018-09-18 福州大学 Fusion method of the multi-source heterogeneous data based on range learning algorithm inside and outside block chain
CN109308935A (en) * 2018-09-10 2019-02-05 天津大学 A kind of method and application platform based on SVM prediction noncoding DNA

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张桂珊等: "机器学习方法在CRISPR/Cas9系统中的应用", 《遗传》, vol. 40, no. 09, pages 704 - 723 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
CN111261223B (en) * 2020-01-12 2022-05-03 湖南大学 CRISPR off-target effect prediction method based on deep learning
CN111489787A (en) * 2020-04-21 2020-08-04 桂林电子科技大学 Method for predicting efficiency of targeted knockout of fixed-point DNA (deoxyribonucleic acid) by CRISPR/Cas9
CN111489787B (en) * 2020-04-21 2023-05-12 桂林电子科技大学 Prediction method for CRISPR/Cas9 targeted knockout site DNA efficiency
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN112086145A (en) * 2020-09-02 2020-12-15 腾讯科技(深圳)有限公司 Compound activity prediction method and device, electronic equipment and storage medium
CN112086145B (en) * 2020-09-02 2024-04-16 腾讯科技(深圳)有限公司 Compound activity prediction method and device, electronic equipment and storage medium
CN113611367A (en) * 2021-08-05 2021-11-05 湖南大学 CRISPR/Cas9 off-target prediction method based on VAE data enhancement

Also Published As

Publication number Publication date
CN110070912B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110070912A (en) A kind of prediction technique of CRISPR/Cas9 undershooting-effect
Kim et al. SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance
Song et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors
Chen et al. Identify key sequence features to improve CRISPR sgRNA efficacy
Lavarenne et al. The spring of systems biology-driven breeding
Kleftogiannis et al. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features
Zheng et al. Exponentially decreased dimension number strategy based dynamic search fireworks algorithm for solving CEC2015 competition problems
Zhang et al. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on-and off-target activities
Fonseca et al. Ranking beta sheet topologies with applications to protein structure prediction
Sutanto et al. Assessing the use of secondary structure fingerprints and deep learning to classify RNA sequences
Li et al. Characteristics and prediction of RNA structure
Niu et al. SgRNA-RF: identification of SgRNA on-target activity with imbalanced datasets
Angenent-Mari et al. Deep learning for RNA synthetic biology
Zhang et al. Subcellular localization prediction of human proteins using multifeature selection methods
Toufikuzzaman et al. CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction
Jayasundara et al. Machine learning for plant microrna prediction: A systematic review
Morgado et al. Learning sequence patterns of AGO-sRNA affinity from high-throughput sequencing libraries to improve in silico functional small RNA detection and classification in plants
Manuweera et al. Computational methods for the ab initio identification of novel microRNA in plants: a systematic review
Lan et al. Optimized sgRNA design by deep learning to balance the off-target effects and on-target activity of CRISPR/Cas9
Wen et al. Sea-ATI unravels novel vocabularies of plant active cistrome
Won et al. Modeling promoter grammars with evolving hidden Markov models
Chen et al. Optimizing precision genome editing through machine learning
Sutanto et al. Assessing global-local secondary structure fingerprints to classify RNA sequences with deep learning
Chen Modeling of Functional Gene Regulation Through Machine Learning and Deep Learning Methods
Chen et al. Integrating machine learning and genome editing for crop improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant