CN106446600A - CRISPR/Cas9-based sgRNA design method - Google Patents
CRISPR/Cas9-based sgRNA design method Download PDFInfo
- Publication number
- CN106446600A CN106446600A CN201610341946.3A CN201610341946A CN106446600A CN 106446600 A CN106446600 A CN 106446600A CN 201610341946 A CN201610341946 A CN 201610341946A CN 106446600 A CN106446600 A CN 106446600A
- Authority
- CN
- China
- Prior art keywords
- sgrna
- cas9
- value
- model
- crispr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108091027544 Subgenomic mRNA Proteins 0.000 title claims abstract description 140
- 108091033409 CRISPR Proteins 0.000 title claims abstract description 66
- 238000013461 design Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000010354 CRISPR gene editing Methods 0.000 title claims abstract description 23
- 108020004414 DNA Proteins 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 12
- 241000894007 species Species 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 125000006850 spacer group Chemical group 0.000 claims description 7
- 231100000221 frame shift mutation induction Toxicity 0.000 claims description 6
- 230000037433 frameshift Effects 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims description 5
- 238000003780 insertion Methods 0.000 claims description 5
- 230000037431 insertion Effects 0.000 claims description 5
- 238000011144 upstream manufacturing Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 claims description 2
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 238000004364 calculation method Methods 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000003776 cleavage reaction Methods 0.000 abstract description 9
- 230000007017 scission Effects 0.000 abstract description 9
- 238000011156 evaluation Methods 0.000 abstract description 7
- 108090000790 Enzymes Proteins 0.000 description 11
- 102000004190 Enzymes Human genes 0.000 description 11
- 238000007481 next generation sequencing Methods 0.000 description 10
- 210000004027 cell Anatomy 0.000 description 8
- 108090000623 proteins and genes Proteins 0.000 description 7
- 238000007477 logistic regression Methods 0.000 description 6
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010362 genome editing Methods 0.000 description 4
- 238000001976 enzyme digestion Methods 0.000 description 3
- 238000012417 linear regression Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013210 evaluation model Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 230000009442 healing mechanism Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本发明涉及一种基于CRISPR/Cas9的sgRNA的设计方法,其特征在于,该方法包括下列步骤:获取sgRNA和对应的Cas9的酶切效率的值;建立个性化sgRNA设计模型;运用NDCG算法衡量建立的个性化sgRNA设计模型的质量并更新数据库;设计sgRNA并给出每个sgRNA的评估值。与现有技术相比,本发明具有准确率高、特征完整、应用范围广与分析数据广的特点。
The present invention relates to a method for designing sgRNA based on CRISPR/Cas9, which is characterized in that the method comprises the following steps: obtaining the value of the cleavage efficiency of sgRNA and corresponding Cas9; establishing a personalized sgRNA design model; using NDCG algorithm to measure and establish The quality of the personalized sgRNA design model and update the database; design sgRNA and give the evaluation value of each sgRNA. Compared with the prior art, the invention has the characteristics of high accuracy, complete features, wide application range and wide analysis data.
Description
技术领域technical field
本发明涉及基因编辑研究领域,尤其是一种基于CRISPR/Cas9基因编辑技术的sgRNA的设计方法。The invention relates to the field of gene editing research, in particular to a method for designing sgRNA based on CRISPR/Cas9 gene editing technology.
背景技术Background technique
随着分子生物学的发展,人们对于生命的构成元素有了更深一层的理解,但是生命过程的机制,尤其是某些疾病的治病机理还存在很多不解。基因与表型之间的关系,基因与基因之间的相互影响,迫切需要一种能在活体内快速敲除和插入基因的工程技术。CRISPR/Cas9系统应时出现,满足了科研工作者的这个需求。With the development of molecular biology, people have a deeper understanding of the constituent elements of life, but there are still many puzzles about the mechanism of life processes, especially the healing mechanism of certain diseases. The relationship between genes and phenotypes, and the mutual influence between genes, urgently require an engineering technology that can quickly knock out and insert genes in vivo. The CRISPR/Cas9 system appeared in time to meet the needs of scientific researchers.
CRISPR/Cas9系统(Clustered regularly interspaced short palindromicrepeats/CRISPR-associated protein 9)是一种操作简单,适用性广泛的基因编辑工具。整个系统主要由一个核酸切割酶(Cas9)和一个起引导识别作用的RNA(sgRNA)组成。sgRNA通过碱基互补配对与靶基因位点识别,然后招募Cas9进行酶切,产生双链断裂,从而实现在DNA水平的基因编辑。因为其适用性广,方便省时,很快应用于各个方面,尤其在癌症模型建立和基因治疗的探究方面,有着很大的优越性。The CRISPR/Cas9 system (Clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9) is a gene editing tool with simple operation and wide applicability. The whole system is mainly composed of a nuclease (Cas9) and an RNA (sgRNA) that guides recognition. sgRNA recognizes the target gene site through complementary base pairing, and then recruits Cas9 to perform enzyme cleavage and generate double-strand breaks, thereby realizing gene editing at the DNA level. Because of its wide applicability, convenience and time-saving, it can be quickly applied to various aspects, especially in the establishment of cancer models and the exploration of gene therapy, which has great advantages.
然而,在科学家的不断探索中发现,同一细胞中针对同一基因设计的不同sgRNA的酶切效率有很大的差异,如果不能设计高效率的sgRNA,只能通过增加浓度来弥补,这样将会给细胞带来很多的基因垃圾,同时产生高比例的脱靶,给科研人员的研究带来很大的不便,因此设计一个高酶切效率的sgRNA对于基因方面的研究非常重要。However, in the continuous exploration of scientists, it has been found that the cleavage efficiency of different sgRNAs designed for the same gene in the same cell is very different. If a high-efficiency sgRNA cannot be designed, it can only be compensated by increasing the concentration, which will give Cells bring a lot of gene garbage, and at the same time produce a high proportion of off-targets, which brings great inconvenience to researchers. Therefore, designing a sgRNA with high enzyme cutting efficiency is very important for gene research.
目前,已有的sgRNA的设计软件有近30种,主要分为两类:一类是从实验中总结sgRNA的一些规则,例如配对的sgRNA序列一端必需含有PAM序列,5’末端应该为GG,GC含量应该保持在60%左右,种子序列不能容忍错配等,然后通过设置条件直接筛选,;另一类主要通过运用统计学方法给每个碱基赋予一个权重来计算sgRNA的特异性,如CRISPRDesign。这两种类型的软件都建立的是一个通用性的模型,然而由于不同物种和不同细胞之间有很大的异质性,导致现存软件的预测效能并不是很好,且因为不同实验条件下的异质性对sgRNA的酶切效率有一定的影响,通用的模型评估准确率比较低。At present, there are nearly 30 kinds of sgRNA design software, which are mainly divided into two categories: one is to summarize some rules of sgRNA from experiments, for example, one end of the paired sgRNA sequence must contain a PAM sequence, and the 5' end should be GG, The GC content should be kept at about 60%, the seed sequence cannot tolerate mismatches, etc., and then directly screened by setting conditions; the other type mainly uses statistical methods to assign a weight to each base to calculate the specificity of sgRNA, such as CRISPR Design. Both types of software build a general model. However, due to the large heterogeneity between different species and different cells, the prediction performance of the existing software is not very good, and because different experimental conditions The heterogeneity of sgRNA has a certain impact on the digestion efficiency of sgRNA, and the general model evaluation accuracy is relatively low.
因此,考虑不同平台物种数据之间的异质性,用不同平台或者物种的数据建立个性化的模型以提高sgRNA的特异性和高效性,对于CRISPR/Cas9系统脱靶问题的研究极为重要。Therefore, considering the heterogeneity of species data from different platforms, it is extremely important for the study of the off-target problem of CRISPR/Cas9 system to establish a personalized model with data from different platforms or species to improve the specificity and efficiency of sgRNA.
发明内容Contents of the invention
本发明的目的是针对上述问题提供一种准确率高、应用范围广的基于CRISPR/Cas9的sgRNA的设计方法。The purpose of the present invention is to provide a CRISPR/Cas9-based sgRNA design method with high accuracy and wide application range for the above problems.
为实现本发明所述目的,本发明提供一种基于CRISPR/Cas9的sgRNA的设计方法,该方法包括下列步骤:In order to realize the purpose of the present invention, the present invention provides a method for designing sgRNA based on CRISPR/Cas9, which method comprises the following steps:
1)获取sgRNA和对应的Cas9的酶切效率的值,具体为:1) Obtain the value of the cleavage efficiency of the sgRNA and the corresponding Cas9, specifically:
11)从文献中获取sgRNA以及对应的Cas9的酶切效率的值;11) Obtain the value of the cleavage efficiency of sgRNA and corresponding Cas9 from the literature;
12)从SRA数据库中获取sgRNA,计算获取对应的Cas9的酶切效率的值;12) Obtain the sgRNA from the SRA database, and calculate and obtain the value of the enzyme cleavage efficiency of the corresponding Cas9;
13)按照物种、细胞类型和实验条件将步骤11)和12)中获取到的数据分类成不同的参考基因组,每个参考基因组中都列出一份第一列为sgRNA名称、第二列为sgRNA序列以及第三列为对应的Cas9的酶切效率的表格;13) Classify the data obtained in steps 11) and 12) into different reference genomes according to species, cell type and experimental conditions, and each reference genome lists a copy of the first column as the sgRNA name and the second column as The sgRNA sequence and the third column are tables of the corresponding Cas9 digestion efficiency;
2)建立个性化sgRNA设计模型,具体为:2) Establish a personalized sgRNA design model, specifically:
21)根据需求从相应的参考基因组中,提取步骤1)中获取的sgRNA的序列信息;21) Extract the sequence information of the sgRNA obtained in step 1) from the corresponding reference genome as required;
22)对步骤21)中提取的sgRNA序列信息按照二进制规则进行二进制编码;22) carry out binary coding to the sgRNA sequence information extracted in step 21) according to binary rules;
23)对步骤21)中获取的sgRNA,判断其Cas9的酶切效率的数据类型,若为数值型则进入步骤24),若为分类型则进入步骤25);23) For the sgRNA obtained in step 21), judge the data type of the enzyme cutting efficiency of its Cas9, if it is a numerical type, then enter step 24), if it is a classification type, then enter step 25);
24)对步骤22)中编码后的sgRNA序列信息,用Lasso模型进行特征提取,根据标准线性回归建立个性化sgRNA设计模型;24) For the sgRNA sequence information encoded in step 22), use the Lasso model to perform feature extraction, and establish a personalized sgRNA design model according to standard linear regression;
25)对步骤22)中编码后的sgRNA序列信息,用二分类逻辑回归中的L1正则化进行特征选择,再根据二分类逻辑回归中的L2正则化建立个性化sgRNA设计模型;25) For the encoded sgRNA sequence information in step 22), perform feature selection with L1 regularization in binary logistic regression, and then establish a personalized sgRNA design model according to L2 regularization in binary logistic regression;
3)运用NDCG算法衡量步骤2)中建立的个性化sgRNA设计模型的质量并更新SRA数据库,具体为:3) Use the NDCG algorithm to measure the quality of the personalized sgRNA design model established in step 2) and update the SRA database, specifically:
31)计算步骤2)中建立的个性化sgRNA设计模型的NDCG值;31) Calculate the NDCG value of the personalized sgRNA design model established in step 2);
32)判断现有SRA数据库中是否有对应的个性化sgRNA模型,若否则将其添加进SRA数据库,若是则进入步骤33);32) Judging whether there is a corresponding personalized sgRNA model in the existing SRA database, if otherwise it is added to the SRA database, and if so, enter step 33);
33)比较该个性化sgRNA模型与对应的SRA数据库中的sgRNA模型,选择NDCG值大的一个存储在SRA数据库中;33) compare the personalized sgRNA model with the sgRNA model in the corresponding SRA database, and select the one with a large NDCG value to be stored in the SRA database;
4)设计sgRNA并给出每个sgRNA的评估值,具体为:4) Design sgRNA and give the evaluation value of each sgRNA, specifically:
41)根据用户给出的基因组区域,从SRA数据库中选取合适的参考基因组,从中搜索所有符合设计规则的sgRNA,将其作为设计的sgRNA;41) According to the genome region given by the user, select a suitable reference genome from the SRA database, search for all sgRNAs that meet the design rules, and use them as the designed sgRNA;
42)对步骤41)中设计的sgRNA,运用步骤2)中建立的个性化sgRNA模型进行评估。42) For the sgRNA designed in step 41), use the personalized sgRNA model established in step 2) to evaluate.
优选地,所述步骤12)中计算得到对应的Cas9的酶切效率的值具体为:Preferably, the value of the enzyme cleavage efficiency of the corresponding Cas9 calculated in the step 12) is specifically:
121)把sgRNA和相对应的二代测序的读长比对到参考基因组上;121) Aligning the read length of sgRNA and corresponding next-generation sequencing to the reference genome;
122)取出包含sgRNA的读长;122) Take out the read length comprising sgRNA;
123)判断在切割点是否产生DNA上的插入或删除以及DNA上的插入或删除是否为移码突变;123) Judging whether the insertion or deletion on the DNA occurs at the cutting point and whether the insertion or deletion on the DNA is a frameshift mutation;
124)统计每个sgRNA的移码突变率,具体为:124) Count the frameshift mutation rate of each sgRNA, specifically:
125)将步骤124)中计算得到的移码突变率作为Cas9的酶切效率的值。125) Use the frameshift mutation rate calculated in step 124) as the value of the enzyme cutting efficiency of Cas9.
优选地,所述步骤21)中sgRNA的序列信息包括sgRNA序列、sgRNA识别DNA必需的标志片段以及sgRNA的spacer的上下游的碱基,所述sgRNA的spacer的上下游的碱基长度为平台默认值或用户设置的值。Preferably, the sequence information of the sgRNA in the step 21) includes the sgRNA sequence, the necessary marker fragment for sgRNA recognition of DNA, and the upstream and downstream bases of the spacer of the sgRNA, and the base length of the upstream and downstream of the spacer of the sgRNA is the platform default value or a value set by the user.
优选地,所述步骤22)中的二进制规则具体为:A对应1000,C对应0100,G对应0010,T对应0001,N对应0000。Preferably, the binary rules in step 22) are specifically: A corresponds to 1000, C corresponds to 0100, G corresponds to 0010, T corresponds to 0001, and N corresponds to 0000.
优选地,所述步骤24)中用Lasso模型进行特征提取是通过提取非零权重来选择特征向量,具体为:Preferably, performing feature extraction with the Lasso model in the step 24) is to select the feature vector by extracting non-zero weights, specifically:
其中,w是被估计的特征向量的权重,x是被选择的sgRNA的特征向量,n是sgRNA的数量,y是sgRNA对应的Cas9的酶切效率的值;α是一个常数,||w||1是参数向量的矩阵;Lasso模型通过增加α||w||1来解这个最小二乘损失函数,通过遍历正则化矩阵,非零权重的特征被提取出来。Among them, w is the weight of the estimated feature vector, x is the feature vector of the selected sgRNA, n is the number of sgRNA, y is the value of the enzyme cutting efficiency of Cas9 corresponding to the sgRNA; α is a constant, ||w| | 1 is a matrix of parameter vectors; the Lasso model solves this least-squares loss function by adding α||w|| 1 , and features with non-zero weights are extracted by traversing the regularization matrix.
优选地,所述步骤25)中的L1正则化具体为:Preferably, the L1 regularization in the step 25) is specifically:
其中,w和c是被估计的特征的权重和截距,X是编码的sgRNA的二进制矩阵,n是sgRNA的数量,y是sgRNA对应的Cas9的酶切效率的值。Among them, w and c are the weights and intercepts of the estimated features, X is the binary matrix of encoded sgRNAs, n is the number of sgRNAs, and y is the value of the enzyme cleavage efficiency of Cas9 corresponding to sgRNAs.
优选地,所述L2正则化具体为:Preferably, the L2 regularization is specifically:
优选地,所述步骤31)中计算建立的个性化sgRNA设计模型的NDCG值具体为:Preferably, the NDCG value of the personalized sgRNA design model calculated and established in the step 31) is specifically:
其中,DCG是用预测排序计算的数值,IDCG是用真实排序计算所得的理想的DCG,reli是第i位置预测的排序值。Among them, DCG is the numerical value calculated by predicted sorting, IDCG is the ideal DCG calculated by real sorting, and rel i is the predicted sorting value of the i-th position.
优选地,所述步骤41)中设计规则具体为:Preferably, the design rules in the step 41) are specifically:
20bp+PAM20bp+PAM
其中,bp为表示DNA长度的单位,PAM为sgRNA识别DNA必需的标志片段。Among them, bp is the unit indicating the length of DNA, and PAM is the necessary marker fragment for sgRNA to recognize DNA.
与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
(1)针对不同物种不同类型细胞,使用了个性化的策略,并用数据驱动的机器学习算法进行建模,评估准确率有很大程度的提高。(1) For different types of cells of different species, personalized strategies are used, and data-driven machine learning algorithms are used for modeling, and the evaluation accuracy is greatly improved.
(2)使用新的编码规则,使得找到的特征更加完整,不仅限于PAM和spacer之间。(2) Use new coding rules to make the found features more complete, not only between PAM and spacer.
(3)赋予了用户自己构建模型的流程,使得应用范围更广,不仅限于数据库中仅有的一些物种。(3) The user is given the process of constructing the model by himself, which makes the application scope wider, not limited to only some species in the database.
(4)使用NGS数据的OTF率作为酶切率,扩大了可分析数据的范围;(4) Using the OTF rate of NGS data as the enzyme digestion rate expands the range of data that can be analyzed;
(5)用户可以上传自己的数据来扩充数据库,加速了数据的积累,有利于解决现在因数据量不足导致不能很好设计最优sgRNA的困境。(5) Users can upload their own data to expand the database, which accelerates the accumulation of data and helps to solve the current dilemma of not being able to design the optimal sgRNA due to insufficient data.
附图说明Description of drawings
图1为建立个性化sgRNA模型与模型评估的方法流程图;Fig. 1 is the method flowchart of establishing individualized sgRNA model and model assessment;
图2为设计和评估sgRNA的方法流程图。Figure 2 is a flowchart of the method for designing and evaluating sgRNAs.
具体实施方式detailed description
下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
缩写词说明:Explanation of acronyms:
CRISPR:Clustered regularly interspaced short palindromic repeatsCRISPR: Clustered regularly interspaced short palindromic repeats
成簇的规律的间隔的小回文重复序列clustered regularly interspaced small palindromic repeats
Cas9:跟CRISPR II型系统相关的酶Cas9: an enzyme related to the CRISPR type II system
NGS:Next Generation Sequencing,二代测序NGS: Next Generation Sequencing, next generation sequencing
PAM:Protospacer-adjacent motif,sgRNA识别DNA必需的标志片段PAM: Protospacer-adjacent motif, a marker fragment necessary for sgRNA to recognize DNA
sgRNA:CRISPR/Cas9系统中起引导作用的RNAsgRNA: RNA that acts as a guide in the CRISPR/Cas9 system
indel:CRISPR/Cas9编辑引起的DNA上的插入、删除indel: insertions and deletions on DNA caused by CRISPR/Cas9 editing
spacer:sgRNA中起碱基互补配对的20个左右的碱基spacer: About 20 bases in the sgRNA that start complementary base pairing
OTF:out of frame,移码突变。OTF: out of frame, frame shift mutation.
Read:读长,是高通量测序中一个反应获得的测序序列。Read: read length, which is the sequencing sequence obtained in one reaction in high-throughput sequencing.
本实施例提供一种基于CRISPR/Cas9的sgRNA的设计方法,针对不同物种不同类型细胞建立自己个性化sgRNA设计模型的流程,可以根据不同需求建立模型并设计sgRNA,具体包括下列四个步骤:This embodiment provides a method for designing sgRNA based on CRISPR/Cas9. The process of establishing a personalized sgRNA design model for different types of cells of different species can be established according to different requirements and design sgRNA, specifically including the following four steps:
(1)数据收集:从文献中收集到的收据一般为两类:sgRNA与相对应的酶切效率数值型或者sgRNA与相对应的酶切效率分类型(如有效或者无效二分类);从SRA数据库中下载的NGS则只有数值型一种。因为NGS数据通过统计OTF率后的流程与文献中收集的数值型一致,故本实施例只对文献分类型和NGS两种数据的进行阐述。(1) Data collection: The receipts collected from the literature are generally of two types: sgRNA and the corresponding enzyme digestion efficiency numerical type or sgRNA and the corresponding enzyme digestion efficiency type (such as valid or invalid); from SRA The NGS downloaded in the database only has a numerical type. Because the flow of NGS data after OTF rate statistics is consistent with the numerical type collected in the literature, this example only elaborates on the literature classification and NGS data.
分类型数据:针对从文献中收集的分类型数据,本实施例规定有效为1,无效为0,整理成如表1的格式。Classified data: For the classified data collected from the literature, this embodiment stipulates that valid is 1, and invalid is 0, and it is sorted into a format such as Table 1.
表1Table 1
数值型数据:针对NGS的数值型数据,首先通过BWA分别把sgRNA的序列和NGS的reads比对到人类参考基因组上,取出包含sgRNA的reads,并判断在切割点是否产生indel以及indel是否是OTF,然后统计每个sgRNA的OTF率(OTF率=包含该sgRNA并且是OTF的reads的总数除以包含该sgRNA的总reads数)。最后整理为如表2的格式。Numerical data: For NGS numerical data, first compare the sgRNA sequence and NGS reads to the human reference genome through BWA, take out the reads containing sgRNA, and judge whether indels are generated at the cutting point and whether the indels are OTF , and then count the OTF rate of each sgRNA (OTF rate=the total number of reads containing the sgRNA and OTF divided by the total number of reads containing the sgRNA). Finally, it is organized into the format shown in Table 2.
表2Table 2
(2)建立模型:如图1所示,从相应的参考基因组提取收集到的sgRNA的序列信息。假设设置上下游序列分别为35和32个碱基,则取出的序列为90(35+20+3+32)个碱基。CACCTGGTAT GTTCGTATCG GGCAGAATATCGCAACCTGC TCAGCGCC TA CGGTCCATCT CGCTCAGGTACGACTGACCGACCCAGTCTA。(2) Modeling: As shown in Figure 1, the sequence information of the collected sgRNA is extracted from the corresponding reference genome. Assuming that the upstream and downstream sequences are set to be 35 and 32 bases respectively, the extracted sequence is 90 (35+20+3+32) bases. CACCTGGTAT GTTCGTATCG GGCAGAATATCGCAACCTGC TCAGCGCC TA CGGTCCATCT CGCTCAGGTACGACTGACCGACCCAGTCTA.
对提取的sgRNA信息进行二进制编码,规则如表3所示。The extracted sgRNA information is binary coded, and the rules are shown in Table 3.
表3table 3
则以上取出90个碱基可编码为:Then the above 90 bases can be coded as:
0100 1000 0100 0100 0001 0010 0010 0001 1000 00010100 1000 0100 0100 0001 0010 0010 0001 1000 0001
0010 0001 0001 0100 0010 0001 1000 0001 0100 00100010 0001 0001 0100 0010 0001 1000 0001 0100 0010
0010 0010 0100 1000 0010 1000 1000 0001 1000 00010010 0010 0100 1000 0010 1000 1000 0001 1000 0001
0100 0010 0100 1000 1000 0100 0100 0001 0010 01000100 0010 0100 1000 1000 0100 0100 0001 0010 0100
0001 0100 1000 0010 0100 0010 0100 0100 0001 10000001 0100 1000 0010 0100 0010 0100 0100 0001 1000
0100 0010 0010 0001 0100 0100 1000 0001 0100 00010100 0010 0010 0001 0100 0100 1000 0001 0100 0001
0100 0010 0100 0001 0100 1000 0010 0010 0001 10000100 0010 0100 0001 0100 1000 0010 0010 0001 1000
0100 0010 1000 0100 0001 0010 1000 0100 0100 00100100 0010 1000 0100 0001 0010 1000 0100 0100 0010
1000 0100 0100 0100 1000 0010 0001 0100 0001 10001000 0100 0100 0100 1000 0010 0001 0100 0001 1000
用机器学习方法提取特征,建立个性化sgRNA设计模型。Use machine learning methods to extract features and build a personalized sgRNA design model.
针对分类型数据,用逻辑回归来选择特征和建立预测模型。二分类逻辑回归有两个可选的正则化,本发明用L1正则化进行特征选择,L2正则化建立模型。For categorical data, logistic regression is used to select features and build predictive models. The binary classification logistic regression has two optional regularizations. The present invention uses L1 regularization for feature selection, and L2 regularization for model building.
L1正则化逻辑回归解下列稀疏特征选择的最优化问题:L1 regularized logistic regression solves the following optimization problems for sparse feature selection:
其中,w和c是被估计的特征的权重和截距,X是训练样本的特征表示,n是训练样本的数量,y是sgRNA相对应的酶切效率值。Among them, w and c are the weights and intercepts of the estimated features, X is the feature representation of the training samples, n is the number of training samples, and y is the enzyme cutting efficiency value corresponding to the sgRNA.
用L2惩罚逻辑回归解最小化价值函数:Solve the minimized value function with L2 penalized logistic regression:
针对数值型数据,用Lasso模型来做特征选择,标准线性回归来建立预测模型。Lasso是估计稀疏相关系数的线性模型,主要通过提取非零权重来选择特征向量。最小化目标函数为:For numerical data, the Lasso model is used for feature selection, and the standard linear regression is used to establish a predictive model. Lasso is a linear model that estimates sparse correlation coefficients, mainly by extracting non-zero weights to select feature vectors. The objective function to minimize is:
其中,w是被估计的特征向量的权重,x是被选择的sgRNA的特征向量,n是训练样本的数量,y是sgRNA相对应的酶切效率值;α是一个常数,||w||1是参数向量的矩阵;Lasso模型通过增加α||w||1来解这个最小二乘损失函数,通过遍历正则化矩阵,非零权重的特征被提取出来,这些特征被认为是重要的影响sgRNA酶切效率的元素。Among them, w is the weight of the estimated feature vector, x is the feature vector of the selected sgRNA, n is the number of training samples, y is the enzyme cutting efficiency value corresponding to the sgRNA; α is a constant, ||w|| 1 is a matrix of parameter vectors; the Lasso model solves this least squares loss function by adding α||w|| 1 , and by traversing the regularization matrix, features with non-zero weights are extracted, which are considered to be important influences Elements of sgRNA cleavage efficiency.
选到这些特征后,然后用一个标准线性回归建立一个评估模型。After selecting these features, a standard linear regression is then used to build an evaluation model.
数值型和分类型的建模结果都产生两个文件:一个是xml文件,内容包含有选择的特征,和交叉验证的结果;另一个文件是pkl文件,内容为建立的预测模型,二进制文件。Two files are generated for both numerical and subtype modeling results: one is an xml file, which contains selected features and cross-validation results; the other file is a pkl file, which contains the established predictive model and a binary file.
xml文件内容如下:The content of the xml file is as follows:
(3)评估模型:采用NDCG算法衡量预测模型的质量,NDCG(Normalized DiscountedCumulative Gain,归一化折损累积增益)是主要用来衡量一个排序模型的效能,它的值代表着预测的排序结果和实际的排序之间的相似性,范围在0和1之间,1表示完全一致,数值越大代表着这个模型越好。具体公式如下:(3) Evaluation model: The NDCG algorithm is used to measure the quality of the prediction model. NDCG (Normalized Discounted Cumulative Gain, normalized discounted cumulative gain) is mainly used to measure the performance of a sorting model, and its value represents the predicted sorting results and The similarity between the actual rankings ranges between 0 and 1, 1 means complete agreement, and the larger the value, the better the model. The specific formula is as follows:
DCG(Discounted Cumulative Gain,折损累积增益)是用预测排序计算的数值,IDCG(ideal DCG),是理想的DCG,用真实排序计算所得。DCG的数学定义如下:DCG (Discounted Cumulative Gain, discounted cumulative gain) is a value calculated by predictive sorting, and IDCG (ideal DCG) is an ideal DCG calculated by real sorting. The mathematical definition of DCG is as follows:
其中,reli是第i位置预测的排序值。where rel i is the ranking value predicted at the i-th position.
如下表所示,sgID为sgRNA的名称,seq为sgRNA的spacer序列,Benchmark Score为基准分数,BS_rank为Benchmark Score的排序,Cage为本发明预测模型评估的分数,C_rank为Cage的排序如表4所示。As shown in the table below, sgID is the name of the sgRNA, seq is the spacer sequence of the sgRNA, Benchmark Score is the benchmark score, BS_rank is the ranking of the Benchmark Score, Cage is the score of the prediction model evaluation of the present invention, and C_rank is the ranking of the Cage as shown in Table 4 Show.
表4Table 4
TOP50 NDCG=0.876322904TOP50 NDCG=0.876322904
TOP 10%NDCG=0.84340749TOP 10% NDCG = 0.84340749
如果数据库中没有此模型,则更新到数据库,否则算出两组的NDCG值进行比较,若新的模型比已有模型的NDCG值大,则可更新到数据库。If there is no such model in the database, it will be updated to the database, otherwise, the NDCG values of the two groups will be calculated for comparison, and if the new model is greater than the NDCG value of the existing model, it can be updated to the database.
(4)设计和评估:如图2所示,针对用户已设计好的sgRNA进行评估或者针对用户给出的基因组区域(如chromosome 1,1,000,000to 1,002,000,hg19),进行sgRNA的设计,首先确定要评估的sgRNA的物种或者细胞类型,然后选择适合的模型进行评估,如果没有合适的模型,可选择相类似的模型,本实施例提供了涉及3个物种8种细胞的10个模型以供选择使用。结果输出如表5所示。(4) Design and evaluation: As shown in Figure 2, to evaluate the sgRNA designed by the user or to design the sgRNA for the genomic region given by the user (such as chromosome 1, 1,000,000 to 1,002,000, hg19), first determine the The species or cell type of the sgRNA to be evaluated, and then select a suitable model for evaluation. If there is no suitable model, a similar model can be selected. This example provides 10 models involving 3 species and 8 types of cells for selection. . The resulting output is shown in Table 5.
表5table 5
至此,用户可以选择适合自己需求的sgRNA进行下一步的研究。At this point, users can choose the sgRNA that suits their needs for further research.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A design method of sgRNA based on CRISPR/Cas9 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341946.3A CN106446600B (en) | 2016-05-20 | 2016-05-20 | A design method of sgRNA based on CRISPR/Cas9 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446600A true CN106446600A (en) | 2017-02-22 |
CN106446600B CN106446600B (en) | 2019-10-18 |
Family
ID=58183551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610341946.3A Expired - Fee Related CN106446600B (en) | 2016-05-20 | 2016-05-20 | A design method of sgRNA based on CRISPR/Cas9 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446600B (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9999671B2 (en) | 2013-09-06 | 2018-06-19 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10465176B2 (en) | 2013-12-12 | 2019-11-05 | President And Fellows Of Harvard College | Cas variants for gene editing |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
CN110689922A (en) * | 2018-07-04 | 2020-01-14 | 赛业(广州)生物科技有限公司 | Method and system for GC content analysis of automatic parallelization knockout strategy |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | A CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111881324A (en) * | 2020-07-30 | 2020-11-03 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data universal storage format structure, construction method and application thereof |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
CN117252306A (en) * | 2023-10-11 | 2023-12-19 | 中央民族大学 | A method for calculating gene editing capability index |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
US12157760B2 (en) | 2018-05-23 | 2024-12-03 | The Broad Institute, Inc. | Base editors and uses thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
-
2016
- 2016-05-20 CN CN201610341946.3A patent/CN106446600B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103805606A (en) * | 2014-02-28 | 2014-05-21 | 青岛市畜牧兽医研究所 | Pair of small guide RNAs (Ribonucleic Acids) (sgRNAs) for specifically identifying sheep DKK1 gene and coded DNA (Deoxyribonucleic Acid) and application of sgRNAs |
CN104109687A (en) * | 2014-07-14 | 2014-10-22 | 四川大学 | Construction and application of Zymomonas mobilis CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-association proteins)9 system |
CN105255937A (en) * | 2015-08-14 | 2016-01-20 | 西北农林科技大学 | Method for expression of CRISPR sgRNA by eukaryotic cell III-type promoter and use thereof |
CN105400779A (en) * | 2015-10-15 | 2016-03-16 | 芜湖医诺生物技术有限公司 | Target sequence, recognized by streptococcus thermophilus CRISPR-Cas9 system, of human CCR5 gene, sgRNA and application of CRISPR-Cas9 system |
CN105296518A (en) * | 2015-12-01 | 2016-02-03 | 中国农业大学 | Homologous arm vector construction method used for CRISPR/Cas 9 technology |
Non-Patent Citations (6)
Title |
---|
JOHN G DOENCH ET AL: "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9", 《NATURE BIOTECHNOLOGY》 * |
NICOLO FUSI ET AL: "In Silico Predictive Modeling of CRISPR/Cas9 guide efficiency", 《BIORXIV》 * |
YANG LEI ET AL: "CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR system in plants", 《MOLECULAR PLANT》 * |
王立人: "CRISPR/CAS系统介导的基因组大片段DNA编辑", 《中国博士学位论文全文数据库基础科学辑》 * |
谢胜松等: "CRISPR/Cas9系统中sgRNA设计与脱靶效应评估", 《遗传》 * |
邵红伟等: "CRISPR-Cas9系统定向编辑TCR基因的sgRNA筛选", 《集美大学学报(自然科学版)》 * |
Cited By (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12006520B2 (en) | 2011-07-22 | 2024-06-11 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US11920181B2 (en) | 2013-08-09 | 2024-03-05 | President And Fellows Of Harvard College | Nuclease profiling system |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
US10954548B2 (en) | 2013-08-09 | 2021-03-23 | President And Fellows Of Harvard College | Nuclease profiling system |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US9999671B2 (en) | 2013-09-06 | 2018-06-19 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US11299755B2 (en) | 2013-09-06 | 2022-04-12 | President And Fellows Of Harvard College | Switchable CAS9 nucleases and uses thereof |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
US10682410B2 (en) | 2013-09-06 | 2020-06-16 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
US10912833B2 (en) | 2013-09-06 | 2021-02-09 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US11053481B2 (en) | 2013-12-12 | 2021-07-06 | President And Fellows Of Harvard College | Fusions of Cas9 domains and nucleic acid-editing domains |
US10465176B2 (en) | 2013-12-12 | 2019-11-05 | President And Fellows Of Harvard College | Cas variants for gene editing |
US11124782B2 (en) | 2013-12-12 | 2021-09-21 | President And Fellows Of Harvard College | Cas variants for gene editing |
US12215365B2 (en) | 2013-12-12 | 2025-02-04 | President And Fellows Of Harvard College | Cas variants for gene editing |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US11578343B2 (en) | 2014-07-30 | 2023-02-14 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10704062B2 (en) | 2014-07-30 | 2020-07-07 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US12043852B2 (en) | 2015-10-23 | 2024-07-23 | President And Fellows Of Harvard College | Evolved Cas9 proteins for gene editing |
US11214780B2 (en) | 2015-10-23 | 2022-01-04 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10947530B2 (en) | 2016-08-03 | 2021-03-16 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11999947B2 (en) | 2016-08-03 | 2024-06-04 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11702651B2 (en) | 2016-08-03 | 2023-07-18 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US12084663B2 (en) | 2016-08-24 | 2024-09-10 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
US11820969B2 (en) | 2016-12-23 | 2023-11-21 | President And Fellows Of Harvard College | Editing of CCR2 receptor gene to protect against HIV infection |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11932884B2 (en) | 2017-08-30 | 2024-03-19 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
US12157760B2 (en) | 2018-05-23 | 2024-12-03 | The Broad Institute, Inc. | Base editors and uses thereof |
CN110689922A (en) * | 2018-07-04 | 2020-01-14 | 赛业(广州)生物科技有限公司 | Method and system for GC content analysis of automatic parallelization knockout strategy |
CN110689922B (en) * | 2018-07-04 | 2023-07-14 | 广州赛业百沐生物科技有限公司 | A method and system for automatic parallel knockout strategy GC content analysis |
US11795452B2 (en) | 2019-03-19 | 2023-10-24 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11643652B2 (en) | 2019-03-19 | 2023-05-09 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
US12031126B2 (en) | 2020-05-08 | 2024-07-09 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | A CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111881324B (en) * | 2020-07-30 | 2023-12-15 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data general storage format structure, construction method and application thereof |
CN111881324A (en) * | 2020-07-30 | 2020-11-03 | 苏州工业园区服务外包职业学院 | High-throughput sequencing data universal storage format structure, construction method and application thereof |
CN117252306B (en) * | 2023-10-11 | 2024-02-27 | 中央民族大学 | Gene editing capability index calculation method |
CN117252306A (en) * | 2023-10-11 | 2023-12-19 | 中央民族大学 | A method for calculating gene editing capability index |
Also Published As
Publication number | Publication date |
---|---|
CN106446600B (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446600B (en) | A design method of sgRNA based on CRISPR/Cas9 | |
CN111798921B (en) | RNA binding protein prediction method and device based on multi-scale attention convolution neural network | |
CN108319984B (en) | Construction method and prediction method of prediction model of leaf phenotypic characteristics and photosynthetic characteristics of woody plants based on DNA methylation level | |
CN104866863B (en) | A kind of biomarker screening technique | |
CN106295246A (en) | Find the lncRNA relevant to tumor and predict its function | |
CN111462820A (en) | Noncoding RNA prediction method based on feature screening and ensemble algorithm | |
CN110111843A (en) | Method, equipment and the storage medium that nucleic acid sequence is clustered | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
CN111613274A (en) | A deep learning-based method for predicting CRISPR/Cas9 sgRNA activity | |
CN111613267A (en) | A CRISPR/Cas9 off-target prediction method based on attention mechanism | |
Khan et al. | Detecting N6-methyladenosine sites from RNA transcriptomes using random forest | |
CN115064220A (en) | A single-cell method for cross-species cell type identification | |
Hickl et al. | Binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets | |
Yardibi et al. | The trend of breeding value research in animal science: bibliometric analysis | |
Lv et al. | Machine learning for biological sequence analysis | |
CN105279396A (en) | Excavation method of plant drought-tolerant gene module | |
CN115394348A (en) | IncRNA subcellular localization prediction method, equipment and medium based on graph convolution network | |
CN113658641B (en) | Phage classification method, device, equipment and storage medium | |
CN118435284A (en) | A method for predicting genetic editing activity through deep learning and its use | |
Chai et al. | Integrating multi-omics data with deep learning for predicting cancer prognosis | |
CN101894216B (en) | Method of discovering SNP group related to complex disease from SNP information | |
Jia et al. | EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention | |
Sun et al. | CRISPR-M: Predicting sgRNA off-target effect using a multi-view deep learning network | |
Agüero-Chapin et al. | An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference | |
CN112365930B (en) | Method for determining optimal sequence alignment threshold value for gene database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191018 |
|
CF01 | Termination of patent right due to non-payment of annual fee |