WO2017181735A2 - 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法 - Google Patents

一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法 Download PDF

Info

Publication number
WO2017181735A2
WO2017181735A2 PCT/CN2017/000248 CN2017000248W WO2017181735A2 WO 2017181735 A2 WO2017181735 A2 WO 2017181735A2 CN 2017000248 W CN2017000248 W CN 2017000248W WO 2017181735 A2 WO2017181735 A2 WO 2017181735A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
recognition site
sgrna recognition
gene
sgrna
Prior art date
Application number
PCT/CN2017/000248
Other languages
English (en)
French (fr)
Inventor
陈庄
刘文华
蒋宗勇
张群洁
戴彰言
俞婷
陈中健
朱翠
Original Assignee
广东省农业科学院农业生物基因研究中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东省农业科学院农业生物基因研究中心 filed Critical 广东省农业科学院农业生物基因研究中心
Publication of WO2017181735A2 publication Critical patent/WO2017181735A2/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing

Definitions

  • the invention belongs to the technical field of genomics and bioinformatics, in particular to a highly efficient specific sgRNA recognition site guiding sequence for pig gene editing and a screening method thereof.
  • CRISPR-Cas9 is an adaptive immune system for phage genomes or horizontal transfer plasmids present in bacteria.
  • the Cas9 protein with endonuclease activity specifically recognizes and cleaves double-stranded DNA under the guidance of sgRNA.
  • the CRISPR/Cas9 technology is also mainly composed of two parts: one is sgRNA that specifically binds to the genome through base complementary pairing; the other is Cas9 nuclease (Barrangou, which can target specific genomic sequences with PAM and cleave 2014).
  • the target gene can be knocked out by changing the site of the sgRNA.
  • Doench et al. found that different sgRNAs have different editing activities (Doench et al., 2014); Zhang et al.'s study of PAM found that NGG editing The most efficient (Zhang et al., 2014), Farboud et al. also found that sgRNA with 3' end GG can significantly increase the base. Due to group editing efficiency (Farboud and Meyer, 2015).
  • many studies have shown that the CRISPR/Cas9 technology has a certain off-target effect. Fu et al.
  • SNP-corrected genome The recognition of sgRNA depends on sequence similarity. Sometimes the research object is not the standard reference genome, especially when the mutation occurs in the target gene, it will affect the screening of the sgRNA recognition site.
  • Score of screening results The probability of off-target mechanism of sgRNA is ongoing but there is no absolute conclusion. Most software will not give an intermediate scoring process to assist in manual screening.
  • the present invention provides a highly efficient specific sgRNA recognition site leader sequence for porcine gene editing and a screening method thereof.
  • a screening method for efficient specific sgRNA recognition site leader sequences for pig gene editing comprising the following steps:
  • step 2) Using a script to select all the exon sequences obtained from all the protein-coding genes in step 1), select a site having the 5'-GN 20 GG-3' sequence characteristic, and remove the sequence spanning the exon region. The remaining sequence is used as a data basis for subsequent screening of specific sgRNA recognition site leader sequences;
  • Statistical sgRNA recognition site guide sequence score select the three sgRNA recognition site leader sequences with the highest score in each protein coding gene; when the sgRNA recognition site guidance sequence satisfying the maximum score limit of the total score is less than 3 , changing the X value in the structural formula of 5'-GN X GG-3', stepwise decreasing from 20 to 16, repeating steps 3)-5) until an eligible sgRNA recognition site leader sequence is obtained; for having a variable clip
  • the gene in order to completely knock out the transcripts produced by all the different splicing methods of the target gene with the least sgRNA, we use overlapping regions in different transcripts as the preferred region for screening sgRNA recognition sites, such as the inability to find sufficient quantities in this region.
  • the sgRNA recognition site is then screened for non-overlapping regions to ensure that there is a sufficient number of sgRNA recognition site leader sequences for each of the alternatively spliced genes in the final screening results.
  • a gene has three alternative splicing, and only one sgRNA recognition site leader sequence is found in the overlapping region of three transcripts.
  • the final number of sites may be between 3-7 sgRNA recognition site leader sequences.
  • the scoring matrix is constructed by first calculating a penalty for each off-target point of the candidate sgRNA recognition site leader sequence; The mismatch position penalty starts from 100% (5' end) and the penalty is gradually reduced to 0% (3' end) (the decreasing curve is tunable); 2 multiple mismatched points are multiplied by penalty, so that Off-target sites with multiple base mismatches have lower scores; 3 off-target positions in functional gene exons, in introns, or intergenic regions will be subject to additional penalties (tunable, default) There are no penalty for exon 200%, intron 100% and intergenic region); 4 set the maximum score of a single off-target site to 1.5 (adjustable parameter).
  • the method prior to performing step 1), further comprises: comparing the data of the target sample resequence to the reference genome using SOAP, and acquiring the SNP in the modified target sample using SOAPsnp, and obtaining the genomic data for analysis .
  • This step is an optional step that applies to situations where the target genome differs significantly from the reference genome.
  • the protein-encoding gene in the pig genome that has been sequenced is 21,630.
  • the gene having alternative splicing is 2386.
  • the present invention also provides a highly efficient specific sgRNA recognition site leader sequence for pig gene editing screened by the above screening method.
  • the invention utilizes the annotation information of the whole genome sequence of the pig and the protein-encoding gene, based on The results of the sgRNA activity and off-target probability studies in the latest sgRNA study predicted a highly efficient specific sgRNA recognition site leader sequence that can be used for editing CRISPR-Cas9 genes in all of the pig-encoding genes and can be used for species with whole genome sequences. Method and software. Compared with the prior art, the present invention has the following significant advantages:
  • the sgRNA recognition site-directed sequence of the pig selected by the screening method of the present invention has been subjected to rigorous screening and testing, and the sgRNA recognition site-directing sequence for editing the CRISPR-Cas9 gene containing all the porcine protein-encoding genes.
  • the identification, scoring and testing algorithms for specific sgRNA recognition in the present invention, and the software corresponding to the algorithm for predicting and evaluating the functional gene sgRNA target site of pigs Can be widely used for sgRNA-specific site prediction of non-model species with whole genome sequences;
  • the specific sgRNA recognition site-directed sequence of the pigs screened by the present invention can be used to accurately knock out the single functional gene of the pig; the mixed sgRNA library based on the sgRNA target site of the genome-wide functional gene can also be used for construction.
  • a CRISPR-Cas9 editing library of functional genes in the pig genome for screening genes related to different stress factors in pig cells.
  • Example 1 is a flow chart showing a method for screening a highly efficient specific sgRNA recognition site leader sequence for porcine gene editing according to Example 1 of the present invention.
  • Example 1 Screening method for efficient specific sgRNA recognition site leader sequence for pig gene editing
  • FIG. 1 is a flowchart of a method for screening a highly efficient specific sgRNA recognition site guide sequence for pig gene editing according to the embodiment.
  • the experimental sample of the present embodiment is a pig (Sus scrofa Duroc) that has been sequenced.
  • the genome (version 10.2) has a splicing length of 2.8 Gb. Since the present embodiment is a sequencing species, the SNP correction process is omitted; if the experimental sample is a sequenced Duroc pig, a Wuzhishan pig, and a sequenced Vietnamese wild boar, the reference genome of the sequencing strain can be directly used;
  • a screening method for efficient specific sgRNA recognition site leader sequences for porcine gene editing includes the following specific steps:
  • genes with a single splicing pattern there were 19,244 genes with a single splicing pattern and 2,386 genes with alternative splicing.
  • genes with alternative splicing regions that overlap in different transcripts are first preferred, and other differential regions are used as an alternative to ensure that there are sufficient numbers of sgRNA recognition sites for each alternative splicing in the final screening results.
  • Boot sequence For genes with alternative splicing, regions that overlap in different transcripts are first preferred, and other differential regions are used as an alternative to ensure that there are sufficient numbers of sgRNA recognition sites for each alternative splicing in the final screening results.
  • a sequence of 23 bp in length of 5'-GN 20 GG-3' was selected as a candidate sgRNA target site for all CDS using a script.
  • each candidate sgRNA leader sequence was scored.
  • each off-target penalty for the candidate sgRNA leader sequence is calculated separately. 1 Intra-sequence mismatch position penalty starts from 100% (5' end) penalty gradually decreases to 0% (3' end) (declining curve is linear reduction); 2 multiple mismatched points are multiplied by penalty, The off-target site with multiple mismatch sites has a lower score; 3 the off-target position in the functional gene exon, intron or intergenic region will be separately penalized, the parameter is set to 300% , 200% and 100%; 4 set the maximum score of a single off-target site to 1.5, and remove the candidate sgRNA recognition site leader sequence with an off-target site greater than the score.
  • the score of the total score of the selected sgRNA guide sequence is 300 points.
  • the sgRNA leader sequence score was counted and the highest scoring 3 sgRNA leader sequences were selected for each transcript. When less than 3 sgRNA leader sequences are met, 5'-GN 19 GG-3', 5'-GN 18 GG-3', 5'-GN 17 GG-3', etc. Steps 3)-5), search for the sgRNA leader sequence that meets the requirements.
  • 18,838 genes were found to be suitable for CRISPR-Cas9 editing target sites, accounting for 87% of the total, of which 18,318 genes have more than three specific sgRNA recognition site leader sequences, and 520 genes have 1 - 2 specific sgRNA recognition site leader sequences, 2792 genes have high sequence repeatability, and there is no sgRNA target site suitable for single CRISPR-Cas9 editing.
  • the algorithm is developed into a perl software package based on the lunix system.
  • the screening method for the efficient specific sgRNA recognition site for pig gene editing also includes the first step of genomic SNP correction, ie using SOAP
  • genomic SNP correction ie using SOAP
  • the data of the target sample resequence is compared to the reference genome, and the SNP in the target sample is obtained by using SOAPsnp to obtain the genomic data for analysis, and the genomic SNP correction is added to improve the specificity and accuracy of the subsequent method; The same as in the first embodiment.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法 技术领域
本发明属于基因组学与生物信息学技术领域,具体地说,本发明涉及一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法。
背景技术
CRISPR(Clustered regularly interspaced short palindromic repeats)/Cas9系统介导的基因组编辑技术,是在锌指核酸酶(ZFNs)和类转录激活因子效应物核酸酶(TALEN)后的第三代基因组编辑技术(Brouns et al.,2008)。CRISPR-Cas9是细菌中存在的一种对噬菌体基因组或水平转移质粒的适应性免疫系统,具有核酸内切酶活性的Cas9蛋白在sgRNA的引导下特异地识别并切割双链DNA。因此,CRISPR/Cas9技术也主要由两部分组成:一是通过碱基互补配对与基因组特异结合的sgRNA;另一个是可靶定到具有PAM的特定基因组序列并进行切割的Cas9核酸酶(Barrangou,2014)。
通过改变sgRNA的位点可以实现对目标基因进行敲除,然而Doench等人的研究发现不同的sgRNA具有不同的编辑活性(Doench et al.,2014);Zhang等人对PAM的研究发现NGG的编辑效率最高(Zhang et al.,2014),Farboud等人也发现3′末端为GG的sgRNA可显著提高基 因组编辑效率(Farboud and Meyer,2015)。此外,很多研究表明,CRISPR/Cas9技术存在一定的脱靶效应。Fu等人的研究发现Cas9核酸酶对脱靶位点1~2个错配碱基的耐受能力与其配对位置有关,他们还发现含5个错配碱基的脱靶位点能被Cas9核酸酶切割(Fu et al.,2013)。Hsu等人还发现,Cas9核酸酶对错配碱基的耐受能力不仅与错配碱基数量有关还与错配碱基位置有关(Hsu et al.,2013)。Lin、Wang等人的研究也分别发现,脱靶位点即便存在一个凸起(bulge)的碱基,Cas9核酸酶可进行切割(Lin et al.,2014;Wang et al.,2015)。由此可见,CRISPR/Cas9技术存在严重的脱靶风险(Shengsong et al.,2015)。
目前,已有多款针对CRISPR/Cas9技术的sgRNA设计和/或脱靶效应评估软件,但不同的软件各有优缺点。如由美国麻省理工学院Broad研究所的张锋实验室开发的CRISPR Design;由锌指联盟开发的ZiFiT;此外还有Cas9 Design、E-CRISP、Cas-OFFinder、CRISPR-P等等。但在针对一些非模式物种的全基因组水平的研究时,这些软件很难同时满足以下要求:
1)、批量运算:大部分软件提供的在线版本,很难实现批量运算;
2)、非模式物种的搜索:在基于全基因组的分析时一些非模式物种的基因组并未包含在web服务器中,并且基因组的更新换代信息和不同版本的注释信息也会对分析结果有很大影响;
3)、SNP修正的基因组:sgRNA的识别依赖于序列相似性,有时研究的对象并非标准的参考基因组,尤其突变发生在目标基因时会影响sgRNA识别位点引导序列的筛选;
4)、筛选结果的评分:sgRNA的脱靶机制概率是正在进行但并没有绝对定论的研究,大部分软件都不会给出中间打分过程以辅助后期手工筛选;
5)、结合sgRNA位点在蛋白编码基因位置以及可变剪接问题:对蛋白编码基因靠近N端的编辑由于造成提前终止密码子的概率更高因此效率更高,而对于具有多种可变剪接的基因需要考虑对每个转录本均进行突变,很多程序中并没有兼顾上述几点。
发明内容
基于此,为了克服上述现有技术的缺陷,本发明提供了一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法。
为了实现上述发明目的,本发明采取了以下技术方案:
一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,包括以下步骤:
1)筛选猪全基因组序列中注释的蛋白编码基因中的外显子序列,标注可变剪接基因不同剪接模式间外显子的重叠状态用于5)中的搜索;
2)利用脚本对步骤1)中从所有蛋白编码基因中获取的所有外显子序列,选取具有5’-GN20GG-3’序列特性的位点,移除跨越外显子区域的序列,将剩余序列作为后续筛选特异性sgRNA识别位点引导序列的数据基础;
3)将筛选到的所有候选sgRNA识别位点引导序列比对到猪全基 因组序列上,通过序列同源性分析,首先移除在原始位点外具有其与其它基因组位置完整匹配的候选sgRNA识别位点引导序列,找出所有错配碱基数在5个以下的脱靶位点,并确定这些脱靶位点位于功能基因外显子、或内含子内,或基因间区部;
4)构建打分矩阵,对所有候选sgRNA识别位点引导序列进行打分;
5)统计sgRNA识别位点引导序列得分,选取每个蛋白编码基因中得分最高的3条sgRNA识别位点引导序列;当满足总得分的最大分值限制的sgRNA识别位点引导序列不足3条时,改变5’-GNXGG-3′的结构式中的X值,由20逐步递减到16,重复步骤3)-5),直至获得符合条件的sgRNA识别位点引导序列;对于具有可变剪辑的基因,为了以最少的sgRNA彻底敲除目标基因的所有不同剪接方式产生的转录本,我们将不同转录本中重叠的区域作为筛选sgRNA识别位点的首选区域,如在该区域无法找到足够数量的sgRNA识别位点,再对非重叠的区域进行筛选,以保证最后筛选结果中针对每一种可变剪接的基因都有足够数量的sgRNA识别位点引导序列。例如一个基因有3种可变剪接,其中3个转录本的重叠区域内只找到1个sgRNA识别位点引导序列,为满足对于每一个不同转录本需要获取3个位点的规则,将在非重叠区域筛选,最终位点数可能在3-7个sgRNA识别位点引导序列之间。
在其中一些实施例中,所述打分矩阵的构建方法为:首先,分别计算候选sgRNA识别位点引导序列的每个脱靶位点罚分;①序列内 错配位置罚分从100%开始(5’端)罚分逐渐递减至0%(3’端)(递减曲线为可调参数);②多个错配位点则罚分相乘,使得具有多个碱基错配的脱靶位点具有较低分值;③脱靶位置处于功能基因外显子内、内含子内、或基因间区位置将分别受到额外罚分(可调参数,默认值为外显子200%、内含子100%和基因间区无罚分);④设定单条脱靶位点的最大分值为1.5(可调参数)。第二,计算候选sgRNA识别位点引导序列的总得分,①将所有脱靶位点的得分相加;②依据候选sgRNA识别位点引导序列位点在全基因CDS总长度的百分比将给得分总数10%(可调参数)的罚分,越靠近翻译起始位置认为编辑效率越高,罚分越小;③设定选sgRNA识别位点引导序列的总得分的最大分值(可调参数)。第三,依据目标物种的文献研究与实际数据,优化算法中的参数。
在其中一些实施例中,在进行步骤1)之前还包括使用SOAP将目标样本重测序的数据比对到参考基因组,并使用SOAPsnp获取修正目标样本中的SNP,获取用于分析的基因组数据的步骤。该步骤是一个可选步骤,适用于目标基因组与参考基因组差异较大的情况。
在其中一些实施例中,所述已完成测序的猪基因组中的蛋白编码基因为21630个。
在其中一些实施例中,所述具有可变剪接的基因为2386个。
本发明还提供了通过上述筛选方法筛选得到的用于猪基因编辑的高效特异性sgRNA识别位点引导序列。
本发明利用猪的全基因组序列与蛋白编码基因的注释信息,基于 sgRNA最新研究中关于sgRNA活性与脱靶概率研究的结果,预测了包含猪所有蛋白编码基因中可用于CRISPR-Cas9基因编辑的高效特异性sgRNA识别位点引导序列以及可用于具有全基因组序列的物种的方法及软件。与现有技术相比,本发明具有以下显著优点:
1、本发明的筛选方法所筛选得到的猪的特异性sgRNA识别位点引导序列经过了严格的筛选与检验,包含所有猪蛋白编码基因的用于CRISPR-Cas9基因编辑的sgRNA识别位点引导序列,对于整个CRISPR-Cas9基因编辑成功与否至关重要;本发明中对特异性sgRNA识别的鉴定、打分和检验算法,以及算法对应的用于预测和评估猪的功能基因sgRNA靶位点的软件可广泛用于具有全基因组序列的非模式物种的sgRNA特异位点预测;
2、本发明所筛选得到的猪的特异性sgRNA识别位点引导序列可用于准确的敲除猪的单个功能基因;基于全基因组功能基因的sgRNA靶位点组合成的混合sgRNA库还可用于构建猪基因组中功能基因的CRISPR-Cas9编辑文库,用于筛选猪细胞对不同逆境因子的相关基因。
附图说明
图1为本发明实施例1的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法的流程图。
具体实施方式
以下实施例是对本发明的进一步说明,而不是对本发明的限制。 下列实施例中未注明具体实验条件和方法,所采用的技术手段通常为本领域技术人员所熟知的常规手段。
实施例1用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法
请参阅图1,为本实施例的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法的流程图,本实施例的实验样本为已完成测序的猪(Sus scrofa Duroc)的基因组(10.2版本)拼接长度为2.8Gb。由于本实施例为测序品种,省略了SNP的修正过程;如实验样本为已测序的杜洛克猪、五指山猪和测序种的西藏野猪(Tibetan wild boar)可直接使用该测序品系的参考基因组;
用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法包括以下具体步骤:
1)猪基因组中蛋白编码基因的分类与筛选
在Ensembl(www.ensembl.org)的数据库中,猪的基因组注释了30582个基因,移除转座子来源的基因以及没有注释蛋白编码区(CDS)的基因后,剩余21630个蛋白编码基因。
其中具有单一剪接模式的基因有19244个,具有可变剪接的基因2386个。对于具有可变剪接的基因,首先将在不同转录本中重叠的区域作为首选,其它差异区域作为备选,以保证最后筛选结果中针对每一种可变剪接都有足够数量的sgRNA识别位点引导序列。
2)sgRNA靶位点预测
利用脚本对所有CDS选取序列结构为5’-GN20GG-3’的长度为 23bp的序列位点作为候选sgRNA靶位点。
3)潜在脱靶位点筛选
把筛选到的所有可能sgRNA靶位点比对到全基因组序列上,找出错配的脱靶位点在5个及以下的sgRNA引导序列,删除完全相同的目标位点的sgRNA引导序列;
4)sgRNA引导序列识别位点打分
根据背景技术中,不同工作中对错配位点概率的研究,越靠近5’端的序列识别特异性越低,蛋白编码基因中对越靠近N端的序列进行编辑,对蛋白质结构影响越大。
对每个候选sgRNA引导序列进进行打分。首先,分别计算候选sgRNA引导序列的每个脱靶位点罚分。①序列内错配位置罚分从100%开始(5’端)罚分逐渐递减至0%(3’端)(递减曲线为线性缩减);②多个错配位点则罚分相乘,使得具有多个错配位点的脱靶位点具有较低分值;③脱靶位置处于功能基因外显子、内含子或基因间区位置将分别受到罚分加成,参数设定为300%、200%和100%;④将单条脱靶位点的最大分值设定为1.5,移除具有大于该分值的脱靶位点的候选sgRNA识别位点引导序列。第二,计算候选sgRNA识别位点引导序列的总得分,①将所有脱靶位点的得分相加;②依据候选sgRNA引导序列位点在全基因CDS总长度的百分比将给得分总数10%的罚分;③设定选sgRNA引导序列的总得分的最大分值为300分。
5)结果筛选与统计
统计sgRNA引导序列得分,对每个转录本选取得分最高的3个 sgRNA引导序列。当遇到满足条件的sgRNA引导序列不足3条时,使用5’-GN19GG-3′、5’-GN18GG-3’、5’-GN17GG-3’等长度递减的匹配重复步骤3)-5),搜索符合要求的sgRNA引导序列。
在21630个基因中,18838个基因找到了适合CRISPR-Cas9编辑靶位点,占总量的87%,其中18318个基因具有3个以上的特异的sgRNA识别位点引导序列,520个基因具有1-2个特异的sgRNA识别位点引导序列,2792个基因由于序列重复度较高,没有适合单个CRISPR-Cas9编辑的sgRNA靶位点。
6)算法优化与软件开发
基于以上分析步骤,将算法开发为基于lunix系统的perl软件包。
若实验样本为未测序的品系的猪(如长白猪、梅花猪),用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法还包括首先进行基因组SNP修正的步骤,即使用SOAP将目标样本重测序的数据比对到参考基因组,并使用SOAPsnp获取修正目标样本中的SNP,获取用于分析的基因组数据,加入基因组SNP修正以可以提高后续方法的特异性与准确性;其他步骤与实施例1相同。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。 因此,本发明专利的保护范围应以所附权利要求为准。

Claims (6)

  1. 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,其特征在于,包括以下步骤:
    1)筛选猪全基因组序列中注释的蛋白编码基因中的外显子序列,标注可变剪接基因不同剪接模式间外显子的重叠状态用于5)中的搜索;
    2)利用脚本对步骤1)中从所有蛋白编码基因中获取的所有外显子序列,选取具有5’-GN20GG-3’序列特性的位点,移除跨越外显子区域的序列,将剩余序列作为后续筛选特异性sgRNA识别位点引导序列的数据基础;
    3)将筛选到的所有候选sgRNA识别位点引导序列比对到猪全基因组序列上,通过序列同源性分析,首先移除在原始位点外具有其与其它基因组位置完整匹配的候选sgRNA识别位点引导序列,找出所有错配碱基数在5个以下的脱靶位点,并确定这些脱靶位点位于功能基因外显子或内含子内,或者基因间区部;
    4)构建打分矩阵,对所有候选sgRNA识别位点引导序列进行打分;
    5)统计sgRNA识别位点引导序列得分,选取每个蛋白编码基因中得分最高的3条sgRNA识别位点引导序列;当满足总得分的最大分值限制的sgRNA识别位点引导序列不足3条时,改变5’-GNXGG-3′的结构式中的X值,由20逐步递减到16,重复步骤3)-5),直至获得符合条件的sgRNA识别位点引导序列;对于具有可变剪辑的基因,优先搜索不同剪接模式中重叠区域内的sgRNA识别位点引导序列, 如数量不足,则使用非重叠区域来填补,以便每一种剪接形式的转录本都能覆盖。
  2. 根据权利要求1所述的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,其特征在于,步骤4)所述打分矩阵的构建方法为:首先,分别计算候选sgRNA识别位点引导序列的每个脱靶位点罚分;①序列内错配位置罚分从5’端100%开始罚分逐渐递减至3’端0%;②多个错配位点则罚分相乘,使得具有多个碱基错配的脱靶位点具有较低分值;③脱靶位置处于功能基因外显子内、内含子内或基因间区位置将分别受到额外罚分;④设定单条脱靶位点的最大分值为1.5;第二,计算候选sgRNA识别位点引导序列的总得分,①将所有脱靶位点的得分相加;②依据候选sgRNA识别位点引导序列位点在全基因CDS总长度的百分比将给得分总数10%的罚分,越靠近翻译起始位置认为编辑效率越高,罚分越小;③设定选sgRNA识别位点引导序列的总得分的最大分值;第三,依据目标物种的文献研究与实际数据,优化算法中的参数。
  3. 根据权利要求1所述的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,其特征在于,在进行步骤1)之前还包括使用SOAP将目标样本重测序的数据比对到参考基因组,并使用SOAPsnp获取修正目标样本中的SNP,获取用于分析的基因组数据的步骤。
  4. 根据权利要求1所述的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,其特征在于,步骤1)中所述已完成测 序的猪基因组中的蛋白编码基因为21630个。
  5. 根据权利要求1所述的用于猪基因编辑的高效特异性sgRNA识别位点引导序列的筛选方法,其特征在于,步骤1)中所述具有可变剪接的基因为2386个。
  6. 权利要求1~5任一项所述的筛选方法筛选得到的用于猪基因编辑的高效特异性sgRNA识别位点引导序列。
PCT/CN2017/000248 2016-04-20 2017-03-22 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法 WO2017181735A2 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610248143.3 2016-04-20
CN201610248143.3A CN105886616B (zh) 2016-04-20 2016-04-20 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法

Publications (1)

Publication Number Publication Date
WO2017181735A2 true WO2017181735A2 (zh) 2017-10-26

Family

ID=56705013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/000248 WO2017181735A2 (zh) 2016-04-20 2017-03-22 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法

Country Status (2)

Country Link
CN (1) CN105886616B (zh)
WO (1) WO2017181735A2 (zh)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
CN110751982A (zh) * 2018-07-04 2020-02-04 赛业(广州)生物科技有限公司 一种智能并行化敲除策略筛选的方法及系统
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12031126B2 (en) 2023-12-08 2024-07-09 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109207515A (zh) * 2017-07-03 2019-01-15 华中农业大学 一种设计和构建猪全基因组CRISPR/Cas9敲除文库的方法
CN108205614A (zh) * 2017-12-29 2018-06-26 苏州金唯智生物科技有限公司 一种全基因组sgRNA文库的构建系统及其应用
CN108221058A (zh) * 2017-12-29 2018-06-29 苏州金唯智生物科技有限公司 一种猪全基因组sgRNA文库及其构建方法和应用
CN108319818B (zh) * 2018-02-07 2018-12-07 中国科学院生物物理研究所 一种预测影响长非编码rna生物学功能的snp位点的方法
CN108359712B (zh) * 2018-02-09 2020-06-26 广东省农业科学院农业生物基因研究中心 一种快速高效筛选SgRNA靶向DNA序列的方法
CN108733977A (zh) * 2018-05-31 2018-11-02 中国人民解放军军事科学院军事医学研究院 真核生物保守转录因子结合位点聚集区tfcr的识别方法与应用
CN113436683A (zh) * 2020-03-23 2021-09-24 北京合生基因科技有限公司 筛选候选插入片段的方法和系统
CN111849983A (zh) * 2020-07-17 2020-10-30 中国农业大学 一种sgRNA及其应用
CN116206684B (zh) * 2022-12-26 2024-01-30 纳昂达(南京)生物科技有限公司 一种评估基因组重复区探针捕获安全性的方法及其装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105316322B (zh) * 2015-09-25 2018-12-07 北京大学 一种sgRNA碱基错配靶位点文库及其应用

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12006520B2 (en) 2011-07-22 2024-06-11 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11999947B2 (en) 2016-08-03 2024-06-04 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
CN110751982B (zh) * 2018-07-04 2023-11-10 广州赛业百沐生物科技有限公司 一种智能并行化敲除策略筛选的方法及系统
CN110751982A (zh) * 2018-07-04 2020-02-04 赛业(广州)生物科技有限公司 一种智能并行化敲除策略筛选的方法及系统
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12031126B2 (en) 2023-12-08 2024-07-09 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Also Published As

Publication number Publication date
CN105886616B (zh) 2020-08-07
CN105886616A (zh) 2016-08-24

Similar Documents

Publication Publication Date Title
WO2017181735A2 (zh) 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法
Kelley Cross-species regulatory sequence activity prediction
Chen et al. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing
He et al. Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants
Alföldi et al. Comparative genomics as a tool to understand evolution and disease
Maze et al. Analytical tools and current challenges in the modern era of neuroepigenomics
US20190065670A1 (en) Predicting disease burden from genome variants
Martin et al. Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture
US20220101944A1 (en) Methods for detecting copy-number variations in next-generation sequencing
Down et al. Large-scale discovery of promoter motifs in Drosophila melanogaster
JP6762932B2 (ja) シーケンシングリードのde novoアセンブリーの方法、システム、およびプロセス
Luo et al. Comprehensive characterization of 10,571 mouse large intergenic noncoding RNAs from whole transcriptome sequencing
Triska et al. Nucleotide patterns aiding in prediction of eukaryotic promoters
US20240029890A1 (en) Computational modeling of loss of function based on allelic frequency
CN110621785A (zh) 基于三代捕获测序对二倍体基因组单倍体分型的方法和装置
Buckley et al. Similar evolutionary trajectories for retrotransposon accumulation in mammals
CN110959178A (zh) 用于靶向基因组编辑的系统和方法
Molinari et al. Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview
WO2018209704A1 (zh) 基于dna测序数据的样本来源检测方法、装置和存储介质
Bharti et al. Design and Analysis of RNA Sequencing Data
Park et al. Benchmark study for evaluating the quality of reference genomes and gene annotations in 114 species
Zhang Error Correction Algorithms for Genomic Sequencing Data
WO2018033733A1 (en) Methods and apparatus for identifying genetic variants
Shinder et al. EASTR: Correcting systematic alignment errors in multi-exon genes
Andrews Deep learning as a tool to better understand transcription factor binding across cell types and species

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17785223

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 17785223

Country of ref document: EP

Kind code of ref document: A2