CN106460041A - 检测表现二元表型的蛋白序列集之间的高变异区 - Google Patents
检测表现二元表型的蛋白序列集之间的高变异区 Download PDFInfo
- Publication number
- CN106460041A CN106460041A CN201580016184.3A CN201580016184A CN106460041A CN 106460041 A CN106460041 A CN 106460041A CN 201580016184 A CN201580016184 A CN 201580016184A CN 106460041 A CN106460041 A CN 106460041A
- Authority
- CN
- China
- Prior art keywords
- motif
- data set
- data
- protein sequence
- phenotype
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 32
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 24
- 238000001514 detection method Methods 0.000 title description 3
- 238000003766 bioinformatics method Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 25
- 235000018102 proteins Nutrition 0.000 claims description 21
- 238000013480 data collection Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 238000011160 research Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 5
- 235000001014 amino acid Nutrition 0.000 claims description 4
- 150000001413 amino acids Chemical class 0.000 claims description 4
- 241000701806 Human papillomavirus Species 0.000 claims description 3
- 238000003641 Yates's correction for continuity Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 239000012141 concentrate Substances 0.000 claims 2
- 208000022361 Human papillomavirus infectious disease Diseases 0.000 description 34
- 206010028980 Neoplasm Diseases 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 231100000590 oncogenic Toxicity 0.000 description 3
- 230000002246 oncogenic effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 208000005623 Carcinogenesis Diseases 0.000 description 2
- 101150000092 E5 gene Proteins 0.000 description 2
- 241000341655 Human papillomavirus type 16 Species 0.000 description 2
- 208000009608 Papillomavirus Infections Diseases 0.000 description 2
- 208000019802 Sexually transmitted disease Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000036952 cancer formation Effects 0.000 description 2
- 231100000504 carcinogenesis Toxicity 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 201000009030 Carcinoma Diseases 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 101150066038 E4 gene Proteins 0.000 description 1
- 244000187656 Eucalyptus cornuta Species 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000027455 binding Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000000711 cancerogenic effect Effects 0.000 description 1
- 231100000315 carcinogenic Toxicity 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009826 neoplastic cell growth Effects 0.000 description 1
- 230000001613 neoplastic effect Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000000505 pernicious effect Effects 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 239000004577 thatch Substances 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 230000004572 zinc-binding Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioethics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了用于鉴定被分至不同表型数据集的序列集之间的蛋白序列差异的基于计算机的生物信息学方法,所述方法涉及查询数据库以鉴定蛋白序列的第一表型数据集和另一表型数据集内的共同序列基序,计算每个数据集的基序之间的两两相关性,并计算数据集之间的变化以鉴定在给定数据集中保守的一个或多个基序并因此与该数据集的表型相关联(图1)。
Description
相关申请的交叉参考
本申请要求2014年3月25日提交的美国临时专利申请号61/970,287的优先权。
技术领域
本发明总体涉及计算鉴定表现二元表型的两个蛋白序列集之间的高变异区的方法和材料,所述二元表型例如来自早期基因蛋白的高风险和低风险人乳头瘤病毒基序。
背景技术
生物信息学领域中的一个持续不断追求是开发用于检测相似蛋白序列但具有不同表型的两个数据集之间的具有高变异性的序列位点的框架。
例如,人乳头瘤病毒(HVP),具有超过100个基因型,是非常复杂的一组人病原性病毒并且还具有相对相似的蛋白序列。致癌类型的HVP可在辅因子存在下诱导恶性转化。的确,超过99%的所有宫颈癌和大部分的生殖器癌是致癌性HPV类型的结果。这些HPV类型已经越来越与涉及皮肤、喉和食道的其它上皮癌联系起来。
由于不能在动物模型中有效地产生成熟HPV病毒粒子,调查HPV瘤形成的研究是复杂的。因此,对于完全阐明HPV感染细胞中的致癌潜能具有持续不断的限制。更通常地,区分相似蛋白序列的不同表型的能力将是非常有用的。
发明内容
本公开涉及鉴定二元表型数据集中序列差异的新方法。例如,通过检查HPV早期基因的蛋白序列内的保守区域并寻找它们在已知低风险类型中的存在,可将所述方法应用于检测高风险HPV中的潜在治疗靶点。
因此,在一个实施方式中,计算机执行的生物信息学方法鉴定被分至不同表型数据集的序列集之间的蛋白序列差异。所述方法通过以下进行:查询数据库以鉴定蛋白序列的第一表型数据集和另一表型数据集内的共同序列基序;计算每个数据集的基序之间的两两相关性;以及计算所述数据集之间的变化以鉴定在给定数据集中保守的一个或多个基序并因此与数据集的表型相关联。
除非另有说明,在此所用的所有的技术和科学术语具有与本公开所属领域技术人员通常理解的相同的含义。材料、方法和实施例仅仅是说明性的并非旨在限制。在此提及的所有公开、专利申请、专利、序列、数据库条目和其它参考文献以其整体通过参考并入。在冲突的情况下,将以本说明书,包括定义为准。
本发明的其它特征和优势将通过以下详细描述和附图以及通过权利要求书而显而易见。
附图说明
图1.用于鉴定与高风险HPV相关的基序的策略。在13个高风险参考序列的训练集上利用MEME鉴定高风险基序。然后利用MAST将这些基序应用于12个低风险参考序列集上,确定两个集中每个基序的产生的频率。
另外,利用MAST和BLAST在NCBI蛋白数据库中的病毒序列、人ORF、以及这两种指定风险类别之外的HPV类型中搜索这些基序。
图2.HPV蛋白图谱。在每一个它们各自基因中,每个显著位置的位置被高亮。另外,还确定了这些HPV早期基因中的已知保守基序的位置,所述保守基序在此分析中被检测但由于对致癌性显著而没有被过滤。这包括E6和E7的锌结合位点、E7的pRB结合位点和E5的第一结构域中的双亮氨酸基序。
图3以列表形式显示统计学上显著的基序、它们在每个数据集中的频率,以及在基因中的位置和推定作用。进行带有耶茨校正的卡方检验从112个通过MEME确定的基序中产生10个统计学上显著的基序。然后,在风险未分类的其它HPV分离物的数据集中分别查询这些基序,这些基序的频率也显示在表格中。HPV16中每个基序的氨基酸范围与相对推定功能也在最后两列中指出。
发明详述
在此研究中利用的计算方法考虑到检测相似蛋白序列但具有不同表型的两数据集之间的具有高变异性的序列位点。在一个实施方式中,这些方法应用于HPV的研究。
先前研究的序列比较技术检查了在一个集中的序列的种系发生,但限于揭示序列或数据集之间的变化。例如,在HPV的情况下,之前的比较基因组学研究将集中于一个或两个基因(主要是已知的致癌基因E6和E7)或每次调查少许HPV类型,通常是HPV16、HPV18和HPV45。
本文中利用的生物信息学方法论提供了一种系统、全面和非监督式的方法,该方法用于确定有助于致癌作用的HPV蛋白质组中的区域。统计学上显著的基序表明在它们各自的蛋白质组区域中HR(高风险)和LR(低风险)类型之间的变化。这些区域于是可被看作是潜在有助于致癌作用的位点,并可根据蛋白区域的推定功能评估这些区域。此方法还可推广用于鉴定两个不同数据集之间的变化。
利用本文中的方法具有被用作HPV治疗靶点的发现工具的潜能。这用作设计靶向显著区域以防止恶性转化的药物的先驱步骤。而且,这些进程是全面且无偏分析,这些进程可译为调查HPV之外的其它病毒或不同类别的蛋白。
实施方式将在以下实施例中被进一步描述,这没有限制权利要求书中描述的发明的范围。
实施例
在所述方法的一个实施方式中,计算序列分析工具,如MEME和MAST(meme.sdsc.edu/meme/intro.html),以及统计分析被用于确定对HPV致癌性显著的序列基序。MEME鉴定在相似核苷酸或蛋白序列的数据集中保守的短序列特征、基序。MAST是利用MEME输出以在用户限定数据库或公共知识来源中搜索这些基序的比对搜索工具。与这些技术一起,使用耶茨连续性校对的卡方检验被用于查找存在于两数据集中的显著基序。
回到图1,从NCBI参考序列数据库(www.ncbi.nlm.nih.gov/RefSeq/)中检索基因E1、E2、E4、E5、E6、E7、L1和L2的13个高风险和12个低风险类型的HPV蛋白参考序列。高风险数据集包括类型HPV16、18、31、33、35、39、45、51、52、56、58、59和68,而低风险组为类型HPV6、11、40、42、43、44、53、54、61、72、73和81。HPV51参考序列缺乏基因注释,并且HPV35的参考序列对于E2具有错误的蛋白输出。用来自UniProtKB/Swiss-Prot的全基因组条目P26554和P27220替代这两个参考序列。
另外,由于在大多数参考序列条目中的E4和E5基因的限制注释,由于NIAID HPV数据库PaVe(pave.niaid.nih.gov)包括选定参考序列的修订和重注释的提交,从NIAID HPV数据库PaVe中检索它们各自的蛋白序列。结果,在PaVe中,13个高风险类型中只有12个和12个低风险类型中只有9个具有指定的E5基因。
为鉴定HR HPV蛋白质组内的共同序列基序,采用MEME(用于基序引出的多重Em(Multiple Em for Motif Elicitation))程序组(meme.sdsc.edu/memecgibin/rneme.cgi)。对于每个基因,利用MEME评估13个HR HPV类型,指定最小6个氨基酸和最大10个的基序宽度。使得基序能够重复,并基于基因的大小调整基序的最大数量。这确保没有两个引出的基序具有超过0.60的两两相关性。通过从MEME结果生成的MAST(基序比对搜索工具(Motif Alignment Search Tool))结果计算该相关性。为确定LR HPV类型中这些基序的频率,利用在LR HPV类型中鉴定的基序,在所述12个LR HPV类型上进行单独的MAST搜索。确定每个病毒蛋白质组中的基序频率。
为量化两个集(HR HPV和LR HPV)之间的变化,评估在12个LR HPV类型中单个高风险基序出现的频率。这里假设相比于LR HPV序列,在HR HPV序列中优选保守的基序将具有致癌潜能。首先,鉴定每个类型中基序的存在,不考虑重复出现。合计对于每个基序具有至少一次存在的HPV类型的数量。为选择特定HR HPV基序,对两个数据集之间的每个基序的频率进行带有耶茨连续性校正的卡方检验。采用此保守校正以避免过高估计统计显著性。
根据零假设确立显著性检验,使得给定基序的频率在高风险数据集中与在低风险数据集中相同。因此如果高风险数据集中给定基序的频率超过低风险数据集中的,则否定所述假设(H1)。利用一个自由度(对于二元数据集),计算每个基序的p-值(=0.05),然后将所述p-值用于分级所述基序。
上述方法用作计算鉴定表现二元表型的两个蛋白序列集之间的更高变异性区域的方法学,尽管评估超过两个的额外集是可能的。这特别应用于确定可能是瘤形成原因的高风险HPV中的序列因子。这些位点可以潜在地是用于防止作为高风险HPV感染结果的恶性肿瘤的治疗剂的靶点。此进程可外推至评估病毒之间的表型差异,以及调查相似蛋白的特定性质。
在以上实施例中,可以使用包括用于指定所列功能性的计算机程序的非暂时计算机可读存储介质。
应当理解虽然已经结合本发明的详细说明描述了本发明,前述说明旨在阐明而非限制本发明的范围,本发明的范围由所附权利要求书的范围限定。其它方面、优势和修改在权利要求书的范围之内。
Claims (7)
1.用于鉴定被分至不同表型数据集的序列集之间蛋白序列差异的计算机执行的生物信息学方法,所述方法包括:
查询数据库以鉴定蛋白序列的第一表型数据集和另一表型数据集内的共同序列基序;
计算每个数据集的基序之间的两两相关性;以及
计算所述数据集之间的变化以鉴定在给定数据集中保守的一个或多个基序并因此与该数据集的表型相关联。
2.权利要求1所述的方法,其中所述数据库包括用于基序引出的多重EM程序组。
3.权利要求1所述的方法,其中指定最小六个氨基酸和最大十个氨基酸的基序宽度。
4.权利要求1所述的方法,其中通过基序比对搜索工具计算所述两两相关性。
5.权利要求1所述的方法,其中通过带有耶茨连续性校正的卡方检验计算两个数据集之间的每个基序的频率的变化。
6.权利要求1所述的方法,其中致癌性是所述表型数据集之一。
7.用于鉴定被分至不同表型数据集的人乳头瘤病毒序列集之间的蛋白序列差异的计算机执行的生物信息学方法,所述方法包括:
查询数据库以鉴定蛋白序列的第一表型数据集和另一表型数据集内的共同序列基序;
计算每个数据集的基序之间的两两相关性;以及
计算所述数据集之间的变化以鉴定在给定数据集中保守的一个或多个基序并因此与该数据集的表型相关联。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201461970287P | 2014-03-25 | 2014-03-25 | |
US61/970,287 | 2014-03-25 | ||
PCT/US2015/021262 WO2015148216A1 (en) | 2014-03-25 | 2015-03-18 | Detection of high variability regions between protein sequence sets representing a binary phenotype |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106460041A true CN106460041A (zh) | 2017-02-22 |
Family
ID=54196238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580016184.3A Pending CN106460041A (zh) | 2014-03-25 | 2015-03-18 | 检测表现二元表型的蛋白序列集之间的高变异区 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20170177788A1 (zh) |
EP (1) | EP3122904A4 (zh) |
JP (1) | JP2017514213A (zh) |
CN (1) | CN106460041A (zh) |
CA (1) | CA2942923A1 (zh) |
WO (1) | WO2015148216A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018013579A1 (en) | 2016-07-11 | 2018-01-18 | cARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITY | Sweat as a biofluid for analysis and disease identification |
US11208640B2 (en) | 2017-07-21 | 2021-12-28 | Arizona Board Of Regents On Behalf Of Arizona State University | Modulating human Cas9-specific host immune response |
US11524063B2 (en) | 2017-11-15 | 2022-12-13 | Arizona Board Of Regents On Behalf Of Arizona State University | Materials and methods relating to immunogenic epitopes from human papillomavirus |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102485904A (zh) * | 2010-12-03 | 2012-06-06 | 浙江中医药大学附属第一医院 | 一种哺乳动物微小核糖核酸基因预测的方法 |
-
2015
- 2015-03-18 CN CN201580016184.3A patent/CN106460041A/zh active Pending
- 2015-03-18 JP JP2016558213A patent/JP2017514213A/ja active Pending
- 2015-03-18 CA CA2942923A patent/CA2942923A1/en not_active Abandoned
- 2015-03-18 WO PCT/US2015/021262 patent/WO2015148216A1/en active Application Filing
- 2015-03-18 EP EP15768463.0A patent/EP3122904A4/en not_active Withdrawn
- 2015-03-18 US US15/128,405 patent/US20170177788A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102485904A (zh) * | 2010-12-03 | 2012-06-06 | 浙江中医药大学附属第一医院 | 一种哺乳动物微小核糖核酸基因预测的方法 |
Non-Patent Citations (3)
Title |
---|
PAUL K.S. CHAN等: "Geographical distribution and oncogenic risk association of human papillomavirus type 58 E6 and E7 sequence variations", 《INT. J. CANCER》 * |
TIMOTHY L. BAILEY等: "MEME: discovering and analyzing DNA and protein sequence motifs", 《NUCLEIC ACIDS RESEARCH》 * |
WILLIAM DAMPIER等: "Host sequence motifs shared by HIV predict response to antiretroviral therapy", 《BMC MEDICAL GENOMICS》 * |
Also Published As
Publication number | Publication date |
---|---|
EP3122904A1 (en) | 2017-02-01 |
CA2942923A1 (en) | 2015-10-01 |
US20170177788A1 (en) | 2017-06-22 |
WO2015148216A1 (en) | 2015-10-01 |
EP3122904A4 (en) | 2017-11-22 |
JP2017514213A (ja) | 2017-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cantalupo et al. | Viral sequences in human cancer | |
Tang et al. | The landscape of viral expression and host gene fusion and adaptation in human cancer | |
US20230366046A1 (en) | Systems and methods for analyzing viral nucleic acids | |
Strong et al. | Comprehensive high-throughput RNA sequencing analysis reveals contamination of multiple nasopharyngeal carcinoma cell lines with HeLa cell genomes | |
Zhang et al. | Identifying transcriptomic signatures and rules for SARS-CoV-2 infection | |
Cao et al. | Divergent viral presentation among human tumors and adjacent normal tissues | |
Kwok et al. | Genomic sequencing and comparative analysis of Epstein-Barr virus genome isolated from primary nasopharyngeal carcinoma biopsy | |
Holmes et al. | Mechanistic signatures of HPV insertions in cervical carcinomas | |
EP2668292B1 (en) | Detection of infection by a microorganism using small rna sequencing subtraction and assembly | |
Chen et al. | Evolution and classification of oncogenic human papillomavirus types and variants associated with cervical cancer | |
US20230197269A1 (en) | Systems and methods for detecting viral dna from sequencing | |
Tanchotsrinon et al. | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition | |
Niu et al. | Characterizing viral circRNAs and their application in identifying circRNAs in viruses | |
CN106460041A (zh) | 检测表现二元表型的蛋白序列集之间的高变异区 | |
Kazemian et al. | Possible human papillomavirus 38 contamination of endometrial cancer RNA sequencing samples in the cancer genome atlas database | |
Sengupta et al. | Similarity studies of corona viruses through chaos game representation | |
Shen-Gunther et al. | Abundance of HPV L1 intra-genotype variants with capsid epitopic modifications found within low-and high-grade Pap smears with potential implications for vaccinology | |
Gulyaeva et al. | LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins | |
Xu et al. | DeepVISP: deep learning for virus site integration prediction and motif discovery | |
Bonfert et al. | Mining RNA–seq data for infections and contaminations | |
Cornejo Castro et al. | Dual infection and recombination of Kaposi sarcoma herpesvirus revealed by whole-genome sequence analysis of effusion samples | |
Shen-Gunther et al. | HPV integration site mapping: A rapid method of viral integration site (VIS) analysis and visualization using automated workflows in CLC microbial genomics | |
AVS et al. | Virus-host interaction analysis in colorectal cancer identifies core virus network signature and small molecules | |
Kolář et al. | From protein interactions to functional annotation: graph alignment in Herpes | |
Lu et al. | Deriving topology and sequence alignment for the helix skeleton in low-resolution protein density maps |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |