CN111095421B - 基因文件的上下文感知增量算法 - Google Patents

基因文件的上下文感知增量算法 Download PDF

Info

Publication number
CN111095421B
CN111095421B CN201880054764.5A CN201880054764A CN111095421B CN 111095421 B CN111095421 B CN 111095421B CN 201880054764 A CN201880054764 A CN 201880054764A CN 111095421 B CN111095421 B CN 111095421B
Authority
CN
China
Prior art keywords
file
genome
source
delta compression
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880054764.5A
Other languages
English (en)
Chinese (zh)
Other versions
CN111095421A (zh
Inventor
A·马哈拉纳
M·C·康斯坦丁内斯库
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN111095421A publication Critical patent/CN111095421A/zh
Application granted granted Critical
Publication of CN111095421B publication Critical patent/CN111095421B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN201880054764.5A 2017-08-31 2018-08-09 基因文件的上下文感知增量算法 Active CN111095421B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/693,019 2017-08-31
US15/693,019 US11163726B2 (en) 2017-08-31 2017-08-31 Context aware delta algorithm for genomic files
PCT/IB2018/056009 WO2019043481A1 (en) 2017-08-31 2018-08-09 DELTA ALGORITHM SENSITIVE TO THE CONTEXT FOR GENOMIC FILES

Publications (2)

Publication Number Publication Date
CN111095421A CN111095421A (zh) 2020-05-01
CN111095421B true CN111095421B (zh) 2024-02-02

Family

ID=65435154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880054764.5A Active CN111095421B (zh) 2017-08-31 2018-08-09 基因文件的上下文感知增量算法

Country Status (5)

Country Link
US (1) US11163726B2 (https=)
JP (1) JP7157141B2 (https=)
CN (1) CN111095421B (https=)
GB (1) GB2578709B (https=)
WO (1) WO2019043481A1 (https=)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10554220B1 (en) * 2019-01-30 2020-02-04 International Business Machines Corporation Managing compression and storage of genomic data
US11188503B2 (en) * 2020-02-18 2021-11-30 International Business Machines Corporation Record-based matching in data compression
CN113012755B (zh) * 2021-04-12 2023-10-27 聊城大学 基因组atcg的检索方法
US12580047B2 (en) 2022-01-18 2026-03-17 Dell Products L.P. Biological sequence compression using sequence alignment
US12530320B2 (en) 2022-01-18 2026-01-20 Dell Products L.P. File compression using sequence splits and sequence alignment
US12572509B2 (en) 2022-01-18 2026-03-10 Dell Products L.P. Structure based file compression using sequence alignment
US12511260B2 (en) * 2022-01-18 2025-12-30 Dell Products L.P. File compression using sequence alignment
US12353358B2 (en) 2022-01-18 2025-07-08 Dell Products L.P. Adding content to compressed files using sequence alignment
US12339811B2 (en) 2022-04-12 2025-06-24 Dell Products L.P. Compressing multiple dimension files using sequence alignment
US11977517B2 (en) 2022-04-12 2024-05-07 Dell Products L.P. Warm start file compression using sequence alignment
US12216621B2 (en) * 2022-04-12 2025-02-04 Dell Products L.P. Hyperparameter optimization in file compression using sequence alignment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1680589A (zh) * 2003-06-06 2005-10-12 李志广 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法
CN101535945A (zh) * 2006-04-25 2009-09-16 英孚威尔公司 全文查询和搜索系统及其使用方法
CN106687966A (zh) * 2014-08-05 2017-05-17 伊卢米纳剑桥有限公司 用于数据分析和压缩的方法和系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040086861A1 (en) * 2000-04-19 2004-05-06 Satoshi Omori Method and device for recording sequence information on nucleotides and amino acids
US7657383B2 (en) * 2004-05-28 2010-02-02 International Business Machines Corporation Method, system, and apparatus for compactly storing a subject genome
WO2006052242A1 (en) * 2004-11-08 2006-05-18 Seirad, Inc. Methods and systems for compressing and comparing genomic data
US20110119240A1 (en) * 2009-11-18 2011-05-19 Dana Shapira Method and system for generating a bidirectional delta file
WO2011106629A2 (en) 2010-02-26 2011-09-01 Life Technologies Corporation Modified proteins and methods of making and using same
WO2012092515A2 (en) 2010-12-30 2012-07-05 Life Technologies Corporation Methods, systems, and computer readable media for nucleic acid sequencing
EP2595076B1 (en) * 2011-11-18 2019-05-15 Tata Consultancy Services Limited Compression of genomic data
US9715574B2 (en) * 2011-12-20 2017-07-25 Michael H. Baym Compressing, storing and searching sequence data
EP2608096B1 (en) 2011-12-24 2020-08-05 Tata Consultancy Services Ltd. Compression of genomic data file
GB2507751A (en) * 2012-11-07 2014-05-14 Ibm Storing data files in a file system which provides reference data files
CN103546160B (zh) 2013-09-22 2016-07-06 上海交通大学 基于多参考序列的基因序列分级压缩方法
CN104699998A (zh) 2013-12-06 2015-06-10 国际商业机器公司 用于对基因组进行压缩和解压缩的方法和装置
NL2012222C2 (en) * 2014-02-06 2015-08-10 Genalice B V A method of storing/reconstructing a multitude of sequences in/from a data storage structure.

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1680589A (zh) * 2003-06-06 2005-10-12 李志广 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法
CN101535945A (zh) * 2006-04-25 2009-09-16 英孚威尔公司 全文查询和搜索系统及其使用方法
CN106687966A (zh) * 2014-08-05 2017-05-17 伊卢米纳剑桥有限公司 用于数据分析和压缩的方法和系统

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
四种常用的生物序列比对软件比较;陈凤珍;李玲;操利超;严志祥;;生物信息学(第01期);全文 *
四膜虫功能基因组数据库增量更新2016:生活史和减数分裂转录组及磷酸化蛋白组资源建设;杨文涛;张晶;闫冠雄;田苗;袁冬霞;缪炜;曾宏辉;熊杰;;基因组学与应用生物学(第06期);全文 *
多样性增量结合支持向量机方法预测酵母核小体定位;赵秀娟;裴智勇;刘佳;蔡禄;;生物物理学报(第05期);全文 *
语义异构生物数据源中的数据集成与更新;杨森;夏燕;曹顺良;邓绪斌;朱扬勇;;计算机工程(第08期);全文 *

Also Published As

Publication number Publication date
CN111095421A (zh) 2020-05-01
US20190065518A1 (en) 2019-02-28
JP2020533666A (ja) 2020-11-19
GB2578709B (en) 2020-09-23
JP7157141B2 (ja) 2022-10-19
GB2578709A (en) 2020-05-20
GB202003514D0 (en) 2020-04-29
WO2019043481A1 (en) 2019-03-07
US11163726B2 (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN111095421B (zh) 基因文件的上下文感知增量算法
US11550826B2 (en) Method and system for generating a geocode trie and facilitating reverse geocode lookups
US9842132B2 (en) Bloom filter index for device discovery
US20160321254A1 (en) Unsolicited bulk email detection using url tree hashes
US10216802B2 (en) Presenting answers from concept-based representation of a topic oriented pipeline
US10380257B2 (en) Generating answers from concept-based representation of a topic oriented pipeline
US10031764B2 (en) Managing executable files
KR20210118099A (ko) 벡터 스트링 검색 명령
US10970249B2 (en) Format aware file system with file-to-object decomposition
US20230401457A1 (en) Data facet generation and recommendation
US20190288967A1 (en) Lossy text source coding by word length
US10318262B2 (en) Smart hashing to reduce server memory usage in a distributed system
US11290532B2 (en) Tape reconstruction from object storage
US10162934B2 (en) Data de-duplication system using genome formats conversion
US11157477B2 (en) Handling queries in document systems using segment differential based document text-index modelling
US20170220584A1 (en) Identifying Linguistically Related Content for Corpus Expansion Management
CN111625615A (zh) 文字提取与处理
EP4577925A1 (en) Real-time resolution in identity graph data structures
CN112988778A (zh) 一种处理数据库查询脚本的方法和装置
US10404274B2 (en) Space compression for file size reduction
US11681865B2 (en) Annotating a log based on log documentation
US11188503B2 (en) Record-based matching in data compression
US11177824B2 (en) Dictionary embedded expansion procedure
US20230102594A1 (en) Code page tracking and use for indexing and searching
US20210082581A1 (en) Determining novelty of a clinical trial against an existing trial corpus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant