GB2578709B - Context aware delta algorithm for genomic files - Google Patents

Context aware delta algorithm for genomic files Download PDF

Info

Publication number
GB2578709B
GB2578709B GB2003514.3A GB202003514A GB2578709B GB 2578709 B GB2578709 B GB 2578709B GB 202003514 A GB202003514 A GB 202003514A GB 2578709 B GB2578709 B GB 2578709B
Authority
GB
United Kingdom
Prior art keywords
context aware
delta algorithm
genomic
files
genomic files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
GB2003514.3A
Other languages
English (en)
Other versions
GB2578709A (en
GB202003514D0 (en
Inventor
Maharana Adyasha
Corneliu Constantinescu Mihail
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB202003514D0 publication Critical patent/GB202003514D0/en
Publication of GB2578709A publication Critical patent/GB2578709A/en
Application granted granted Critical
Publication of GB2578709B publication Critical patent/GB2578709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
GB2003514.3A 2017-08-31 2018-08-09 Context aware delta algorithm for genomic files Active GB2578709B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/693,019 US11163726B2 (en) 2017-08-31 2017-08-31 Context aware delta algorithm for genomic files
PCT/IB2018/056009 WO2019043481A1 (en) 2017-08-31 2018-08-09 DELTA ALGORITHM SENSITIVE TO THE CONTEXT FOR GENOMIC FILES

Publications (3)

Publication Number Publication Date
GB202003514D0 GB202003514D0 (en) 2020-04-29
GB2578709A GB2578709A (en) 2020-05-20
GB2578709B true GB2578709B (en) 2020-09-23

Family

ID=65435154

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2003514.3A Active GB2578709B (en) 2017-08-31 2018-08-09 Context aware delta algorithm for genomic files

Country Status (5)

Country Link
US (1) US11163726B2 (https=)
JP (1) JP7157141B2 (https=)
CN (1) CN111095421B (https=)
GB (1) GB2578709B (https=)
WO (1) WO2019043481A1 (https=)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10554220B1 (en) * 2019-01-30 2020-02-04 International Business Machines Corporation Managing compression and storage of genomic data
US11188503B2 (en) * 2020-02-18 2021-11-30 International Business Machines Corporation Record-based matching in data compression
CN113012755B (zh) * 2021-04-12 2023-10-27 聊城大学 基因组atcg的检索方法
US12580047B2 (en) 2022-01-18 2026-03-17 Dell Products L.P. Biological sequence compression using sequence alignment
US12530320B2 (en) 2022-01-18 2026-01-20 Dell Products L.P. File compression using sequence splits and sequence alignment
US12572509B2 (en) 2022-01-18 2026-03-10 Dell Products L.P. Structure based file compression using sequence alignment
US12511260B2 (en) * 2022-01-18 2025-12-30 Dell Products L.P. File compression using sequence alignment
US12353358B2 (en) 2022-01-18 2025-07-08 Dell Products L.P. Adding content to compressed files using sequence alignment
US12339811B2 (en) 2022-04-12 2025-06-24 Dell Products L.P. Compressing multiple dimension files using sequence alignment
US11977517B2 (en) 2022-04-12 2024-05-07 Dell Products L.P. Warm start file compression using sequence alignment
US12216621B2 (en) * 2022-04-12 2025-02-04 Dell Products L.P. Hyperparameter optimization in file compression using sequence alignment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103546160A (zh) * 2013-09-22 2014-01-29 上海交通大学 基于多参考序列的基因序列分级压缩方法
US8972201B2 (en) * 2011-12-24 2015-03-03 Tata Consultancy Services Limited Compression of genomic data file
US20160306919A1 (en) * 2013-12-06 2016-10-20 International Business Machines Corporation Genome compression and decompression

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040086861A1 (en) * 2000-04-19 2004-05-06 Satoshi Omori Method and device for recording sequence information on nucleotides and amino acids
CN1680589A (zh) * 2003-06-06 2005-10-12 李志广 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法
US7657383B2 (en) * 2004-05-28 2010-02-02 International Business Machines Corporation Method, system, and apparatus for compactly storing a subject genome
WO2006052242A1 (en) * 2004-11-08 2006-05-18 Seirad, Inc. Methods and systems for compressing and comparing genomic data
CN101535945A (zh) * 2006-04-25 2009-09-16 英孚威尔公司 全文查询和搜索系统及其使用方法
US20110119240A1 (en) * 2009-11-18 2011-05-19 Dana Shapira Method and system for generating a bidirectional delta file
WO2011106629A2 (en) 2010-02-26 2011-09-01 Life Technologies Corporation Modified proteins and methods of making and using same
WO2012092515A2 (en) 2010-12-30 2012-07-05 Life Technologies Corporation Methods, systems, and computer readable media for nucleic acid sequencing
EP2595076B1 (en) * 2011-11-18 2019-05-15 Tata Consultancy Services Limited Compression of genomic data
US9715574B2 (en) * 2011-12-20 2017-07-25 Michael H. Baym Compressing, storing and searching sequence data
GB2507751A (en) * 2012-11-07 2014-05-14 Ibm Storing data files in a file system which provides reference data files
NL2012222C2 (en) * 2014-02-06 2015-08-10 Genalice B V A method of storing/reconstructing a multitude of sequences in/from a data storage structure.
GB2530012A (en) * 2014-08-05 2016-03-16 Illumina Cambridge Ltd Methods and systems for data analysis and compression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972201B2 (en) * 2011-12-24 2015-03-03 Tata Consultancy Services Limited Compression of genomic data file
CN103546160A (zh) * 2013-09-22 2014-01-29 上海交通大学 基于多参考序列的基因序列分级压缩方法
US20160306919A1 (en) * 2013-12-06 2016-10-20 International Business Machines Corporation Genome compression and decompression

Also Published As

Publication number Publication date
CN111095421A (zh) 2020-05-01
US20190065518A1 (en) 2019-02-28
JP2020533666A (ja) 2020-11-19
CN111095421B (zh) 2024-02-02
JP7157141B2 (ja) 2022-10-19
GB2578709A (en) 2020-05-20
GB202003514D0 (en) 2020-04-29
WO2019043481A1 (en) 2019-03-07
US11163726B2 (en) 2021-11-02

Similar Documents

Publication Publication Date Title
GB2578709B (en) Context aware delta algorithm for genomic files
EP3237017A4 (en) Systems and methods for genome modification and regulation
EP3274813A4 (en) Access files
EP3241301A4 (en) Encrypted file storage
EP3132025A4 (en) Methods and compositions for modifying genomic dna
EP3157527A4 (en) Ezh2 inhibitors for treating lymphoma
EP3157536A4 (en) Methods for treating overweight or obesity
ZA201800782B (en) Compounds and methods for inhibiting jak
EP3117004A4 (en) Genomic insulator elements and uses thereof
EP3120278A4 (en) Methods and systems for genome comparison
EP3107902A4 (en) Compounds and methods for inhibiting fascin
EP3304275A4 (en) PROTECTION OF DATA FILES
PT3183295T (pt) Composições de ciclodextrina alquilada fracionada e processos para preparação e utilização das mesmas
EP3164394A4 (en) Gls1 inhibitors for treating disease
EP3165625A4 (en) Wire material for steel wire, and steel wire
GB201711552D0 (en) Secure file transfer
AP2016009570A0 (en) Triaminopyrimidine compounds useful for preventing or treating malaria
EP3230188A4 (en) Evacuation controller
EP3224155A4 (en) Non-slip cable tie
EP3206347A4 (en) Method for calling routing algorithm, sdn controller, and sdn-oaf
EP3168190A4 (en) Method for purifying chlorosilane
EP3250218A4 (en) Methods for treating obesity
EP3244896A4 (en) Methods for treating pulmonary hypertension
EP3168973A4 (en) Cross regulation circuit for multiple outputs and cross regulation method thereof
EP3263608A4 (en) Method for catalyst removal

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20201123