JP7157141B2 - ゲノム・ファイルのためのコンテキスト・アウェア差分アルゴリズム - Google Patents

ゲノム・ファイルのためのコンテキスト・アウェア差分アルゴリズム Download PDF

Info

Publication number
JP7157141B2
JP7157141B2 JP2020509515A JP2020509515A JP7157141B2 JP 7157141 B2 JP7157141 B2 JP 7157141B2 JP 2020509515 A JP2020509515 A JP 2020509515A JP 2020509515 A JP2020509515 A JP 2020509515A JP 7157141 B2 JP7157141 B2 JP 7157141B2
Authority
JP
Japan
Prior art keywords
file
genome
files
source
based differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2020509515A
Other languages
English (en)
Japanese (ja)
Other versions
JP2020533666A (ja
JP2020533666A5 (https=
Inventor
マハラナ、アジャシャ
コンスタンチネスキュ、ミハイル、コルネリウ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of JP2020533666A publication Critical patent/JP2020533666A/ja
Publication of JP2020533666A5 publication Critical patent/JP2020533666A5/ja
Application granted granted Critical
Publication of JP7157141B2 publication Critical patent/JP7157141B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
JP2020509515A 2017-08-31 2018-08-09 ゲノム・ファイルのためのコンテキスト・アウェア差分アルゴリズム Active JP7157141B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/693,019 2017-08-31
US15/693,019 US11163726B2 (en) 2017-08-31 2017-08-31 Context aware delta algorithm for genomic files
PCT/IB2018/056009 WO2019043481A1 (en) 2017-08-31 2018-08-09 DELTA ALGORITHM SENSITIVE TO THE CONTEXT FOR GENOMIC FILES

Publications (3)

Publication Number Publication Date
JP2020533666A JP2020533666A (ja) 2020-11-19
JP2020533666A5 JP2020533666A5 (https=) 2021-02-12
JP7157141B2 true JP7157141B2 (ja) 2022-10-19

Family

ID=65435154

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2020509515A Active JP7157141B2 (ja) 2017-08-31 2018-08-09 ゲノム・ファイルのためのコンテキスト・アウェア差分アルゴリズム

Country Status (5)

Country Link
US (1) US11163726B2 (https=)
JP (1) JP7157141B2 (https=)
CN (1) CN111095421B (https=)
GB (1) GB2578709B (https=)
WO (1) WO2019043481A1 (https=)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10554220B1 (en) * 2019-01-30 2020-02-04 International Business Machines Corporation Managing compression and storage of genomic data
US11188503B2 (en) * 2020-02-18 2021-11-30 International Business Machines Corporation Record-based matching in data compression
CN113012755B (zh) * 2021-04-12 2023-10-27 聊城大学 基因组atcg的检索方法
US12580047B2 (en) 2022-01-18 2026-03-17 Dell Products L.P. Biological sequence compression using sequence alignment
US12530320B2 (en) 2022-01-18 2026-01-20 Dell Products L.P. File compression using sequence splits and sequence alignment
US12572509B2 (en) 2022-01-18 2026-03-10 Dell Products L.P. Structure based file compression using sequence alignment
US12511260B2 (en) * 2022-01-18 2025-12-30 Dell Products L.P. File compression using sequence alignment
US12353358B2 (en) 2022-01-18 2025-07-08 Dell Products L.P. Adding content to compressed files using sequence alignment
US12339811B2 (en) 2022-04-12 2025-06-24 Dell Products L.P. Compressing multiple dimension files using sequence alignment
US11977517B2 (en) 2022-04-12 2024-05-07 Dell Products L.P. Warm start file compression using sequence alignment
US12216621B2 (en) * 2022-04-12 2025-02-04 Dell Products L.P. Hyperparameter optimization in file compression using sequence alignment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077607A1 (en) 2004-11-08 2008-03-27 Seirad Inc. Methods and Systems for Compressing and Comparing Genomic Data
US20110119240A1 (en) 2009-11-18 2011-05-19 Dana Shapira Method and system for generating a bidirectional delta file
US20130132353A1 (en) 2011-11-18 2013-05-23 Tata Consultancy Services Limited Compression Of Genomic Data

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040086861A1 (en) * 2000-04-19 2004-05-06 Satoshi Omori Method and device for recording sequence information on nucleotides and amino acids
CN1680589A (zh) * 2003-06-06 2005-10-12 李志广 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法
US7657383B2 (en) * 2004-05-28 2010-02-02 International Business Machines Corporation Method, system, and apparatus for compactly storing a subject genome
CN101535945A (zh) * 2006-04-25 2009-09-16 英孚威尔公司 全文查询和搜索系统及其使用方法
WO2011106629A2 (en) 2010-02-26 2011-09-01 Life Technologies Corporation Modified proteins and methods of making and using same
WO2012092515A2 (en) 2010-12-30 2012-07-05 Life Technologies Corporation Methods, systems, and computer readable media for nucleic acid sequencing
US9715574B2 (en) * 2011-12-20 2017-07-25 Michael H. Baym Compressing, storing and searching sequence data
EP2608096B1 (en) 2011-12-24 2020-08-05 Tata Consultancy Services Ltd. Compression of genomic data file
GB2507751A (en) * 2012-11-07 2014-05-14 Ibm Storing data files in a file system which provides reference data files
CN103546160B (zh) 2013-09-22 2016-07-06 上海交通大学 基于多参考序列的基因序列分级压缩方法
CN104699998A (zh) 2013-12-06 2015-06-10 国际商业机器公司 用于对基因组进行压缩和解压缩的方法和装置
NL2012222C2 (en) * 2014-02-06 2015-08-10 Genalice B V A method of storing/reconstructing a multitude of sequences in/from a data storage structure.
GB2530012A (en) * 2014-08-05 2016-03-16 Illumina Cambridge Ltd Methods and systems for data analysis and compression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077607A1 (en) 2004-11-08 2008-03-27 Seirad Inc. Methods and Systems for Compressing and Comparing Genomic Data
US20110119240A1 (en) 2009-11-18 2011-05-19 Dana Shapira Method and system for generating a bidirectional delta file
US20130132353A1 (en) 2011-11-18 2013-05-23 Tata Consultancy Services Limited Compression Of Genomic Data

Also Published As

Publication number Publication date
CN111095421A (zh) 2020-05-01
US20190065518A1 (en) 2019-02-28
JP2020533666A (ja) 2020-11-19
GB2578709B (en) 2020-09-23
CN111095421B (zh) 2024-02-02
GB2578709A (en) 2020-05-20
GB202003514D0 (en) 2020-04-29
WO2019043481A1 (en) 2019-03-07
US11163726B2 (en) 2021-11-02

Similar Documents

Publication Publication Date Title
JP7157141B2 (ja) ゲノム・ファイルのためのコンテキスト・アウェア差分アルゴリズム
US10540383B2 (en) Automatic ontology generation
US20200183986A1 (en) Method and system for document similarity analysis
US10216802B2 (en) Presenting answers from concept-based representation of a topic oriented pipeline
US20160321254A1 (en) Unsolicited bulk email detection using url tree hashes
US9876853B2 (en) Storlet workflow optimization leveraging clustered file system placement optimization features
US10380257B2 (en) Generating answers from concept-based representation of a topic oriented pipeline
US20230315883A1 (en) Method to privately determine data intersection
US10031764B2 (en) Managing executable files
US10884986B2 (en) Analyzing and correcting corruption which caused filesystem checker failure so that the filesystem checker will run without error
US20230401457A1 (en) Data facet generation and recommendation
US10970249B2 (en) Format aware file system with file-to-object decomposition
US11290532B2 (en) Tape reconstruction from object storage
US20170220584A1 (en) Identifying Linguistically Related Content for Corpus Expansion Management
US11429579B2 (en) Building a word embedding model to capture relational data semantics
US10162934B2 (en) Data de-duplication system using genome formats conversion
US11157477B2 (en) Handling queries in document systems using segment differential based document text-index modelling
US12019645B2 (en) Record management in time series database
US10404274B2 (en) Space compression for file size reduction
US11177824B2 (en) Dictionary embedded expansion procedure
CN114461845A (zh) 信息查询方法、装置、介质和计算设备
US20190258705A1 (en) Applying Matching Data Transformation Information Based on a User's Editing of Data within a Document
US11797561B2 (en) Reducing character set conversion
US20230102594A1 (en) Code page tracking and use for indexing and searching
US7386570B2 (en) Method, system and program product for providing high performance data lookup

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20201228

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20210122

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20220118

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220324

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20220426

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20220502

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220614

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20220927

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20221006

R150 Certificate of patent or registration of utility model

Ref document number: 7157141

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150