CN111095421B - 基因文件的上下文感知增量算法 - Google Patents
基因文件的上下文感知增量算法 Download PDFInfo
- Publication number
- CN111095421B CN111095421B CN201880054764.5A CN201880054764A CN111095421B CN 111095421 B CN111095421 B CN 111095421B CN 201880054764 A CN201880054764 A CN 201880054764A CN 111095421 B CN111095421 B CN 111095421B
- Authority
- CN
- China
- Prior art keywords
- file
- genome
- source
- delta compression
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/693,019 | 2017-08-31 | ||
| US15/693,019 US11163726B2 (en) | 2017-08-31 | 2017-08-31 | Context aware delta algorithm for genomic files |
| PCT/IB2018/056009 WO2019043481A1 (en) | 2017-08-31 | 2018-08-09 | DELTA ALGORITHM SENSITIVE TO THE CONTEXT FOR GENOMIC FILES |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111095421A CN111095421A (zh) | 2020-05-01 |
| CN111095421B true CN111095421B (zh) | 2024-02-02 |
Family
ID=65435154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201880054764.5A Active CN111095421B (zh) | 2017-08-31 | 2018-08-09 | 基因文件的上下文感知增量算法 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11163726B2 (https=) |
| JP (1) | JP7157141B2 (https=) |
| CN (1) | CN111095421B (https=) |
| GB (1) | GB2578709B (https=) |
| WO (1) | WO2019043481A1 (https=) |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10554220B1 (en) * | 2019-01-30 | 2020-02-04 | International Business Machines Corporation | Managing compression and storage of genomic data |
| US11188503B2 (en) * | 2020-02-18 | 2021-11-30 | International Business Machines Corporation | Record-based matching in data compression |
| CN113012755B (zh) * | 2021-04-12 | 2023-10-27 | 聊城大学 | 基因组atcg的检索方法 |
| US12580047B2 (en) | 2022-01-18 | 2026-03-17 | Dell Products L.P. | Biological sequence compression using sequence alignment |
| US12530320B2 (en) | 2022-01-18 | 2026-01-20 | Dell Products L.P. | File compression using sequence splits and sequence alignment |
| US12572509B2 (en) | 2022-01-18 | 2026-03-10 | Dell Products L.P. | Structure based file compression using sequence alignment |
| US12511260B2 (en) * | 2022-01-18 | 2025-12-30 | Dell Products L.P. | File compression using sequence alignment |
| US12353358B2 (en) | 2022-01-18 | 2025-07-08 | Dell Products L.P. | Adding content to compressed files using sequence alignment |
| US12339811B2 (en) | 2022-04-12 | 2025-06-24 | Dell Products L.P. | Compressing multiple dimension files using sequence alignment |
| US11977517B2 (en) | 2022-04-12 | 2024-05-07 | Dell Products L.P. | Warm start file compression using sequence alignment |
| US12216621B2 (en) * | 2022-04-12 | 2025-02-04 | Dell Products L.P. | Hyperparameter optimization in file compression using sequence alignment |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1680589A (zh) * | 2003-06-06 | 2005-10-12 | 李志广 | 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法 |
| CN101535945A (zh) * | 2006-04-25 | 2009-09-16 | 英孚威尔公司 | 全文查询和搜索系统及其使用方法 |
| CN106687966A (zh) * | 2014-08-05 | 2017-05-17 | 伊卢米纳剑桥有限公司 | 用于数据分析和压缩的方法和系统 |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040086861A1 (en) * | 2000-04-19 | 2004-05-06 | Satoshi Omori | Method and device for recording sequence information on nucleotides and amino acids |
| US7657383B2 (en) * | 2004-05-28 | 2010-02-02 | International Business Machines Corporation | Method, system, and apparatus for compactly storing a subject genome |
| WO2006052242A1 (en) * | 2004-11-08 | 2006-05-18 | Seirad, Inc. | Methods and systems for compressing and comparing genomic data |
| US20110119240A1 (en) * | 2009-11-18 | 2011-05-19 | Dana Shapira | Method and system for generating a bidirectional delta file |
| WO2011106629A2 (en) | 2010-02-26 | 2011-09-01 | Life Technologies Corporation | Modified proteins and methods of making and using same |
| WO2012092515A2 (en) | 2010-12-30 | 2012-07-05 | Life Technologies Corporation | Methods, systems, and computer readable media for nucleic acid sequencing |
| EP2595076B1 (en) * | 2011-11-18 | 2019-05-15 | Tata Consultancy Services Limited | Compression of genomic data |
| US9715574B2 (en) * | 2011-12-20 | 2017-07-25 | Michael H. Baym | Compressing, storing and searching sequence data |
| EP2608096B1 (en) | 2011-12-24 | 2020-08-05 | Tata Consultancy Services Ltd. | Compression of genomic data file |
| GB2507751A (en) * | 2012-11-07 | 2014-05-14 | Ibm | Storing data files in a file system which provides reference data files |
| CN103546160B (zh) | 2013-09-22 | 2016-07-06 | 上海交通大学 | 基于多参考序列的基因序列分级压缩方法 |
| CN104699998A (zh) | 2013-12-06 | 2015-06-10 | 国际商业机器公司 | 用于对基因组进行压缩和解压缩的方法和装置 |
| NL2012222C2 (en) * | 2014-02-06 | 2015-08-10 | Genalice B V | A method of storing/reconstructing a multitude of sequences in/from a data storage structure. |
-
2017
- 2017-08-31 US US15/693,019 patent/US11163726B2/en active Active
-
2018
- 2018-08-09 JP JP2020509515A patent/JP7157141B2/ja active Active
- 2018-08-09 GB GB2003514.3A patent/GB2578709B/en active Active
- 2018-08-09 WO PCT/IB2018/056009 patent/WO2019043481A1/en not_active Ceased
- 2018-08-09 CN CN201880054764.5A patent/CN111095421B/zh active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1680589A (zh) * | 2003-06-06 | 2005-10-12 | 李志广 | 基因芯片用人类白细胞抗原分型探针的筛选及其应用方法 |
| CN101535945A (zh) * | 2006-04-25 | 2009-09-16 | 英孚威尔公司 | 全文查询和搜索系统及其使用方法 |
| CN106687966A (zh) * | 2014-08-05 | 2017-05-17 | 伊卢米纳剑桥有限公司 | 用于数据分析和压缩的方法和系统 |
Non-Patent Citations (4)
| Title |
|---|
| 四种常用的生物序列比对软件比较;陈凤珍;李玲;操利超;严志祥;;生物信息学(第01期);全文 * |
| 四膜虫功能基因组数据库增量更新2016:生活史和减数分裂转录组及磷酸化蛋白组资源建设;杨文涛;张晶;闫冠雄;田苗;袁冬霞;缪炜;曾宏辉;熊杰;;基因组学与应用生物学(第06期);全文 * |
| 多样性增量结合支持向量机方法预测酵母核小体定位;赵秀娟;裴智勇;刘佳;蔡禄;;生物物理学报(第05期);全文 * |
| 语义异构生物数据源中的数据集成与更新;杨森;夏燕;曹顺良;邓绪斌;朱扬勇;;计算机工程(第08期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111095421A (zh) | 2020-05-01 |
| US20190065518A1 (en) | 2019-02-28 |
| JP2020533666A (ja) | 2020-11-19 |
| GB2578709B (en) | 2020-09-23 |
| JP7157141B2 (ja) | 2022-10-19 |
| GB2578709A (en) | 2020-05-20 |
| GB202003514D0 (en) | 2020-04-29 |
| WO2019043481A1 (en) | 2019-03-07 |
| US11163726B2 (en) | 2021-11-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111095421B (zh) | 基因文件的上下文感知增量算法 | |
| US11550826B2 (en) | Method and system for generating a geocode trie and facilitating reverse geocode lookups | |
| US9842132B2 (en) | Bloom filter index for device discovery | |
| US20160321254A1 (en) | Unsolicited bulk email detection using url tree hashes | |
| US10216802B2 (en) | Presenting answers from concept-based representation of a topic oriented pipeline | |
| US10380257B2 (en) | Generating answers from concept-based representation of a topic oriented pipeline | |
| US10031764B2 (en) | Managing executable files | |
| KR20210118099A (ko) | 벡터 스트링 검색 명령 | |
| US10970249B2 (en) | Format aware file system with file-to-object decomposition | |
| US20230401457A1 (en) | Data facet generation and recommendation | |
| US20190288967A1 (en) | Lossy text source coding by word length | |
| US10318262B2 (en) | Smart hashing to reduce server memory usage in a distributed system | |
| US11290532B2 (en) | Tape reconstruction from object storage | |
| US10162934B2 (en) | Data de-duplication system using genome formats conversion | |
| US11157477B2 (en) | Handling queries in document systems using segment differential based document text-index modelling | |
| US20170220584A1 (en) | Identifying Linguistically Related Content for Corpus Expansion Management | |
| CN111625615A (zh) | 文字提取与处理 | |
| EP4577925A1 (en) | Real-time resolution in identity graph data structures | |
| CN112988778A (zh) | 一种处理数据库查询脚本的方法和装置 | |
| US10404274B2 (en) | Space compression for file size reduction | |
| US11681865B2 (en) | Annotating a log based on log documentation | |
| US11188503B2 (en) | Record-based matching in data compression | |
| US11177824B2 (en) | Dictionary embedded expansion procedure | |
| US20230102594A1 (en) | Code page tracking and use for indexing and searching | |
| US20210082581A1 (en) | Determining novelty of a clinical trial against an existing trial corpus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |