CA3126012A1 - Method and system for content agnostic file indexing - Google Patents
Method and system for content agnostic file indexing Download PDFInfo
- Publication number
- CA3126012A1 CA3126012A1 CA3126012A CA3126012A CA3126012A1 CA 3126012 A1 CA3126012 A1 CA 3126012A1 CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A1 CA3126012 A1 CA 3126012A1
- Authority
- CA
- Canada
- Prior art keywords
- chunks
- binary data
- chunk
- data file
- pregenerated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000002085 persistent effect Effects 0.000 claims 2
- 238000007906 compression Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 229910002056 binary alloy Inorganic materials 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6052—Synchronisation of encoder and decoder
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/244,332 | 2019-01-10 | ||
US16/244,332 US11138152B2 (en) | 2017-10-11 | 2019-01-10 | Method and system for content agnostic file indexing |
PCT/US2020/012661 WO2020146448A1 (en) | 2019-01-10 | 2020-01-08 | Method and system for content agnostic file indexing |
Publications (1)
Publication Number | Publication Date |
---|---|
CA3126012A1 true CA3126012A1 (en) | 2020-07-16 |
Family
ID=71520909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA3126012A Pending CA3126012A1 (en) | 2019-01-10 | 2020-01-08 | Method and system for content agnostic file indexing |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP3908937A4 (ko) |
JP (1) | JP2022518194A (ko) |
KR (1) | KR20210110875A (ko) |
AU (1) | AU2020205970A1 (ko) |
CA (1) | CA3126012A1 (ko) |
WO (1) | WO2020146448A1 (ko) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11138152B2 (en) | 2017-10-11 | 2021-10-05 | Lognovations Holdings, Llc | Method and system for content agnostic file indexing |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5594435A (en) * | 1995-09-13 | 1997-01-14 | Philosophers' Stone Llc | Permutation-based data compression |
US7882139B2 (en) * | 2003-09-29 | 2011-02-01 | Xunlei Networking Technologies, Ltd | Content oriented index and search method and system |
US20050071151A1 (en) * | 2003-09-30 | 2005-03-31 | Ali-Reza Adl-Tabatabai | Compression-decompression mechanism |
EP1676368A4 (en) * | 2003-10-17 | 2008-10-08 | Pacbyte Software Pty Ltd | DATA COMPRESSION SYSTEM AND METHOD |
EP2153527A4 (en) * | 2006-09-01 | 2010-09-08 | Pacbyte Software Pty Ltd | METHOD AND SYSTEM FOR SENDING A DATA FILE VIA A DATA NETWORK |
US8533166B1 (en) * | 2010-08-20 | 2013-09-10 | Brevity Ventures LLC | Methods and systems for encoding/decoding files and transmission thereof |
US9639543B2 (en) * | 2010-12-28 | 2017-05-02 | Microsoft Technology Licensing, Llc | Adaptive index for data deduplication |
US11138152B2 (en) * | 2017-10-11 | 2021-10-05 | Lognovations Holdings, Llc | Method and system for content agnostic file indexing |
-
2020
- 2020-01-08 KR KR1020217025238A patent/KR20210110875A/ko unknown
- 2020-01-08 WO PCT/US2020/012661 patent/WO2020146448A1/en unknown
- 2020-01-08 CA CA3126012A patent/CA3126012A1/en active Pending
- 2020-01-08 EP EP20737931.4A patent/EP3908937A4/en active Pending
- 2020-01-08 JP JP2021540318A patent/JP2022518194A/ja active Pending
- 2020-01-08 AU AU2020205970A patent/AU2020205970A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
KR20210110875A (ko) | 2021-09-09 |
WO2020146448A1 (en) | 2020-07-16 |
EP3908937A1 (en) | 2021-11-17 |
JP2022518194A (ja) | 2022-03-14 |
EP3908937A4 (en) | 2022-09-28 |
AU2020205970A1 (en) | 2021-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11138152B2 (en) | Method and system for content agnostic file indexing | |
US11899641B2 (en) | Trie-based indices for databases | |
US20220093210A1 (en) | System and method for characterizing biological sequence data through a probabilistic data structure | |
US8554561B2 (en) | Efficient indexing of documents with similar content | |
US10680645B2 (en) | System and method for data storage, transfer, synchronization, and security using codeword probability estimation | |
KR20130062889A (ko) | 데이터 압축 방법 및 시스템 | |
US20050187898A1 (en) | Data Lookup architecture | |
US10146817B2 (en) | Inverted index and inverted list process for storing and retrieving information | |
US11899624B2 (en) | System and method for random-access manipulation of compacted data files | |
US11544225B2 (en) | Method and system for content agnostic file indexing | |
CA3126012A1 (en) | Method and system for content agnostic file indexing | |
Lou et al. | Data deduplication with random substitutions | |
CN112416879B (zh) | 一种基于ntfs文件系统的块级数据去重方法 | |
US20220245097A1 (en) | Hashing with differing hash size and compression size | |
JP6291435B2 (ja) | プログラムおよびクラスタシステム | |
US11995060B2 (en) | Hashing a data set with multiple hash engines | |
US20220245104A1 (en) | Hashing for deduplication through skipping selected data | |
Vaddeman et al. | Data formats | |
US20240202166A1 (en) | Generating compressed column slabs for storage in a database system | |
Нікітін et al. | Modification of hashing algorithm to increase rate of operations in nosql databases | |
Nikitin et al. | Modification of hashing algorithm to increase rate of operations in NOSQL databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20231229 |