CA3126012A1 - Method and system for content agnostic file indexing - Google Patents

Method and system for content agnostic file indexing Download PDF

Info

Publication number
CA3126012A1
CA3126012A1 CA3126012A CA3126012A CA3126012A1 CA 3126012 A1 CA3126012 A1 CA 3126012A1 CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A1 CA3126012 A1 CA 3126012A1
Authority
CA
Canada
Prior art keywords
chunks
binary data
chunk
data file
pregenerated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3126012A
Other languages
English (en)
French (fr)
Inventor
Christopher Mcelveen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lognovations Holdings LLC
Original Assignee
Lognovations Holdings LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/244,332 external-priority patent/US11138152B2/en
Application filed by Lognovations Holdings LLC filed Critical Lognovations Holdings LLC
Publication of CA3126012A1 publication Critical patent/CA3126012A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6052Synchronisation of encoder and decoder

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CA3126012A 2019-01-10 2020-01-08 Method and system for content agnostic file indexing Pending CA3126012A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/244,332 2019-01-10
US16/244,332 US11138152B2 (en) 2017-10-11 2019-01-10 Method and system for content agnostic file indexing
PCT/US2020/012661 WO2020146448A1 (en) 2019-01-10 2020-01-08 Method and system for content agnostic file indexing

Publications (1)

Publication Number Publication Date
CA3126012A1 true CA3126012A1 (en) 2020-07-16

Family

ID=71520909

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3126012A Pending CA3126012A1 (en) 2019-01-10 2020-01-08 Method and system for content agnostic file indexing

Country Status (6)

Country Link
EP (1) EP3908937A4 (ko)
JP (1) JP2022518194A (ko)
KR (1) KR20210110875A (ko)
AU (1) AU2020205970A1 (ko)
CA (1) CA3126012A1 (ko)
WO (1) WO2020146448A1 (ko)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138152B2 (en) 2017-10-11 2021-10-05 Lognovations Holdings, Llc Method and system for content agnostic file indexing

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5594435A (en) * 1995-09-13 1997-01-14 Philosophers' Stone Llc Permutation-based data compression
US7882139B2 (en) * 2003-09-29 2011-02-01 Xunlei Networking Technologies, Ltd Content oriented index and search method and system
US20050071151A1 (en) * 2003-09-30 2005-03-31 Ali-Reza Adl-Tabatabai Compression-decompression mechanism
EP1676368A4 (en) * 2003-10-17 2008-10-08 Pacbyte Software Pty Ltd DATA COMPRESSION SYSTEM AND METHOD
EP2153527A4 (en) * 2006-09-01 2010-09-08 Pacbyte Software Pty Ltd METHOD AND SYSTEM FOR SENDING A DATA FILE VIA A DATA NETWORK
US8533166B1 (en) * 2010-08-20 2013-09-10 Brevity Ventures LLC Methods and systems for encoding/decoding files and transmission thereof
US9639543B2 (en) * 2010-12-28 2017-05-02 Microsoft Technology Licensing, Llc Adaptive index for data deduplication
US11138152B2 (en) * 2017-10-11 2021-10-05 Lognovations Holdings, Llc Method and system for content agnostic file indexing

Also Published As

Publication number Publication date
KR20210110875A (ko) 2021-09-09
WO2020146448A1 (en) 2020-07-16
EP3908937A1 (en) 2021-11-17
JP2022518194A (ja) 2022-03-14
EP3908937A4 (en) 2022-09-28
AU2020205970A1 (en) 2021-08-05

Similar Documents

Publication Publication Date Title
US11138152B2 (en) Method and system for content agnostic file indexing
US11899641B2 (en) Trie-based indices for databases
US20220093210A1 (en) System and method for characterizing biological sequence data through a probabilistic data structure
US8554561B2 (en) Efficient indexing of documents with similar content
US10680645B2 (en) System and method for data storage, transfer, synchronization, and security using codeword probability estimation
KR20130062889A (ko) 데이터 압축 방법 및 시스템
US20050187898A1 (en) Data Lookup architecture
US10146817B2 (en) Inverted index and inverted list process for storing and retrieving information
US11899624B2 (en) System and method for random-access manipulation of compacted data files
US11544225B2 (en) Method and system for content agnostic file indexing
CA3126012A1 (en) Method and system for content agnostic file indexing
Lou et al. Data deduplication with random substitutions
CN112416879B (zh) 一种基于ntfs文件系统的块级数据去重方法
US20220245097A1 (en) Hashing with differing hash size and compression size
JP6291435B2 (ja) プログラムおよびクラスタシステム
US11995060B2 (en) Hashing a data set with multiple hash engines
US20220245104A1 (en) Hashing for deduplication through skipping selected data
Vaddeman et al. Data formats
US20240202166A1 (en) Generating compressed column slabs for storage in a database system
Нікітін et al. Modification of hashing algorithm to increase rate of operations in nosql databases
Nikitin et al. Modification of hashing algorithm to increase rate of operations in NOSQL databases

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20231229