CN103582880B - 压缩匹配枚举 - Google Patents

压缩匹配枚举 Download PDF

Info

Publication number
CN103582880B
CN103582880B CN201180071391.0A CN201180071391A CN103582880B CN 103582880 B CN103582880 B CN 103582880B CN 201180071391 A CN201180071391 A CN 201180071391A CN 103582880 B CN103582880 B CN 103582880B
Authority
CN
China
Prior art keywords
node
trie
suffix
data
suffix array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201180071391.0A
Other languages
English (en)
Chinese (zh)
Other versions
CN103582880A (zh
Inventor
B.A.米克尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN103582880A publication Critical patent/CN103582880A/zh
Application granted granted Critical
Publication of CN103582880B publication Critical patent/CN103582880B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6011Encoder aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN201180071391.0A 2011-06-03 2011-10-09 压缩匹配枚举 Expired - Fee Related CN103582880B (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US13/152,733 2011-06-03
US13/152,733 US8493249B2 (en) 2011-06-03 2011-06-03 Compression match enumeration
US13/152733 2011-06-03
PCT/US2011/055532 WO2012166190A1 (en) 2011-06-03 2011-10-09 Compression match enumeration

Publications (2)

Publication Number Publication Date
CN103582880A CN103582880A (zh) 2014-02-12
CN103582880B true CN103582880B (zh) 2017-05-03

Family

ID=47259701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180071391.0A Expired - Fee Related CN103582880B (zh) 2011-06-03 2011-10-09 压缩匹配枚举

Country Status (6)

Country Link
US (2) US8493249B2 (enExample)
EP (1) EP2715568A4 (enExample)
JP (1) JP5873925B2 (enExample)
KR (2) KR101865264B1 (enExample)
CN (1) CN103582880B (enExample)
WO (1) WO2012166190A1 (enExample)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8493249B2 (en) 2011-06-03 2013-07-23 Microsoft Corporation Compression match enumeration
JP5766588B2 (ja) * 2011-11-16 2015-08-19 クラリオン株式会社 検索端末装置、検索サーバ装置、及びセンタ連携型検索システム
US9231615B2 (en) 2012-10-24 2016-01-05 Seagate Technology Llc Method to shorten hash chains in Lempel-Ziv compression of data with repetitive symbols
WO2014117353A1 (en) * 2013-01-31 2014-08-07 Hewlett-Packard Development Company, L.P. Incremental update of a shape graph
US9760546B2 (en) * 2013-05-24 2017-09-12 Xerox Corporation Identifying repeat subsequences by left and right contexts
US10565182B2 (en) * 2015-11-23 2020-02-18 Microsoft Technology Licensing, Llc Hardware LZMA compressor
CN108664459B (zh) * 2018-03-22 2021-09-17 佛山市顺德区中山大学研究院 一种后缀数组自适应的合并方法及其装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4906991A (en) * 1988-04-29 1990-03-06 Xerox Corporation Textual substitution data compression with finite length search windows
US5406279A (en) * 1992-09-02 1995-04-11 Cirrus Logic, Inc. General purpose, hash-based technique for single-pass lossless data compression
US7124034B2 (en) * 1999-12-24 2006-10-17 International Business Machines Corporation Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor
CN1894696A (zh) * 2003-12-23 2007-01-10 英特尔公司 检测数据流中的模式的方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5442350A (en) 1992-10-29 1995-08-15 International Business Machines Corporation Method and means providing static dictionary structures for compressing character data and expanding compressed data
US5978795A (en) 1997-01-14 1999-11-02 Microsoft Corporation Temporally ordered binary search method and system
KR19990015114A (ko) * 1997-08-01 1999-03-05 구자홍 문자연결 정보를 이용한 문자인식기
US6047283A (en) 1998-02-26 2000-04-04 Sap Aktiengesellschaft Fast string searching and indexing using a search tree having a plurality of linked nodes
US6751624B2 (en) * 2000-04-04 2004-06-15 Globalscape, Inc. Method and system for conducting a full text search on a client system by a server system
JP2002269096A (ja) 2001-03-08 2002-09-20 Ricoh Co Ltd 文字列復元方法及びその装置並びに記録媒体
KR100793505B1 (ko) * 2006-05-30 2008-01-14 울산대학교 산학협력단 복수의 표적 mRNA에 적용 가능한 siRNA염기서열을 추출하는 방법
US7453377B2 (en) 2006-08-09 2008-11-18 Reti Corporation Apparatus and methods for searching a pattern in a compressed data
US8099415B2 (en) * 2006-09-08 2012-01-17 Simply Hired, Inc. Method and apparatus for assessing similarity between online job listings
JP4714127B2 (ja) * 2006-11-27 2011-06-29 株式会社日立製作所 記号列検索方法、プログラムおよび装置ならびにそのトライの生成方法、プログラムおよび装置
JP4439013B2 (ja) 2007-04-25 2010-03-24 株式会社エスグランツ ビット列検索方法及び検索プログラム
US8812508B2 (en) 2007-12-14 2014-08-19 Hewlett-Packard Development Company, L.P. Systems and methods for extracting phases from text
US8676815B2 (en) * 2008-05-07 2014-03-18 City University Of Hong Kong Suffix tree similarity measure for document clustering
US8108353B2 (en) * 2008-06-11 2012-01-31 International Business Machines Corporation Method and apparatus for block size optimization in de-duplication
US8515961B2 (en) * 2010-01-19 2013-08-20 Electronics And Telecommunications Research Institute Method and apparatus for indexing suffix tree in social network
US8493249B2 (en) 2011-06-03 2013-07-23 Microsoft Corporation Compression match enumeration
TWI443539B (zh) * 2012-01-06 2014-07-01 Univ Nat Central 藉由權重字尾樹進行資料分析之方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4906991A (en) * 1988-04-29 1990-03-06 Xerox Corporation Textual substitution data compression with finite length search windows
US5406279A (en) * 1992-09-02 1995-04-11 Cirrus Logic, Inc. General purpose, hash-based technique for single-pass lossless data compression
US7124034B2 (en) * 1999-12-24 2006-10-17 International Business Machines Corporation Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor
CN1894696A (zh) * 2003-12-23 2007-01-10 英特尔公司 检测数据流中的模式的方法和装置

Also Published As

Publication number Publication date
US20120306670A1 (en) 2012-12-06
US9065469B2 (en) 2015-06-23
KR101865264B1 (ko) 2018-06-07
KR20180066254A (ko) 2018-06-18
JP5873925B2 (ja) 2016-03-01
EP2715568A4 (en) 2016-01-06
US20130307710A1 (en) 2013-11-21
CN103582880A (zh) 2014-02-12
WO2012166190A1 (en) 2012-12-06
KR101926324B1 (ko) 2019-02-26
KR20140038441A (ko) 2014-03-28
JP2014520318A (ja) 2014-08-21
EP2715568A1 (en) 2014-04-09
US8493249B2 (en) 2013-07-23

Similar Documents

Publication Publication Date Title
CN103582880B (zh) 压缩匹配枚举
Gueniche et al. Compact prediction tree: A lossless model for accurate sequence prediction
US8838551B2 (en) Multi-level database compression
CN111460311A (zh) 基于字典树的搜索处理方法、装置、设备和存储介质
US8659451B2 (en) Indexing compressed data
US8698657B2 (en) Methods and systems for compressing and decompressing data
CN107291785A (zh) 一种数据查找方法及装置
US20090063465A1 (en) System and method for string processing and searching using a compressed permuterm index
US20100278446A1 (en) Structure of hierarchical compressed data structure for tabular data
Ferragina et al. On the bit-complexity of Lempel--Ziv compression
US9720927B2 (en) Method and system for database storage management
Yamamoto et al. Faster compact on-line Lempel-Ziv factorization
Belazzougui et al. Bidirectional variable-order de Bruijn graphs
JPS6356726B2 (enExample)
US8996531B1 (en) Inverted index and inverted list process for storing and retrieving information
US20120110025A1 (en) Coding order-independent collections of words
JP5544998B2 (ja) テキスト処理装置、テキスト処理方法、およびテキスト処理プログラム
Vey Differential direct coding: a compression algorithm for nucleotide sequence data
CN115617818B (zh) 区块链中的mpt树批量更新方法、电子设备及存储介质
JP5939259B2 (ja) 照合制御プログラム、照合制御装置および照合制御方法
CN114443866B (zh) 数据处理方法、装置、计算设备及介质
Hoang et al. Dictionary selection using partial matching
CN110807092A (zh) 数据处理方法及装置
CN110545108B (zh) 数据处理方法、装置、电子设备及计算机可读存储介质
JP5521064B1 (ja) Id付与装置、方法、及びプログラム

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: MICROSOFT TECHNOLOGY LICENSING LLC

Free format text: FORMER OWNER: MICROSOFT CORP.

Effective date: 20150623

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20150623

Address after: Washington State

Applicant after: MICROSOFT TECHNOLOGY LICENSING, LLC

Address before: Washington State

Applicant before: Microsoft Corp.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170503