CN104679776A - Method and device for compressing inverted indexes - Google Patents

Method and device for compressing inverted indexes Download PDF

Info

Publication number
CN104679776A
CN104679776A CN201310631164.XA CN201310631164A CN104679776A CN 104679776 A CN104679776 A CN 104679776A CN 201310631164 A CN201310631164 A CN 201310631164A CN 104679776 A CN104679776 A CN 104679776A
Authority
CN
China
Prior art keywords
document
compression
inverted index
byte
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310631164.XA
Other languages
Chinese (zh)
Other versions
CN104679776B (en
Inventor
汤善敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310631164.XA priority Critical patent/CN104679776B/en
Publication of CN104679776A publication Critical patent/CN104679776A/en
Application granted granted Critical
Publication of CN104679776B publication Critical patent/CN104679776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for compressing inverted indexes. The method comprises the following steps: acquiring compression requests of the inverted indexes; according to the compression requests, dividing the inverted indexes to be stored into document numbers part and document weights; respectively compressing the document numbers and the document weights, and recording the corresponding relation of the document numbers and the document weights. The invention further discloses a device for compressing the inverted indexes. According to the method and the device, the storage space is not only saved, but also the retrieval performance is effectively increased.

Description

The compression method of inverted index and device
Technical field
The present invention relates to internet arena, particularly relate to a kind of compression method and device of inverted index.
Background technology
Inverted index is the most widely used index stores form of Webpage search, is used to be stored in the mapping of the memory location of certain word in a document or one group of document under full-text search.The Webpage search amount corresponding due to keyword is larger, and therefore this inverted index stores after needing to compress it when storing again, then carries out searching corresponding Search Results according to the inverted index stored when searching for.The contracting of falling row pressure technical method conventional at present has following several:
The first, direct logarithm value carries out variable-length encoding, namely with several byte representation numerical value.Each byte only uses low 7, and most significant digit is for representing whether this value has encoded.
The second, sorts to array, the difference between record numerical value, then uses the coded systems such as unary code/group VarInt code to carry out code storage to difference.
Although the existing row of falling compress technique has all done certain optimization in the searching of ratio of compression and later stage, but in search procedure, still need some fritter of decompress(ion) inverted index to determine certain numerical value whether in the row of falling, and pilot process also need the operations such as memory copying.Therefore for the retrieval module of CPU intensity, the existing row of falling compress technique effectively cannot improve retrieval performance in actual applications, sometimes even can produce minus effect.
Summary of the invention
The fundamental purpose of the embodiment of the present invention is to provide a kind of compression method and device of inverted index, is intended to effectively improve retrieval performance.
For reaching above object, embodiments providing a kind of compression method of inverted index, comprising the following steps:
Obtain compressing inverted index request;
According to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Respectively document code and document weight are compressed, and record the corresponding relation of document code and document weight.
The embodiment of the present invention additionally provides a kind of compression set of inverted index, comprising:
Acquisition request module, for obtaining compressing inverted index request;
Data Division module, for according to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Document code compression module, for compressing document code;
Document weight compression module, for compressing document weight;
Logging modle, for recording the corresponding relation of document code and document weight.
This inverted index, when compressing inverted index, is first split into document code and document weight two parts by the embodiment of the present invention, then compresses these two parts respectively, and records the corresponding relation of document code and document weight.The inverted index compressed by said method not only saves storage space, but also effectively improves retrieval performance.
Accompanying drawing explanation
Fig. 1 be the compression method of inverted index of the present invention apply hardware structure schematic diagram;
Fig. 2 is the schematic flow sheet of the compression method of inverted index of the present invention;
Fig. 3 is the structural representation of inverted index of the present invention;
Fig. 4 is respectively to the refinement schematic flow sheet that document code and document weight compress in the compression method of inverted index of the present invention;
Fig. 5 is to the refinement schematic flow sheet that the document code of every block compresses in the compression method of inverted index of the present invention;
Fig. 6 is the high-level schematic functional block diagram of the compression set preferred embodiment of inverted index of the present invention;
Fig. 7 is the refinement high-level schematic functional block diagram of document code compression module in the compression set of inverted index of the present invention.
The realization of the object of the invention, functional characteristics and advantage will in conjunction with the embodiments, are described further with reference to accompanying drawing.
Embodiment
Technical scheme of the present invention is further illustrated below in conjunction with Figure of description and specific embodiment.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
The invention provides a kind of compression method of falling row pressure.Should being used for when compressing this inverted index by the compression method of falling row pressure, not only decreasing storage space, but also reducing magnetic disc access times, improve retrieval performance.Should can run in an independent calculation element by the compression method of falling row pressure, also can run in the system that multiple calculation element forms.Below will describe accordingly calculation element.
With reference to Fig. 1, the topology example figure of the calculation element that the present invention's compression method of falling row pressure is applied is proposed.This calculation element can comprise the parts such as processor 101, memory module 102, load module 103, communication module 104, display module 105.It will be understood by those skilled in the art that the computing device structure shown in Fig. 1 does not form the restriction to this calculation element, this calculation element can also comprise the parts more more or less than diagram, or combines some parts, or different parts are arranged.
Particularly, this memory module 102 can be used for storing software program and data.Full detail in this calculation element, comprises the raw data of input, software program, middle operation result and final operation result and all will be kept in memory module 102.Processor 101 by running the software program and data that are stored in memory module 102, thus performs the application of various function and data processing.Above-mentioned memory module 102 can comprise one or more computer-readable recording mediums, and it not only comprises internal storage, also comprises external memory storage.Described internal storage is used for depositing the current data that performing of processor and program, once power-off, then the data of internal storage will be lost.External memory storage is generally nonvolatile memory, can preserve information for a long time.Such as at least one disk memory, flush memory device or other volatile solid-state parts.
Above-mentioned load module 103 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, control lever, optics or trace ball signal and inputs.For touch-screen conventional at present, this touch-screen can comprise touch detecting apparatus and touch controller two parts.Wherein, touch detecting apparatus detects the touch orientation of user, and detects the signal that touch operation brings, and sends signal to touch controller; Touch controller receives touch information from touch detecting apparatus, and converts it to contact coordinate, then gives processor 101, and the order that energy receiving processor 101 is sent also is performed.Be understandable that, this load module can include but not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, control lever etc. one or more.
Display module 105 can be used for the various graphical user interface showing information or the information being supplied to user and the calculation element inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display module 105 can comprise display panel 141, such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode).Be understandable that, this load module 103 and display module 105 can as two independently parts realize, also can integrate and realize input and output function.
Above-mentioned communicator 104 can be used for calculation element and PERCOM peripheral communication.This communicator can comprise RF circuit, WIFI etc.Wherein RF circuit can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially, after being received by the downlink information of base station, transfer to more than one or one processor 101 to process; In addition, base station is sent to by relating to up data.Usually, RF circuit includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupling mechanism, LNA(Low Noise Amplifier, low noise amplifier), diplexer etc.In addition, RF circuit can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access (WCDMA)), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc.WiFi belongs to short range wireless transmission technology, and calculation element can help user to send and receive e-mail by WiFi, browse webpage and access streaming video etc., and its broadband internet wireless for user provides is accessed.Be understandable that, WiFi does not belong to must forming of calculation element, can omit in the scope of essence not changing invention as required completely.
Processor 101 is control centers of calculation element, utilize the various piece of various interface and the whole calculation element of connection, by running or perform the software program be stored in memory module 102, and call the data be stored in memory module 102, perform various function and the process data of calculation element, thus integral monitoring is carried out to calculation element.Preferably, this processor 101 can comprise one or more process core, such as processor 101 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 101.
Calculation element also comprises the power module 106(such as battery of powering to all parts), this power module 106 can be connected with processor 101 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power module 106 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.Although not shown, terminal can also comprise camera, bluetooth module etc., does not repeat them here.
Based on the hardware configuration of above-mentioned calculation element, the compression method of falling row pressure that the present invention proposes will operate on this calculation element, below with reference to the hardware configuration of above-mentioned calculation element, be described in detail the compression method of falling row pressure.With reference to Fig. 2, should comprise by the compression method of falling row pressure:
Step S110, the request of acquisition compressing inverted index;
As shown in Figure 3, inverted index is the most widely used index stores form of Webpage search, it comprises multiple search vocabulary, correspondence appears in many sections of documents by each search vocabulary, and each document is also to there being corresponding weighted value, frequency, importance degree, the concrete position occurred etc. that this weighted value occurs in a document for representing this vocabulary.This inverted index will be established, and be stored in external memory storage, for processor access.And due to the document information of inverted index very many, therefore in order to store and access this inverted index efficiently, thus store time need to compress it.Above-mentioned compressing inverted index request can be the request sent after inverted index table upgrades, and also can be the request sent after inverted index is set up.
Step S120, according to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Step S130, respectively document code and document weight to be compressed, and record the corresponding relation of document code and document weight.
This inverted index, when compressing inverted index, is first split into document code and document weight two parts by the embodiment of the present invention, then compresses these two parts respectively, and records the corresponding relation of document code and document weight.The inverted index compressed by said method not only saves storage space, but also decreases and search number of times.
Specifically can comprise with reference to Fig. 4, above-mentioned steps S130:
Step S131, piecemeal is carried out to document code, and corresponding byte is set for recording number and the block number of document in every block;
The block to consider point is too many, and it is many that header information takies byte; Very little, compression effectiveness is undesirable for the block divided.Therefore in actual applications, the standard of divided block and header information how many byte representations of each piece should be decided according to inverted index amount and maximum document code.Such as, the inverted index amount of vocabulary 1 is 2,000 ten thousand sections of webpages, then the document code scope of every section of webpage is [0,19999999] between, therefore the criteria for classifying divided block of 65536 can be adopted, namely the document code of first piece is 0-65536, and the document code of second piece is 65537-131073, block below the like.Meanwhile, due to the build of each piece 4 byte representations, wherein two byte representation blocks number, two byte representation block numbers.
Step S132, the document code of every block carried out to compression process;
After document code piecemeal, the document code of every block will be carried out to compression process.
Whether step S133, all document weight corresponding to each word carry out re-scheduling process, and compress the document weight after re-scheduling than determining according to overall compression;
Because the weight of same word in different documents may exist certain similarity, therefore after re-scheduling process is carried out to document weight, more whether the document weight after re-scheduling is compressed than determining according to overall compression.In the present embodiment, the computing method of this overall compression ratio are as follows:
Overall compression ratio=(compression postbyte number)/(before compression byte number).Byte number before compression=arrange number × document weight byte number; Number+(arranging number × n+7)/8 after compression postbyte number=document weight byte number × re-scheduling), wherein n=log2 (number after re-scheduling), this n are also for representing the byte number that the position of re-scheduling document weight is corresponding.When overall compression ratio is greater than 1, then determine that the document weight after to re-scheduling is compressed.
Step S134, document weight after the head of every block preserves re-scheduling by original byte, and corresponding bit points to the document weight of re-scheduling position for recording every section of document is set.
After document weight after re-scheduling is compressed, preserve the document weight after re-scheduling by original byte, when this original byte is document code piecemeal, the byte represented by build information.Corresponding bit points to the document weight of re-scheduling position for recording every section of document is set simultaneously.
To choose two kinds of compress modes of prior art below, such as elongated compression, difference compression contrast with the above-mentioned compression method of falling row pressure, and its comparing result is as shown in table 1 below:
As seen from the above table, the compression algorithm of falling row pressure of the present invention than the elongated compression algorithm of prior art and the effect of difference compression algorithm more excellent, such as ratio of compression is less, decompress(ion) bandwidth is larger, decompress(ion) is consuming time shorter.And when this compression algorithm of falling row pressure is applied to existing web page search system, find that total inverted index amount is reduced to original half, not only greatly reduce the number of times of disk access, but also in decompression procedure not holding time, drastically increase whole retrieval performance.
In prior art, when inverted index is uncompressed, bitmap or Boulogne filtrator is built separately for filtering the document code of a part for arranging long word, finally obtaining corresponding document weight with binary chop.Directly use binary chop filter document to number for the word that parallelism is shorter and obtain the document weight of its correspondence.Although above method can reduce the number of times of binary chop, also improve retrieval performance to a certain extent, need to waste additional space and cannot avoid with O(log (n)) the process of acquisition document weight.And the compression method of falling row pressure use numerical value piecemeal of the present invention and the row that falls represent with bitmap, thus allow whole process of searching directly become secondary index in conjunction with binary chop or bitmap filter document, directly O(1 is passed through after filter document) can document weight be obtained, greatly improve retrieval performance.
Further, with reference to Fig. 5, above-mentioned steps S132 comprises:
Step S1321, calculating represent the byte number A=M/8 required for document code of each piecemeal with bitmap, wherein M is the numerical value of the criteria for classifying;
The document number * N of the byte number B=block required for document code of step S1322, each piecemeal of calculating N byte representation, wherein N is for representing document code byte number;
Step S1323, compare the size of A and B, select with bits compression according to comparative result or use N number of byte representation document code.
When A is less than B, bits compression method is adopted to compress document code; When A is more than or equal to B, directly use N number of byte representation document code.
Corresponding said method embodiment, present invention also offers a kind of compression set of inverted index.With reference to Fig. 6, the compression set of this inverted index comprises:
Acquisition request module 110, for obtaining compressing inverted index request;
Data Division module 120, for according to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Document code compression module 130, for compressing document code;
Document weight compression module 140, for compressing document weight;
Logging modle 150, for recording the corresponding relation of document code and document weight.
This inverted index, when compressing inverted index, is first split into document code and document weight two parts by the embodiment of the present invention, then compresses these two parts respectively, and records the corresponding relation of document code and document weight.The inverted index compressed by said method not only saves storage space, but also decreases and search number of times.
Further, with reference to Fig. 7, above-mentioned document code compression module 130 can comprise:
Blocking unit 131, for carrying out piecemeal to document code, and arranges corresponding byte for recording number and the block number of document in every block;
Compression unit 132, for carrying out compression process to the document code of every block.
Further, the block to consider point is too many, and it is many that header information takies byte; Very little, compression effectiveness is undesirable for the block divided.In above-mentioned blocking unit 131, by the byte quantity deciding the standard of divided block and the header information of each piece according to inverted index amount and maximum document code and represent.Such as, the inverted index amount of vocabulary 1 is 2,000 ten thousand sections of webpages, then the document code scope of every section of webpage is [0,19999999] between, therefore the criteria for classifying divided block of 65536 can be adopted, namely the document code of first piece is 0-65536, and the document code of second piece is 65537-131073, block below the like.Meanwhile, due to the build of each piece 4 byte representations, wherein two byte representation blocks number, two byte representation block numbers.
Further, above-mentioned compression unit 132 for:
Calculating bitmap represents the byte number A=M/8 required for the document code of each piecemeal, and wherein M is the numerical value of the standard of divided block;
The document number * N of the byte number B=block required for document code of each piecemeal of calculating N byte representation, wherein N is for representing document code byte number;
Relatively the size of A and B, selects with bits compression according to comparative result or uses N number of byte representation document code.
Further, described document weight compression module 140 for:
Whether all document weight corresponding to each word carries out re-scheduling process, and compress the document weight after re-scheduling than determining according to overall compression;
Document weight after the head of every block preserves re-scheduling by original byte, and the position arranging that corresponding bit points to the document weight of re-scheduling for recording every section of document.
In the compression set of above-mentioned inverted index, the specific works principle of each functional module all can refer to described in previous methods embodiment, just repeats no more at this.
From upper table 1, the compression algorithm of falling row pressure of the present invention than the elongated compression algorithm of prior art and the effect of difference compression algorithm more excellent, such as ratio of compression is less, decompress(ion) bandwidth is higher, decompress(ion) is consuming time shorter.And when this compression algorithm of falling row pressure is applied to existing web page search system, find that total inverted index amount is reduced to original half, not only greatly reduce the number of times of disk access, but also in decompression procedure not holding time, drastically increase whole recall precision.
In prior art, when inverted index is uncompressed, bitmap or Boulogne filtrator is built separately for filtering the document code of a part for arranging long word, finally obtaining corresponding document weight with binary chop.Directly use binary chop filter document to number for the word that parallelism is shorter and obtain the document weight of its correspondence.Above method can reduce the number of times of binary chop, improves retrieval performance to a certain extent, but needs to waste additional space and cannot avoid with O(log (n)) the process of acquisition document weight.And the compression method of falling row pressure use numerical value piecemeal of the present invention and the row that falls represent with bitmap, thus allow whole process of searching directly become secondary index in conjunction with binary chop or bitmap filter document, directly O(1 is passed through after filter document) can document weight be obtained, greatly improve recall precision.
It should be noted that, in this article, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or device and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or device.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the device comprising this key element and also there is other identical element.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that above-described embodiment method can add required general hardware platform by software and realize, hardware can certainly be passed through, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, the compression set of this above-mentioned inverted index will be realized by some instructions, this some instruction is stored in a storage medium (as ROM/RAM, magnetic disc, CD), supply terminal device (can be mobile phone, computing machine, server, or the network equipment etc.) perform method described in each embodiment of the present invention.
The foregoing is only the preferred embodiments of the present invention; not thereby its scope of the claims is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; directly or indirectly be used in the technical field that other are relevant, be all in like manner included in scope of patent protection of the present invention.

Claims (11)

1. a compression method for inverted index, is characterized in that, comprises the following steps:
Obtain compressing inverted index request;
According to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Respectively document code and document weight are compressed, and record the corresponding relation of document code and document weight.
2. the compression method of inverted index as claimed in claim 1, is characterized in that, describedly carries out compression to document code and comprises:
Piecemeal is carried out to document code, and corresponding byte is set for recording number and the block number of document in every block;
Compression process is carried out to the document code of every block.
3. the compression method of inverted index as claimed in claim 2, is characterized in that, describedly carries out piecemeal to document code, and arranges corresponding byte and comprise for the number and block number recording document in every block:
The byte quantity that the standard of divided block and the header information of each piece represent is decided according to inverted index amount and maximum document code.
4. the compression method of inverted index as claimed in claim 2, it is characterized in that, the described document code to every block carries out compression process and comprises:
Calculating bitmap represents the byte number A=M/8 required for the document code of each piecemeal, and wherein M is the numerical value of the standard of divided block;
The document number * N of the byte number B=block required for document code of each piecemeal of calculating N byte representation, wherein N is for representing document code byte number;
When A is less than B, bits compression method is adopted to compress document code; When A is more than or equal to B, directly use N number of byte representation document code.
5. the compression method of the inverted index as described in any one of claim 1-4, is characterized in that, describedly carries out compression to document weight and comprises:
Whether all document weight corresponding to each word carries out re-scheduling process, and compress the document weight after re-scheduling than determining according to overall compression;
Document weight after the head of every block preserves re-scheduling by original byte, and the position arranging that corresponding bit points to the document weight of re-scheduling for recording every section of document.
6. the compression method of inverted index as claimed in claim 5, is characterized in that, described overall compression ratio is compression postbyte number and the ratio of the front byte number of compression, wherein byte number before compression=arrange number × document weight byte number; Number+(arranging number × n+7)/8 after compression postbyte number=document weight byte number × re-scheduling), wherein n=log2 (number after re-scheduling).
7. a compression set for inverted index, is characterized in that, comprising:
Acquisition request module, for obtaining compressing inverted index request;
Data Division module, for according to described compressing inverted index request, the inverted index that will store is divided into document code and document weight two parts;
Document code compression module, for compressing document code;
Document weight compression module, for compressing document weight;
Logging modle, for recording the corresponding relation of document code and document weight.
8. the compression set of inverted index as claimed in claim 7, it is characterized in that, described document code compression module comprises:
Blocking unit, for carrying out piecemeal to document code, and arranges corresponding byte for recording number and the block number of document in every block;
Compression unit, for carrying out compression process to the document code of every block.
9. the compression set of inverted index as claimed in claim 7, it is characterized in that, described blocking unit is used for:
The byte quantity that the standard of divided block and the header information of each piece represent is decided according to inverted index amount and maximum document code.
10. the compression set of inverted index as claimed in claim 8, it is characterized in that, described compression unit is used for:
Calculating bitmap represents the byte number A=M/8 required for the document code of each piecemeal, and wherein M is the numerical value of the standard of divided block;
The document number * N of the byte number B=block required for document code of each piecemeal of calculating N byte representation, wherein N is for representing document code byte number;
When A is less than B, bits compression method is adopted to compress document code; When A is more than or equal to B, directly use N number of byte representation document code.
The compression set of 11. inverted indexs as described in any one of claim 7-10, it is characterized in that, described document weight compression module is used for:
Whether all document weight corresponding to each word carries out re-scheduling process, and compress the document weight after re-scheduling than determining according to overall compression;
Document weight after the head of every block preserves re-scheduling by original byte, and the position arranging that corresponding bit points to the document weight of re-scheduling for recording every section of document.
CN201310631164.XA 2013-11-29 2013-11-29 The compression method and device of inverted index Active CN104679776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310631164.XA CN104679776B (en) 2013-11-29 2013-11-29 The compression method and device of inverted index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310631164.XA CN104679776B (en) 2013-11-29 2013-11-29 The compression method and device of inverted index

Publications (2)

Publication Number Publication Date
CN104679776A true CN104679776A (en) 2015-06-03
CN104679776B CN104679776B (en) 2019-08-27

Family

ID=53314833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310631164.XA Active CN104679776B (en) 2013-11-29 2013-11-29 The compression method and device of inverted index

Country Status (1)

Country Link
CN (1) CN104679776B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019184A (en) * 2017-09-04 2019-07-16 北京字节跳动网络技术有限公司 A kind of method of the orderly integer array of compression and decompression
CN110825936A (en) * 2018-07-23 2020-02-21 北京小度互娱科技有限公司 Method, system and storage medium for generating inverted index and searching by using inverted index

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1737791A (en) * 2005-09-08 2006-02-22 无敌科技(西安)有限公司 Data compression method by finite exhaustive optimization
US20090089256A1 (en) * 2007-10-01 2009-04-02 Frederik Transier Compressed storage of documents using inverted indexes
CN102708187A (en) * 2012-05-14 2012-10-03 成都信息工程学院 Reverse index mixed compression and decompression method based on Hbase database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1737791A (en) * 2005-09-08 2006-02-22 无敌科技(西安)有限公司 Data compression method by finite exhaustive optimization
US20090089256A1 (en) * 2007-10-01 2009-04-02 Frederik Transier Compressed storage of documents using inverted indexes
CN102708187A (en) * 2012-05-14 2012-10-03 成都信息工程学院 Reverse index mixed compression and decompression method based on Hbase database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FALK SCHOLER 等: "Compression of inverted indexes for fast query evaluation", 《PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL 2002》 *
刘小珠 等: "高效的随机访问分块倒排文件自索引技术", 《计算机学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019184A (en) * 2017-09-04 2019-07-16 北京字节跳动网络技术有限公司 A kind of method of the orderly integer array of compression and decompression
CN110019184B (en) * 2017-09-04 2021-04-27 北京字节跳动网络技术有限公司 Method for compressing and decompressing ordered integer array
CN110825936A (en) * 2018-07-23 2020-02-21 北京小度互娱科技有限公司 Method, system and storage medium for generating inverted index and searching by using inverted index
CN110825936B (en) * 2018-07-23 2024-04-30 北京小度互娱科技有限公司 Method, system and storage medium for generating reverse index and searching by reverse index

Also Published As

Publication number Publication date
CN104679776B (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN104899204A (en) Data storage method and device
CN108536753B (en) Method for determining repeated information and related device
CN104063362B (en) A kind of truncation of a string method and device
US11705923B2 (en) Method and apparatus for storing data, and computer device and storage medium thereof
CN104850507A (en) Data caching method and data caching device
CN104572889A (en) Method, device and system for recommending search terms
CN102571820B (en) For transmitting the method for data, compression service device and terminal
CN105243638A (en) Image uploading method and apparatus
CN104516887A (en) Webpage data search method, device and system
CN104572690A (en) Webpage data acquisition method, webpage data acquisition device and webpage data acquisition system
CN104516888A (en) Authority query method and device of multi-dimensional data
CN105335653A (en) Abnormal data detection method and apparatus
CN105659503A (en) System and method for providing multi-user power saving codebook optimization
CN105659502A (en) System and method for conserving power consumption in a memory system
CN109241031B (en) Model generation method, model using method, device, system and storage medium
CN105047185A (en) Method, device and system for obtaining audio frequency of accompaniment
CN104679776A (en) Method and device for compressing inverted indexes
CN103634032A (en) Data transferring method and system and mobile terminal
CN113220651B (en) Method, device, terminal equipment and storage medium for compressing operation data
CN110018886B (en) Application state switching method and device, electronic equipment and readable storage medium
CN105095286A (en) Page recommendation method and device
CN111767280A (en) Data processing method, device and storage medium
CN108256017B (en) Method and device for data storage and computer equipment
Roh et al. Energy-efficient two-dimensional skyline query processing in wireless sensor networks
CN110798222B (en) Data compression method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant