CN106202154A - A kind of inverted index based on data de-duplication framework represents method and system - Google Patents
A kind of inverted index based on data de-duplication framework represents method and system Download PDFInfo
- Publication number
- CN106202154A CN106202154A CN201610464499.0A CN201610464499A CN106202154A CN 106202154 A CN106202154 A CN 106202154A CN 201610464499 A CN201610464499 A CN 201610464499A CN 106202154 A CN106202154 A CN 106202154A
- Authority
- CN
- China
- Prior art keywords
- sequence
- pattern
- inverted
- index
- sequence pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000009467 reduction Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000003909 pattern recognition Methods 0.000 claims description 4
- 230000003252 repetitive effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000009472 formulation Methods 0.000 claims 1
- 238000007906 compression Methods 0.000 abstract description 17
- 230000006835 compression Effects 0.000 abstract description 16
- 230000006837 decompression Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000008707 rearrangement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of inverted index based on data de-duplication framework represents method and system, it is adaptable to search engine and community network data process.Including: 1. the Inverted List in traversal inverted index, identify and record the sequence pattern repeated between different Inverted List.2. calculate the length of described each sequence pattern, and carry out corresponding operating according to length.According to the lexcographical order of sequence pattern, for described each sequence pattern allocation model sequence number.3. according to described sequence pattern, inverted index is carried out reduction, store the Inverted List after sequence pattern and reduction respectively.4. difference processes: document sequence number adjacent in sequence pattern is carried out mathematic interpolation.Mode index is represented as two tuples, logging mode sequence number and the position offset of neighboring modes sequence number.The present invention can effectively delete the repetition data in inverted index, reduces document serial numbers, improves the compression ratio of inverted index, shortens the query responding time of search engine simultaneously, improves Consumer's Experience.
Description
Technical field
The invention belongs to the compressing inverted index technical field of search engine, particularly to one based on data de-duplication
The inverted index of framework represents method and system.Present disclosure applies equally to data compression problem based on community network figure and look into
Inquiry is inscribed.
Background technology
Inverted index is most popular data structure in modern search engines, and it is by dictionary and Inverted List two parts
Composition.Lexical item, the document frequency of lexical item and the sensing that wherein dictionary obtains after saving and processing collection of document
The pointer of Inverted List corresponding to this lexical item;Inverted List is made up of multiple rows of falling record, the most each row of falling record correspondence bag
A document containing this lexical item, in the row's of falling record, the information of record includes: document sequence number (referred to as docID), lexical item frequency (lexical item
The number of times occurred in the document), positional information (lexical item appearance position in a document) etc..In the present invention, it will be assumed that
Each Inverted List is only made up of a series of docID.Concrete structure schematic diagram is with reference to Fig. 3.
Along with the fast development of the Internet, the memory space that on the one hand inverted index takies drastically expands, and on the other hand sweeps
Time needed for retouching inverted index is longer, reduces the query processing efficiency of search engine.In order to overcome index data scale to hold
The problem that continuous growth is brought, there has been proposed the method being compressed inverted index in a large number.Inverted index is compressed,
Can not only effectively reduce the memory space that inverted index takies, query processing efficiency can be significantly improved simultaneously.
To any lexical item t, the Inverted List of its correspondence is typically represented by: < d1,d2,d3,…,dft>, wherein ft is this word
The document frequency of item, d1,d2,d3,…,dftFor original document sequence number.Owing to the document sequence number in Inverted List is to arrange by ascending order
Row, therefore it has been proposed that employing d-gap form to represent document sequence number, i.e. to each document sequence number, by the document sequence number and
The difference of the most adjacent document sequence number represents (except first document sequence number), thus obtains the arrangement of falling of following form
Table: < d1,d2-d1,d3-d2,…,dft-dft-1>, this list is carried out variable-length encoding the most again and reaches the effect of compression.Because it is civilian
Difference (d-gap) between shelves sequence number will be much smaller than original document sequence number, and the least coding bit wide being intended to of numerical value
It is the shortest, so its compression ratio of the inverted index of d-gap form is higher than the inverted index of general type.
Although the mean values that the inverted index of d-gap form needs coding is less, but compression process needs coding
Numerical value number not do not reduce.By observing it was found that inverted index also exists substantial amounts of repetition data division.Although
There is no the document sequence number repeated in same list, but different lists but may comprise identical document sequence.Illustrate,
Assume Inverted List l1, l2And l3It is respectively provided with following form:
l1→{1,2,5,14,20,39,40,41,42}
l2→{1,2,5,6,9,10,14,16,39,40,41,50}
l3→{1,2,5,11,14,39,40,41,43,50}
Visible l1, l2And l3All comprise document sequence 1,2,5} and 39,40,41}, if we make " A={1,2,5} ",
" B={39,40,41} ", then l1, l2And l3Reduction can be distinguished and become following form:
l’1→{A,14,20,B,42}
l’2→{A,6,9,10,14,16,B,50}
l’3→{A,11,14,B,43,50}
Obviously | A |+| B |+| l '1|+|l’2|+|l’3|<|l1|+|l2|+|l3|, i.e. need the numerical value number of coding to reduce
, and needing the number encoded the fewest, memory space required after coding is the least.
In addition to search engine, in process community network opens up the relation data of figure, inverted index model still has can fit
The property used.A kind of typical application scenarios is in a complete community network, there is certain relation, example between user and other users
As mutually subscribed between user, concern etc. each other, collectively referred to here in as friends.If describing problem by graph theory, it is simply that there is this
One, sample figure: each point in figure represents certain user determined, the limit that there is connection arbitrfary point A and B represents user A and user
B is friends.Under this topological structure, a kind of typical case's application is with certain character in all friends searching for certain user
String is the user of name prefix, and returns candidate result collection.
For above-mentioned application scenarios, presently, there are two kinds of solutions.First is each point and have phase with it in record figure
Other points of adjacent relation.When query processing, all other points being connected with it can be traveled through successively, be judged by character string contrast
Whether the user name that this point is corresponding comprises inquiry string as prefix.One of shortcoming of this method is to need traversal all
Friend's node, the node wherein comprising inquiry string prefix the most only occupies the minority;Next to that string matching scanning can consume
More calculate resource.More the scheme of performance advantage be use inverted index structure storage relation data, set up user and its
One-to-many mapping relations (i.e. user's Inverted List) between friend User.Further according to statistical result, set up character string with comprise it make
For the one-to-many mapping relations (i.e. character string Inverted List) between all users of name prefix.When query processing, need to divide
Du Qu not inquire about the character string Inverted List that user's Inverted List of user is corresponding with inquiry, both common factors are candidate result
Collection.Under this model, compression method and the search algorithm of general inverted index are equally applicable.
Summary of the invention
Present invention aim to address what existing compressing inverted index method needed to encode each document sequence number
Problem, it is provided that a kind of novel inverted index based on data de-duplication framework represents method and system, it is possible to effectively delete
Except the repetition data in inverted index, reduce document serial numbers to be encoded, improve the compression ratio of inverted index.
Present invention firstly provides a kind of inverted index based on data de-duplication framework and represent method, with reference to Fig. 1, its
Key step includes:
Step 1 (S101), every Inverted List in traversal inverted index, identify and record weight between different Inverted List
Appear again existing sequence pattern;
Step 2 (S102), the modal length of the repetitive sequence pattern identified in calculation procedure 1, carry out according to modal length
Corresponding operating: when modal length is less than threshold value k, delete this pattern;When modal length is more than or equal to threshold value k, retain this mould
Formula.Wherein the value of threshold value k is between 4 to 6.Afterwards according to the lexcographical order of described each sequence pattern, for described each sequence mould
Formula allocation model sequence number;
Step 3 (S103), carries out reduction according to the sequence pattern that step 2 is deleted after simplifying to inverted index, distinguishes afterwards
Inverted List after storage sequence pattern and reduction, retains its modal length and mode index for each sequence pattern;
Step 4 (S104). carry out difference process: difference meter is carried out for the adjacent document sequence number in sequence pattern content
Calculate;
When previous element is document sequence number, for the document sequence number in each Inverted List after reduction, except first
Individual element preserves outside original value, and surplus element all deducts the original value of previous adjacent element;When previous element is pattern sequence
Number time, deduct the greatest member of previous flanking sequence pattern;
For the mode index in Inverted List, in addition to first element preserves original value, surplus element all deducts therewith
Nearest previous mode index original value;Each mode index is expressed as two tuples, poor including mode index or mode index
Value and the position offset of next mode index, first pattern sequence in the most front segment record list of the most each Inverted List
Number position offset, obtain new inverted index;
Wherein said sequence pattern is the document sequence repeated between different Inverted List, and described modal length is described
Document sequence number number in sequence pattern.
For achieving the above object, present invention also offers a kind of inverted index table based on data de-duplication framework
Show system.With reference to Fig. 2, this system includes:
Pattern recognition module, for traveling through every Inverted List in inverted index, identifies and records different Inverted List
Between the sequence pattern that repeats;
Pattern simplifies module, is on the basis of the result obtained from pattern recognition module, calculates described each sequence pattern
Modal length, carries out corresponding operating according to described modal length: when modal length is less than threshold value k, deletes this pattern, work as pattern
When length is more than or equal to threshold value k, retain this pattern, afterwards according to the lexcographical order of described each sequence pattern, for described each sequence mould
Formula allocation model sequence number;
Index reduction module, simplifies the sequence pattern after the deletion simplification that module obtains according to pattern, enters inverted index
Row reduction, stores the Inverted List after sequence pattern and reduction the most respectively, wherein retains its pattern for each sequence pattern long
Degree and mode index;
Difference processing module, for the Inverted List after the index sequence pattern of reduction module stores and reduction, poor
Value processes: for the Inverted List after sequence pattern and reduction, calculates the difference between adjacent element respectively and replaces primitive element;
For each mode index in Inverted List, the position skew of LSN or sequence number difference and next mode index respectively
Amount, obtains new inverted index;
Wherein said sequence pattern is the document sequence repeated between different Inverted List, and described modal length is described
Document sequence number number in sequence pattern.
Advantages of the present invention and having the beneficial effects that, effectively deletes the repetition data in inverted index, reduces to be encoded
Document serial numbers, improve inverted index compression ratio, the present invention can be widely used in Performance of Search Engine optimization and fall
Row index compression field.
Accompanying drawing explanation
Fig. 1 is that the inverted index based on data de-duplication framework of the present invention represents method flow diagram;
Fig. 2 is that the inverted index based on data de-duplication framework of the present invention represents system schematic;
Fig. 3 is inverted index basic structure schematic diagram of the prior art;
Detailed description of the invention
For ease of understanding the above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the accompanying drawings with detailed description of the invention to this
Invention is described in further detail.
Embodiment 1,
Inverted index based on data de-duplication framework represents method, and its flow process sees Fig. 1.For realizing described method
Inverted index represent system, see Fig. 2.
We call a sequence of interval numerical value continuous print document sequence number sequence in Inverted List, such as sequence 10,
11,12,13,14} are properly termed as a sequence of interval, and sequence { 10,11,13,14} then comprise two sequence of interval, first
Sequence of interval is that { 10,11}, second sequence of interval is { 13,14}.By observing it was found that Inverted List also exists big
Measure such sequence of interval, the most in the present invention, it is proposed that two kinds identify the plan of repetitive file sequence between different lists
Slightly: C1. identifies the document sequence arbitrarily repeated;The most only identify the sequence of interval of repetition.It is sequence of interval for sequence pattern
Situation, we use the method for expressing of run-length, for being optimized the storage mode of sequence pattern, thus further
Improve the compression ratio of inverted index.The method for expressing of sequence pattern be described below:
Relative strategy C1, it is assumed that a given sequence pattern M comprising n document sequence number:
Using method for expressing retained-mode length n simultaneously of d-gap, its corresponding form is:
Md-gap={ n, d1,d2-d1,...,dn-dn-1}
Relative strategy C2, it is assumed that a given sequence pattern M comprising n document sequence number:
Using method for expressing retained-mode length n simultaneously of run-length, its corresponding form is:
Mrun-length={ n, d1}
Why sequence pattern based on sequence of interval is expressed as the form of { modal length, first document sequence number }, be because of
For representing these patterns by run-length, we only need 2 integers just can represent the situation of original n document sequence number,
It is thus desirable to the integer of storage can to reduce (n-2) individual, and the value of n is the biggest, and the integer number that run-length reduces is the most.
Assume to there is such a inverted index, including three Inverted List l1, l2And l3:
l1→{1,2,3,14,20,21,39,40,49,51,55}
l2→{1,2,3,9,10,11,14,21,39,40,49,55}
l3→{1,2,3,14,16,39,49,53,55}
After step S101, S102 and S103, inverted index based on data de-duplication framework can be obtained and represent:
A→{1,2,3}
B→{21,39,40,49}
l’1→{A,14,20,B,51,55}
l’2→{A,9,10,11,14,B,55}
l’3→{A,14,16,39,49,53,55}
First to Inverted List l '1, l '2And l '3In document sequence number carry out difference process, specific rules is: 1) each fall
First document sequence number in permutation table keeps constant;2) for remaining each document sequence number, if previous adjacent element
It is that document sequence number then deducts it and previous adjacent element;3) or when previous adjacent element be that mode index then deducts it and this
Maximum document sequence number in the sequence pattern that mode index is corresponding.Therefore the Inverted List sequence after we can be processed:
l”1→{A,11,6,B,2,4}
l”2→{A,6,1,1,3,B,6}
l”3→{A,11,2,23,10,4,2}
Then we describe each mode index in Inverted List by integer two tuple, and first value of two tuples is mould
Formula sequence number (when first mode index in this mode index is list) or mode index with and its nearest previous mould
The difference of formula sequence number (when first mode index in this mode index non-list).Second value of two tuples is in list
Position offset between next mode index and this mode index is (if this mode index is last pattern in list
Sequence number, then this value is 0).Lexcographical order size according to A and B sequence content, we are respectively A, B by ascending order and distribute numbering 1 and 2.
The most above-mentioned Inverted List sequence can be described as:
l”1→{(1,3),11,6,(1,0),2,4}
l”2→{(1,5),6,1,1,3,(1,0),6}
l”3→{(1,0),11,2,23,10,4,2}
For there is the Inverted List of mode index, it would be desirable to one integer of extra storage is in order to indicate first pattern
Sequence number position offset in Inverted List.For above-mentioned example, first mode index of each Inverted List be A and
Being all the header element in list, therefore side-play amount unification is 1.In addition it is also necessary to it is poor to the document sequence number in sequence pattern
Value processes, and i.e. in addition to first document sequence number holding is constant, surplus element all deducts previous adjacent element.Final arranges rope
Draw and can be described as:
A→{1,1,1}
B→{21,18,1,9}
l”1→{(1,3),11,6,(1,0),2,4}
l”2→{(1,5),6,1,1,3,(1,0),6}
l”3→{(1,0),11,2,23,10,4,2}。
Embodiment 2
We compared for the ratio after various forms of index encodes needed for each document sequence number on TREC GOV2 data set
Special number and the decompression speed of correspondence, wherein EF represents the inverted index table encoded based on optimal segmentation strategy and Elias-Fano
Show method;TD represents inverted index based on traditional d-gap, and R represents that index based on data de-duplication framework represents shape
Formula (I and II represents used repetitive sequence recognition strategy, the respectively corresponding tactful C1 described above and strategy C2).To used
Inverted index data set does as described below:
(1) TREC GOV2 is the data set captured from .gov domain name for 2004, comprises more than 2,500 ten thousand webpages altogether;
(2) we use TREC 2009 query set as inquiry test set, comprise 32244 inquiries altogether, are used for testing respectively
The index of kind of form averagely decompresses speed for this query set;
(3) URL represent according to web page address, GOV2 data set is reset after the data set of gained, TMF and IBDA is
The data set of gained after GOV2 data set is reset.
Table 1
In Table 1, under we compared for multi-form, different re-arrangement strategy, each inverted index represents the actual compression of method
Effect, wherein R and TD method all uses OptPForD coded method.From experimental result it can be seen that based on data de-duplication
The index of framework by above-mentioned coding, its compression ratio be better than tradition d-gap form index, compression ratio all can improve 10% with
On;Compared with EF method, R-I and R-II still keeps some superiority.Additionally, due to R-I can identify more sequence mould than R-II
Formula, therefore its compression effectiveness is more preferable.
Table 2
Table 2 gives decompression velocity contrast's result of correspondence.It is not related to significantly decompress in encoding due to Elias-Fano
Restoring operation, therefore eliminates the statistical result of correspondence in table.From experimental result it can be seen that based on data de-duplication framework
Index be more conducive to, compared to the index of traditional d-gap form, the decoding that counts, wherein arrange resets based on IBDA and TMF
On index, R method can obtain and significantly decompress acceleration effect.And compared to R-I, owing to R-II is by modal length and head literary composition
Shelves sequence number can recover complete sequence pattern, therefore saves a large amount of accessing operation, therefore its decompression speed is generally greater than R-I
Decompression speed.
For community network data, we compared for above-mentioned several rope on the disclosed relational dataset of Facebook part
Drawing the actual compression effect of method for expressing, each method title is identical with above-mentioned experimental section with implication.This data set comprises
About 51,000,000 users, and have recorded the subscribing relationship between user.It is inverted index form that initial data is arranged by we, and root
Add up inquiry according to query set and relate to the decompression speed of data.
Table 3
Table 4
Table 3 compared for original order, ID is carried out IBDA and TMF reset after, the actual compression effect of inverted index.
Therefrom it will be seen that combine IBDA and TMF re-arrangement strategy, two kinds of R methods perform better than on compression effectiveness.Wherein R-I
EF method, remains to keep certain advantage relatively.And compare traditional TD method, R method possesses higher compression ratio all the time.Table
Under the 4 different re-arrangement strategy of contrast, the decompression speed of each method.Therefrom it will be seen that R in addition to original order, after rearrangement
Method has the decompression speed of about 3.8%-17.1% to promote than TD method.
Inverted index to the present invention represents that method and system are described in detail above, applies concrete in the present invention
Principle and the embodiment of the present invention are illustrated by individual example, and the explanation of above example is only intended to help to understand the present invention's
Method and core concept thereof;Simultaneously for one of ordinary skill in the art, according to the thought of the present invention, in specific embodiment party
All will change in formula and range of application, in sum, this specification content should not be construed as limitation of the present invention.
Claims (2)
1. an inverted index based on data de-duplication framework represents method, it is characterised in that including:
Step 1, every Inverted List in traversal inverted index, identify and record the sequence repeated between different Inverted List
Row pattern;
Step 2, the modal length of the repetitive sequence pattern identified in calculation procedure 1, carry out corresponding operating according to modal length: when
When modal length is less than threshold value k, delete this pattern;When modal length is more than or equal to threshold value k, retain this pattern;Basis afterwards
The lexcographical order of described each sequence pattern, for described each sequence pattern allocation model sequence number;
Step 3, deletes the sequence pattern after simplifying according to step 2 and inverted index carries out reduction, the most respectively storage sequence mould
Inverted List after formula and reduction, retains its modal length and mode index for each sequence pattern;
Step 4. carries out difference process: carry out mathematic interpolation for the adjacent document sequence number in sequence pattern content;
When previous element is document sequence number, for the document sequence number in each Inverted List after reduction, except first unit
Element preserves outside original value, and surplus element all deducts the original value of previous adjacent element;When previous element is mode index,
Deduct the greatest member of previous flanking sequence pattern;
For the mode index in Inverted List, in addition to first element preserves original value, surplus element all deducts the most nearest
Previous mode index original value;Each mode index is expressed as two tuples, including mode index or mode index difference and
The position offset of next mode index, first mode index in the most front segment record list of the most each Inverted List
Position offset, obtains new inverted index;
Wherein said sequence pattern is the document sequence repeated between different Inverted List, and described modal length is described sequence
Document sequence number number in pattern.
2. an inverted index based on data de-duplication framework represents system, it is characterised in that including:
Pattern recognition module, for traveling through every Inverted List in inverted index, identifies and records between different Inverted List
The sequence pattern repeated;
Pattern simplifies module, is on the basis of the result obtained from pattern recognition module, calculates the pattern of described each sequence pattern
Length, carries out corresponding operating according to described modal length: when modal length is less than threshold value k, deletes this pattern, work as modal length
During more than or equal to threshold value k, retain this pattern, afterwards according to the lexcographical order of described each sequence pattern, divide for described each sequence pattern
Join mode index;
Index reduction module, simplifies the sequence pattern after the deletion simplification that module obtains according to pattern, returns inverted index
About, the most respectively storage sequence pattern and reduction after Inverted List, wherein for each sequence pattern retain its modal length and
Mode index;
Difference processing module, for the Inverted List after the index sequence pattern of reduction module stores and reduction, is carried out at difference
Reason: for the Inverted List after sequence pattern and reduction, calculates the difference between adjacent element respectively and replaces primitive element;For
Each mode index in Inverted List, difference LSN or sequence number difference and the position offset of next mode index,
Obtain new inverted index;
Wherein said sequence pattern is the document sequence repeated between different Inverted List, and described modal length is described sequence
Document sequence number number in pattern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610464499.0A CN106202154B (en) | 2016-06-21 | 2016-06-21 | A kind of inverted index expression method and system based on data de-duplication framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610464499.0A CN106202154B (en) | 2016-06-21 | 2016-06-21 | A kind of inverted index expression method and system based on data de-duplication framework |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106202154A true CN106202154A (en) | 2016-12-07 |
CN106202154B CN106202154B (en) | 2019-04-02 |
Family
ID=57461729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610464499.0A Active CN106202154B (en) | 2016-06-21 | 2016-06-21 | A kind of inverted index expression method and system based on data de-duplication framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202154B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783516A (en) * | 2019-02-19 | 2019-05-21 | 北京奇艺世纪科技有限公司 | A kind of query statement retrieval answering method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1769398A4 (en) * | 2004-06-18 | 2009-01-21 | Reel Two Ltd | Data collection cataloguing and searching method and system |
CN103235794A (en) * | 2013-04-02 | 2013-08-07 | 中国科学院计算技术研究所 | Method and system for expressing inverted index based on document sequence number processing |
-
2016
- 2016-06-21 CN CN201610464499.0A patent/CN106202154B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1769398A4 (en) * | 2004-06-18 | 2009-01-21 | Reel Two Ltd | Data collection cataloguing and searching method and system |
CN103235794A (en) * | 2013-04-02 | 2013-08-07 | 中国科学院计算技术研究所 | Method and system for expressing inverted index based on document sequence number processing |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783516A (en) * | 2019-02-19 | 2019-05-21 | 北京奇艺世纪科技有限公司 | A kind of query statement retrieval answering method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106202154B (en) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9934324B2 (en) | Index structure to accelerate graph traversal | |
Belazzougui et al. | Optimal lower and upper bounds for representing sequences | |
Holley et al. | The survival of contact processes | |
Braun et al. | Effectively and efficiently mining frequent patterns from dense graph streams on disk | |
CN104348490A (en) | Combined data compression algorithm based on effect optimization | |
CN104050247A (en) | Method for realizing quick retrieval of mass videos | |
WO2014107988A1 (en) | Method and system for discovering and analyzing micro-blog user group structure | |
Ladra et al. | Scalable and queryable compressed storage structure for raster data | |
EP2344959A2 (en) | Index compression in databases | |
Kontopoulos et al. | A space efficient scheme for persistent graph representation | |
CN105701200A (en) | Data warehouse security OLAP method on memory cloud computing platform | |
CN103914483B (en) | File memory method, device and file reading, device | |
CN103092992B (en) | Vector data elder generation based on Key/Value type NoSQL data base sequence quadtree coding and indexing means | |
CN110389950A (en) | A kind of big data cleaning method quickly run | |
Pibiri et al. | Dynamic elias-fano representation | |
CN105740428A (en) | B+ tree-based high-dimensional disc indexing structure and image search method | |
CN110097581B (en) | Method for constructing K-D tree based on point cloud registration ICP algorithm | |
CN106202154A (en) | A kind of inverted index based on data de-duplication framework represents method and system | |
CN107077481B (en) | Information processing apparatus, information processing method, and computer-readable storage medium | |
CN113362405A (en) | StOMP (static latent image processing) -based compressed sensing image reconstruction system construction method | |
Grabowski et al. | Tight and simple web graph compression for forward and reverse neighbor queries | |
CN107169066A (en) | One kind is based on kdTree and the timing diagram data processing method of multivalued decision diagram | |
Ren et al. | Efficient processing of shortest path queries in evolving graph sequences | |
CN109361686A (en) | A kind of compression method reducing sensing data time redundancy | |
Lee et al. | Dynamic rank-select structures with applications to run-length encoded texts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared |