CN105989061A - Rapid indexing method for repeated detection of multi-dimensional data under sliding window - Google Patents
Rapid indexing method for repeated detection of multi-dimensional data under sliding window Download PDFInfo
- Publication number
- CN105989061A CN105989061A CN201510066798.4A CN201510066798A CN105989061A CN 105989061 A CN105989061 A CN 105989061A CN 201510066798 A CN201510066798 A CN 201510066798A CN 105989061 A CN105989061 A CN 105989061A
- Authority
- CN
- China
- Prior art keywords
- sliding window
- bloom filter
- filter matrix
- subwindow
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a rapid indexing method for repeated detection of multi-dimensional data under a sliding window. According to the rapid indexing method, a compressed counting type Blond filter matrix array is adopted to maintain data items inside the sliding window, multiple sub-windows are maintained inside the sliding window, new elements are received by head sub-windows in a sliding manner, old elements are eliminated by end sub-windows in the sliding manner, each independent sub-window consists of one counting type Blond filter matrix, the counting type Blond filter matrix has a dimension reduction function on multi-dimensional data, and a counter unit is maintained inside the counting type Blond filter matrix. Since all counting type Blond filter matrixes are of a same design capacitor and share one same group of k hash functions, the repeated element detection efficiency can be effectively improved; since a basic system clock is maintained in a counter unit, invisible element delete of the sliding window can be effectively supported; multi-dimensional data can be maintained in a matrix manner, the combination error rate of the multi-dimensional data can be effectively reduced, and the overall mal-judgment rate can be reduced.
Description
Technical field
The present invention relates to duplicate detection fast indexing method and the system of a kind of magnanimity multidimensional data, particularly relate to a kind of indexing means carrying out duplicate detection under sliding window data flow model to magnanimity multidimensional data, belong to big data and calculate field.
Background technology
With the development of mobile Internet and Web2.0, global metadata amount is just in surprising growth: the data volume of whole world generation in 2008 is 0.49ZB (1ZB=1021 byte), within 2009, is 0.8ZB, within 2010, is 1.2ZB, up to 1.82ZB in 2011.IDC expects the year two thousand twenty, and the whole mankind can produce the data more than 40ZB.At a high speed, the network data of magnanimity but comprises complicated information, wherein may have miscellaneous service data stream, such as IP service flow, user's click steam, stream of user queries, web page server daily record etc..In addition, be wherein likely to comprise various security incident, the safety to network for the security incident constitutes threat greatly, and therefore Network Traffic Monitoring is particularly important.
In Network Traffic Monitoring application system, multidimensional data duplicate detection is very important preprocessing means.As a example by network service flow in network monitoring management system, each Business Stream is by five-tuple (source address, dest address, source port, dest port, protocol) uniquely determine, when representing and inquiry network service flow this five ties up element set, need highly effective algorithm to improve system effectiveness.
Calculating under scene at flow data, according to the move mode on flow data calculation window border, calculation window currently mainly is divided into following several types.The first is stationary window model, i.e. the two ends, left and right of calculation window are all fixing, and stationary window model is little for the ageing help embodying data;The second is boundary mark window model, i.e. window left end is fixed, right-hand member moves forward, boundary mark window contains the data item occurring between from special time point to current time, it if flowing out in the existing cycle in data and arranging multiple boundary mark, is equivalent to that data stream is divided into some independent low-volume traffic streams and investigates respectively;The third is jump window model, i.e. window left end skip-forwards advances, and right-hand member is slidably advanced, and jump window model more can the consecutive variations process of feedback data stream than boundary mark window model, but owing to window end batch eliminates element, therefore in window, effective element quantity has obvious wave process;4th kind is sliding window model, i.e. two ends, window left and right forward slip simultaneously, and sliding window is deleted stale data item while inserting new data item, is considered the ideal model of data stream monitoring and analysis.
Under sliding window model, carry out multidimensional and repeat the fast indexing method of Data Detection and mainly have a following several method:
First method is the indexing means that Hash combines counting, and hash indexing method is the big data directory mechanism of a kind of very convenient and efficient, by counting the data item that cryptographic Hash is identical, completes the existence record to multidimensional data;When sliding window needs to insert into row element, add 1 to corresponding counter, when needs enter row element delete when, the operation that subtracts 1 to corresponding counter, if counter is 0, then delete respective element item.But, there is shortcoming in the index strategy of Hash counting, first, hash algorithm is a kind of nondeterministic algorithm, will necessarily there is data item hash-collision, and the quality of conflict processing method has conclusive effect for data directory;Secondly, hash algorithm takies greatly for memory headroom.
Second method is multidimensional Bloom filter (MDBF) indexing means, MDBF uses the multiple standard Bloom filter composition identical with element dimensions, the expression inquiry being directly single property value subclass by expression and the query decomposition of Muhivitamin Formula With Minerals, the dimension of element has how many, just uses the Bloom filter of how many standards to represent each self-corresponding attribute respectively.When entering row element inquiry, by judging whether each property value of Muhivitamin Formula With Minerals all judges whether element belongs to set in corresponding standard Bloom filter.But, the method there is also shortcomings.First, the method is more weak in the deletion ability of sliding window interior element, it is impossible to realize accurate data item sliding window;Secondly as the multiple Hash in Bloom filter exists the possibility of conflict, only rely on the existence verification on each independent dimension, there is the higher situation of element False Rate.
In sum, fast indexing method is extremely important for the repetition data test problems in sliding window.In fast indexing method, promote element and repeat data detection efficiency, reduce and repeat Data Detection False Rate, be very important problem in the design of quick indexing structure.
Content of the invention
The main object of the present invention is to provide under sliding window carrying out fast indexing method and the system of multidimensional data duplicate detection, promote the efficiency of element duplicate detection, reduce and repeat Data Detection False Rate, the problem effectively solving multidimensional data duplicate detection under sliding window model.
Present disclosure mainly includes the following aspects.
First, in the design of quick indexing structure, the present invention uses the data item compressed in attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) safeguards sliding window.Specifically, safeguarding multiple subwindow in sliding window, head of the queue subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner;Each independent subwindow is made up of attribute Bloom filter matrix (CCBFM), and the dimension that CCBFM possesses towards multidimensional data deletes ability, and its internal maintenance counter unit.
Second, based on above-mentioned Index Structure Design, in terms of repeat element detection efficiency, all attribute Bloom filter matrixes in the present invention all use identical design capacity and share same group of k hash function, so effectively the time complexity of element inquiry can be reduced to O (k) by O (kn), effectively promote repeat element detection efficiency.
3rd, based on above-mentioned Index Structure Design, in terms of sliding window data stream calculation scene applicability, the data item of each independent subwindow of the present invention is safeguarded by attribute Bloom filter matrix (CCBFM), by safeguarding system-based clock in counter unit, can effectively support the element implicit expression deletion action in sliding window, promote the applicability for sliding-window operations for the system.
Compared with prior art, the main innovation point of the present invention and having the beneficial effect that:
1) present invention is at big data fast indexing structure design aspect, propose a kind of compression attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) index structure, index structure safeguards multiple subwindow in sliding window, and each independent subwindow is made up of attribute Bloom filter matrix (CCBFM).
2) present invention is based on above-mentioned Index Structure Design, by all using identical design capacity to all attribute Bloom filter matrixes and sharing same group of k hash function, can effectively promote repeat element detection efficiency;By safeguarding system-based clock in counter unit, can effectively support that the element implicit expression of sliding window is deleted;Safeguard multidimensional data by matrix-style, effectively reduce the combined error rate of multidimensional data, reduce overall False Rate.
Brief description
Fig. 1 is sliding window model schematic diagram;
Fig. 2 is repetition Data Detection index structure schematic diagram under sliding window model;
Fig. 3 is that under sliding window model, hash function merges shared schematic diagram;
Fig. 4 is data processing node multidimensional data duplicate detection workflow diagram.
Fig. 5 is the curve map of bit error rate test.
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below by specific embodiments and the drawings, the present invention will be further described.
In the design of sliding window quick indexing structure, the present invention uses the data item compressed in attribute Bloom filter matrix array (CCBFMA) safeguards sliding window, and this indexing means possesses more preferable repeat element detection efficiency and lower False Rate than existing solution.
Multidimensional compression attribute Bloom filter matrix array (CCBFMA) is made up of compression attribute Bloom filter matrix (CCBFM) of one group of isomorphism, each CCBFM is made up of the counter unit that m bit wide is d, bit wide d=log of each CCBFM Counter unit2(N/g).The total element capacity assuming sliding window is N, the present invention safeguards g subwindow in sliding window, the design capacity of each subwindow is N/g, all subwindows safeguard the data item flowing through in the way of FIFO, head of the queue subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner.
Each CCBFM safeguards all dimension datas of an independent subwindow.The present invention is by safeguarding system-based clock in the counter unit at CBF, the implicit expression carrying out sliding window interior element is deleted.It is described as follows: whether element x to be determined is the effective element of current sliding window mouth, and first, in terms of the existence judgement to x, CCBFM effectively eliminates combined error rate by way of matrix, greatly reduces the False Rate of element;Secondly, if x is existence by corresponding CCBFM judgement, then whether the counter safeguarded in needing to verify its counter unit exceedes current basal clock, if it exceeds Base clock then thinks that it is not effective element.
Fig. 1 gives slip data stream window computation model.Two ends, sliding window left and right forward slip simultaneously, sliding window is deleted stale data item while inserting new data item, is considered the ideal model of data stream monitoring and analysis.
Fig. 2 gives repetition Data Detection index structure under sliding window model.According to this index structure, the present invention safeguards compression attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) index structure in ram space, this index structure safeguards multiple subwindow in sliding window, and all dimension datas of each independent subwindow are made up of attribute Bloom filter matrix (CCBFM).
Fig. 3 gives hash function under sliding window model and merges shared schematic diagram, and wherein k is hash function number, and g is sliding sub-window number, d=log2(N/g) bit wide is represented.As it can be seen, all CCBFM are isomorphisms, different Bloom filters have the counter unit of same coordinate mapped and store in same vector, in order to they can be read simultaneously in an internal storage access.Owing to all g Bloom filters share the same group of hash function that quantity is k, element x to be determined the whether effective element in current all boundary mark subwindows, its query time complexity can be reduced to O (k) by O (kg).
Fig. 4 gives multidimensional data duplicate detection workflow diagram.As it can be seen, element duplicate detection mainly includes following core procedure under the scape of sliding window data flow field.
(1) system-based clock, Element detection marker bit flag and system data structure are initialized;
(2) receiving the element e of input, e is made up of w dimension, i.e. (e1, e2...ew);
(3) in CCBFMA, whether detection elements e exists, if it does not exist, then enter flow process (4), enters new element and inserts flow process;If it is present enter flow process (8);
(4) write e (e1, e2...ew) in head of the queue CCBFM;
(5) k counter unit writing system Base clock in corresponding CCBFM;
(6) judge whether e is last element of head of the queue subwindow, if it is, Base clock resets, and delete first subwindow of tail of the queue, produce new head of the queue sliding sub-window;If it is not, then Base clock is from increasing;
(7) arranging global mark flag is false, and flow process terminates;
(8) judge whether element ei is present in tail of the queue subwindow, if it is not, then enter flow process (9), if it is, enter flow process (10);
(9) arranging global mark flag is true, and flow process terminates;
(10) judging whether corresponding counter unit numerical value is more than system-based clock, if it is, arranging global mark flag is true, flow process terminates;If it is not, then arranging global mark flag is false, flow process terminates.
In order to be embodied in the present invention relative to applicability under multidimensional data duplicate detection scene for the conventional method, the present invention, based on True Data collection, constructs following experiment.
Experimental situation: stand-alone server, two-way six core, internal memory 32GB;
Experimental data: true domain name data collection
Experiment content: contrast this method, detection error rate under multidimensional data scene for the MDBF indexing means.Data set inserts 1000 records, and in coverage rate, (coverage rate represents the probability that an attribute of data to be checked is concentrated with identical copies in data respectively, coverage rate is 1 to mean that all properties is repetition) it is the 0th, the 0.2nd, the 0.4th, the 0.6th, the 0.8th, when 1, the requests for information of test 6000000 record data.
Experimental result: table 1 is concrete data list, and the curve map of Fig. 5 bit error rate test, in figure, abscissa is coverage rate, and ordinate is False Rate.
Table 1. experimental result list
Sequence number | Indexing means | Coverage rate | Inquiry bar number | Error number |
1 | MDBF indexing means | 0 | 6000000 | 272 |
2 | MDBF indexing means | 0.2 | 6000000 | 1200235 |
3 | MDBF indexing means | 0.4 | 6000000 | 2400175 |
4 | MDBF indexing means | 0.6 | 6000000 | 3600105 |
5 | MDBF indexing means | 0.8 | 6000000 | 4800060 |
6 | MDBF indexing means | 1 | 6000000 | 6000000 |
7 | CCBFMA indexing means | 0 | 6000000 | 1814 |
8 | CCBFMA indexing means | 0.2 | 6000000 | 3374 |
9 | CCBFMA indexing means | 0.4 | 6000000 | 3024 |
10 | CCBFMA indexing means | 0.6 | 6000000 | 4678 |
11 | CCBFMA indexing means | 0.8 | 6000000 | 5316 |
12 | CCBFMA indexing means | 1 | 6000000 | 6994 |
Can be drawn by above-mentioned experimental result, CCBFMA indexing means significantly reduces relative to the False Rate of MDBF indexing means.Further, since MDBF indexing means does not has eliminates combined error rate, therefore when coverage rate is 1, its all inquiries are erroneous judgement, and CCBFMA does not has this problem.
Above example is only limited in order to technical scheme to be described; technical scheme can be modified or equivalent by those of ordinary skill in the art; without departing from the spirit and scope of the present invention, protection scope of the present invention should be to be as the criterion described in claims.
Claims (7)
1. a multidimensional data duplicate detection fast indexing method under sliding window, its step includes:
1) safeguarding multiple subwindow in sliding window, all subwindows safeguard the data item flowing through, team in the way of FIFO
First subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner;
2) by the data item in compression attribute Bloom filter matrix function group index structural maintenance sliding window, each attribute
Bloom filter matrix safeguards a subwindow of sliding window, and it comprises multiple dimension data.
2. the method for claim 1, it is characterised in that: each attribute Bloom filter matrix is by some counter unit structures
Become, bit wide d=log of counter unit2(N/g), wherein N is total element capacity of sliding window, g be sliding window in son
Window number, N/g is the design capacity of each subwindow.
3. the method for claim 1, it is characterised in that: all attribute Bloom filter matrixes all use identical design to hold
Measure and share same group of k hash function.
4. method as claimed in claim 3, it is characterised in that: all attribute Bloom filter matrixes have the meter of same coordinate
Number device unit is mapped and stores in same vector, and is read simultaneously in an internal storage access.
5. the method for claim 1, it is characterised in that: in the counter unit of attribute Bloom filter matrix, safeguard system
System Base clock, in order to carry out the implicit expression deletion action of sliding window interior element.
6. method as claimed in claim 5, it is characterised in that determine the effective element whether element x is current sliding window mouth
Method is: first, and in terms of the existence judgement to x, attribute Bloom filter matrix is effectively removed by way of matrix
Combined error rate, thus greatly reduce the False Rate of element;Secondly, if x by corresponding attribute Bloom filter matrix
Judgement is for existing, then whether the counter safeguarded in verifying its counter unit exceedes current basal clock, if it exceeds during basis
Zhong Ze thinks that it is not effective element.
7. method as claimed in claim 6, it is characterised in that enter row element duplicate detection under the scape of sliding window data flow field
Step is as follows:
(1) system-based clock, Element detection marker bit flag and system data structure are initialized;
(2) receiving the element e of input, e is made up of w dimension, i.e. (e1, e2...ew);
(3) in compression attribute Bloom filter matrix array, whether detection elements e exists, if it does not exist, then enter to become a mandarin
Journey (4), enters new element and inserts flow process;If it is present enter flow process (8);
(4) by the attribute Bloom filter matrix of e (e1, e2...ew) write head of the queue;
(5) k counter unit writing system Base clock in corresponding attribute Bloom filter matrix;
(6) judge whether e is last element of head of the queue subwindow, if it is, Base clock resets, and delete tail of the queue the
One subwindow, produces new head of the queue sliding sub-window;If it is not, then Base clock is from increasing;
(7) arranging global mark flag is false, and flow process terminates;
(8) judge whether element ei is present in tail of the queue subwindow, if it is not, then enter flow process (9), if it is, enter
Flow process (10);
(9) arranging global mark flag is true, and flow process terminates;
(10) judge whether corresponding counter unit numerical value is more than system-based clock, if it is, arrange global mark flag
For true, flow process terminates;If it is not, then arranging global mark flag is false, flow process terminates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510066798.4A CN105989061B (en) | 2015-02-09 | 2015-02-09 | Multidimensional data repeats detection fast indexing method under a kind of sliding window |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510066798.4A CN105989061B (en) | 2015-02-09 | 2015-02-09 | Multidimensional data repeats detection fast indexing method under a kind of sliding window |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105989061A true CN105989061A (en) | 2016-10-05 |
CN105989061B CN105989061B (en) | 2019-11-26 |
Family
ID=57038169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510066798.4A Active CN105989061B (en) | 2015-02-09 | 2015-02-09 | Multidimensional data repeats detection fast indexing method under a kind of sliding window |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105989061B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997391A (en) * | 2017-04-10 | 2017-08-01 | 华北电力大学(保定) | A kind of method of steady state condition data in quick screening large scale process data |
CN108694074A (en) * | 2017-04-07 | 2018-10-23 | 腾讯科技(深圳)有限公司 | A kind of method and server obtaining count information |
CN109582640A (en) * | 2018-11-15 | 2019-04-05 | 深圳市酷开网络科技有限公司 | A kind of data deduplication storage method, device and storage medium based on sliding window |
CN109815234A (en) * | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
CN110083743A (en) * | 2019-03-28 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of quick set of metadata of similar data detection method based on uniform sampling |
CN110704419A (en) * | 2018-06-21 | 2020-01-17 | 中兴通讯股份有限公司 | Data structure, data indexing method, device and equipment, and storage medium |
CN112529613A (en) * | 2020-11-27 | 2021-03-19 | 广州华多网络科技有限公司 | Method and device for processing user continuous login data and transferring virtual resources |
CN112688837A (en) * | 2021-03-17 | 2021-04-20 | 中国人民解放军国防科技大学 | Network measurement method and device based on time sliding window |
CN112751869A (en) * | 2020-12-31 | 2021-05-04 | 中国人民解放军战略支援部队航天工程大学 | Network abnormal flow detection method and device based on sliding window group |
CN114595280A (en) * | 2022-05-10 | 2022-06-07 | 鹏城实验室 | Time member query method, device, terminal and medium based on sliding window |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253820A (en) * | 2011-06-16 | 2011-11-23 | 华中科技大学 | Stream type repetitive data detection method |
CN103336771A (en) * | 2013-04-02 | 2013-10-02 | 江苏大学 | Data similarity detection method based on sliding window |
-
2015
- 2015-02-09 CN CN201510066798.4A patent/CN105989061B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253820A (en) * | 2011-06-16 | 2011-11-23 | 华中科技大学 | Stream type repetitive data detection method |
CN103336771A (en) * | 2013-04-02 | 2013-10-02 | 江苏大学 | Data similarity detection method based on sliding window |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108694074B (en) * | 2017-04-07 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Method for acquiring counting information and server |
CN108694074A (en) * | 2017-04-07 | 2018-10-23 | 腾讯科技(深圳)有限公司 | A kind of method and server obtaining count information |
CN106997391B (en) * | 2017-04-10 | 2020-11-03 | 华北电力大学(保定) | Method for rapidly screening steady-state working condition data in large-scale process data |
CN106997391A (en) * | 2017-04-10 | 2017-08-01 | 华北电力大学(保定) | A kind of method of steady state condition data in quick screening large scale process data |
CN110704419A (en) * | 2018-06-21 | 2020-01-17 | 中兴通讯股份有限公司 | Data structure, data indexing method, device and equipment, and storage medium |
CN109582640A (en) * | 2018-11-15 | 2019-04-05 | 深圳市酷开网络科技有限公司 | A kind of data deduplication storage method, device and storage medium based on sliding window |
CN109582640B (en) * | 2018-11-15 | 2020-12-01 | 深圳市酷开网络科技有限公司 | Sliding window-based data deduplication storage method and device and storage medium |
CN109815234A (en) * | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
CN110083743A (en) * | 2019-03-28 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of quick set of metadata of similar data detection method based on uniform sampling |
CN112529613A (en) * | 2020-11-27 | 2021-03-19 | 广州华多网络科技有限公司 | Method and device for processing user continuous login data and transferring virtual resources |
CN112751869A (en) * | 2020-12-31 | 2021-05-04 | 中国人民解放军战略支援部队航天工程大学 | Network abnormal flow detection method and device based on sliding window group |
CN112751869B (en) * | 2020-12-31 | 2023-07-14 | 中国人民解放军战略支援部队航天工程大学 | Method and device for detecting abnormal network traffic based on sliding window group |
CN112688837B (en) * | 2021-03-17 | 2021-06-08 | 中国人民解放军国防科技大学 | Network measurement method and device based on time sliding window |
CN112688837A (en) * | 2021-03-17 | 2021-04-20 | 中国人民解放军国防科技大学 | Network measurement method and device based on time sliding window |
CN114595280A (en) * | 2022-05-10 | 2022-06-07 | 鹏城实验室 | Time member query method, device, terminal and medium based on sliding window |
CN114595280B (en) * | 2022-05-10 | 2022-08-02 | 鹏城实验室 | Time member query method, device, terminal and medium based on sliding window |
Also Published As
Publication number | Publication date |
---|---|
CN105989061B (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105989061A (en) | Rapid indexing method for repeated detection of multi-dimensional data under sliding window | |
CN103577440B (en) | A kind of data processing method and device in non-relational database | |
Cheng et al. | K-isomorphism: privacy preserving network publication against structural attacks | |
CN103902653B (en) | A kind of method and apparatus for building data warehouse table genetic connection figure | |
CN105989129B (en) | Real time data statistical method and device | |
CN103559217A (en) | Heterogeneous database oriented massive multicast data storage implementation method | |
CN103646051B (en) | Big-data parallel processing system and method based on column storage | |
CN106534164B (en) | Effective virtual identity depicting method based on cyberspace user identifier | |
CN102253991B (en) | Uniform resource locator (URL) storage method, web filtering method, device and system | |
CN104156380A (en) | Distributed memory Hash indexing method and system | |
CN104869009A (en) | Website data statistics system and method | |
Campinas et al. | Efficiency and precision trade-offs in graph summary algorithms | |
CN103685224A (en) | A network invasion detection method | |
CN104618361B (en) | A kind of network flow data method for reordering | |
WO2017161540A1 (en) | Data query method, data object storage method and data system | |
CN107766529A (en) | A kind of mass data storage means for sewage treatment industry | |
CN103440265A (en) | MapReduce-based CDC (Change Data Capture) method of MYSQL database | |
CN106970939A (en) | A kind of database audit method and its system | |
CN101986611B (en) | Quick flow grouping method based on two-level cache | |
CN109388635A (en) | A kind of data storage method of the multi-value data based on binary system and dictionary table | |
CN104794158B (en) | Domain name data repeats detection fast indexing method under a kind of boundary mark window | |
CN102486772A (en) | Method and device for exporting data | |
JP2019204475A (en) | Method of caching plural files of 2 mb or less based upon hadoop | |
CN109656929A (en) | A kind of method and device for carving multiple relationship type database file | |
CN110019549A (en) | For the big data storage system of platform of internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |