CN105989061A - Rapid indexing method for repeated detection of multi-dimensional data under sliding window - Google Patents

Rapid indexing method for repeated detection of multi-dimensional data under sliding window Download PDF

Info

Publication number
CN105989061A
CN105989061A CN201510066798.4A CN201510066798A CN105989061A CN 105989061 A CN105989061 A CN 105989061A CN 201510066798 A CN201510066798 A CN 201510066798A CN 105989061 A CN105989061 A CN 105989061A
Authority
CN
China
Prior art keywords
sliding window
bloom filter
filter matrix
subwindow
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510066798.4A
Other languages
Chinese (zh)
Other versions
CN105989061B (en
Inventor
王勇
王树鹏
王振宇
王曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201510066798.4A priority Critical patent/CN105989061B/en
Publication of CN105989061A publication Critical patent/CN105989061A/en
Application granted granted Critical
Publication of CN105989061B publication Critical patent/CN105989061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a rapid indexing method for repeated detection of multi-dimensional data under a sliding window. According to the rapid indexing method, a compressed counting type Blond filter matrix array is adopted to maintain data items inside the sliding window, multiple sub-windows are maintained inside the sliding window, new elements are received by head sub-windows in a sliding manner, old elements are eliminated by end sub-windows in the sliding manner, each independent sub-window consists of one counting type Blond filter matrix, the counting type Blond filter matrix has a dimension reduction function on multi-dimensional data, and a counter unit is maintained inside the counting type Blond filter matrix. Since all counting type Blond filter matrixes are of a same design capacitor and share one same group of k hash functions, the repeated element detection efficiency can be effectively improved; since a basic system clock is maintained in a counter unit, invisible element delete of the sliding window can be effectively supported; multi-dimensional data can be maintained in a matrix manner, the combination error rate of the multi-dimensional data can be effectively reduced, and the overall mal-judgment rate can be reduced.

Description

Multidimensional data duplicate detection fast indexing method under a kind of sliding window
Technical field
The present invention relates to duplicate detection fast indexing method and the system of a kind of magnanimity multidimensional data, particularly relate to a kind of indexing means carrying out duplicate detection under sliding window data flow model to magnanimity multidimensional data, belong to big data and calculate field.
Background technology
With the development of mobile Internet and Web2.0, global metadata amount is just in surprising growth: the data volume of whole world generation in 2008 is 0.49ZB (1ZB=1021 byte), within 2009, is 0.8ZB, within 2010, is 1.2ZB, up to 1.82ZB in 2011.IDC expects the year two thousand twenty, and the whole mankind can produce the data more than 40ZB.At a high speed, the network data of magnanimity but comprises complicated information, wherein may have miscellaneous service data stream, such as IP service flow, user's click steam, stream of user queries, web page server daily record etc..In addition, be wherein likely to comprise various security incident, the safety to network for the security incident constitutes threat greatly, and therefore Network Traffic Monitoring is particularly important.
In Network Traffic Monitoring application system, multidimensional data duplicate detection is very important preprocessing means.As a example by network service flow in network monitoring management system, each Business Stream is by five-tuple (source address, dest address, source port, dest port, protocol) uniquely determine, when representing and inquiry network service flow this five ties up element set, need highly effective algorithm to improve system effectiveness.
Calculating under scene at flow data, according to the move mode on flow data calculation window border, calculation window currently mainly is divided into following several types.The first is stationary window model, i.e. the two ends, left and right of calculation window are all fixing, and stationary window model is little for the ageing help embodying data;The second is boundary mark window model, i.e. window left end is fixed, right-hand member moves forward, boundary mark window contains the data item occurring between from special time point to current time, it if flowing out in the existing cycle in data and arranging multiple boundary mark, is equivalent to that data stream is divided into some independent low-volume traffic streams and investigates respectively;The third is jump window model, i.e. window left end skip-forwards advances, and right-hand member is slidably advanced, and jump window model more can the consecutive variations process of feedback data stream than boundary mark window model, but owing to window end batch eliminates element, therefore in window, effective element quantity has obvious wave process;4th kind is sliding window model, i.e. two ends, window left and right forward slip simultaneously, and sliding window is deleted stale data item while inserting new data item, is considered the ideal model of data stream monitoring and analysis.
Under sliding window model, carry out multidimensional and repeat the fast indexing method of Data Detection and mainly have a following several method:
First method is the indexing means that Hash combines counting, and hash indexing method is the big data directory mechanism of a kind of very convenient and efficient, by counting the data item that cryptographic Hash is identical, completes the existence record to multidimensional data;When sliding window needs to insert into row element, add 1 to corresponding counter, when needs enter row element delete when, the operation that subtracts 1 to corresponding counter, if counter is 0, then delete respective element item.But, there is shortcoming in the index strategy of Hash counting, first, hash algorithm is a kind of nondeterministic algorithm, will necessarily there is data item hash-collision, and the quality of conflict processing method has conclusive effect for data directory;Secondly, hash algorithm takies greatly for memory headroom.
Second method is multidimensional Bloom filter (MDBF) indexing means, MDBF uses the multiple standard Bloom filter composition identical with element dimensions, the expression inquiry being directly single property value subclass by expression and the query decomposition of Muhivitamin Formula With Minerals, the dimension of element has how many, just uses the Bloom filter of how many standards to represent each self-corresponding attribute respectively.When entering row element inquiry, by judging whether each property value of Muhivitamin Formula With Minerals all judges whether element belongs to set in corresponding standard Bloom filter.But, the method there is also shortcomings.First, the method is more weak in the deletion ability of sliding window interior element, it is impossible to realize accurate data item sliding window;Secondly as the multiple Hash in Bloom filter exists the possibility of conflict, only rely on the existence verification on each independent dimension, there is the higher situation of element False Rate.
In sum, fast indexing method is extremely important for the repetition data test problems in sliding window.In fast indexing method, promote element and repeat data detection efficiency, reduce and repeat Data Detection False Rate, be very important problem in the design of quick indexing structure.
Content of the invention
The main object of the present invention is to provide under sliding window carrying out fast indexing method and the system of multidimensional data duplicate detection, promote the efficiency of element duplicate detection, reduce and repeat Data Detection False Rate, the problem effectively solving multidimensional data duplicate detection under sliding window model.
Present disclosure mainly includes the following aspects.
First, in the design of quick indexing structure, the present invention uses the data item compressed in attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) safeguards sliding window.Specifically, safeguarding multiple subwindow in sliding window, head of the queue subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner;Each independent subwindow is made up of attribute Bloom filter matrix (CCBFM), and the dimension that CCBFM possesses towards multidimensional data deletes ability, and its internal maintenance counter unit.
Second, based on above-mentioned Index Structure Design, in terms of repeat element detection efficiency, all attribute Bloom filter matrixes in the present invention all use identical design capacity and share same group of k hash function, so effectively the time complexity of element inquiry can be reduced to O (k) by O (kn), effectively promote repeat element detection efficiency.
3rd, based on above-mentioned Index Structure Design, in terms of sliding window data stream calculation scene applicability, the data item of each independent subwindow of the present invention is safeguarded by attribute Bloom filter matrix (CCBFM), by safeguarding system-based clock in counter unit, can effectively support the element implicit expression deletion action in sliding window, promote the applicability for sliding-window operations for the system.
Compared with prior art, the main innovation point of the present invention and having the beneficial effect that:
1) present invention is at big data fast indexing structure design aspect, propose a kind of compression attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) index structure, index structure safeguards multiple subwindow in sliding window, and each independent subwindow is made up of attribute Bloom filter matrix (CCBFM).
2) present invention is based on above-mentioned Index Structure Design, by all using identical design capacity to all attribute Bloom filter matrixes and sharing same group of k hash function, can effectively promote repeat element detection efficiency;By safeguarding system-based clock in counter unit, can effectively support that the element implicit expression of sliding window is deleted;Safeguard multidimensional data by matrix-style, effectively reduce the combined error rate of multidimensional data, reduce overall False Rate.
Brief description
Fig. 1 is sliding window model schematic diagram;
Fig. 2 is repetition Data Detection index structure schematic diagram under sliding window model;
Fig. 3 is that under sliding window model, hash function merges shared schematic diagram;
Fig. 4 is data processing node multidimensional data duplicate detection workflow diagram.
Fig. 5 is the curve map of bit error rate test.
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below by specific embodiments and the drawings, the present invention will be further described.
In the design of sliding window quick indexing structure, the present invention uses the data item compressed in attribute Bloom filter matrix array (CCBFMA) safeguards sliding window, and this indexing means possesses more preferable repeat element detection efficiency and lower False Rate than existing solution.
Multidimensional compression attribute Bloom filter matrix array (CCBFMA) is made up of compression attribute Bloom filter matrix (CCBFM) of one group of isomorphism, each CCBFM is made up of the counter unit that m bit wide is d, bit wide d=log of each CCBFM Counter unit2(N/g).The total element capacity assuming sliding window is N, the present invention safeguards g subwindow in sliding window, the design capacity of each subwindow is N/g, all subwindows safeguard the data item flowing through in the way of FIFO, head of the queue subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner.
Each CCBFM safeguards all dimension datas of an independent subwindow.The present invention is by safeguarding system-based clock in the counter unit at CBF, the implicit expression carrying out sliding window interior element is deleted.It is described as follows: whether element x to be determined is the effective element of current sliding window mouth, and first, in terms of the existence judgement to x, CCBFM effectively eliminates combined error rate by way of matrix, greatly reduces the False Rate of element;Secondly, if x is existence by corresponding CCBFM judgement, then whether the counter safeguarded in needing to verify its counter unit exceedes current basal clock, if it exceeds Base clock then thinks that it is not effective element.
Fig. 1 gives slip data stream window computation model.Two ends, sliding window left and right forward slip simultaneously, sliding window is deleted stale data item while inserting new data item, is considered the ideal model of data stream monitoring and analysis.
Fig. 2 gives repetition Data Detection index structure under sliding window model.According to this index structure, the present invention safeguards compression attribute Bloom filter matrix array (CCBFMA-Compressed Counting Bloom Filter Matrix Array) index structure in ram space, this index structure safeguards multiple subwindow in sliding window, and all dimension datas of each independent subwindow are made up of attribute Bloom filter matrix (CCBFM).
Fig. 3 gives hash function under sliding window model and merges shared schematic diagram, and wherein k is hash function number, and g is sliding sub-window number, d=log2(N/g) bit wide is represented.As it can be seen, all CCBFM are isomorphisms, different Bloom filters have the counter unit of same coordinate mapped and store in same vector, in order to they can be read simultaneously in an internal storage access.Owing to all g Bloom filters share the same group of hash function that quantity is k, element x to be determined the whether effective element in current all boundary mark subwindows, its query time complexity can be reduced to O (k) by O (kg).
Fig. 4 gives multidimensional data duplicate detection workflow diagram.As it can be seen, element duplicate detection mainly includes following core procedure under the scape of sliding window data flow field.
(1) system-based clock, Element detection marker bit flag and system data structure are initialized;
(2) receiving the element e of input, e is made up of w dimension, i.e. (e1, e2...ew);
(3) in CCBFMA, whether detection elements e exists, if it does not exist, then enter flow process (4), enters new element and inserts flow process;If it is present enter flow process (8);
(4) write e (e1, e2...ew) in head of the queue CCBFM;
(5) k counter unit writing system Base clock in corresponding CCBFM;
(6) judge whether e is last element of head of the queue subwindow, if it is, Base clock resets, and delete first subwindow of tail of the queue, produce new head of the queue sliding sub-window;If it is not, then Base clock is from increasing;
(7) arranging global mark flag is false, and flow process terminates;
(8) judge whether element ei is present in tail of the queue subwindow, if it is not, then enter flow process (9), if it is, enter flow process (10);
(9) arranging global mark flag is true, and flow process terminates;
(10) judging whether corresponding counter unit numerical value is more than system-based clock, if it is, arranging global mark flag is true, flow process terminates;If it is not, then arranging global mark flag is false, flow process terminates.
In order to be embodied in the present invention relative to applicability under multidimensional data duplicate detection scene for the conventional method, the present invention, based on True Data collection, constructs following experiment.
Experimental situation: stand-alone server, two-way six core, internal memory 32GB;
Experimental data: true domain name data collection
Experiment content: contrast this method, detection error rate under multidimensional data scene for the MDBF indexing means.Data set inserts 1000 records, and in coverage rate, (coverage rate represents the probability that an attribute of data to be checked is concentrated with identical copies in data respectively, coverage rate is 1 to mean that all properties is repetition) it is the 0th, the 0.2nd, the 0.4th, the 0.6th, the 0.8th, when 1, the requests for information of test 6000000 record data.
Experimental result: table 1 is concrete data list, and the curve map of Fig. 5 bit error rate test, in figure, abscissa is coverage rate, and ordinate is False Rate.
Table 1. experimental result list
Sequence number Indexing means Coverage rate Inquiry bar number Error number
1 MDBF indexing means 0 6000000 272
2 MDBF indexing means 0.2 6000000 1200235
3 MDBF indexing means 0.4 6000000 2400175
4 MDBF indexing means 0.6 6000000 3600105
5 MDBF indexing means 0.8 6000000 4800060
6 MDBF indexing means 1 6000000 6000000
7 CCBFMA indexing means 0 6000000 1814
8 CCBFMA indexing means 0.2 6000000 3374
9 CCBFMA indexing means 0.4 6000000 3024
10 CCBFMA indexing means 0.6 6000000 4678
11 CCBFMA indexing means 0.8 6000000 5316
12 CCBFMA indexing means 1 6000000 6994
Can be drawn by above-mentioned experimental result, CCBFMA indexing means significantly reduces relative to the False Rate of MDBF indexing means.Further, since MDBF indexing means does not has eliminates combined error rate, therefore when coverage rate is 1, its all inquiries are erroneous judgement, and CCBFMA does not has this problem.
Above example is only limited in order to technical scheme to be described; technical scheme can be modified or equivalent by those of ordinary skill in the art; without departing from the spirit and scope of the present invention, protection scope of the present invention should be to be as the criterion described in claims.

Claims (7)

1. a multidimensional data duplicate detection fast indexing method under sliding window, its step includes:
1) safeguarding multiple subwindow in sliding window, all subwindows safeguard the data item flowing through, team in the way of FIFO First subwindow receives new element in sliding manner, and tail of the queue subwindow eliminates old element in sliding manner;
2) by the data item in compression attribute Bloom filter matrix function group index structural maintenance sliding window, each attribute Bloom filter matrix safeguards a subwindow of sliding window, and it comprises multiple dimension data.
2. the method for claim 1, it is characterised in that: each attribute Bloom filter matrix is by some counter unit structures Become, bit wide d=log of counter unit2(N/g), wherein N is total element capacity of sliding window, g be sliding window in son Window number, N/g is the design capacity of each subwindow.
3. the method for claim 1, it is characterised in that: all attribute Bloom filter matrixes all use identical design to hold Measure and share same group of k hash function.
4. method as claimed in claim 3, it is characterised in that: all attribute Bloom filter matrixes have the meter of same coordinate Number device unit is mapped and stores in same vector, and is read simultaneously in an internal storage access.
5. the method for claim 1, it is characterised in that: in the counter unit of attribute Bloom filter matrix, safeguard system System Base clock, in order to carry out the implicit expression deletion action of sliding window interior element.
6. method as claimed in claim 5, it is characterised in that determine the effective element whether element x is current sliding window mouth Method is: first, and in terms of the existence judgement to x, attribute Bloom filter matrix is effectively removed by way of matrix Combined error rate, thus greatly reduce the False Rate of element;Secondly, if x by corresponding attribute Bloom filter matrix Judgement is for existing, then whether the counter safeguarded in verifying its counter unit exceedes current basal clock, if it exceeds during basis Zhong Ze thinks that it is not effective element.
7. method as claimed in claim 6, it is characterised in that enter row element duplicate detection under the scape of sliding window data flow field Step is as follows:
(1) system-based clock, Element detection marker bit flag and system data structure are initialized;
(2) receiving the element e of input, e is made up of w dimension, i.e. (e1, e2...ew);
(3) in compression attribute Bloom filter matrix array, whether detection elements e exists, if it does not exist, then enter to become a mandarin Journey (4), enters new element and inserts flow process;If it is present enter flow process (8);
(4) by the attribute Bloom filter matrix of e (e1, e2...ew) write head of the queue;
(5) k counter unit writing system Base clock in corresponding attribute Bloom filter matrix;
(6) judge whether e is last element of head of the queue subwindow, if it is, Base clock resets, and delete tail of the queue the One subwindow, produces new head of the queue sliding sub-window;If it is not, then Base clock is from increasing;
(7) arranging global mark flag is false, and flow process terminates;
(8) judge whether element ei is present in tail of the queue subwindow, if it is not, then enter flow process (9), if it is, enter Flow process (10);
(9) arranging global mark flag is true, and flow process terminates;
(10) judge whether corresponding counter unit numerical value is more than system-based clock, if it is, arrange global mark flag For true, flow process terminates;If it is not, then arranging global mark flag is false, flow process terminates.
CN201510066798.4A 2015-02-09 2015-02-09 Multidimensional data repeats detection fast indexing method under a kind of sliding window Active CN105989061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510066798.4A CN105989061B (en) 2015-02-09 2015-02-09 Multidimensional data repeats detection fast indexing method under a kind of sliding window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510066798.4A CN105989061B (en) 2015-02-09 2015-02-09 Multidimensional data repeats detection fast indexing method under a kind of sliding window

Publications (2)

Publication Number Publication Date
CN105989061A true CN105989061A (en) 2016-10-05
CN105989061B CN105989061B (en) 2019-11-26

Family

ID=57038169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510066798.4A Active CN105989061B (en) 2015-02-09 2015-02-09 Multidimensional data repeats detection fast indexing method under a kind of sliding window

Country Status (1)

Country Link
CN (1) CN105989061B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997391A (en) * 2017-04-10 2017-08-01 华北电力大学(保定) A kind of method of steady state condition data in quick screening large scale process data
CN108694074A (en) * 2017-04-07 2018-10-23 腾讯科技(深圳)有限公司 A kind of method and server obtaining count information
CN109582640A (en) * 2018-11-15 2019-04-05 深圳市酷开网络科技有限公司 A kind of data deduplication storage method, device and storage medium based on sliding window
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110083743A (en) * 2019-03-28 2019-08-02 哈尔滨工业大学(深圳) A kind of quick set of metadata of similar data detection method based on uniform sampling
CN110704419A (en) * 2018-06-21 2020-01-17 中兴通讯股份有限公司 Data structure, data indexing method, device and equipment, and storage medium
CN112529613A (en) * 2020-11-27 2021-03-19 广州华多网络科技有限公司 Method and device for processing user continuous login data and transferring virtual resources
CN112688837A (en) * 2021-03-17 2021-04-20 中国人民解放军国防科技大学 Network measurement method and device based on time sliding window
CN112751869A (en) * 2020-12-31 2021-05-04 中国人民解放军战略支援部队航天工程大学 Network abnormal flow detection method and device based on sliding window group
CN114595280A (en) * 2022-05-10 2022-06-07 鹏城实验室 Time member query method, device, terminal and medium based on sliding window

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253820A (en) * 2011-06-16 2011-11-23 华中科技大学 Stream type repetitive data detection method
CN103336771A (en) * 2013-04-02 2013-10-02 江苏大学 Data similarity detection method based on sliding window

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253820A (en) * 2011-06-16 2011-11-23 华中科技大学 Stream type repetitive data detection method
CN103336771A (en) * 2013-04-02 2013-10-02 江苏大学 Data similarity detection method based on sliding window

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694074B (en) * 2017-04-07 2023-04-07 腾讯科技(深圳)有限公司 Method for acquiring counting information and server
CN108694074A (en) * 2017-04-07 2018-10-23 腾讯科技(深圳)有限公司 A kind of method and server obtaining count information
CN106997391B (en) * 2017-04-10 2020-11-03 华北电力大学(保定) Method for rapidly screening steady-state working condition data in large-scale process data
CN106997391A (en) * 2017-04-10 2017-08-01 华北电力大学(保定) A kind of method of steady state condition data in quick screening large scale process data
CN110704419A (en) * 2018-06-21 2020-01-17 中兴通讯股份有限公司 Data structure, data indexing method, device and equipment, and storage medium
CN109582640A (en) * 2018-11-15 2019-04-05 深圳市酷开网络科技有限公司 A kind of data deduplication storage method, device and storage medium based on sliding window
CN109582640B (en) * 2018-11-15 2020-12-01 深圳市酷开网络科技有限公司 Sliding window-based data deduplication storage method and device and storage medium
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110083743A (en) * 2019-03-28 2019-08-02 哈尔滨工业大学(深圳) A kind of quick set of metadata of similar data detection method based on uniform sampling
CN112529613A (en) * 2020-11-27 2021-03-19 广州华多网络科技有限公司 Method and device for processing user continuous login data and transferring virtual resources
CN112751869A (en) * 2020-12-31 2021-05-04 中国人民解放军战略支援部队航天工程大学 Network abnormal flow detection method and device based on sliding window group
CN112751869B (en) * 2020-12-31 2023-07-14 中国人民解放军战略支援部队航天工程大学 Method and device for detecting abnormal network traffic based on sliding window group
CN112688837B (en) * 2021-03-17 2021-06-08 中国人民解放军国防科技大学 Network measurement method and device based on time sliding window
CN112688837A (en) * 2021-03-17 2021-04-20 中国人民解放军国防科技大学 Network measurement method and device based on time sliding window
CN114595280A (en) * 2022-05-10 2022-06-07 鹏城实验室 Time member query method, device, terminal and medium based on sliding window
CN114595280B (en) * 2022-05-10 2022-08-02 鹏城实验室 Time member query method, device, terminal and medium based on sliding window

Also Published As

Publication number Publication date
CN105989061B (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN105989061A (en) Rapid indexing method for repeated detection of multi-dimensional data under sliding window
CN103577440B (en) A kind of data processing method and device in non-relational database
Cheng et al. K-isomorphism: privacy preserving network publication against structural attacks
CN103902653B (en) A kind of method and apparatus for building data warehouse table genetic connection figure
CN105989129B (en) Real time data statistical method and device
CN103559217A (en) Heterogeneous database oriented massive multicast data storage implementation method
CN103646051B (en) Big-data parallel processing system and method based on column storage
CN106534164B (en) Effective virtual identity depicting method based on cyberspace user identifier
CN102253991B (en) Uniform resource locator (URL) storage method, web filtering method, device and system
CN104156380A (en) Distributed memory Hash indexing method and system
CN104869009A (en) Website data statistics system and method
Campinas et al. Efficiency and precision trade-offs in graph summary algorithms
CN103685224A (en) A network invasion detection method
CN104618361B (en) A kind of network flow data method for reordering
WO2017161540A1 (en) Data query method, data object storage method and data system
CN107766529A (en) A kind of mass data storage means for sewage treatment industry
CN103440265A (en) MapReduce-based CDC (Change Data Capture) method of MYSQL database
CN106970939A (en) A kind of database audit method and its system
CN101986611B (en) Quick flow grouping method based on two-level cache
CN109388635A (en) A kind of data storage method of the multi-value data based on binary system and dictionary table
CN104794158B (en) Domain name data repeats detection fast indexing method under a kind of boundary mark window
CN102486772A (en) Method and device for exporting data
JP2019204475A (en) Method of caching plural files of 2 mb or less based upon hadoop
CN109656929A (en) A kind of method and device for carving multiple relationship type database file
CN110019549A (en) For the big data storage system of platform of internet of things

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant