CN109815234A - A kind of multiple cuckoo filter under streaming computing model - Google Patents

A kind of multiple cuckoo filter under streaming computing model Download PDF

Info

Publication number
CN109815234A
CN109815234A CN201811635873.4A CN201811635873A CN109815234A CN 109815234 A CN109815234 A CN 109815234A CN 201811635873 A CN201811635873 A CN 201811635873A CN 109815234 A CN109815234 A CN 109815234A
Authority
CN
China
Prior art keywords
data
cuckoo
cuckoo filter
sliding window
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811635873.4A
Other languages
Chinese (zh)
Other versions
CN109815234B (en
Inventor
范小朋
吴梦露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhongke Advanced Technology Development Co ltd
Original Assignee
Hangzhou China Science Advanced Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou China Science Advanced Technology Research Institute Co Ltd filed Critical Hangzhou China Science Advanced Technology Research Institute Co Ltd
Priority to CN201811635873.4A priority Critical patent/CN109815234B/en
Publication of CN109815234A publication Critical patent/CN109815234A/en
Application granted granted Critical
Publication of CN109815234B publication Critical patent/CN109815234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the multiple cuckoo filters under a kind of streaming computing model.Multiple cuckoo filter is mainly made of multiple standard cuckoo filters identical with data flow sum, for each data flow, one standard cuckoo filter is set, the expression of the multiple data collection of data flow and query decomposition are the expression and inquiry of multiple forms data collection by each respective data flow of standard cuckoo filter process;Sliding window is respectively established for each cuckoo filter, each standard cuckoo filter is filtered inquiry simultaneously, and sliding window inquires between different data streams whether exist simultaneously identical specified object along time boundary splitting traffic.The present invention can inherit the advantages of cuckoo filter well, and the processing for high amount of traffic can largely simplify operand and space occupancy rate, reduce false positive rate, facilitate accurate data to inquire, obvious technical effects are prominent.

Description

A kind of multiple cuckoo filter under streaming computing model
Technical field
The present invention relates to a kind of cuckoo filters of computer big data field, more particularly, to a kind of stream Multiple cuckoo filter towards magnanimity multidimensional data index under formula computation model.
Background technique
With the fast development of the related industries such as mobile Internet, Web2.0, smart machine, data volume caused by the mankind With exponential rapid growth.Mass data gradually shows the number greatly such as hugeization, type diversification, flow high speed According to feature.Data multidimensional feature becomes clear day by day, and the storage of magnanimity multidimensional data calculates analysis, large-scale data rope in real time Draw and search for etc. is that information system brings stern challenge.
Unlike low-dimensional data, multidimensional data enables the system to record a large amount of and comprehensive information, and by answering With providing richer service for user.But the distributed towards multidimensional data, in performances sides such as indexes Face dramatic decrease, the memory headroom especially occupied also with dimension increase and rapid growth.
Summary of the invention
It is a primary object of the present invention to propose under a kind of streaming computing model towards the more of magnanimity multidimensional data index Weight cuckoo filter, to establish operation basis to establish multiplex data stream incidence relation.
The technical solution adopted by the present invention is that:
The present invention designs multiple cuckoo filter data structure, multiple cuckoo filter master according to cuckoo filter It to be made of multiple standard cuckoo filters identical with data flow sum, for each data flow setting pair of required processing Answer a standard cuckoo filter, each respective data flow of standard cuckoo filter process, by the multiplicity of data flow According to the expression of collection and query decomposition it is the expression and inquiry of multiple forms data collection, and data element is increased by control and is deposited in index The value control of the fingerprint size f of storage reduces the positive rate of vacation of multiple cuckoo filter.
When inquiring under any time, the standard cuckoo filter of each data flow is filtered inquiry simultaneously, and inquiry is not With identical specified object whether is existed simultaneously between data flow, exists, return to True, otherwise return to False.
One sliding window respectively established for each cuckoo filter, sliding window is from corresponding data flow head Start to obtain quantity and the corresponding segmentation source of each data flow in the period a one by one in office along time boundary splitting traffic Fingerprint in each sliding window is compared data, checks between multiple data flows whether exist simultaneously certain specified element.
The index of all entries of cuckoo filter is stored in the sliding window, sliding window uses queuing data knot Structure;When whether containing element x simultaneously in more multiple sliding windows, the Hash mapping result of element x is first obtained in all standards Final storage location in cuckoo filter retrieves Hash in sliding window changes over time and moves and compare without offset Whether mapping result is in the location index of corresponding sliding window storage.
The standard cuckoo filter of each data flow is since the head of data flow.
The specified object is the character numerical value of data slot or data slot after processing.
The data of the standard cuckoo filter are by data fluxion dynamic generation, or are set in advance.
Main thought of the invention is to design multiple cuckoo filtering based on cuckoo filter data structure and algorithm The expression of multiple data collection and query decomposition are the expression and inquiry of multiple forms data collection by device.
In the present invention, the inquiry of multiple cuckoo filter is realized based on cuckoo filter data structure source C++ code Algorithm.Compare in any time, object is specified whether to compare inquiry in multiple standard cuckoo mistakes by the result of Hash mapping In the location index of the corresponding multiple sliding window storages of filter.
The present invention implements and analyzes false positive rate, and the positive rate of the vacation of multiple cuckoo filter and the size of bucket, cuckoo are filtered The number of device, set element sum, sliding window size, window move size every time and fingerprint size is related.
Fingerprint in the present invention refers to digital finger-print, as unique character value of data slot, such as MD5 value.
The present invention proposes multiple cuckoo filter, is for multiple odd numbers by the expression of multiplex data stream and query decomposition According to the expression and inquiry of stream.The data flow of generation how many, the cuckoo filter of how many standard is respectively indicated and is looked into Ask the object in each data flow.
The data flow that the present invention inputs is not limited only to the data acquisition system of large capacity, such as file stream.
The program code that the present invention constructs multiple cuckoo filter data structure is not limited only to C Plus Plus;Call journey Sequence perform script is not limited only to Linux Shell language, such as Python script.Hash function used in programming procedure It is not limited only to MurmurHash, such as BobHash, SuperFastHash, MD5Hash, SHA1Hash.
In present invention specific implementation, the number of standard cuckoo filter is not limited only to raw dynamically with data fluxion At, k cuckoo filter can be given in advance, when insertion failure occurs for some cuckoo filter, other in the set Element continues into next cuckoo filter, after the completion of all elements insertion, then the cuckoo filter of releasing idling Occupied space.
In the present invention, the index of all entries of cuckoo filter is stored in sliding window, that is, by cuckoo mistake The bucket and entry of filter are numbered since 0, largely simplify operand and space occupancy rate.
The beneficial effects of the present invention are:
Multiple cuckoo filter is designed the present invention is based on cuckoo filter, can also inherit cuckoo filtering well The advantages of device --- support element dynamic increase and reliable delete operation, better query performance, storage location relevance, Smaller space utilization rate under certain condition.
The present invention can not only largely simplify operand and space occupancy rate, and can substantially reduce false positive rate, convenient Accurate data inquiry.
Multiple cuckoo filter of the invention is compared with previous cuckoo filter, can be supported in multiple data flows Lookup exists simultaneously the object for meeting specified relationship, has more broad application prospect than existing cuckoo filter.
Detailed description of the invention
Fig. 1 is the multiple cuckoo filter data structure query logic schematic diagram of the present invention;
Fig. 2 is the relational graph of Checkup query time and sliding window size;
Fig. 3 is the relational graph of Checkup query time and set element sum.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
It is proposed by the present invention be under a kind of streaming computing model with the variation of data fluxion the adaptive cuckoo of dynamic change Bird filter referring to the drawings and gives an actual example pair to make the purpose of the present invention, technical solution and effect clearer, clear and definite The present invention is further described.
1, data structure
As shown in Figure 1, multiple cuckoo filter is by multiple standard cuckoo filter groups identical with data flow sum At each data flow respectively corresponds to the cuckoo filter of a standard, i.e., by the expression and query decomposition of multiple data collection For the expression and inquiry of multiple forms data collection.How many is a for the data flow of generation, with regard to the cuckoo filter of how many standard Respectively indicate the object inquired in each data flow.
Table 1: the symbol description of multiple cuckoo filter
High amount of traffic environment is simulated, designs multiple cuckoo filter data structure such as Fig. 1 in the present invention.
Assuming that having n data flow as n data acquisition system, data pair at least in the millions are included in each data acquisition system As element, that is, designing the identical set of n number of elements at least in the millions, corresponding n standard cuckoo of dynamic generation Filter.
As shown in Figure 1, Windowing (Windowing) processing is carried out in the following ways for each cuckoo filter, Be able to solve the technical issues of infinite data source never terminates: each cuckoo filter establishes sliding window, sliding window Since respective data flow head, sliding window is ok within each period mouth along time boundary splitting traffic N segmentation source data (the inquiry data i.e. in sliding window) is obtained, and then the fingerprint in each sliding window is compared, Check between multiple data flows whether exist simultaneously certain specified element.Fingerprint is identical to be considered identical specified element.
2, search algorithm
The element querying flow of multiple cuckoo filter algorithm specifically:
Input data: num and x respectively indicates number of data streams and element to be checked
Query process is as follows:
Step 1. adds data into i-th of cuckoo filter.Element x to be checked is recorded in the filter simultaneously Location information, be stored in entry item_index [i].A sliding window is generated in the filter, window size is random It generates, the corresponding data in position in cuckoo filter is inserted into sliding window.Next filter is carried out after the completion Same operation, until each filter completes above-mentioned steps.
Step 2. inquiry starts.
Situation 1: for single filter, if in current sliding window mouth including element x, then it represents that in the filter Containing element x, true is returned;
Situation 2: for single filter, if in current sliding window mouth not yet including element x and the sliding window The last item data in the filter are not slided into also, then sliding window moves down, and continues to inquire;
Situation 3: for single filter, if in current sliding window mouth not yet including element x and the sliding window The last item data in the filter are had arrived at, then it represents that element x is not present in the filter, returns to false;
The index of all entries of cuckoo filter is stored in sliding window, the fingerprint that index data is generated by Hash is sliding Dynamic window uses queue data structure.When whether containing element x simultaneously in more multiple sliding windows, the Kazakhstan of element x is first obtained Uncommon final storage location of the mapping result in all standard cuckoo filters, changes over time in sliding window without offset In moving and comparing, Hash mapping result is retrieved whether in the location index of corresponding sliding window storage, and in successful inquiring In the case of carry out timing can assess performance.
Hash mapping result is retrieved not in the location index of corresponding sliding window storage, i.e. inquiry failure then returns False, this is unsuccessfully not representing element x and is not successfully plugged into cuckoo filter, but indicates in dynamic at any time In mobile sliding window, it can not find in the sliding window of any time multiple cuckoo filters while comprising the element Situation.
Hash mapping result is retrieved in the location index of corresponding sliding window storage, i.e. successful inquiring, then returns to true, Then think that there are identical element x.Also think there is false positive rate simultaneously, it may occur however that the Hash fingerprint of other elements and the Kazakhstan of x Uncommon fingerprint Hash collision, fairly falls in sliding window.It is following to carry out false positive rate analysis.
3, false positive rate analysis:
For a standard cuckoo filter, the worst request for information is considered --- inquiry one is not belonging in set Element, then the inquiry must retrieve all 2b entries in two buckets.
In each entry, the probability for being matched to stored fingerprint and returning to erroneous judgement inquiry is at most 1/2f, carries out 2b After secondary fingerprint comparison, the fingerprint False Rate upper limit are as follows:
CF=1- (1-1/2f)2b≈2b/2f
Multiple cuckoo filter is looked into the case where not considering dynamic window according to the set of multiple cuckoo filter Operation is ask, the inquiry of each element requires to retrieve whole cuckoo filters.The positive rate of the vacation of multiple cuckoo filter, refers to The probability that at least one cuckoo filter judges x by accident in all cuckoo filters.
The positive rate of the vacation of each cuckoo filter is ∈CF, in s all cuckoo filters, that does not judge by accident is general Rate is (1- ∈CF)s.The united false positive rate upper limit of s cuckoo filter are as follows:
1-(1-∈CF)s=1- (1-1/2f)2bs≈2bs/2f
If the dynamic window in view of cuckoo filter changes, it is assumed that total m element (the i.e. data pair of a data acquisition system As), sliding window size is w, every time mobile k element, then symbiosis atA sliding window.
In the dynamic window variation of s cuckoo filter, altogether relativelyIt is secondary, multiple cuckoo The positive rate of the vacation of bird filter calculates are as follows:
According to above formula relationship as it can be seen that of the positive rate of the vacation of multiple cuckoo filter and the size of bucket, cuckoo filter Number, data acquisition system element sum, sliding window size, window move size every time and fingerprint size is related.Here set is Refer to all data in dynamic window.
Specifically, the value for increasing fingerprint size f can significantly reduce false positive rate ∈MCF, under bigger data acquisition system, The value for constructing bigger fingerprint size f enables to the positive rate of the vacation of multiple cuckoo filter to lower.
As a result, in multiple cuckoo filter of the invention, it is independent from each other between multiple standard cuckoo filters, False sun rate can be smaller than standard cuckoo filter.
The present invention has used cuckoo filter data structure source C++ code to realize multiple cuckoo mistake in specific implementation Filter.
Experiment is divided into three groups, analyze respectively Checkup query time and cuckoo filter sum, sliding window size, The relationship of set element sum.As shown in Fig. 2, total element number is initially set to 1000000 in experiment, sliding window is initially set Value control is set between 50000-100000, the size that sliding window moves every time is appointed as 2000.
Multiple cuckoo filters move down sliding window simultaneously, if in any time in corresponding sliding window The presence for all inquiring specified element then returns to True, otherwise returns to False.When wherein some sliding window is to assigning When to maximum value, needs to fix its window and other windows is waited successively to be moved to filter end.
Specific time-consuming data such as following table 2- table 6.It was found that being difficult to find 4 dynamics when data flow number is more than 4 or more Simultaneously containing the fingerprint of certain element in window, so cuckoo filter number is increased to 5 by 1, specified element is retrieved in test Exist simultaneously time and inquiry times consumed by the sliding window at multiple cuckoo filter moment.
As can be seen that when cuckoo filter number increases to 4 by 1, the number of successful query largely subtracts table 2- table 5 It is few, and inquire time-consuming linearly increasing.Such as table 6, when the number of cuckoo filter at 5 and its it is above when, be difficult again at Function inquires sliding window in any time while all there is the fingerprint for specifying element.
2:1 cuckoo filter of table, every group of 50 inquiry
3:2 cuckoo filter of table, every group of 100 inquiry
4:3 cuckoo filter of table, every group of 100 inquiry
5:4 cuckoo filter of table, every group of 100 inquiry
6:5 cuckoo filter of table, every group of 100 inquiry
And implemented further directed to Checkup query time and the relationship of sliding window size, sliding window it is big It is small not generate at random.When specifying 2 cuckoo filters, the present invention sets maximum for the sliding window of one of them, separately One window size value between the 20000~160000 of each increase by 20000.It, will when specifying 3 cuckoo filters First sliding window is set as maximum, other two window is incremented by successively with every time 10000 speed.It as shown in Figure 3 can be with Find out, increasing for cuckoo filter number will lead to the more query times of cost, in general, the cuckoo of same number Between filter, sliding window is bigger, and inquiry is time-consuming gradually on a declining curve.
Also, the sum for gradually increasing data acquisition system element respectively, tests the influence to Checkup query time.Such as Fig. 3 It is shown as can be seen that inquiry is time-consuming also bigger when data acquisition system sum is bigger.
Thus above-mentioned implementation is as it can be seen that the advantages of present invention can inherit cuckoo filter well, for high amount of traffic Processing can largely simplify operand and space occupancy rate, reduce false positive rate, facilitate accurate data to inquire, technical effect is aobvious It writes and protrudes.

Claims (7)

1. the multiple cuckoo filter under a kind of streaming computing model, it is characterised in that: multiple cuckoo filter mainly by Identical multiple standard cuckoo filter compositions, are arranged a standard cuckoo mistake for each data flow with data flow sum Filter, each respective data flow of standard cuckoo filter process, by the expression of the multiple data collection of data flow and inquiry point Solution is the expression and inquiry of multiple forms data collection, and increases data element in the value of the fingerprint size f of index storage by control Control reduces the positive rate of vacation of multiple cuckoo filter.
2. the multiple cuckoo filter under a kind of streaming computing model according to claim 1, it is characterised in that: any When inscribing inquiry, the standard cuckoo filter of each data flow is filtered inquiry simultaneously, between inquiry different data streams Identical specified object whether is existed simultaneously, is existed, True is returned, otherwise returns to False.
3. the multiple cuckoo filter under a kind of streaming computing model according to claim 2, it is characterised in that: for Each cuckoo filter respectively establishes a sliding window, and sliding window is since corresponding data flow head along the time Boundary segmentation data flow obtains quantity and the corresponding segmentation source data of each data flow in the period a one by one in office, will be each Fingerprint in sliding window is compared, and checks between multiple data flows whether exist simultaneously certain specified element.
4. the multiple cuckoo filter under a kind of streaming computing model according to claim 2, it is characterised in that: described Sliding window in store all entries of cuckoo filter index, sliding window use queue data structure;It is more multiple When whether containing element x simultaneously in sliding window, the Hash mapping result of element x is first obtained in all standard cuckoo filters In final storage location, sliding window change over time it is dynamic without offset and relatively in, whether retrieval Hash mapping result In the location index of corresponding sliding window storage.
5. the multiple cuckoo filter under a kind of streaming computing model according to claim 1, it is characterised in that: each The standard cuckoo filter of data flow is since the head of data flow.
6. the multiple cuckoo filter under a kind of streaming computing model according to claim 1, it is characterised in that: described Specified object be the character numerical value of data slot or data slot after processing.
7. the multiple cuckoo filter under a kind of streaming computing model according to claim 1, it is characterised in that: described Standard cuckoo filter data by data fluxion dynamic generation, or be set in advance.
CN201811635873.4A 2018-12-29 2018-12-29 Multiple cuckoo filter under STREAMING computational model Active CN109815234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811635873.4A CN109815234B (en) 2018-12-29 2018-12-29 Multiple cuckoo filter under STREAMING computational model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811635873.4A CN109815234B (en) 2018-12-29 2018-12-29 Multiple cuckoo filter under STREAMING computational model

Publications (2)

Publication Number Publication Date
CN109815234A true CN109815234A (en) 2019-05-28
CN109815234B CN109815234B (en) 2021-01-08

Family

ID=66602770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811635873.4A Active CN109815234B (en) 2018-12-29 2018-12-29 Multiple cuckoo filter under STREAMING computational model

Country Status (1)

Country Link
CN (1) CN109815234B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339058A (en) * 2020-03-24 2020-06-26 中国人民解放军国防科技大学 Set synchronization method and device
CN111478769A (en) * 2020-03-18 2020-07-31 西安电子科技大学 Distributed credible identity authentication method, system, storage medium and terminal
CN111552692A (en) * 2020-04-30 2020-08-18 南方科技大学 Plus-minus cuckoo filter
CN111552693A (en) * 2020-04-30 2020-08-18 南方科技大学 Tag cuckoo filter
CN111858651A (en) * 2020-09-22 2020-10-30 中国人民解放军国防科技大学 Data processing method and data processing device
CN112149416A (en) * 2020-09-09 2020-12-29 南京大学 Method for detecting hot spot academic research topic in distributed academic data warehouse
CN112507689A (en) * 2021-01-20 2021-03-16 中国地质大学(武汉) Spatial range-keyword query method under distributed subscription and release mode
CN112597345A (en) * 2020-10-30 2021-04-02 深圳市检验检疫科学研究院 Laboratory data automatic acquisition and matching method
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN114844638A (en) * 2022-07-03 2022-08-02 浙江九州量子信息技术股份有限公司 Big data volume secret key duplication removing method and system based on cuckoo filter

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116599A (en) * 2012-11-30 2013-05-22 浙江工商大学 Urban mass data flow fast redundancy elimination method based on improved Bloom filter structure
US20160134503A1 (en) * 2014-11-07 2016-05-12 Arbor Networks, Inc. Performance enhancements for finding top traffic patterns
CN105989061A (en) * 2015-02-09 2016-10-05 中国科学院信息工程研究所 Rapid indexing method for repeated detection of multi-dimensional data under sliding window

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116599A (en) * 2012-11-30 2013-05-22 浙江工商大学 Urban mass data flow fast redundancy elimination method based on improved Bloom filter structure
US20160134503A1 (en) * 2014-11-07 2016-05-12 Arbor Networks, Inc. Performance enhancements for finding top traffic patterns
CN105989061A (en) * 2015-02-09 2016-10-05 中国科学院信息工程研究所 Rapid indexing method for repeated detection of multi-dimensional data under sliding window

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BIN FAN ET AL.: "Cuckoo Filter: Practically Better Than Bloom", 《PROCEEDINGS OF THE 10TH ACM INTERNATIONAL ON CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111478769A (en) * 2020-03-18 2020-07-31 西安电子科技大学 Distributed credible identity authentication method, system, storage medium and terminal
CN111339058B (en) * 2020-03-24 2023-05-16 中国人民解放军国防科技大学 Aggregation synchronization method and device
CN111339058A (en) * 2020-03-24 2020-06-26 中国人民解放军国防科技大学 Set synchronization method and device
CN111552693B (en) * 2020-04-30 2023-04-07 南方科技大学 Tag cuckoo filter
CN111552692A (en) * 2020-04-30 2020-08-18 南方科技大学 Plus-minus cuckoo filter
CN111552693A (en) * 2020-04-30 2020-08-18 南方科技大学 Tag cuckoo filter
CN111552692B (en) * 2020-04-30 2023-04-07 南方科技大学 Plus-minus cuckoo filter
CN112149416A (en) * 2020-09-09 2020-12-29 南京大学 Method for detecting hot spot academic research topic in distributed academic data warehouse
CN112149416B (en) * 2020-09-09 2023-08-22 南京大学 Method for detecting hot academic research topics in distributed academic data warehouse
CN111858651A (en) * 2020-09-22 2020-10-30 中国人民解放军国防科技大学 Data processing method and data processing device
CN112597345B (en) * 2020-10-30 2023-05-12 深圳市检验检疫科学研究院 Automatic acquisition and matching method for laboratory data
CN112597345A (en) * 2020-10-30 2021-04-02 深圳市检验检疫科学研究院 Laboratory data automatic acquisition and matching method
CN112507689B (en) * 2021-01-20 2023-08-01 中国地质大学(武汉) Space range-keyword query method under distributed subscription and release mode
CN112507689A (en) * 2021-01-20 2021-03-16 中国地质大学(武汉) Spatial range-keyword query method under distributed subscription and release mode
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN113535706B (en) * 2021-08-03 2023-05-23 佛山赛思禅科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN114844638A (en) * 2022-07-03 2022-08-02 浙江九州量子信息技术股份有限公司 Big data volume secret key duplication removing method and system based on cuckoo filter

Also Published As

Publication number Publication date
CN109815234B (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN109815234A (en) A kind of multiple cuckoo filter under streaming computing model
CN105320775B (en) The access method and device of data
CN104536959B (en) A kind of optimization method of Hadoop accessing small high-volume files
CN103593436B (en) file merging method and device
CN101866358B (en) Multidimensional interval querying method and system thereof
CN111913955A (en) Data sorting processing device, method and storage medium
US8364751B2 (en) Automated client/server operation partitioning
EP2199935A2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
CN107436813A (en) A kind of method and system of meta data server dynamic load leveling
CN107329987A (en) A kind of search system based on mongo databases
CN110688382B (en) Data storage query method and device, computer equipment and storage medium
WO2021047373A1 (en) Big data-based column data processing method, apparatus, and medium
CN110515920A (en) A kind of mass small documents access method and system based on Hadoop
CN102214236A (en) Method and system for processing mass data
JP2022547673A (en) DATA PROCESSING METHOD AND RELATED DEVICE, AND COMPUTER PROGRAM
CN109766318A (en) File reading and device
CN104462349B (en) A kind of document handling method and device
CN109117426A (en) Distributed networks database query method, apparatus, equipment and storage medium
CN110019017B (en) High-energy physical file storage method based on access characteristics
CN116089364B (en) Storage file management method and device, AI platform and storage medium
CN112540954B (en) Multi-level storage construction and online migration method in directory unit
CN109828984B (en) Analysis processing method and device, computer storage medium and terminal
Zhao et al. LS-AMS: An adaptive indexing structure for realtime search on microblogs
CN108614879A (en) Small documents processing method and device
CN112540843B (en) Resource allocation method and device, storage equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 310000 Room 501, building 9, No. 20, kekeyuan Road, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Patentee after: Hangzhou Zhongke advanced technology development Co.,Ltd.

Address before: 310026 Room 501, building 9, 20 kejiyuan Road, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Patentee before: HANGZHOU ZHONGKE ADVANCED TECHNOLOGY RESEARCH INSTITUTE Co.,Ltd.