CN117891858A

CN117891858A - Space-time efficient parallel approximate member query method and system

Info

Publication number: CN117891858A
Application number: CN202410293870.6A
Authority: CN
Inventors: 黄河; 杜扬; 孙玉娥; 舒亚鹏; 陆俊; 侯劲松; 蒋明; 谢民; 于浩; 李振伟; 王韬
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-04-16

Abstract

The invention relates to a space-time efficient parallel approximate member query method and a space-time efficient parallel approximate member query system, which relate to the technical field of computers and comprise the steps of acquiring a data set to be operated; dividing data elements in a data set into a plurality of disjoint sub-data sets, wherein each sub-data set corresponds to a pipeline stage; each pipeline stage sequentially and continuously processes data elements waiting for operation; when the operation of the data elements in each pipeline stage is completed, judging whether an operation result is returned or not; if the data element fails to operate in the current pipeline stage, the next pipeline stage is entered for operation. The invention designs a pipeline parallel approximate member query method capable of avoiding memory access conflict, and aims to break through the existing performance bottleneck to realize a more time-space efficient approximate member query data structure by utilizing the parallel function provided by a multi-core CPU (Central processing Unit) or programmable hardware, and simultaneously solve the limitation that the length of a cuckoo filter must be a power of 2.

Description

Space-time efficient parallel approximate member query method and system

Technical Field

The invention relates to the technical field of computers, in particular to a space-time efficient parallel approximate member query method and a space-time efficient parallel approximate member query system.

Background

The approximate membership query (AMQ: approximate Membership Query) has wide application in the fields of databases, data mining, bioinformatics, network measurement, etc. A Filter (Filter) is a compact probabilistic data structure that can be used to handle approximate member query problems, namely: the data elements in the data set are stored in the filter and the question whether a given data element belongs to the given data set is answered only in accordance with the filter. Since the memory space used by a filter is much smaller than the size of the data set it stores, the filter can still be stored in higher layers of the memory hierarchy (e.g., DRAM or SRAM) even though the data set is large in size and can only be stored on disk. Another advantage of the filter is that the entire data set does not need to be searched to answer queries regarding the presence of particular data elements in the data set, thereby improving query performance by reducing storage access overhead.

The Approximate Membership Query (AMQ) is defined as follows: given a data set , a compact data structure is built by sequentially inserting all data elements from data set/> . For data element/> , the approximate member query method supports a lookup operation to answer whether a given data element belongs to the data set/> . The data structure of the approximate member query may result in false positives that result in a lookup of data element/> showing that it belongs to data set , but data element/> may not belong to data set/> . Meanwhile, the processing speed of the approximate member query method depends on the complexity of the operation: the more complex the operation, the lower the throughput. Thus, approximating member query data structures requires a balance between space efficiency and operational throughput.

There have been a number of related methods for approximate member queries, including Bloom filters (Bloom filters), quotient filters (Quotient Filter), cuckoo filters (Cuckoo filters), and variants thereof. The bloom filter maintains a number of bits of length , hashes each data element to a number of bits in the array, and sets the bits to '1' to indicate that the data element belongs to the data set/> , enabling fast insertion and lookup, but not supporting deletion. Some variants of bloom filters are proposed that can support deletion, e.g. a counter-type bloom filter replaces a bit array with a counter array to support deletion, but takes up more space. Unlike the bloom filter, the use of a hash table to store fingerprints corresponding to data elements belonging to a data set can support a deletion operation, can more effectively handle dynamic changes in the filter, and uses less space when the false alarm rate is low, thus becoming an excellent alternative to the bloom filter in many applications. Recent studies on the cuckoo filter method have focused mainly on improving its performance, including improving space utilization, improving operation throughput, and reducing false alarm rate. However, these methods are limited to single-threaded execution and have faced performance bottlenecks. Their operating throughput is limited by CPU single-core performance, which has grown at a significantly slower rate in recent years. Furthermore, the length of the cuckoo filter (number of storage tanks) must be strictly set to a form of a power of 2, i.e./> . Otherwise, using an exclusive or operation to solve one candidate bucket for another candidate bucket may overflow.

Whether the parallel functionality provided by the multi-core CPU or programmable hardware can be leveraged to improve the performance of the filter is a potential opportunity. However, all cuckoo filters face a common challenge when executed concurrently in multiple threads: memory access conflicts. The cuckoo filter encodes the data elements in the data set into a shared hash table. Wherein the relocation of data elements, i.e. the recursive kicking of data elements from a full bucket to its replacement bucket using a partial key-cuckoo hash method, often results in memory access conflicts when multiple threads attempt to relocate different data elements simultaneously, resulting in reduced performance or additional synchronization overhead. The most relevant work to the parallel execution of the cuckoo filter is the parallel cuckoo hash method, which is specifically designed for cuckoo hash tables-cuckoo filters are a compact version of the cuckoo hash table, storing only fingerprints and using partial key cuckoo hash methods to find the spare location of data elements based only on fingerprints. However, there are a few parallel cuckoo hash methods that work well for cuckoo filters, and these methods do not perform well and do not support programmable hardware.

Disclosure of Invention

Therefore, the invention aims to solve the technical problem that the approximate member query method based on the cuckoo filter in the prior art faces performance bottleneck under the condition of single-thread execution.

In order to solve the technical problems, in a first aspect, the present invention provides a space-time efficient parallel approximate member query method, which includes:

acquiring a data set to be operated; the operations include an insert operation, a find operation, and a delete operation;

dividing data elements in the data set into a plurality of disjoint sub-data sets, wherein each sub-data set corresponds to a pipeline stage;

Each pipeline stage sequentially and continuously processes data elements waiting for operation;

When the operation of each pipeline stage on the data element is completed, judging whether an operation result is returned or not;

if yes, returning a result of successful operation, or returning a result of failed operation when the operation fails and the complete pipeline stage is traversed or the preset kick-out operation times are reached;

if not, the data element fails to operate in the current pipeline stage, and enters the next pipeline stage to operate.

In one embodiment of the present invention, in each pipeline stage, data elements waiting for operation are processed successively, the sources of the data elements include:

assigned to a data element of the current pipeline stage awaiting the first operation, or to a data element of the previous pipeline stage failing to operate.

In one embodiment of the present invention, the specific steps of performing the inserting operation include:

Acquiring a data element to be inserted, and calculating to obtain a fingerprint of the data element to be inserted;

Combining the fingerprint of the data element to be inserted, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be inserted in the current pipeline stage;

judging whether an empty storage tank exists in the candidate storage bucket; if yes, storing the fingerprints of the data elements to be inserted into an empty storage tank, and returning a successful insertion result; if not, randomly selecting one storage tank in the candidate storage barrel to kick out;

Judging whether the preset kicking operation times are reached or not; if yes, discarding the insertion and returning a result of the insertion failure; if not, the fingerprint of the data element to be inserted enters the next pipeline stage to continue the insertion operation until an empty storage tank appears or the maximum kick-out operation times are reached.

In one embodiment of the invention, the kick-out operation includes: and randomly selecting a storage tank from the candidate storage buckets, and exchanging values of fingerprints of the storage tank and the data elements to be inserted.

In one embodiment of the present invention, the specific steps of performing the lookup operation include:

Acquiring a data element to be searched, and calculating to obtain a fingerprint of the data element to be searched;

Combining the fingerprint of the data element to be searched, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be searched in the current pipeline stage;

Judging whether the value of a storage tank in a candidate storage bucket of the current pipeline stage is equal to the fingerprint value of the data element to be searched; if yes, returning a successful searching result;

if not, judging whether all pipeline stages have been searched; if all pipeline stages have been searched, returning a searching failure result;

If not, entering the next pipeline stage to continue the searching operation.

In one embodiment of the present invention, the specific steps of performing the deletion operation include:

acquiring a data element to be deleted, and calculating to obtain a fingerprint of the data element to be deleted;

Combining the fingerprint of the data element to be deleted, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be deleted in the current pipeline stage;

Judging whether the value of a storage groove in the candidate storage bucket is equal to the fingerprint value of the data element to be deleted; if yes, executing the deleting operation, setting the corresponding storage groove to be empty, and returning a successful deleting result;

if not, judging whether all pipeline stages have been searched; if all pipeline stages have been searched, the data elements to be deleted are not stored in the representing filter, and a deletion failure result is returned;

If not, entering the next pipeline stage to continue deleting operation.

In one embodiment of the present invention, before acquiring the data set to be operated, initializing the filter according to the user requirement specifically includes:

initializing the number of sub-filters of the filter;

Initializing each sub-filter; each sub-filter comprises a plurality of buckets, each bucket comprising a plurality of storage slots; each sub-filter provides a candidate bucket for each data element;

Correspondingly distributing the sub-filters to pipeline stages; each sub-filter corresponds to a pipeline stage for pipelined parallel operation.

In one embodiment of the present invention, during the process of operating on the data element, the index of the candidate bucket is further calculated by pipeline hash, and specifically includes:

Calculating candidate bucket indexes of the next pipeline stage according to the bucket index sequence of the current pipeline stage;

candidate bucket indices for the initial pipeline stage are calculated based on the bucket index for the last pipeline stage.

In a second aspect, the present invention provides a space-time efficient parallel approximate member query system, which is applied to the space-time efficient parallel approximate member query method described in any one of the foregoing embodiments, and includes an initialization module, a pipeline hash module, and a pipeline parallel module, where:

the initialization module is used for initializing the filter according to the user requirements;

The pipeline hash module is used for calculating indexes of candidate storage barrels of the pipeline stage through a pipeline hash method;

the pipeline parallel module is used for acquiring a data set to be operated; dividing data elements in the data set into a plurality of disjoint sub-data sets, wherein each sub-data set corresponds to a pipeline stage; each pipeline stage sequentially and continuously processes data elements waiting for operation; when the operation of each pipeline stage on the data element is completed, judging whether an operation result is returned or not; if yes, returning a result of successful operation, or returning a result of failed operation when the operation fails and the complete pipeline stage is traversed or the preset kick-out operation times are reached; if not, the data element fails to operate in the current pipeline stage, and enters the next pipeline stage to operate.

In one embodiment of the present invention, the pipeline parallel module further includes an insert operation module, a find operation module, and a delete operation module, wherein:

The inserting operation module is used for obtaining the data element to be inserted and calculating to obtain the fingerprint of the data element to be inserted; combining the fingerprint of the data element to be inserted, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be inserted in the current pipeline stage; judging whether an empty storage tank exists in the candidate storage bucket; if yes, storing the fingerprints of the data elements to be inserted into an empty storage tank, and returning a successful insertion result; if not, randomly selecting one storage tank in the candidate storage barrel to kick out; judging whether the preset kicking operation times are reached or not; if yes, discarding the insertion and returning a result of the insertion failure; if not, enabling the fingerprint of the data element to be inserted into the next pipeline stage to continue the insertion operation until an empty storage tank appears or the maximum kicking operation times are reached;

The searching operation module is used for acquiring the data element to be searched and calculating to obtain the fingerprint of the data element to be searched; combining the fingerprint of the data element to be searched, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be searched in the current pipeline stage; judging whether the value of a storage tank in a candidate storage bucket of the current pipeline stage is equal to the fingerprint value of the data element to be searched; if yes, returning a successful searching result; if not, judging whether all pipeline stages have been searched; if all pipeline stages have been searched, returning a searching failure result; if not, entering the next pipeline stage to continue searching operation;

The deleting operation module is used for acquiring the data element to be deleted and calculating to obtain the fingerprint of the data element to be deleted; combining the fingerprint of the data element to be deleted, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be deleted in the current pipeline stage; judging whether the value of a storage groove in the candidate storage bucket is equal to the fingerprint value of the data element to be deleted; if yes, executing the deleting operation, setting the corresponding storage groove to be empty, and returning a successful deleting result; if not, judging whether all pipeline stages have been searched; if all pipeline stages have been searched, the data elements to be deleted are not stored in the representing filter, and a deletion failure result is returned; if not, entering the next pipeline stage to continue deleting operation.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

The space-time efficient parallel approximate member query method and system eliminate parallel conflict and locking operation in the traditional parallel solution. It is highly friendly to multithreaded execution and programmable hardware such as FPGAs and P4 chips because they allow the allocation of filter operations (including inserts, lookups, and deletions) to multiple threads (or hardware pipeline stages), each maintaining a sub-filter. There is also a competitive advantage in terms of space utilization because it allows each element to explore more candidate buckets at the time of insertion, i.e., less likely to be evicted or cause space overflow. In addition, the hash algorithm uses addition and subtraction instead of exclusive or for calculation, which effectively solves the limitation that the length of the table in the cuckoo filter must be a power of 2.

Drawings

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings, in which:

FIG. 1 is a flow chart of a space-time efficient parallel approximate member query method provided in a preferred embodiment of the present invention;

FIG. 2 is a flow chart of steps provided in a preferred embodiment of the present invention for pipelined parallel operation;

FIG. 3 is a flowchart showing the steps of an insert operation provided in a preferred embodiment of the present invention;

FIG. 4 is a flowchart illustrating the steps of a lookup operation provided in a preferred embodiment of the present invention;

FIG. 5 is a flowchart showing the steps of a delete operation provided in a preferred embodiment of the present invention;

FIG. 6 is a functional block diagram of a space-time efficient parallel approximation member query system provided in a preferred embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

In order to solve the problem that the existing approximate member query method based on the cuckoo filter faces performance bottleneck under the condition of single-thread execution, the invention designs the pipeline parallel approximate member query method capable of avoiding memory access conflict, aims to break through the existing performance bottleneck to realize a more time-space efficient approximate member query data structure by utilizing the parallel function provided by a multi-core CPU (Central processing Unit) or programmable hardware, and simultaneously solves the limitation that the length of the cuckoo filter must be a power of 2.

Referring to fig. 1, fig. 1 is a flowchart of a space-time efficient parallel approximate member query method according to an embodiment of the present invention. As shown in fig. 1, the method provided by the embodiment of the invention includes the following steps:

S101: acquiring a data set to be operated; the operations include an insert operation, a find operation, and a delete operation.

S102: dividing the data elements in the data set into a plurality of disjoint sub-data sets, wherein each sub-data set corresponds to a pipeline stage.

S103: each pipeline stage in turn continually processes data elements awaiting operation.

In some possible embodiments, in each pipeline stage, the data elements waiting for operation are processed successively, and the sources of the data elements include:

S104: and when the operation of each pipeline stage on the data element is completed, judging whether an operation result is returned or not.

S105: if yes, returning a result of successful operation, or returning a result of failed operation when the operation fails and the complete pipeline stage is traversed or the preset kick-out operation times are reached; if not, the data element fails to operate in the current pipeline stage, and enters the next pipeline stage to operate.

In a specific embodiment, as shown in fig. 2, fig. 2 is a flowchart of steps of a pipeline parallel operation provided in the present invention, and preferably, four pipeline stages are taken as an example:

a data set is acquired that needs to be manipulated.

The data elements in the data set are divided into 4 disjoint sub-data sets, corresponding to 4 pipeline stages, namely, the data elements are uniformly distributed to the corresponding pipeline stages to start the first target operation.

It should be noted that, when the invention is inserted, the non-uniform memory distribution can be prevented, and the use and parallel efficiency of the memory can be improved; in searching and deleting, since most data elements are inserted into the filter when first inserted, starting the first operation at the corresponding pipeline stage can effectively reduce the number of operations. Thus, the parallelism can be improved.

Each pipeline stage in turn continually processes data elements awaiting operation.

Specifically, the sources of data elements have two parts: data elements allocated to the pipeline stage awaiting a first operation; the data element that failed in the previous pipeline stage waits for the pipeline stage to operate.

After each pipeline stage operates on the element, whether the operation is finished or not is judged, namely whether an operation result (success or failure) is returned or not. If yes, returning an operation result, and executing the next step; otherwise, the element fails to operate in the current pipeline stage, enters the next pipeline stage to operate, and returns to execute the previous step.

Notably, the insertion operation requires continuous kicking and insertion, not ending until the insertion is successful (operation is successful) or the maximum number of kicks is reached (operation failure); the searching and deleting operation can return successful operation only by successfully executing the operation in a certain pipeline stage, or can finish the operation after traversing all pipeline stages, and the operation fails.

And returning an operation result, namely, operation success or operation failure.

In some possible embodiments, before acquiring the data set to be operated, the filter is further initialized according to the requirement of the user, which specifically includes:

initializing the number of sub-filters of the filter;

In some possible embodiments, during the process of operating on the data element, the index of the candidate bucket is further calculated through pipeline hash method, which specifically includes:

and calculating candidate bucket indexes of the next pipeline stage according to the bucket index sequence of the current pipeline stage.

Illustratively, the insertion range of the fingerprint is defined as the sub-filter length, i.e., the number of sub-filter buckets. For the specific case of 4 pipeline stages, three different reciprocal prime numbers, denoted/> , respectively, must be determined in order to calculate the index of the candidate bucket. The 4 candidate bucket indices 、/>、/>、/> for the data element may then be derived according to equation (1).

The specific formula is as follows:

（1）；

Illustratively, as shown in equation (2), the multiplication inverse/> of is used to participate in the operation, due to the nature of the multiplication inverse: the candidate bucket index for the initial pipeline stage may thus be calculated from the bucket index for the last pipeline stage,/> .

The specific formula is as follows:

（2）；

in some possible embodiments, the specific steps of performing the inserting operation include:

In some possible embodiments, the kick-out operation includes: and randomly selecting a storage tank from the candidate storage buckets, and exchanging values of fingerprints of the storage tank and the data elements to be inserted.

As shown in fig. 3, fig. 3 is a flowchart illustrating steps of an inserting operation according to an embodiment of the present invention, and preferably:

the data element to be inserted is acquired.

Acquiring a fingerprint of a data element : and/> .

The index of the candidate bucket for a data element in a pipeline stage/> is calculated using equation (1) or equation (2) above in combination with the index of the fingerprint , the current pipeline stage, and the previous pipeline stage (the index of the previous pipeline stage is not needed when the data element is first operated on).

It should be noted that, when an element arrives for the first time, not all elements are inserted from the first pipeline stage, which may cause the previous pipeline stage to work all the time, and the later pipeline stage to be in an idle state, and the memory distribution is uneven. Therefore, the embodiment of the invention can uniformly distribute the elements to one of the four pipeline stages to start the insertion operation so as to fully utilize the memory space and the parallel performance.

After the index of the candidate storage bucket of the fingerprint in the current pipeline stage is calculated, judging whether an empty storage tank exists in the corresponding candidate storage bucket; if yes, an empty storage tank exists in the candidate storage bucket of the current pipeline stage, the fingerprint is directly stored in one empty storage tank, and an insertion result is returned; otherwise, the kick-out operation is carried out, and the next step is executed.

No empty storage tank exists in the candidate storage bucket of the current pipeline stage, and one storage tank in the candidate storage bucket is randomly selected for kick-out operation.

Specifically, the kick-out operation is specifically: values of slot and fingerprint/> are exchanged. At this point the data element to be inserted has been inserted into the filter, and the kicked fingerprint/> continues to be inserted.

Judging whether the maximum kicking frequency is reached; if yes, returning an insertion result; if not, the maximum kick-out number is not reached, and the fingerprint enters the next pipeline stage to continue the insertion operation.

Specifically, there are two types of return results for returning the insertion result: 1. successfully inserting the data into the empty storage tank, and returning an insertion result: the insertion is successful; 2. reaching the maximum kick-out frequency, discarding the insertion, and returning the insertion result: insertion failed.

In some possible embodiments, the specific steps of performing the lookup operation include:

If not, entering the next pipeline stage to continue the searching operation.

Fig. 4 is a flowchart illustrating steps of a lookup operation according to an embodiment of the present invention, and is preferably:

the data element to be looked up is obtained.

Acquiring a fingerprint of a data element : and/> .

Because during insertion, in order to fully utilize the memory space and the parallel performance, elements are distributed to different pipeline stages for first insertion, and most elements can be inserted into a filter for the first time. Thus, the lookup operation also maps elements to corresponding pipeline stages for a first lookup.

After the index of the candidate storage bucket of the fingerprint in the current pipeline stage is calculated, judging whether the value of a storage tank in the corresponding candidate storage bucket is equal to the fingerprint/> ; if yes, the search is successful, and a search result of the data element/> is returned; otherwise, continuing to search and executing the next step.

Returning to search successfully when the value of a storage tank in the candidate storage barrel of the current pipeline stage is the same as the fingerprint; judging whether all pipeline stages have been searched when the value of the storage tank is the same as the fingerprint; if so, the search fails and a search result of the data element is returned; otherwise, not searching all pipeline stages, entering the next pipeline stage and continuing the searching operation.

Specifically, the result of the lookup of data element is returned. There are two returned results: 1. the fingerprint of a storage tank in a certain candidate storage bucket is the same as the fingerprint/> of the data element/> , and the search is returned to be successful; 2. the same slot as fingerprint/> is not found in all pipeline stages, returning to the find failure.

In some possible embodiments, the specific steps of performing the deletion operation include:

If not, entering the next pipeline stage to continue deleting operation.

Fig. 5 is a flowchart illustrating steps of a deletion operation provided in the present invention, and is preferably:

the data element to be deleted is acquired.

Acquiring a fingerprint of a data element : and/> .

In the inserting process, in order to fully utilize the memory space and the parallel performance, elements are distributed to different pipeline stages for first inserting, and most elements can be inserted into a filter for the first time. Thus, the delete operation also maps elements to corresponding pipeline stages in an attempt to perform the first delete operation.

After the index of the candidate storage bucket of the fingerprint in the current pipeline stage is calculated, judging whether the value of a storage tank in the corresponding candidate storage bucket is equal to the fingerprint/> ; if yes, starting deleting, and executing deleting operation: setting the corresponding storage groove to be empty, deleting successfully, and returning a deleting result of the data element/> ; otherwise, continuing to search and executing the next step.

Judging whether all pipeline stages have been searched or not, wherein the value of a storage tank which is not in a candidate storage barrel of the current pipeline stage is the same as the fingerprint; if yes, the data elements are not stored in the representative filter, the deletion fails, and a deletion result of the data elements is returned; otherwise, executing the next step.

Entering the next pipeline stage continues to attempt the delete operation.

Specifically, the result of deleting the data element is returned. There are two returned results: 1. the fingerprint of a storage tank in a certain candidate storage bucket is the same as the fingerprint/> of the data element/> , the deleting operation is executed, and a result of successful deleting is returned; 2. the same slot as fingerprint/> is not found in all pipeline stages, i.e., data element/> is not stored by the filter, returning the result of the deletion failure.

The above is a specific implementation step of three operations in the embodiment of the present invention, where all operations may be performed in parallel in a pipeline.

Based on the same application conception, the embodiment of the application also provides a space-time efficient parallel approximate member query system corresponding to the space-time efficient parallel approximate member query method provided by the embodiment, and because the principle of solving the problem of the system in the embodiment of the application is similar to that of the space-time efficient parallel approximate member query method of the embodiment of the application, the implementation of the system can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 6, fig. 6 is a functional block diagram of a space-time efficient parallel approximate member query system 100 according to the present application, which is applied to the space-time efficient parallel approximate member query method according to any one of the foregoing embodiments, and includes an initialization module 110, a pipeline hash module 120, and a pipeline parallel module 130, where:

the initialization module 110 is configured to initialize the filter according to a user requirement.

Exemplary, the embodiment of the present invention provides an initialization module 110, which can initialize a filter according to a user requirement, specifically:

Initializing the number of sub-filters of the filter; the filter has sub-filters;

Initializing each sub-filter; each sub-filter contains buckets, each bucket contains/> slots; each sub-filter provides a candidate storage bucket for each element, so the invention allocates a plurality of candidate storage positions for each data element, and more candidate storage buckets can be searched by the data element during insertion;

sub-filters are allocated to/> pipeline stages (Stage). When the filters work, each sub-filter corresponds to one pipeline stage to realize pipeline parallel operation.

The pipeline hash module 120 is configured to calculate, through pipeline hash, an index of a candidate bucket of the pipeline stage.

Illustratively, embodiments of the present invention provide a pipelined hash module 120. The pipeline hash method is specifically designed for the pipeline hash module 120, the candidate bucket index in the next pipeline stage can be obtained through the current candidate bucket index and the fingerprint of the element, and the element with the operation failure in the last pipeline stage can obtain the index of the candidate bucket in the first pipeline stage through the pipeline hash method, specifically, the above formula (1) and formula (2).

The pipeline parallel module 130 is configured to obtain a data set to be operated; dividing data elements in the data set into a plurality of disjoint sub-data sets, wherein each sub-data set corresponds to a pipeline stage; each pipeline stage sequentially and continuously processes data elements waiting for operation; when the operation of each pipeline stage on the data element is completed, judging whether an operation result is returned or not; if yes, returning a result of successful operation, or returning a result of failed operation when the operation fails and the complete pipeline stage is traversed or the preset kick-out operation times are reached; if not, the data element fails to operate in the current pipeline stage, and enters the next pipeline stage to operate.

Illustratively, embodiments of the present invention provide a pipelined parallel module 130 that implements corresponding operations on data elements in a pipelined parallel manner.

After the arrival of the data elements, the data elements are evenly distributed to the pipeline stages using equation (3).

（3）；

If all data elements begin to operate for the first time in the first pipeline stage, the parallelism of the filter is greatly reduced, and the memory distribution is uneven.

The data elements sequentially enter corresponding pipeline stages to perform target operation, and different pipeline stages can work simultaneously.

When the operation of the data element in the current pipeline stage fails, if the ending condition is not met, the data element enters the next pipeline stage to continue the corresponding operation, as shown in the formula (4), until the operation is successful or the ending condition is met.

（4）；

After the data element enters the next pipeline stage from the current pipeline stage, the current pipeline stage does not need to wait for the operation of other pipeline stages, and can directly operate on the next arriving data element.

In some possible embodiments, the pipeline parallel module further includes an insert operation module, a find operation module, and a delete operation module, wherein:

The inserting operation module is used for obtaining the data element to be inserted and calculating to obtain the fingerprint of the data element to be inserted; combining the fingerprint of the data element to be inserted, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be inserted in the current pipeline stage; judging whether an empty storage tank exists in the candidate storage bucket; if yes, storing the fingerprints of the data elements to be inserted into an empty storage tank, and returning a successful insertion result; if not, randomly selecting one storage tank in the candidate storage barrel to kick out; judging whether the preset kicking operation times are reached or not; if yes, discarding the insertion and returning a result of the insertion failure; if not, the fingerprint of the data element to be inserted enters the next pipeline stage to continue the insertion operation until an empty storage tank appears or the maximum kick-out operation times are reached.

Illustratively, the kicked data element then enters the next pipeline stage to reinsert the filter. This kick-out process continues until an empty holding tank is present or the number of kicks reaches a preset threshold (). /(I)

The searching operation module is used for acquiring the data element to be searched and calculating to obtain the fingerprint of the data element to be searched; combining the fingerprint of the data element to be searched, the index of the current pipeline stage and the index of the previous pipeline stage, and calculating to obtain the index of the candidate storage bucket of the data element to be searched in the current pipeline stage; judging whether the value of a storage tank in a candidate storage bucket of the current pipeline stage is equal to the fingerprint value of the data element to be searched; if yes, returning a successful searching result; if not, judging whether all pipeline stages have been searched; if all pipeline stages have been searched, returning a searching failure result; if not, entering the next pipeline stage to continue the searching operation.

Illustratively, if the same slot as the fingerprint is not found in the candidate buckets for all pipeline stages, return indicates that the data element is not present in the data set.

Illustratively, if no slots with fingerprints are found in the candidate buckets for all pipeline stages, return indicates that the data element is not present in the data set and cannot be deleted.

In summary, the space-time efficient parallel approximate member query system provided by the invention mainly comprises an initialization module, a pipeline hash module and a pipeline parallel module, wherein the pipeline parallel module comprises three operation modules: the device comprises an insertion operation module, a searching operation module and a deleting operation module. The initialization module is responsible for initializing the filter according to the user demand. The pipeline parallel module executes corresponding target operation on the arrived data elements through a pipeline parallel method: an inserting operation of inserting data elements in the data set into the filter; a search operation, searching whether the data element is stored in the filter, namely searching whether a given data element exists in the data set; and a deleting operation, wherein whether a given data element exists in the filter or not is searched, and if so, the data element is deleted from the filter. The pipeline parallel module may pass through multiple pipeline stages when performing a destination operation on a data element. The pipeline hash module gives the index of the candidate storage bucket of the data element in the current pipeline stage when the data element reaches each pipeline stage.

Further, compared with the prior art, the algorithm provided by the embodiment of the invention eliminates parallel conflict and locking operation in the traditional parallel solution. It is highly friendly to multithreaded execution and programmable hardware such as FPGAs and P4 chips because they allow the allocation of filter operations (including inserts, lookups, and deletions) to multiple threads (or hardware pipeline stages), each maintaining a sub-filter. There is also a competitive advantage in terms of space utilization because it allows each element to explore more candidate buckets at the time of insertion, i.e., less likely to be evicted or cause space overflow. In addition, the hash algorithm uses addition and subtraction instead of exclusive or for calculation, which effectively solves the limitation that the length of the table in the cuckoo filter must be a power of 2.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. A space-time efficient parallel approximate member query method, comprising:

2. The space-time efficient parallel approximate member query method of claim 1, wherein in each pipeline stage, data elements waiting for operation are processed successively, the sources of the data elements include:

3. The space-time efficient parallel approximate member query method of claim 1, wherein the specific step of performing the inserting operation comprises:

4. The space-time efficient parallel approximate member query method of claim 3, wherein the kick-out operation comprises: and randomly selecting a storage tank from the candidate storage buckets, and exchanging values of fingerprints of the storage tank and the data elements to be inserted.

5. The space-time efficient parallel approximate member query method of claim 1, wherein the specific steps of performing the lookup operation comprise:

If not, entering the next pipeline stage to continue the searching operation.

6. The space-time efficient parallel approximate member query method of claim 1, wherein the specific step of performing the delete operation comprises:

If not, entering the next pipeline stage to continue deleting operation.

7. The space-time efficient parallel approximate member query method of claim 1, further comprising initializing a filter according to user requirements before acquiring the data set to be operated, and specifically comprising:

initializing the number of sub-filters of the filter;

8. The space-time efficient parallel approximate member query method of claim 1, wherein in the process of operating on data elements, indexes of candidate buckets are also calculated through pipeline hash, and the method specifically comprises the following steps:

9. A space-time efficient parallel approximate member query system, applied to the space-time efficient parallel approximate member query method of any of the above claims 1-8, characterized by comprising an initialization module, a pipeline hash module and a pipeline parallel module, wherein:

10. The space-time efficient parallel approximate member query system of claim 9, wherein the pipelined parallel module further comprises an insert operation module, a find operation module, and a delete operation module, wherein: