CN113360516B - Collection member management method - Google Patents
Collection member management method Download PDFInfo
- Publication number
- CN113360516B CN113360516B CN202110920389.1A CN202110920389A CN113360516B CN 113360516 B CN113360516 B CN 113360516B CN 202110920389 A CN202110920389 A CN 202110920389A CN 113360516 B CN113360516 B CN 113360516B
- Authority
- CN
- China
- Prior art keywords
- fingerprint information
- bucket
- value
- relocation
- set member
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Abstract
The present invention relates to a collection member management technique. The invention provides a set member management method based on first-in first-out and minimum active number strategies, which has the technical scheme that: the method comprises a set member inserting method based on a first-in first-out and minimum active number strategy, a set member judging method based on a first-in first-out and minimum active number strategy and a set member deleting method based on a first-in first-out and minimum active number strategy. The invention overcomes the problems of unbalanced load and time overhead of additionally traversing blank positions of the barrel caused by adopting a random selection strategy when the current set member management method is inserted, overcomes the problem of larger time overhead when the current set member judgment process is traversed and inquired, and overcomes the problem of lower efficiency of a data compaction technology in the current set member deletion process, and is suitable for the set member management method.
Description
Technical Field
The invention relates to a computer information representation and retrieval technology, in particular to a technology of a set member management method based on a first-in first-out and minimum active number strategy.
Background
Efficient data set representation and accurate set member judgment are two core problems existing in the current data era, but with the increase of the data set in a large amount, if data is stored in a complete posture, not only the storage space overhead is increased explosively, but also the time overhead spent in member management, such as judgment, update, deletion and the like of set members, is increased explosively, and the storage space overhead and the time overhead cannot be accepted. Therefore, in order to solve the above problems, the application needs to satisfy the requirements of small overhead of storage space and small overhead of time such as fast insertion, fast determination, and fast update and deletion of members. The data set representation and judgment technology of the current set members mainly adopts the following three modes: 1) set member representation and decision technology based on bloom filter and its related variants, which is static implementation, identifies the existence of a set member by a boolean value, and although time and space efficiency is guaranteed, does not support deletion of the set member, and thus cannot satisfy the situation of dynamic data update. 2) The technology realizes dynamic scaling transformation of set size through a data structure such as a linked list to solve the problem of space utilization efficiency, but in the aspect of operation problems such as deleting set members, especially under the condition of high frequency of data dynamic updating transformation, the error rate of member judgment operation is greatly improved by deleting set member operation, and even the whole linked list is possibly failed. 3) A cuckoo filter (cuckoo filter) based set member representation and determination technique that calculates member set fingerprint information by two specific HASH functions, stores the member set fingerprint information in a specific table to identify a set member, and determines a set member by matching the fingerprint information, a conventional cuckoo filter includes its existing variant techniques, and its set member insertion method basically follows a random selection policy: for example, if the cuckoo filter knows that both candidate buckets are not full buckets through a traversal method, one candidate bucket may be randomly selected, and at this time, the bucket may be about to be a full bucket, and the other bucket is an empty bucket, and the traversal process in this period not only takes time overhead but also causes unbalanced insertion, and also increases the number of times of relocation operations; for another example, when the set member selects the empty slot to insert into the candidate bucket, the cuckoo filter determines the control of the empty slot by traversing the position of the candidate bucket, then randomly selects an empty slot to insert for the set member, and each new member will go through the candidate bucket once in, during which the traversing process will take a lot of time overhead. Therefore, a large amount of time overhead is caused when the random selection strategy followed by the conventional cuckoo filter faces high dynamic change of data, although the number of fingerprints of two candidate buckets is counted firstly in the prior art, and then the bucket with the least number of fingerprints in the two candidate buckets is selected by comparison to serve as an insertion bucket to realize the function of balancing insertion load, when the two candidate buckets are fully loaded, an algorithm fails, and when relocation operation occurs, particularly under the condition that the number of relocation operation times is increased and a storage space is more and more full, a set member may be kicked out by a random selection strategy kicking-out mechanism specific to the cuckoo filter just after being inserted into a set, and the cuckoo filter does not meet the insertion position after being kicked out twice; in summary, the existing method for managing the set members does not well solve the problem of time overhead caused by random selection and random insertion of the set members during insertion and the problem of cyclic relocation operation of the set members.
For example, patent application No. 201510982653.9 entitled "an efficient dynamic data set member management method" discloses an efficient dynamic data set member management method, which includes a member insertion method, a member determination method, a member deletion method, and a data set compaction method, and adapts to the change of the size of a dynamic set through an established dynamic cuckoo filter, and completes the determination of a set member by using the data storage matching of member fingerprint information. The core technology of the patent application is that a list is connected together in a form similar to a linked list, and when relocation operation occurs, if the fingerprint information subjected to relocation has been subjected to relocation operation twice in the list, namely the fingerprint information is not stored in the list, the fingerprint information is stored in the same position in a continued list by searching a pointer. It can be understood that the problem that the storage position cannot be found by the fingerprint information in the relocation operation is solved by means of table continuation, and the insertion method, the member determination method, the member deletion method and the data set compaction method are all performed based on the form of table continuation. The disadvantages are that: firstly, the method can not judge whether the fingerprint information has been subjected to repeated repositioning operations under the condition that the repositioning operations are carried out on the fingerprint information; secondly, the relocation algorithm only speaks the kicked fingerprint information to be inserted into a new table, and does not consider whether another candidate bucket of the fingerprint information of the original table has empty bits to be inserted, which is highly efficient in extreme cases, but causes space waste when the empty bits of the original table are spare. Meanwhile, the insertion algorithm of the application adopts a random selection strategy, which causes the surge of relocation operation and imbalance of insertion. In addition, in the application, the data set compaction method is embedded in the deletion operation, and when the data set compaction method compacts the set members, the data set compaction method does not know the number of the set members and the number of the continuation tables, so that the traversal is performed by one set member and one set member, and the efficiency is low.
For another example, patent application No. 201910419541.0 entitled "method and system for representing approximate data set based on insertion position selection" discloses a method and system for representing approximate data set based on insertion position selection, which belongs to the field of computer information representation and comprises: maintaining a relocation count for each hash bucket in the cuckoo filter; for the member X to be inserted, the following steps are performed: respectively obtain fingerprint information thereofξ X And two candidate hash buckets; if no fingerprint information is stored in the two candidate hash bucketsξ X Judging whether an empty slot exists or not; otherwise, the inserting operation is finished; if only one of the empty slots exists, fingerprint information is transmittedξ X Inserting the hash bucket into a candidate hash bucket with an empty slot; if the two parts do not have empty slots, fingerprint information is processed through repositioning operationξ X Inserting the hash bucket into a candidate hash bucket with a smaller relocation count, and correspondingly updating the relocation times of the hash bucket; if both have empty slots, then the fingerprint information is transmittedξ X And inserting the hash into a candidate hash bucket with a smaller global insertion number. The patent application can simultaneously support the deletion operation of the set members and the efficient insertion operation. It can be understood that: when fingerprint information insertion operation occurs, the algorithm can traverse the number of the fingerprint information of the two candidate buckets to be used as the selection of the fingerprint insertion bucket, namely, the algorithm uses the candidate buckets as the insertion bucket when the total number of the fingerprints is minimum, so that the load balance of the fingerprint information insertion is realized, and the mode of randomly selecting a strategy is abandoned. Also when a relocation operation occurs, the bucket with the smaller relocation count is selected as the insertion bucket, i.e., the bucket with the smaller relocation count is judged by maintaining the relocation count of each bucket. The disadvantages are that: the fingerprint insertion load balancing count proposed in this patent application isIn general, the implementation can be very good, but if two candidate buckets are full, the relocation count used for the load balancing will be completely invalid, and later, one bucket will be completely invalid, and the other bucket undertakes all insertion work; secondly, although the algorithm for selecting the candidate bucket in the patent application abandons a random selection strategy, the algorithm for inserting the fingerprint into the bucket still adopts the random selection strategy, the process can do barrel traversal once more to traverse the empty position, namely, the algorithm of the whole inserting process checks whether the fingerprint information exists from the traversal bucket and does two traversal operations together with traversing the bucket to search the empty position, wherein the second traversal can be avoided in practice, thereby reducing the time overhead. In the extreme case, when the filter vacancy is less, the fingerprint information just inserted is randomly selected as the kicked fingerprint and the operation is performed twice at the same time, and the fingerprint information becomes the orphan fingerprint, so that the algorithm cannot identify whether the fingerprint is subjected to the over-repositioning operation, that is, the repeated repositioning can occur, the just-replaced fingerprint is kicked out again, and the operation is executed repeatedly, which is an aspect not considered in the patent application.
It can be seen that in the prior art, in the method for managing the set members based on the cuckoo filter, the algorithm basically follows the random selection strategy, so that the conditions of low efficiency, unbalanced insertion of fingerprint information of candidate buckets and the like in the management process of the set members are caused; secondly, most of the conventional cuckoo filters cannot distinguish whether the set members have the excessive relocation operation when the relocation operation occurs, which causes that one member has the repeated relocation operation, namely, the cyclic relocation operation is possibly involved, and a large amount of time is consumed. Similarly, since the member relocation algorithm also follows a random selection strategy, that is, a member is randomly selected from the candidate buckets and kicked out to be switched to another candidate bucket, during which it is likely that a member is just inserted and becomes an orphan member because of being selected and kicked out to perform the relocation operation twice.
Disclosure of Invention
The present invention is to solve the above problems in the prior art, and provides a method for managing collection members.
The invention solves the problems of unbalanced load and time overhead of additionally traversing the blank position of a barrel caused by adopting a random selection strategy during insertion in the technical problems, and adopts the technical scheme that a set member management method comprises a set member insertion method, and the set member insertion method comprises the following steps:
and 2, during insertion, judging whether a fingerprint information label of the member to be inserted exists in the corresponding hash candidate bucket by adopting a set member judgment method, if so, finishing the insertion operation, if not, selecting one of the two hash candidate buckets with the least storage area active count value to carry out the insertion operation, during the insertion operation, calculating the inserted slot position insertion according to the insertion position address information of the hash candidate bucket, if the inserted slot position is not a blank slot position, judging whether the hash candidate bucket still has a blank slot position according to the abnormal blank slot position count of the storage area of the hash candidate bucket, if not, entering the relocation operation, otherwise, randomly inserting the blank slot position.
Further, to provide a more detailed step in insertion, then:
in step 1, the active number count value of the storage area is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step 2 comprises the following specific steps:
step 201, during insertion, obtaining a member of a set to be inserted through hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate buckets;
Step 202, judging by adopting a set member judgment methodWhether corresponding fingerprint information labels exist in two hash candidate bucketsIf yes, the inserting operation is ended, otherwise, the step 203 is entered;
step 203, by reading the respective storage area activity count values of the two hash candidate bucketsACIComparing and selecting a hash candidate with the least active count valueBarrel as insert barrel labelminWB;
Step 204, defining an intermediate variableTINSRead outminWBOf the recording areaValue, then calculate(ii) a If it isTINSIf the slot is a blank slot, step 205 is entered, otherwise step 206 is entered, wherein,L0representing corresponding hash candidate bucketsminWBThe start address of the mobile terminal,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step 205, the fingerprint information isAnd the fingerprint information tag with relocation flag 0 is insertedIn the indicated blank slot positionminWBIn the recording area of
step 206, judgeminWBAbnormal empty slot count in storage area in recording areaIf yes, entering into a repositioning operation step, otherwise entering into step 207;
step 207, the fingerprint information isAnd the relocation flag value is 0Random insertion of fingerprint information tagsminWBIn the abnormal empty slot position ofminWBIn the recording area of
In particular, to illustrate how to obtain the members of the to-be-inserted set by the hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate bucketsThen, in step 201, the member of the set to be inserted is obtained by the hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
Further, in order to provide a relocation operation step, when a relocation operation occurs, the fingerprint information that is kicked out and inserted the longest is selected as the replacement fingerprint information, so as to avoid the problem of cyclic relocation operation of the fingerprint information due to a random selection strategy, in step 206, the relocation operation includes the following steps:
step 206A, willminWBAssigning to intermediate variable the fingerprint information label to be insertedTempWill beEXCAssigning the fingerprint information tag to another intermediate variableTemp1;
Step 206B, willTempThe relocation flag value of the fingerprint information tag in (1) is set to 1, and the relocation flag value is inserted into the fingerprint information tagEXCIn the slot pointed;
step 206D, judgmentTempWhether the relocation flag value of the fingerprint information tag in 1 is 1, if so, entering step 206E, otherwise, entering step 206G;
step 206E, calculating intermediate variablesThen will beTempThe value of the relocation flag of the fingerprint information tag in 1 is set to 2 and insertedaddfIn the empty slot pointed byminWBIn the cache area of (2);
step 206G, calculatingTemp1 and marked as another hash candidate bucketBUTemp1;
Step 206H, readBUTemp1 in the recording areaINSValue, then calculate
Step 206I, judgmentTINSIf the slot position is a blank slot position, entering step 206L if the slot position is the blank slot position, and otherwise entering step 206J;
step 206J, read and judgeBUTemp1 recording area ofERIIf the value is 0, if so, it will beTemp1 as the fingerprint information label to be inserted, returning to the step 206A, otherwise, entering the step 206K;
step 206K, willTempThe relocation flag value of the fingerprint information tag in 1 is set to 1 and is randomly insertedBUTemp1 in the abnormal blank slotBUTempEnabling ERI = ERI-1 and ACI = ACI +1 in the recording area of 1, and ending the repositioning operation;
step 206L, willTempThe relocation flag value of the fingerprint information tag in 1 is set to 1, and the relocation flag value is inserted intoTINSIn the indicated blank slot positionBUTemp1 in the recording area
And ACI = ACI +1, the relocation operation ends.
The invention solves the problem of large time expenditure during traversal query in the judgment process of the set members in the technical problems, and adopts the technical scheme that the set member management method comprises a set member judgment method which comprises the following steps:
step I, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has been subjected to relocation operation for the last time, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has been subjected to relocation operation for 2 times, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number counting value, a storage area activity number counting value, a storage area is included in the recording area, and a cache area is divided into a storage area and a cache area, wherein the fingerprint information with a relocation flag value of 0 is used for storing the fingerprint information of the set member, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information are stored, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning mark value;
and step II, comparing the storage area activity count values of the two hash candidate buckets, firstly, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the hash candidate bucket with the smaller storage area activity count value, if so, inquiring successfully, otherwise, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the other hash candidate bucket, if so, inquiring successfully, otherwise, inquiring fails.
Further, to provide a more detailed step in determining, the following steps are performed:
in step I, the active number count value of the storage area is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step II comprises the following specific steps:
step II.1, when judging, obtaining the members of the set to be judged through the calculation of a Hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate buckets;
Step II.2, by reading the storage area activity count values of the two hash candidate bucketsACIComparing and selecting a hash candidate bucket with the least active count value as the labelminWBAnother hash candidate bucket is marked asmaxWB;
Step II.3, theminWBPerforming a reverse step, traversing the queryWhether the fingerprint information label is inminWBIf so, the query is successful, the query operation is finished, otherwise, the step II.4 is carried out;
step II.4, themaxWBPerforming a reverse step, traversing the queryWhether the fingerprint information label is inmaxWBIf so, the query is successful, and the query operation is finished, otherwise, the query is failed, and the query operation is finished.
Specifically, to illustrate how to obtain the members of the set to be determined by the hash function calculationξ Y Fingerprint information ofAnd corresponding two hash candidate bucketsIn step II.1, the member of the set to be determined is obtained by the calculation of the hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
Further, to specify the reverse step, in step ii.3 and step ii.4, the reverse step includes the following specific steps:
step III.1, reading the hash candidate bucket record area operated by the reverse stepEXCIs worth, and willEXCValue assignment to intermediate variablesTemp;
Step III.2, the fingerprint information to be checked is compared withTempMatching the fingerprint information in the fingerprint information labels stored in the pointed slot positions, if the fingerprint information labels are matched, returning to the step of successfully inquiring, exiting the step of converting, and otherwise entering the step III.3;
step III.3, reading the hash candidate bucket record area operated by the reverse stepINSValue, judgmentTempAndINSif the value information is the same, entering a step III.4, otherwise entering a step III.9;
step III.4, reading the hash candidate bucket record area operated by the reverse stepHFNValue, judgmentHFNIf the value is equal to 0, if so, the query fails, the step of reverse is exited, otherwise, the step III.5 is entered;
step III.5, mixingHFNValue assignment to intermediate variablesTemp1 and another intermediate variable Temp2= L0+ m + HFN-1 is calculated, wherein,L0indicating the starting address of the corresponding hash candidate bucket operated by the reverse step,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step III.6, the fingerprint information to be checked is compared withTemp2 is indicated byMatching the fingerprint information in the fingerprint information labels stored in the slot positions, if the fingerprint information labels are matched, returning to the step of converting, and if the fingerprint information labels are not matched, entering the step III.7;
step III.7, let Temp1= Temp1-1, judgeTempIf 1 is equal to 0, if yes, the query fails, the step of reverse is exited, otherwise, the step III.8 is entered;
step iii.8, let Temp2= Temp2-1, then go back to step iii.6;
step iii.9, let Temp = L0+ { m + (Temp-L0) -1} mod m, and go back to step iii.2.
The invention solves the problem of low efficiency of data compaction technology in the process of deleting the set members in the technical problems, and adopts the technical scheme that the set member management method comprises a set member deleting method which comprises the following steps:
step A, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has recently undergone relocation operation, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has recently undergone 2 relocation operations, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number count value, a storage area count value, and a cache area count value, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information are stored, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning flag bit;
and step B, during deletion, judging whether the fingerprint information tags of the set members to be deleted exist in the two corresponding hash candidate buckets by adopting a set member judgment method, if not, deleting failure is caused, if so, deleting the fingerprint information tags, judging whether set members exist in the cache area of the hash candidate buckets according to the cache area count value of the hash candidate buckets, if not, deleting the set members, if so, migrating the first set member existing in the cache area to a blank slot position corresponding to the deleted fingerprint information, and ending the deleting operation.
Further, to provide a more detailed step in deletion, then:
in step A, the active number count value of the storage area is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step B comprises the following specific steps:
step B1, when deleting, obtaining the members of the set to be deleted through the calculation of the hash functionξ Z Fingerprint information ofAnd corresponding two hash candidate buckets;
Step B2, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf the storage area exists, the step B3 is carried out, otherwise, the step B15 is carried out;
step B3, storing fingerprint informationHash candidate bucket marking of belonging fingerprint information labelbucket Z Then read outbucket Z In the recording areaHFNValue, judge theHFNIf the value is equal to 0, go to step B7 if it is, otherwise go to step B4;
step B4, calculating the intermediate variable addfDEL = L0+ m + HFN-1, and then willaddfDELAssigning the fingerprint information tag stored in the pointed slot to another intermediate variableTemp2, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step B5, deleteaddfDELThe fingerprint information label stored in the pointed slot position is toTemp2, the relocation flag value of the fingerprint information tag is set to be 1, and the overlay fingerprint information is insertedThe fingerprint information label of the user;
step B6 atbucket Z Enabling ACI = ACI-1 and HFN = HFN-1 in the recording area, and finishing the deleting operation after the deleting operation is successful;
step B8, readbucket Z In the recording areaINSValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toINSIf yes, go to step B12, otherwise go to step B9;
step B9, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B10, otherwise go to step B11;
step B10 atbucket Z Enabling ACI = ACI-1 and EXC = L0+ { m + (EXC-L0) -1} mod m in the recording area, and finishing the deleting operation after the deleting is successful;
step B11 atbucket Z Calculating ACI = ACI-1 and ERI = ERI +1 in the recording area, and finishing the deleting operation after the deleting operation is successful;
step B12, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B13, otherwise go to step B14;
step B13 atbucket Z In the recording area, ACI = ACI-1, EXC = L0+ { m + (EXC-L0) -1} mod m and INS = L0+ { m + (INS-L0) +1} mod m are deleted successfully, and the deletion operation is finished;
step B14 atbucket Z Enabling ACI = ACI-1 and INS = L0+ { m + (INS-L0) +1} mod m in the recording area, and finishing the deleting operation after the deleting is successful;
step B15, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf the cache area exists, the step B16 is carried out, otherwise, the deletion fails, and the deletion operation is finished;
step B16, storing fingerprint information in the buffer memoryHash candidate bucket marking of belonging fingerprint information labelbucket Z Calculate the intermediate variable addfddel = L0+ m + HFN-1, and then willaddfDELFingerprint information tag covering fingerprint information stored in the pointed slot positionThe fingerprint information labels are deleted at the same timeaddfDELThe fingerprint information tag stored in the slot being pointed is concatenated empty, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,HFNcandidate buckets for corresponding hashesbucket Z Is/are as followsHFNThe value of the one or more of the one,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step B17 atbucket Z In the recording area, ACI = ACI-1 and HFN = HFN-1, the deletion is successful, and the deletion operation is ended.
In particular, to illustrate how to obtain the members of the set to be deleted by the hash function calculationξ Z Fingerprint information ofAnd corresponding two hash candidate bucketsThen, in step B1, the set member to be deleted is obtained by the hash function calculationξ Z Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
The invention has the advantages that the method for inserting the collection members in the method for managing the collection members can be seen, the candidate buckets for storing the fingerprint information of the collection members are selected by adopting the self-adaptive load balancing strategy based on the minimum active number strategy, the failure of the algorithm for inserting the load balancing caused by the full load of the candidate buckets in the prior art is overcome, the algorithm can ensure that the fingerprint numbers of the candidate buckets are close to consistency under different conditions, the effect of balancing the fingerprint numbers loaded among the candidate buckets is achieved, meanwhile, the random selection strategy is abandoned, the first-in first-out strategy is adopted, namely, when the fingerprint information of the collection members is inserted into the candidate buckets, the fingerprint information is sequentially arranged in the candidate buckets according to the insertion sequence, the steps are also seen through the repositioning operation steps, when the repositioning operation is carried out, the fingerprint information which is furthest inserted into the buckets is selected as the fingerprint information for replacing the buckets, by the method, the repeated relocation operation of sending the fingerprint information of the set members can be effectively reduced, and the insertion efficiency of the set members is improved; and through adopting the construction mode of three areas and one label, the number of the orphan fingerprint information is as small as possible, and the times of relocation of each fingerprint information are determined (the times of relocation are not relocated, namely the relocation flag value is 0, once of relocation, namely the relocation flag value is 1, and twice of relocation, namely the relocation flag value is 2, through the method, the relocation operation cannot be directed to the set members in the cache area, and the set members in the cache area cannot be relocated), thereby avoiding the fingerprint information from being circularly relocated due to the incapability of identifying the times of relocation of the fingerprint information, improving the operation efficiency, and simultaneously aiming at the characteristics of the mechanism, providing a data compaction technology, by arranging and moving the set members in the cache area to an unfilled storage area, the waste of storage resources in the cache area is avoided, the application of the storage resources in the storage area is improved, and the space utilization rate is greatly improved. According to the set member judgment method and the set member deletion method in the set member management method, the member judgment method based on the first-in first-out strategy is adopted to traverse the query bucket storage area, so that the time complexity is equivalent to the sequential traversal when the bucket is in a full state, and the time overhead is smaller than the time overhead of the sequential search traversal when the bucket is not in a full state, the query times are reduced as much as possible, and the traversal query efficiency is improved.
Drawings
Fig. 1 is a processing flow chart when a set member management method inserts in the embodiment of the present invention.
Fig. 2 is a flowchart of processing at the time of relocation operation in the set member management method according to the embodiment of the present invention.
Fig. 3 is a flowchart of processing in the determination in the set member management method according to the embodiment of the present invention.
Fig. 4 is a flowchart of processing a reverse step in the set member management method according to the embodiment of the present invention.
Fig. 5 is a flowchart of processing at the time of deletion in the set member management method according to the embodiment of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the embodiments and the accompanying drawings.
The invention aims to solve the problems of unbalanced load and time overhead of additionally traversing blank positions of barrels caused by adopting a random selection strategy during insertion in the existing set member management method, and adopts the technical scheme that:
a collection member management method comprising a collection member insertion method, the collection member insertion method comprising the steps of:
step 1, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has recently undergone relocation operation, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has recently undergone 2 relocation operations, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number count value, a storage area count value, and a cache area count value, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning mark value.
For convenience of subsequent description and computer operation, in this step, the active number count value of the storage area may be recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS。
And 2, during insertion, judging whether a fingerprint information label of the member to be inserted exists in the corresponding hash candidate bucket by adopting a set member judgment method, if so, finishing the insertion operation, if not, selecting one of the two hash candidate buckets with the least storage area active count value to carry out the insertion operation, during the insertion operation, calculating the inserted slot position insertion according to the insertion position address information of the hash candidate bucket, if the inserted slot position is not a blank slot position, judging whether the hash candidate bucket still has a blank slot position according to the abnormal blank slot position count of the storage area of the hash candidate bucket, if not, entering the relocation operation, otherwise, randomly inserting the blank slot position.
Providing a more detailed step during insertion, with reference to fig. 1 as a flowchart, the step may include the following specific steps:
step 201, during insertion, obtaining a member of a set to be inserted through hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate buckets。
To illustrate how to compute the members of the set to be inserted by the hash functionξ X Fingerprint information ofAnd corresponding two hash candidate bucketsIn this step, the member of the set to be inserted is obtained by hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
Step 202, judging by adopting a set member judgment methodWhether corresponding fingerprint information labels exist in two hash candidate bucketsIf so, the insert operation ends, otherwise step 203 is entered.
Step 203, by reading the respective stores of the two hash candidate bucketsZone activity count valueACIComparing and selecting a hash candidate bucket with the least active count value as an insertion bucket markminWB。
Step 204, defining an intermediate variableTINSRead outminWBINS value of the recording area, and then TINS = L0+ { m + (INS-L0) -1} mod m; if it isTINSIf the slot is a blank slot, step 205 is entered, otherwise step 206 is entered, wherein,L0representing corresponding hash candidate bucketsminWBThe start address of the mobile terminal,mindicating the maximum number of fingerprint tags carried in the storage area.
Step 205, the fingerprint information isAnd the fingerprint information tag with the repositioning flag of 0 is inserted into the blank slot pointed by the TINSminWBLet INS = L0+ { m + (INS-L0) -1} mod m and ACI = ACI +1 in the recording area(s) of (a), and the insertion operation is ended.
Step 206, judgeminWBAnd (4) judging whether the value of the abnormal blank slot position count ERI of the storage area in the recording area is 0, if so, entering a repositioning operation step, and otherwise, entering a step 207.
Step 207, the fingerprint information isAnd the fingerprint information label with the relocation flag value of 0 is randomly insertedminWBIn the abnormal empty slot position ofminWBLet ERI = ERI-1 and ACI = ACI +1 in the recording area of (a), and the insertion operation is ended.
In order to provide a relocation operation step, when a relocation operation occurs, the fingerprint information that is kicked out and inserted the longest is selected to be replaced, so as to avoid the problem of cyclic relocation operation of fingerprint information due to a random selection strategy, the flowchart of which is shown in fig. 2, then in step 206, the relocation operation includes the following steps:
step 206A, willminWBAssigning to intermediate variable the fingerprint information label to be insertedTempWill beEXCAssigning the fingerprint information tag to another intermediate variableTemp1。
Step 206B, willTempThe relocation flag value of the fingerprint information tag in (1) is set to 1, and the relocation flag value is inserted into the fingerprint information tagEXCIn the slot referred to.
Step 206C, atminWBLet ACI = ACI +1, INS = L0+ { m + (INS-L0) -1} mod m, and EXC = L0+ { m + (INS-L0) -1} mod m.
Step 206D, judgmentTempIf the relocation flag value of the fingerprint information tag in 1 is 1, if so, go to step 206E, otherwise go to step 206G.
Step 206E, calculate the intermediate variable addf = L0+ m + HFN, then willTempThe value of the relocation flag of the fingerprint information tag in 1 is set to 2 and insertedaddfIn the empty slot pointed byminWBIn the cache area of (a).
Step 206F, atminWBLet ACI = ACI +1 and HFN = HFN +1 in the recording area of (1), the relocation operation ends.
Step 206G, calculatingTemp1 and marked as another hash candidate bucketBUTemp1。
Step 206H, readBUTemp1 in the recording areaINSValue, then calculate
TINS=L0+{m+(INS-L0)-1}mod m。
Step 206I, judgmentTINSIf the slot is a blank slot, go to step 206L if so, or go to step 206J if not.
Step 206J, read and judgeBUTemp1 recording area ofERIIf the value is 0, if so, it will beTemp1 returns to step 206A as the fingerprint information tag to be inserted, otherwise step 206K is entered.
Step 206K, willTempThe relocation flag value of the fingerprint information tag in 1 is set to 1 and is randomly insertedBUTemp1 in the abnormal blank slotBUTempLet ERI = ERI-1 and ACI = ACI +1 in the recording area of 1, the relocation operation ends.
Step 206L, willTempThe relocation flag value of the fingerprint information tag in 1 is set to 1, and the relocation flag value is inserted intoTINSIn the indicated blank slot positionBUTempLet INS = L0+ { m + (INS-L0) -1} mod m and ACI = ACI +1 in the recording area of 1, and the relocation operation ends.
Note that the relocation flag value for the set member where relocation occurs may only exist at 0 or 1, since relocation operations can only act on set members in the storage area, whereas the relocation flag value for the set member of the storage area may only exist at 0 or 1.
The invention aims to solve the problem of large time overhead in traversing query in the set member judgment process in the existing set member management method, and adopts the technical scheme that:
a set member management method comprising a set member determination method, the set member determination method comprising the steps of:
step I, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has been subjected to relocation operation for the last time, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has been subjected to relocation operation for 2 times, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number counting value, a storage area activity number counting value, a storage area is included in the recording area, and a cache area is divided into a storage area and a cache area, wherein the fingerprint information with a relocation flag value of 0 is used for storing the fingerprint information of the set member, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning mark value.
For convenience of subsequent description and computer operation, in this step, the active number count value of the storage area may be recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS。
And step II, comparing the storage area activity count values of the two hash candidate buckets, firstly, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the hash candidate bucket with the smaller storage area activity count value, if so, inquiring successfully, otherwise, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the other hash candidate bucket, if so, inquiring successfully, otherwise, inquiring fails.
In order to provide more detailed steps for determining, the flowchart of which is shown in fig. 3, step II may include the following specific steps:
step II.1, when judging, obtaining the members of the set to be judged through the calculation of a Hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate buckets。
To illustrate how to calculate the members of the set to be determined by the hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate bucketsIn this step, the member of the set to be determined is obtained by hash function calculationξ Y Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
Step II.2, by reading the storage area activity count values of the two hash candidate bucketsACIComparing and selecting a hash candidate bucket with the least active count value as the labelminWBAnother hash candidate bucket is marked asmaxWB。
Step II.3, theminWBPerforming a reverse step, traversing the queryWhether the fingerprint information label is inminWBIf so, the query is successful, the query operation is finished, otherwise, the step II.4 is carried out.
Step II.4, themaxWBPerforming a reverse step, traversing the queryWhether the fingerprint information label is inmaxWBIf so, the query is successful, and the query operation is finished, otherwise, the query is failed, and the query operation is finished.
To illustrate the reverse step, the flowchart is shown in fig. 4, and in step ii.3 and step ii.4, the reverse step may include the following specific steps:
step III.1, reading the hash candidate bucket record area operated by the reverse stepEXCIs worth, and willEXCValue assignment to intermediate variablesTemp。
Step III.2, the fingerprint information to be checked is compared withTempAnd matching the fingerprint information in the fingerprint information labels stored in the pointed slot positions, if so, returning to the step of successfully inquiring, exiting the step of converting, and otherwise, entering the step III.3.
Step III.3, reading the hash operated by the reverse stepIn the hope bucket recording areaINSValue, judgmentTempAndINSand (4) whether the value information is the same or not, if so, entering a step III.4, and otherwise, entering a step III.9.
Step III.4, reading the hash candidate bucket record area operated by the reverse stepHFNValue, judgmentHFNAnd if the value is equal to 0, the query is failed, the step of reverse is exited, and if not, the step III.5 is entered.
Step III.5, mixingHFNValue assignment to intermediate variablesTemp1 and another intermediate variable Temp2= L0+ m + HFN-1 is calculated, wherein,L0indicating the starting address of the corresponding hash candidate bucket operated by the reverse step,mindicating the maximum number of fingerprint tags carried in the storage area.
Step III.6, the fingerprint information to be checked is compared withTemp2, matching the fingerprint information in the fingerprint information labels stored in the slot positions pointed by the pointer, if the fingerprint information labels are matched, returning to the step of successfully inquiring, exiting the step of converting, and otherwise entering the step III.7.
Step III.7, let Temp1= Temp1-1, judgeTempAnd if the value of 1 is equal to 0, the query is failed, the step of reverse is exited, and if not, the step III.8 is entered.
Step iii.8, let Temp2= Temp2-1, then go back to step iii.6.
Step iii.9, let Temp = L0+ { m + (Temp-L0) -1} mod m, and go back to step iii.2.
In order to solve the problem of low efficiency of a data compaction technology in a set member deleting process in the existing set member management method, the adopted technical scheme is as follows:
the set member management method comprises a set member deleting method, wherein the set member deleting method comprises the following steps:
step A, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has recently undergone relocation operation, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has recently undergone 2 relocation operations, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number count value, a storage area count value, and a cache area count value, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning flag bit.
For convenience of subsequent description and computer operation, in this step, the active number count value of the storage area may be recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS。
And step B, during deletion, judging whether the fingerprint information tags of the set members to be deleted exist in the two corresponding hash candidate buckets by adopting a set member judgment method, if not, deleting failure is caused, if so, deleting the fingerprint information tags, judging whether set members exist in the cache area of the hash candidate buckets according to the cache area count value of the hash candidate buckets, if not, deleting the set members, if so, migrating the first set member existing in the cache area to a blank slot position corresponding to the deleted fingerprint information, and ending the deleting operation.
In order to provide more detailed steps for deleting, the flowchart of which is shown in fig. 5, step B may include the following specific steps:
step B1, when deleting, obtaining the members of the set to be deleted through the calculation of the hash functionξ Z Fingerprint information ofAnd corresponding two hash candidate buckets。
To illustrate how to calculate the members of the set to be deleted by the hash functionξ Z Fingerprint information ofAnd corresponding two hash candidate bucketsIn this step, the member of the set to be deleted is obtained through the calculation of the hash functionξ Z Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
Step B2, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf so, go to step B3, otherwise go to step B15.
Step B3, storing fingerprint informationHash candidate bucket marking of belonging fingerprint information labelbucket Z Then read outbucket Z In the recording areaHFNValue, judge theHFNIf the value is equal to 0, go to step B7 if it is, otherwise go to step B4.
Step B4, calculating the intermediate variable addfDEL = L0+ m + HFN-1, and then willaddfDELAssigning the fingerprint information tag stored in the pointed slot to another intermediate variableTemp2, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,mindicating the maximum number of fingerprint tags carried in the storage area.
Step B5, deleteaddfDELThe fingerprint information label stored in the pointed slot position is toTemp2, the relocation flag value of the fingerprint information tag is set to be 1, and the overlay fingerprint information is insertedThe fingerprint information label belongs to.
Step B6 atbucket Z In the recording area, ACI = ACI-1 and HFN = HFN-1, the deletion is successful, and the deletion operation is ended.
Step B8, readbucket Z In the recording areaINSValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toINSIf yes, go to step B12, otherwise go to step B9.
Step B9, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B10, otherwise go to step B11.
Step B10 atbucket Z In the recording area, ACI = ACI-1 and EXC = L0+ { m + (EXC-L0) -1} mod m, the deletion is successful, and the deletion operation is ended.
Step B11 atbucket Z And calculating ACI = ACI-1 and ERI = ERI +1 in the recording area, wherein the deletion is successful and the deletion operation is finished.
Step B12, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B13, otherwise go to step B14.
Step B13 atbucket Z In the recording area, ACI = ACI-1, EXC = L0+ { m + (EXC-L0) -1} mod m, and INS = L0+ { m + (INS-L0) +1} mod m were deleted successfully, and the deletion operation was terminated.
Step B14 atbucket Z In the recording area, ACI = ACI-1 and INS = L0+ { m + (INS-L0) +1} mod m are deleted successfully, and the delete operation is ended.
Step B15, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf so, the step B16 is executed, otherwise, the deletion fails and the deletion operation is finished.
Step B16, storing fingerprint information in the buffer memoryHash candidate bucket marking of belonging fingerprint information labelbucket Z Calculate the intermediate variable addfddel = L0+ m + HFN-1, and then willaddfDELFingerprint information tag covering fingerprint information stored in the pointed slot positionThe fingerprint information labels are deleted at the same timeaddfDELThe fingerprint information tag stored in the slot being pointed is concatenated empty, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,HFNcandidate buckets for corresponding hashesbucket Z Is/are as followsHFNThe value of the one or more of the one,mindicating the maximum number of fingerprint tags carried in the storage area.
Step B17 atbucket Z In the recording area, ACI = ACI-1 and HFN = HFN-1, the deletion is successful, and the deletion operation is ended.
The method for managing the set members based on the first-in first-out and minimum active number strategies in the embodiment of the invention can comprise the set member inserting method based on the first-in first-out and minimum active number strategies, the set member judging method based on the first-in first-out and minimum active number strategies and the set member deleting method based on the first-in first-out and minimum active number strategies.
Claims (10)
1. A set member management method, comprising a set member insertion method, the set member insertion method comprising the steps of:
step 1, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has recently undergone relocation operation, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has recently undergone 2 relocation operations, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number count value, a storage area count value, and a cache area count value, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information are stored, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning mark value;
and 2, during insertion, judging whether a fingerprint information label of the member to be inserted exists in the corresponding hash candidate bucket by adopting a set member judgment method, if so, finishing the insertion operation, if not, selecting one of the two hash candidate buckets with the least storage area active count value to carry out the insertion operation, during the insertion operation, calculating the inserted slot position insertion according to the insertion position address information of the hash candidate bucket, if the inserted slot position is not a blank slot position, judging whether the hash candidate bucket still has a blank slot position according to the abnormal blank slot position count of the storage area of the hash candidate bucket, if not, entering the relocation operation, otherwise, randomly inserting the blank slot position.
2. The collection membership management method of claim 1, wherein in step 1, the storage area activity count value is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step 2 comprises the following specific steps:
step 201, during insertion, obtaining a member of a set to be inserted through hash function calculationξ X Fingerprint information ofAnd corresponding two hash candidate buckets;
Step 202, judging by adopting a set member judgment methodWhether corresponding fingerprint information labels exist in two hash candidate bucketsIf yes, the inserting operation is ended, otherwise, the step 203 is entered;
step 203, by reading the respective storage area activity count values of the two hash candidate bucketsACIComparing and selecting a hash candidate bucket with the least active count value as an insertion bucket markminWB;
Step 204, defining an intermediate variableTINSRead outminWBOf the recording areaValue, then calculate(ii) a If it isTINSIf the slot is a blank slot, step 205 is entered, otherwise step 206 is entered, wherein,L0representing corresponding hash candidate bucketsminWBThe start address of the mobile terminal,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step 205, the fingerprint information isAnd the fingerprint information tag with relocation flag 0 is insertedIn the indicated blank slot positionminWBIn the recording area of
step 206, judgeminWBAbnormal empty slot count in storage area in recording areaIf yes, entering into a repositioning operation step, otherwise entering into step 207;
step 207, the fingerprint information isAnd the fingerprint information label with the relocation flag value of 0 is randomly insertedminWBIn the abnormal empty slot position ofminWBIn the recording area of
3. The set member management method according to claim 2, wherein in step 2, the set member to be inserted is obtained by calculating a hash functionξ X Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
4. The collection membership management method according to claim 2 or 3, wherein in step 206, the relocation operation comprises the steps of:
step 206A, willminWBAssigning to intermediate variable the fingerprint information label to be insertedTempWill beEXCAssigning the fingerprint information tag to another intermediate variableTemp1;
Step 206B, willTempThe relocation flag value of the fingerprint information tag in (1) is set to 1, and the relocation flag value is inserted into the fingerprint information tagEXCIn the slot pointed;
step 206D, judgmentTempWhether the relocation flag value of the fingerprint information tag in 1 is 1, if so, entering step 206E, otherwise, entering step 206G;
step 206E, calculating intermediate variablesThen will beTempThe value of the relocation flag of the fingerprint information tag in 1 is set to 2 and insertedaddfIn the empty slot pointed byminWBIn the cache area of (2);
step 206G, calculatingTemp1 and marked as another hash candidate bucketBUTemp1;
Step 206H, readBUTemp1 in the recording areaINSValue, then calculate
Step 206I, judgmentTINSIf the slot position is a blank slot position, entering step 206L if the slot position is the blank slot position, and otherwise entering step 206J;
step 206J, read and judgeBUTemp1 recording area ofERIIf the value is 0, if so, it will beTemp1 as the fingerprint information label to be inserted, returning to the step 206A, otherwise, entering the step 206K;
step 206K, willTempThe relocation flag value of the fingerprint information tag in 1 is set to 1 and is randomly insertedBUTemp1 in the abnormal blank slotBUTemp1 in the recording area
step 206L, willTemp1 inThe relocation flag value of the fingerprint information tag of (1) is set and insertedTINSIn the indicated blank slot positionBUTemp1 in the recording area
5. The set member management method is characterized by comprising a set member judgment method, wherein the set member judgment method comprises the following steps:
step I, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has been subjected to relocation operation for the last time, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has been subjected to relocation operation for 2 times, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number counting value, a storage area activity number counting value, a storage area is included in the recording area, and a cache area is divided into a storage area and a cache area, wherein the fingerprint information with a relocation flag value of 0 is used for storing the fingerprint information of the set member, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information are stored, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning mark value;
and step II, comparing the storage area activity count values of the two hash candidate buckets, firstly, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the hash candidate bucket with the smaller storage area activity count value, if so, inquiring successfully, otherwise, traversing and inquiring whether the fingerprint information label of the set member to be judged exists in the other hash candidate bucket, if so, inquiring successfully, otherwise, inquiring fails.
6. The collection membership management method of claim 5, wherein in step I, the storage area activity count value is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step II comprises the following specific steps:
step II.1, when judging, obtaining the members of the set to be judged through the calculation of a Hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate buckets;
Step II.2, by reading the storage area activity count values of the two hash candidate bucketsACIComparing and selecting a hash candidate bucket with the least active count value as the labelminWBAnother hash candidate bucket is marked asmaxWB;
Step II.3, theminWBPerforming a reverse step, traversing the queryWhether the fingerprint information label is inminWBIf so, the query is successful, the query operation is finished, otherwise, the step II.4 is carried out;
7. The set member management method according to claim 6, wherein in step II, the set member to be determined is obtained by calculating a hash functionξ Y Fingerprint information ofAnd corresponding two hash candidate bucketsThe method specifically comprises the following steps:
8. The set membership management method according to claim 6 or 7, wherein in step ii.3 and step ii.4, the reverse step comprises the following specific steps:
step III.1, reading the hash candidate bucket record area operated by the reverse stepEXCIs worth, and willEXCValue assignment to intermediate variablesTemp;
Step III.2, the fingerprint information to be checked is compared withTempMatching the fingerprint information in the fingerprint information labels stored in the pointed slot positions, if the fingerprint information labels are matched, returning to the step of successfully inquiring, exiting the step of converting, and otherwise entering the step III.3;
step III.3, reading the hash candidate bucket record area operated by the reverse stepINSValue, judgmentTempAndINSif the value information is the same, entering a step III.4, otherwise entering a step III.9;
step III.4, reading the hash candidate bucket record area operated by the reverse stepHFNValue, judgmentHFNIf the value is equal to 0, if so, the query fails, the step of reverse is exited, otherwise, the step III.5 is entered;
step III.5, mixingHFNValue assignment to intermediate variablesTemp1, and calculating another intermediate variable
Wherein, in the step (A),L0indicating the starting address of the corresponding hash candidate bucket operated by the reverse step,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step III.6, the fingerprint information to be checked is compared withTemp2, matching the fingerprint information in the fingerprint information labels stored in the slot positions pointed by the pointer, if the fingerprint information labels are matched, returning to the step of converting, and if the fingerprint information labels are not matched, entering a step III.7;
step III.7, let Temp1= Temp1-1, judgeTempIf 1 is equal to 0, if yes, the query fails, the step of reverse is exited, otherwise, the step III.8 is entered;
step iii.8, let Temp2= Temp2-1, then go back to step iii.6;
step iii.9, let Temp = L0+ { m + (Temp-L0) -1} mod m, and go back to step iii.2.
9. The set member management method is characterized by comprising a set member deleting method, wherein the set member deleting method comprises the following steps:
step A, pre-establishing a cuckoo filter for storing fingerprint information of a set member, and uniformly dividing a storage area and a cache area, wherein the storage area is used for storing fingerprint information with a relocation flag value of 0 or 1, the cache area is used for storing fingerprint information with a relocation flag value of 2, the fingerprint information with a relocation flag value of 0 means that the fingerprint information of the set member is not subjected to relocation operation, the fingerprint information with a relocation flag value of 1 means that the fingerprint information of the set member has recently undergone relocation operation, the fingerprint information with a relocation flag value of 2 means that the fingerprint information of the set member has recently undergone 2 relocation operations, and simultaneously maintaining a recording area in each hash candidate bucket in the cuckoo filter, the recording area corresponding to each hash candidate bucket comprises a storage area active number count value, a storage area count value, and a cache area count value, The storage area abnormal blank slot position count, the cache area fingerprint count value, the exchange bit address information and the insertion bit address information are stored, and for each hash candidate bucket, the slot position storage unit is a fingerprint information label formed by two elements of the fingerprint information of the set member and the repositioning flag bit;
and step B, during deletion, judging whether the fingerprint information tags of the set members to be deleted exist in the two corresponding hash candidate buckets by adopting a set member judgment method, if not, deleting failure is caused, if so, deleting the fingerprint information tags, judging whether set members exist in the cache area of the hash candidate buckets according to the cache area count value of the hash candidate buckets, if not, deleting the set members, if so, migrating the first set member existing in the cache area to a blank slot position corresponding to the deleted fingerprint information, and ending the deleting operation.
10. The collection membership management method of claim 9, wherein in step a, the storage area activity count value is recorded asACIAnd the abnormal blank slot position count of the storage area is recorded asERIAnd the fingerprint count value of the buffer area is recorded asHFNThe address information of the exchange bit is recorded asEXCAnd insert bit address information asINS;
The step B comprises the following specific steps:
step B1, when deleting, obtaining the members of the set to be deleted through the calculation of the hash functionξ Z Fingerprint information ofAnd correspondingTwo hash candidate buckets;
Step B2, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf the storage area exists, the step B3 is carried out, otherwise, the step B15 is carried out;
step B3, storing fingerprint informationHash candidate bucket marking of belonging fingerprint information labelbucket Z Then read outbucket Z In the recording areaHFNValue, judge theHFNIf the value is equal to 0, go to step B7 if it is, otherwise go to step B4;
step B4, calculating intermediate variablesThen will beaddfDELAssigning the fingerprint information tag stored in the pointed slot to another intermediate variableTemp2, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step B5, deleteaddfDELThe fingerprint information label stored in the pointed slot position is toTemp2, the relocation flag value of the fingerprint information tag is set to be 1, and the overlay fingerprint information is insertedThe fingerprint information label of the user;
step B6,In thatbucket Z Enabling ACI = ACI-1 and HFN = HFN-1 in the recording area, and finishing the deleting operation after the deleting operation is successful;
step B8, readbucket Z In the recording areaINSValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toINSIf yes, go to step B12, otherwise go to step B9;
step B9, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B10, otherwise go to step B11;
step B10 atbucket Z Enabling ACI = ACI-1 and EXC = L0+ { m + (EXC-L0) -1} mod m in the recording area, and finishing the deleting operation after the deleting is successful;
step B11 atbucket Z Calculating ACI = ACI-1 and ERI = ERI +1 in the recording area, and finishing the deleting operation after the deleting operation is successful;
step B12, readbucket Z In the recording areaEXCValue, judgment storageWhether the address information of the slot position of the fingerprint information label is equal toEXCIf yes, go to step B13, otherwise go to step B14;
step B13 atbucket Z In the recording area, ACI = ACI-1, EXC = L0+ { m + (EXC-L0) -1} mod m, and INS = L0+ { m + (INS-L0) +1} mod m were deleted successfully and deletedFinishing the operation;
step B14 atbucket Z Enabling ACI = ACI-1 and INS = L0+ { m + (INS-L0) +1} mod m in the recording area, and finishing the deleting operation after the deleting is successful;
step B15, determining fingerprint information by a set membership determination methodWhether the fingerprint information label is stored in two hash candidate bucketsIf the cache area exists, the step B16 is carried out, otherwise, the deletion fails, and the deletion operation is finished;
step B16, storing fingerprint information in the buffer memoryHash candidate bucket marking of belonging fingerprint information labelbucket Z Calculate the intermediate variable addfddel = L0+ m + HFN-1, and then willaddfDELFingerprint information tag covering fingerprint information stored in the pointed slot positionThe fingerprint information labels are deleted at the same timeaddfDELThe fingerprint information tag stored in the slot being pointed is concatenated empty, wherein,L0representing corresponding hash candidate bucketsbucket Z The start address of the mobile terminal,HFNcandidate buckets for corresponding hashesbucket Z Is/are as followsHFNThe value of the one or more of the one,mrepresenting the maximum number of the fingerprint tags loaded in the storage area;
step B17 atbucket Z In the recording area, ACI = ACI-1 and HFN = HFN-1, the deletion is successful, and the deletion operation is ended.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920389.1A CN113360516B (en) | 2021-08-11 | 2021-08-11 | Collection member management method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920389.1A CN113360516B (en) | 2021-08-11 | 2021-08-11 | Collection member management method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113360516A CN113360516A (en) | 2021-09-07 |
CN113360516B true CN113360516B (en) | 2021-11-26 |
Family
ID=77522965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110920389.1A Active CN113360516B (en) | 2021-08-11 | 2021-08-11 | Collection member management method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113360516B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113886391B (en) * | 2021-10-11 | 2023-03-28 | 成都信息工程大学 | Data processing method of double-fingerprint storage cuckoo filter based on discrete type |
CN114003660B (en) * | 2021-11-05 | 2022-06-03 | 广州宸祺出行科技有限公司 | Method and device for efficiently synchronizing real-time data to click House based on flash |
CN115048402B (en) * | 2022-08-16 | 2022-11-18 | 成都信息工程大学 | Self-adaptive dynamic data set member inserting, deleting and retrieving method with time effect |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046164A (en) * | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
CN111552693A (en) * | 2020-04-30 | 2020-08-18 | 南方科技大学 | Tag cuckoo filter |
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110276744A1 (en) * | 2010-05-05 | 2011-11-10 | Microsoft Corporation | Flash memory cache including for use with persistent key-value store |
US9256549B2 (en) * | 2014-01-17 | 2016-02-09 | Netapp, Inc. | Set-associative hash table organization for efficient storage and retrieval of data in a storage system |
CN105099957B (en) * | 2015-08-20 | 2018-05-18 | 电子科技大学 | A kind of data packet forwarding method based on software checking book |
CN105630955B (en) * | 2015-12-24 | 2019-01-29 | 华中科技大学 | A kind of data acquisition system member management method of high-efficiency dynamic |
US11088951B2 (en) * | 2017-01-16 | 2021-08-10 | Intel Corporation | Flow classification apparatus, methods, and systems |
CN107256130B (en) * | 2017-06-06 | 2019-09-24 | 华中科技大学 | Data store optimization method and system based on Cuckoo Hash calculation |
US11762828B2 (en) * | 2018-02-27 | 2023-09-19 | Advanced Micro Devices, Inc. | Cuckoo filters and cuckoo hash tables with biasing, compression, and decoupled logical sparsity |
CN109800228B (en) * | 2018-12-28 | 2023-03-10 | 深圳竹云科技有限公司 | Method for efficiently and quickly solving hash conflict |
CN110222088B (en) * | 2019-05-20 | 2021-08-31 | 华中科技大学 | Data approximate set representation method and system based on insertion position selection |
CN112541102B (en) * | 2020-12-11 | 2023-07-11 | 深圳供电局有限公司 | Abnormal data filtering method, device, equipment and storage medium |
CN113050894A (en) * | 2021-04-20 | 2021-06-29 | 南京理工大学 | Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm |
-
2021
- 2021-08-11 CN CN202110920389.1A patent/CN113360516B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046164A (en) * | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
CN111552693A (en) * | 2020-04-30 | 2020-08-18 | 南方科技大学 | Tag cuckoo filter |
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
Also Published As
Publication number | Publication date |
---|---|
CN113360516A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113360516B (en) | Collection member management method | |
TWI702506B (en) | System, machine readable medium, and machine-implemenated method for merge tree garbage metrics | |
EP2209074A1 (en) | Data storage processing method, data searching method and devices thereof | |
CN103136243B (en) | File system duplicate removal method based on cloud storage and device | |
KR20190119080A (en) | Stream Selection for Multi-Stream Storage | |
CN111552692B (en) | Plus-minus cuckoo filter | |
CN107122130B (en) | Data deduplication method and device | |
CN108604165A (en) | Storage device | |
CN114201120B (en) | Data reading and writing method, device and related equipment | |
US20240104059A1 (en) | Method for Service Processing and System, Device, and Medium | |
US7752206B2 (en) | Method and data processing system for managing a mass storage system | |
CN111913925B (en) | Data processing method and system in distributed storage system | |
WO2013075306A1 (en) | Data access method and device | |
CN109634873A (en) | Solid-state disk host system log information method, apparatus, equipment and medium | |
JP2009169688A (en) | Storage device, data migration device, and data migration method | |
CN109558456A (en) | A kind of file migration method, apparatus, equipment and readable storage medium storing program for executing | |
US20080162591A1 (en) | Method of Logging Transactions and a Method of Reversing a Transaction | |
CN112558868B (en) | Method, device and equipment for storing configuration data | |
CN113253932B (en) | Read-write control method and system for distributed storage system | |
CN105389128B (en) | A kind of solid state hard disk date storage method and storage control | |
CN115048402B (en) | Self-adaptive dynamic data set member inserting, deleting and retrieving method with time effect | |
CN101131649A (en) | Updating speed improving method for read-only memory of device with flash memory | |
CN113704190A (en) | Data writing method and device | |
CN114063919B (en) | Physical block allocation sequence acquisition method and data recovery method for SSD | |
CN112015710B (en) | Method and device for determining directory slicing relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |