CN108632131B - Email address matching method based on fingerprint type variable-length bloom filter - Google Patents
Email address matching method based on fingerprint type variable-length bloom filter Download PDFInfo
- Publication number
- CN108632131B CN108632131B CN201710158175.9A CN201710158175A CN108632131B CN 108632131 B CN108632131 B CN 108632131B CN 201710158175 A CN201710158175 A CN 201710158175A CN 108632131 B CN108632131 B CN 108632131B
- Authority
- CN
- China
- Prior art keywords
- bit
- value
- fingerprint
- bits
- separator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/42—Mailbox-related aspects, e.g. synchronisation of mailboxes
Abstract
The invention provides a mail address matching method based on a fingerprint type variable length bloom filter, wherein a FPVLCBF uses a vector V with the length of m to represent a set, each dimensionality i in the vector V consists of three parts, the first part is a separator occupying 1bit, the second part is a counter consisting of a bits, and the third part is a fingerprint bit consisting of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed; the idea of the fingerprint is combined with the idea of counting the bloom filter, the variable-length counting bloom filter (FPVLCBF) based on the fingerprint is provided, the false positive rate of the traditional counting bloom filter is reduced, and the space efficiency of the counting bloom filter is improved to the maximum extent.
Description
Technical Field
The invention relates to the technical field of mail address filtering, in particular to a mail address matching method based on a fingerprint type variable-length bloom filter.
Background
The mail address matching technology plays an important role as a basis in the mail gateway technology. With the current increasingly complex internet, with the gradual increase of network traffic, how to make the mail gateway have higher processing speed and better memory consumption performance when dealing with a large-traffic network is a main requirement in the field of the current mail gateway.
At present, most of methods for matching mail addresses adopt various multi-mode matching algorithms. Taking the classic AC algorithm as an example, the AC algorithm is adopted to match the mail address, so that a better efficiency in time consumption can be obtained. However, when the number of the pattern rule sets is large, the required memory of the AC algorithm is large. In the aspect of mail address matching, the length of a mail address is basically fixed in a range, and mail address filtering is to find whether a certain mail address is in a set in a mail address set. The counting bloom filter has a wide application range in such occasions, but the space complexity of the counting bloom filter is high, and the false positive rate is also high.
Disclosure of Invention
The invention improves the traditional counting bloom filter algorithm, combines the idea of fingerprint with the idea of counting bloom filter, and provides a variable-length counting bloom filter (FPVLCBF) based on fingerprint, thereby reducing the false positive rate of the traditional counting bloom filter and improving the space efficiency of the counting bloom filter to the maximum extent. The invention is very suitable for matching the mail address in a large-flow environment.
In order to solve the technical problem, the invention provides 1, a mail address matching method based on a fingerprint type variable length bloom filter, which is characterized in that an FPVLCBF uses a vector V with the length of m to represent a set, each dimension i in the vector V is composed of three parts, the first part is a separator occupying 1bit, the second part is a counter composed of a bits, and the third part is a fingerprint bit composed of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed;
the insertion step of the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping over the fingerprint bit, shifting all bit bits behind the fingerprint bit by one bit to the right, adding a bit, and setting the bit to be 1;
e) updating the index table;
the query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function, if all the elements are the same, the element is in the set, otherwise, the element is not in the set;
the deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bit positions behind a second bit behind the fingerprint bit by one bit to the left to cover a first bit position behind the fingerprint bit; e) and updating the index table.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the invention improves the traditional counting bloom filter, combines the idea of fingerprint with the idea of counting bloom filter, and provides a variable-length counting bloom filter (FPVLCBF) based on fingerprint, which reduces the false positive rate of the traditional counting bloom filter and improves the space efficiency of the counting bloom filter to the maximum extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts;
FIG. 1 is a comparison table of misjudgment times and misjudgment rates of CBF, VLCBF, FPCBF and FPVLCBF;
FIG. 2 is a graph comparing space consumption for a Count Bloom Filter (CBF), a Variable Length Count Bloom Filter (VLCBF), a fingerprint count bloom filter (FPCBF), and an FPVLCBF;
FIG. 3 is a comparison table of misjudgment times and misjudgment rates of CBF, VLCBF, FPCBF and FPVLCBF;
FIG. 4 is a comparison graph of the false positive rate of FPVLCBF with different fingerprint bit numbers. .
Detailed Description
In order to better understand the technical scheme, the technical scheme is described in detail in the following with reference to the drawings and the specific embodiments;
as shown in fig. 1-4, in the initialization of the FPVLCBF according to this embodiment, a set with vector dimension m is initialized, and the size of each element is (b +1) bit (b is the number of fingerprint bits), so that the size of the entire initialized bit array is (m × (b +1)) bit, and all bits of the bit array are set to 0.
The insertion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, right shifting all bit bits behind the fingerprint bit by one bit, adding one bit, and setting the bit to 1.
e) And updating the index table.
The query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) and judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function or not, wherein if all the elements are the same, the element is in the set, and otherwise, the element is not in the set.
The deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bits after the second bit after the fingerprint bit by one bit to the left to cover the first bit after the fingerprint bit.
e) And updating the index table.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (1)
1. A mail address matching method based on a fingerprint type variable length bloom filter is characterized in that an FPVLCBF uses a vector V with the length of m to represent a set, each dimension i in the vector V is composed of three parts, the first part is a separator occupying 1bit, the second part is a counter composed of a bits, and the third part is a fingerprint bit composed of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed;
the insertion step of the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping over the fingerprint bit, shifting all bit bits behind the fingerprint bit by one bit to the right, adding a bit, and setting the bit to be 1;
e) updating the index table;
the query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function, if all the elements are the same, the element is in the set, otherwise, the element is not in the set;
the deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bit positions behind a second bit behind the fingerprint bit by one bit to the left to cover a first bit position behind the fingerprint bit;
e) and updating the index table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710158175.9A CN108632131B (en) | 2017-03-16 | 2017-03-16 | Email address matching method based on fingerprint type variable-length bloom filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710158175.9A CN108632131B (en) | 2017-03-16 | 2017-03-16 | Email address matching method based on fingerprint type variable-length bloom filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108632131A CN108632131A (en) | 2018-10-09 |
CN108632131B true CN108632131B (en) | 2020-10-20 |
Family
ID=63686749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710158175.9A Active CN108632131B (en) | 2017-03-16 | 2017-03-16 | Email address matching method based on fingerprint type variable-length bloom filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108632131B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100098081A1 (en) * | 2004-02-09 | 2010-04-22 | Sarang Dharmapurikar | Longest prefix matching for network address lookups using bloom filters |
CN102243657A (en) * | 2011-07-06 | 2011-11-16 | 太原理工大学 | Expandable Bloom Filter method |
US20130218900A1 (en) * | 2010-03-10 | 2013-08-22 | Emc Corporation | Index searching using a bloom filter |
CN104978522A (en) * | 2014-04-10 | 2015-10-14 | 北京启明星辰信息安全技术有限公司 | Method and device for detecting malicious code |
CN105429968A (en) * | 2015-11-06 | 2016-03-23 | 北京数智源科技股份有限公司 | Load ownership network evidence-obtaining method and system based on Bloom filters |
-
2017
- 2017-03-16 CN CN201710158175.9A patent/CN108632131B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100098081A1 (en) * | 2004-02-09 | 2010-04-22 | Sarang Dharmapurikar | Longest prefix matching for network address lookups using bloom filters |
US20130218900A1 (en) * | 2010-03-10 | 2013-08-22 | Emc Corporation | Index searching using a bloom filter |
CN102243657A (en) * | 2011-07-06 | 2011-11-16 | 太原理工大学 | Expandable Bloom Filter method |
CN104978522A (en) * | 2014-04-10 | 2015-10-14 | 北京启明星辰信息安全技术有限公司 | Method and device for detecting malicious code |
CN105429968A (en) * | 2015-11-06 | 2016-03-23 | 北京数智源科技股份有限公司 | Load ownership network evidence-obtaining method and system based on Bloom filters |
Non-Patent Citations (2)
Title |
---|
"An Efficient Data Fingerprint Query Algorithm Based on Two-Leveled Bloom Filter";Bin Zhou,等;《JOURNAL OF MULTIMEDIA, VOL.8, NO.2》;20130430;全文 * |
"Improving counting Bloom filter performance with fingerprints";Salvatore Pontarelli,等;《Information Processing Letters Volume 116, Issue 4, April 2016, Pages 304-309》;20160430;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108632131A (en) | 2018-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3276501B1 (en) | Traffic classification method and device, and storage medium | |
EP1419621A1 (en) | Methods and systems for fast packet forwarding | |
CN107431660B (en) | Search device, search method, and recording medium | |
EP3917099A1 (en) | Stream classification method and device | |
US10771386B2 (en) | IP routing search | |
US8271635B2 (en) | Multi-tier, multi-state lookup | |
CN105005586A (en) | Degree feature replacement policy based stream type graph sampling method | |
Song et al. | Packet classification using coarse-grained tuple spaces | |
CN108632131B (en) | Email address matching method based on fingerprint type variable-length bloom filter | |
WO2017065795A1 (en) | Incremental update of a neighbor graph via an orthogonal transform based indexing | |
CN112087389B (en) | Message matching table look-up method, system, storage medium and terminal | |
US20160301658A1 (en) | Method, apparatus, and computer-readable medium for efficient subnet identification | |
US9529835B2 (en) | Online compression for limited sequence length radix tree | |
CN104901947B (en) | One kind is based on TCAM serial numbers matching process and device | |
CN106657128B (en) | Data packet filtering method and device based on wildcard mask rule | |
US9361404B2 (en) | Offline radix tree compression with key sequence skip | |
KR101587756B1 (en) | Apparatus and method for searching string data using bloom filter pre-searching | |
US9355133B2 (en) | Offline compression for limited sequence length radix tree | |
CN106027369A (en) | Email address characteristic oriented email address matching method | |
US10476785B2 (en) | IP routing search | |
US20060041734A1 (en) | Associating mac addresses with addresses in a look-up table | |
US9442927B2 (en) | Offline generation of compressed radix tree with key sequence skip | |
Hsieh et al. | A novel dynamic router-tables design for IP lookup and update | |
CN110493136B (en) | Resource name coding method and device, electronic equipment and storage medium | |
WO2021004543A1 (en) | Range information encoding and matching method, and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |