CN108632131B - Email address matching method based on fingerprint type variable-length bloom filter - Google Patents

Email address matching method based on fingerprint type variable-length bloom filter Download PDF

Info

Publication number
CN108632131B
CN108632131B CN201710158175.9A CN201710158175A CN108632131B CN 108632131 B CN108632131 B CN 108632131B CN 201710158175 A CN201710158175 A CN 201710158175A CN 108632131 B CN108632131 B CN 108632131B
Authority
CN
China
Prior art keywords
bit
value
fingerprint
bits
separator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710158175.9A
Other languages
Chinese (zh)
Other versions
CN108632131A (en
Inventor
王志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Insec Information Technology Co ltd
Original Assignee
Harbin Insec Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Insec Information Technology Co ltd filed Critical Harbin Insec Information Technology Co ltd
Priority to CN201710158175.9A priority Critical patent/CN108632131B/en
Publication of CN108632131A publication Critical patent/CN108632131A/en
Application granted granted Critical
Publication of CN108632131B publication Critical patent/CN108632131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes

Abstract

The invention provides a mail address matching method based on a fingerprint type variable length bloom filter, wherein a FPVLCBF uses a vector V with the length of m to represent a set, each dimensionality i in the vector V consists of three parts, the first part is a separator occupying 1bit, the second part is a counter consisting of a bits, and the third part is a fingerprint bit consisting of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed; the idea of the fingerprint is combined with the idea of counting the bloom filter, the variable-length counting bloom filter (FPVLCBF) based on the fingerprint is provided, the false positive rate of the traditional counting bloom filter is reduced, and the space efficiency of the counting bloom filter is improved to the maximum extent.

Description

Email address matching method based on fingerprint type variable-length bloom filter
Technical Field
The invention relates to the technical field of mail address filtering, in particular to a mail address matching method based on a fingerprint type variable-length bloom filter.
Background
The mail address matching technology plays an important role as a basis in the mail gateway technology. With the current increasingly complex internet, with the gradual increase of network traffic, how to make the mail gateway have higher processing speed and better memory consumption performance when dealing with a large-traffic network is a main requirement in the field of the current mail gateway.
At present, most of methods for matching mail addresses adopt various multi-mode matching algorithms. Taking the classic AC algorithm as an example, the AC algorithm is adopted to match the mail address, so that a better efficiency in time consumption can be obtained. However, when the number of the pattern rule sets is large, the required memory of the AC algorithm is large. In the aspect of mail address matching, the length of a mail address is basically fixed in a range, and mail address filtering is to find whether a certain mail address is in a set in a mail address set. The counting bloom filter has a wide application range in such occasions, but the space complexity of the counting bloom filter is high, and the false positive rate is also high.
Disclosure of Invention
The invention improves the traditional counting bloom filter algorithm, combines the idea of fingerprint with the idea of counting bloom filter, and provides a variable-length counting bloom filter (FPVLCBF) based on fingerprint, thereby reducing the false positive rate of the traditional counting bloom filter and improving the space efficiency of the counting bloom filter to the maximum extent. The invention is very suitable for matching the mail address in a large-flow environment.
In order to solve the technical problem, the invention provides 1, a mail address matching method based on a fingerprint type variable length bloom filter, which is characterized in that an FPVLCBF uses a vector V with the length of m to represent a set, each dimension i in the vector V is composed of three parts, the first part is a separator occupying 1bit, the second part is a counter composed of a bits, and the third part is a fingerprint bit composed of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed;
the insertion step of the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping over the fingerprint bit, shifting all bit bits behind the fingerprint bit by one bit to the right, adding a bit, and setting the bit to be 1;
e) updating the index table;
the query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function, if all the elements are the same, the element is in the set, otherwise, the element is not in the set;
the deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bit positions behind a second bit behind the fingerprint bit by one bit to the left to cover a first bit position behind the fingerprint bit; e) and updating the index table.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the invention improves the traditional counting bloom filter, combines the idea of fingerprint with the idea of counting bloom filter, and provides a variable-length counting bloom filter (FPVLCBF) based on fingerprint, which reduces the false positive rate of the traditional counting bloom filter and improves the space efficiency of the counting bloom filter to the maximum extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts;
FIG. 1 is a comparison table of misjudgment times and misjudgment rates of CBF, VLCBF, FPCBF and FPVLCBF;
FIG. 2 is a graph comparing space consumption for a Count Bloom Filter (CBF), a Variable Length Count Bloom Filter (VLCBF), a fingerprint count bloom filter (FPCBF), and an FPVLCBF;
FIG. 3 is a comparison table of misjudgment times and misjudgment rates of CBF, VLCBF, FPCBF and FPVLCBF;
FIG. 4 is a comparison graph of the false positive rate of FPVLCBF with different fingerprint bit numbers. .
Detailed Description
In order to better understand the technical scheme, the technical scheme is described in detail in the following with reference to the drawings and the specific embodiments;
as shown in fig. 1-4, in the initialization of the FPVLCBF according to this embodiment, a set with vector dimension m is initialized, and the size of each element is (b +1) bit (b is the number of fingerprint bits), so that the size of the entire initialized bit array is (m × (b +1)) bit, and all bits of the bit array are set to 0.
The insertion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, right shifting all bit bits behind the fingerprint bit by one bit, adding one bit, and setting the bit to 1.
e) And updating the index table.
The query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) and judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function or not, wherein if all the elements are the same, the element is in the set, and otherwise, the element is not in the set.
The deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bits after the second bit after the fingerprint bit by one bit to the left to cover the first bit after the fingerprint bit.
e) And updating the index table.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (1)

1. A mail address matching method based on a fingerprint type variable length bloom filter is characterized in that an FPVLCBF uses a vector V with the length of m to represent a set, each dimension i in the vector V is composed of three parts, the first part is a separator occupying 1bit, the second part is a counter composed of a bits, and the third part is a fingerprint bit composed of b bits; the method for using the first k functions is used for calculating the hash value like a traditional bloom filter, the last hash function is used for calculating the fingerprint value of an element, and the FPVLCBF needs an index table to accelerate the search speed;
the insertion step of the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), from the insertion object x;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), when searching, by calculating to find the last value smaller than hi (x) in the index table, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when encountering 0, then skipping a plurality of bits with bit value 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value of the characteristic function hp (x), and writing the result back to the fingerprint bits;
d) skipping over the fingerprint bit, shifting all bit bits behind the fingerprint bit by one bit to the right, adding a bit, and setting the bit to be 1;
e) updating the index table;
the query procedure for FPVLCBF is as follows:
a) calculating a hash function value hi (x) of x according to the query element x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x);
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are larger than 0, if not, determining that the element is not in the FPVLCBF, and if so, jumping to the step d;
d) judging whether all count bits corresponding to the hash values hi (x) and i = (1, 2,3 … k) are greater than 1, if so, judging that the query is successful, otherwise, turning to e;
e) judging whether the fingerprint bit value of the element with the count value of 1 is the same as the function value hp (x) of the fingerprint function, if all the elements are the same, the element is in the set, otherwise, the element is not in the set;
the deletion procedure for the FPVLCBF is as follows:
a) calculating a hash value hi (x) of x, i = (1, 2,3 … k), and calculating a fingerprint function value hp (x), according to the element x to be deleted;
b) searching for a separator corresponding to the hash value hi (x), i = (1, 2,3 … k), finding the last value smaller than hi (x) in the index table through calculation during searching, starting from the separator 0 of the corresponding bit recorded in the index table, skipping b bits when 0 is encountered, then skipping a plurality of bits with the bit value of 1, finding the next 0, and adding 1 to the separator count value, thus always finding the separator corresponding to the hash value according to the method;
c) skipping the delimiter bits, xoring the fingerprint bit value with the value hp (x) of the fingerprint function, and writing the result back to the fingerprint bits;
d) skipping the fingerprint bit, and shifting all bit positions behind a second bit behind the fingerprint bit by one bit to the left to cover a first bit position behind the fingerprint bit;
e) and updating the index table.
CN201710158175.9A 2017-03-16 2017-03-16 Email address matching method based on fingerprint type variable-length bloom filter Active CN108632131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710158175.9A CN108632131B (en) 2017-03-16 2017-03-16 Email address matching method based on fingerprint type variable-length bloom filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710158175.9A CN108632131B (en) 2017-03-16 2017-03-16 Email address matching method based on fingerprint type variable-length bloom filter

Publications (2)

Publication Number Publication Date
CN108632131A CN108632131A (en) 2018-10-09
CN108632131B true CN108632131B (en) 2020-10-20

Family

ID=63686749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710158175.9A Active CN108632131B (en) 2017-03-16 2017-03-16 Email address matching method based on fingerprint type variable-length bloom filter

Country Status (1)

Country Link
CN (1) CN108632131B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100098081A1 (en) * 2004-02-09 2010-04-22 Sarang Dharmapurikar Longest prefix matching for network address lookups using bloom filters
CN102243657A (en) * 2011-07-06 2011-11-16 太原理工大学 Expandable Bloom Filter method
US20130218900A1 (en) * 2010-03-10 2013-08-22 Emc Corporation Index searching using a bloom filter
CN104978522A (en) * 2014-04-10 2015-10-14 北京启明星辰信息安全技术有限公司 Method and device for detecting malicious code
CN105429968A (en) * 2015-11-06 2016-03-23 北京数智源科技股份有限公司 Load ownership network evidence-obtaining method and system based on Bloom filters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100098081A1 (en) * 2004-02-09 2010-04-22 Sarang Dharmapurikar Longest prefix matching for network address lookups using bloom filters
US20130218900A1 (en) * 2010-03-10 2013-08-22 Emc Corporation Index searching using a bloom filter
CN102243657A (en) * 2011-07-06 2011-11-16 太原理工大学 Expandable Bloom Filter method
CN104978522A (en) * 2014-04-10 2015-10-14 北京启明星辰信息安全技术有限公司 Method and device for detecting malicious code
CN105429968A (en) * 2015-11-06 2016-03-23 北京数智源科技股份有限公司 Load ownership network evidence-obtaining method and system based on Bloom filters

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An Efficient Data Fingerprint Query Algorithm Based on Two-Leveled Bloom Filter";Bin Zhou,等;《JOURNAL OF MULTIMEDIA, VOL.8, NO.2》;20130430;全文 *
"Improving counting Bloom filter performance with fingerprints";Salvatore Pontarelli,等;《Information Processing Letters Volume 116, Issue 4, April 2016, Pages 304-309》;20160430;全文 *

Also Published As

Publication number Publication date
CN108632131A (en) 2018-10-09

Similar Documents

Publication Publication Date Title
EP3276501B1 (en) Traffic classification method and device, and storage medium
EP1419621A1 (en) Methods and systems for fast packet forwarding
CN107431660B (en) Search device, search method, and recording medium
EP3917099A1 (en) Stream classification method and device
US10771386B2 (en) IP routing search
US8271635B2 (en) Multi-tier, multi-state lookup
CN105005586A (en) Degree feature replacement policy based stream type graph sampling method
Song et al. Packet classification using coarse-grained tuple spaces
CN108632131B (en) Email address matching method based on fingerprint type variable-length bloom filter
WO2017065795A1 (en) Incremental update of a neighbor graph via an orthogonal transform based indexing
CN112087389B (en) Message matching table look-up method, system, storage medium and terminal
US20160301658A1 (en) Method, apparatus, and computer-readable medium for efficient subnet identification
US9529835B2 (en) Online compression for limited sequence length radix tree
CN104901947B (en) One kind is based on TCAM serial numbers matching process and device
CN106657128B (en) Data packet filtering method and device based on wildcard mask rule
US9361404B2 (en) Offline radix tree compression with key sequence skip
KR101587756B1 (en) Apparatus and method for searching string data using bloom filter pre-searching
US9355133B2 (en) Offline compression for limited sequence length radix tree
CN106027369A (en) Email address characteristic oriented email address matching method
US10476785B2 (en) IP routing search
US20060041734A1 (en) Associating mac addresses with addresses in a look-up table
US9442927B2 (en) Offline generation of compressed radix tree with key sequence skip
Hsieh et al. A novel dynamic router-tables design for IP lookup and update
CN110493136B (en) Resource name coding method and device, electronic equipment and storage medium
WO2021004543A1 (en) Range information encoding and matching method, and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant