CN107870925B - Character string filtering method and related device - Google Patents

Character string filtering method and related device Download PDF

Info

Publication number
CN107870925B
CN107870925B CN201610850786.5A CN201610850786A CN107870925B CN 107870925 B CN107870925 B CN 107870925B CN 201610850786 A CN201610850786 A CN 201610850786A CN 107870925 B CN107870925 B CN 107870925B
Authority
CN
China
Prior art keywords
character string
prefix
string
stored
network node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610850786.5A
Other languages
Chinese (zh)
Other versions
CN107870925A (en
Inventor
武昊
刘斌
林栋�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN201610850786.5A priority Critical patent/CN107870925B/en
Publication of CN107870925A publication Critical patent/CN107870925A/en
Application granted granted Critical
Publication of CN107870925B publication Critical patent/CN107870925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention discloses a character string filtering method and a related device, an interface filtering device is arranged for a network node, the interface filtering device comprises a first filtering component and a second filtering component which store character strings and character string prefixes of data related to the network node, the interface filtering device can receive an acquisition request sent by a data requesting party before the network node, match character strings to be identified carried in the acquisition request according to the stored character strings and character string prefixes, if the character strings to be identified can be matched with any character string or character string prefix in the interface filtering device, the data requested in the acquisition request can be determined to be related to the network node, therefore, the acquisition request which requests to acquire data irrelevant to the network node can be effectively filtered through the interface filtering device, the data requested by the acquisition request which needs to be processed by the network node is ensured to be related to the network node as much as possible, thereby improving the utilization rate of the processing resources of the network nodes.

Description

Character string filtering method and related device
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and a related apparatus for filtering a character string.
Background
In a communication network, data may be identified by a character string, and a network node that stores the data may record a character string corresponding to the stored data. When a data requestor wishes to retrieve a data, it may send a retrieval request to the network node carrying a string identifying the data. The network node receiving the acquisition request can match the character string carried in the acquisition request with the character string recorded by the network node, and if the matching is successful, the network node is indicated to have the capability of providing the data to the data requesting party.
However, most requests, the data requester does not send the get request to the designated network node, but rather propagates the get request in a broadcast manner, for example, and any network node holding data may receive the get request. However, not every network node stores the data requested by the acquisition request, and in fact, a large portion of the requested data in the acquisition request received by a network node is not related to the data stored by the network node itself. If the network node performs matching processing on each received acquisition request, the network node wastes a large amount of system resources to process the irrelevant acquisition requests, and the efficiency is reduced.
The network node has a need to filter received acquisition requests in order to hopefully filter acquisition requests that are not related to the data stored by the network node before processing, thereby improving system efficiency. However, there is currently no effective solution for filtering acquisition requests for network nodes.
Disclosure of Invention
In order to solve the above technical problem, an embodiment of the present invention provides a method for processing a semiconductor device
In a first aspect, the present invention provides a character string filtering method applied to an interface filtering apparatus of a network node, where the interface filtering apparatus at least includes a first filtering component and a second filtering component, a character string used for identifying data related to the network node is stored in the first filtering component, and a character string prefix used for identifying data related to the network node is stored in the second filtering component, and the method includes:
the interface filtering device receives an acquisition request, wherein the acquisition request comprises a character string to be identified of requested data;
the interface filtering device judges whether the character string to be identified is matched with the character string stored by the first filtering component and the character string prefix stored by the second filtering component;
if the character string to be identified is matched with any one of the character strings stored in the first filtering component or any one of the character string prefixes stored in the second filtering component, the interface filtering device identifies the requested data as data related to the network node;
and the interface filtering device uploads the acquisition request to the network node.
Optionally, the interface filtering apparatus determines whether the character string to be recognized matches the character string stored in the first filtering component and the character string prefix stored in the second filtering component, including:
if the character string to be identified is not matched with the character string stored by the first filtering component and the prefix of the character string stored by the second filtering component, the interface filtering device identifies the requested data as data irrelevant to the network node;
the interface filtering means discards the acquisition request without uploading the acquisition request to the network node.
Optionally, the network node is in an NDN.
In a second aspect, the present invention provides an interface filtering apparatus for string filtering, where the interface filtering apparatus is configured to filter an acquisition request for a corresponding network node, and the interface filtering apparatus at least includes a first filtering component and a second filtering component, where the first filtering component includes a string BF, and the second filtering component includes a prefix BF;
the string BF is used for storing a string for identifying data related to the network node;
the prefix BF is used for storing a string prefix for identifying data related to the network node;
the interface filtering device is used for receiving an acquisition request which can be received by the network node, wherein the acquisition request comprises a character string to be identified of the requested data;
the character string BF and the prefix BF are used for matching the character string to be identified according to the stored character string and the character string prefix, and if the character string to be identified is matched with any one of the character string stored in the character string BF or matched with any one of the character string prefix stored in the prefix BF, the interface filtering device is also used for identifying the requested data as data related to the network node; and uploading the acquisition request to the network node.
Optionally, if the character string to be identified is not matched with the character string stored in the character string BF and the prefix of the character string stored in the prefix BF, the interface filtering device is further configured to identify the requested data as data unrelated to the network node; and discarding the acquisition request without uploading the acquisition request to the network node.
Optionally, the first filter component further includes a string count CBF corresponding to the string BF, and the second filter component further includes a prefix CBF corresponding to the prefix BF:
the character string CBF is used for recording the use condition of a first hash table, the first hash table is a hash table corresponding to the first filtering component, and the first hash table is used for recording the hash value of the character string stored in the character string BF;
the prefix CBF is used for recording the use condition of a second hash table, the second hash table is a hash table corresponding to the second filtering component, and the second hash table is used for recording the hash value of the character string prefix stored by the prefix BF.
Optionally, when the character string stored in the character string BF is changed, the character string CBF is further configured to update the usage of the first hash table according to the changed content;
when the character string prefix stored in the prefix BF is changed, the prefix CBF is also used for updating the use condition of the second hash table according to the change content.
In a third aspect, the present invention provides a regulation and control method for an interface filtering apparatus, which is applied to the interface filtering apparatus, the interface filtering apparatus at least includes a first filtering component and a second filtering component, the first filtering component includes a character string BF, and the second filtering component includes a prefix BF; the string BF is used for storing a string for identifying data related to the network node; the prefix BF is used for storing a string prefix for identifying data related to the network node; the interface filtering device records the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF, and the method comprises the following steps:
calculating the degree of occurrence of character string false positives in the character string BF according to the number of the character strings stored in the character string BF, wherein the character string false positives are used for identifying probability numerical values of the character string BF, which are determined to be matched by the character string BF and are not matched with the character strings stored in the character string BF and are to be recognized;
if the false positive of the character string exceeds a first preset threshold value, deleting a first character string meeting a first length condition from the character string BF, wherein the first character string is a character string stored in the character string BF;
and saving the first character string as a character string prefix into the prefix BF.
Optionally, the calculating, according to the number of the character strings stored in the character string BF, the degree of occurrence of the character string false positives in the character string BF includes:
if the false positive of the character string is lower than a second preset threshold value, deleting a first character string prefix meeting a second length condition from the prefix BF, wherein the first character string prefix is a character string prefix stored in the prefix BF;
and saving the first character string prefix as a character string into the character string BF.
Optionally, after the removing the first string prefix meeting the second length condition from the prefix BF, the method further includes:
and storing the character string prefix subordinate to the first character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
Optionally, the method further includes:
calculating the degree of prefix false positive of the prefix BF according to the number of the character string prefixes stored in the prefix BF, wherein the prefix false positive is used for identifying the probability numerical value that the prefix BF determines the error of the character string to be identified, which is not matched with the character string prefixes stored in the prefix BF, as matching;
if the prefix false positive exceeds a third preset threshold, determining a plurality of character string prefixes meeting a first length condition, wherein the character string prefixes belong to the same common prefix;
merging the plurality of character string prefixes into the common prefix, and deleting the plurality of character string prefixes from the prefix BF.
Optionally, the calculating, according to the number of prefixes of the strings stored in the prefix BF, a degree of prefix false positive of the prefix BF includes:
if the prefix false positive is lower than a fourth preset threshold, deleting a second character string prefix meeting a second length condition from the prefix BF, wherein the second character string prefix is a character string prefix stored in the prefix BF;
and saving the second character string prefix as a character string into the character string BF.
Optionally, after the removing the second string prefix meeting the second length condition from the prefix BF, the method further includes:
and storing the character string prefix subordinate to the second character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
Optionally, the first length condition is a maximum value of a length of a string stored in the string BF.
Optionally, the second length condition is that a minimum value of prefix lengths of the character strings is stored in the prefix BF.
In a fourth aspect, the present invention provides a character string filtering apparatus applied to an interface filtering apparatus of a network node, where the interface filtering apparatus at least includes a first filtering component and a second filtering component, a character string used for identifying data related to the network node is stored in the first filtering component, a character string prefix used for identifying data related to the network node is stored in the second filtering component, and the character string filtering apparatus includes a receiving unit, a determining unit, a first identifying unit, and a sending unit:
the receiving unit is used for receiving an acquisition request, wherein the acquisition request comprises a character string to be identified of the requested data;
the judging unit is used for judging whether the character string to be identified is matched with the character string stored by the first filtering component and the character string prefix stored by the second filtering component; if the character string to be identified is matched with any one of the character strings stored in the first filtering component or any one of the character string prefixes stored in the second filtering component, triggering the first identification unit;
the first identification unit is used for identifying the requested data as data related to the network node;
the sending unit is configured to upload the acquisition request to the network node.
Optionally, the system further comprises a second identifying unit and a discarding unit:
if the character string to be identified is not matched with the character string stored by the first filtering assembly and the prefix of the character string stored by the second filtering assembly, the judging unit triggers a second identifying unit;
the second identification unit is used for identifying the requested data as data irrelevant to the network node;
the discarding unit is configured to discard the acquisition request without uploading the acquisition request to the network node.
Optionally, the network node is in an NDN.
In a fifth aspect, the present invention provides a control device for an interface filtering device, which is applied to the interface filtering device, wherein the interface filtering device at least includes a first filtering component and a second filtering component, the first filtering component includes a character string BF, and the second filtering component includes a prefix BF; the string BF is used for storing a string for identifying data related to the network node; the prefix BF is used for storing a string prefix for identifying data related to the network node; the interface filtering device records the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF, and the regulating and controlling device comprises a calculating unit, a deleting unit and a storing unit:
the calculation unit is used for calculating the degree of the character string BF with character string false positives according to the number of the character strings stored in the character string BF, wherein the character string false positives are used for identifying probability numerical values of the character string BF, which are determined by the character string BF and are not matched with the character strings stored in the character string BF and are to be recognized as matching errors; if the false positive of the character string exceeds a first preset threshold value, triggering the deleting unit;
the deleting unit is used for deleting a first character string meeting a first length condition from the character string BF, wherein the first character string is a character string stored in the character string BF;
and the storage unit is used for storing the first character string as a character string prefix into the prefix BF.
Optionally, if the false positive of the character string is lower than a second preset threshold, the computing unit triggers the deleting unit, and the deleting unit is further configured to delete a first character string prefix meeting a second length condition from the prefix BF, where the first character string prefix is a character string prefix stored in the prefix BF;
the storage unit is further configured to store the first string prefix as a string in the string BF.
Optionally, the storing unit is further configured to store the string prefix subordinate to the first string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
Optionally, the method further includes a determining unit and a merging unit:
the calculation unit is further configured to calculate, according to the number of the prefixes of the strings stored in the prefix BF, a degree of occurrence of prefix false positives in the prefix BF, where the prefix false positives are used to identify probability values that the prefix BF determines, as a match, that a string to be identified is erroneously determined to be mismatched with all of the prefixes of the strings stored in the prefix BF; if the prefix false positive exceeds a third preset threshold, triggering the determining unit;
the determining unit is configured to determine a plurality of character string prefixes meeting a first length condition, where the plurality of character string prefixes belong to the same common prefix;
the merging unit is configured to merge the plurality of character string prefixes into the common prefix, and delete the plurality of character string prefixes from the prefix BF.
Optionally, if the prefix false positive is lower than a fourth preset threshold, the calculation unit triggers the deletion unit, and the deletion unit is further configured to delete a second string prefix meeting a second length condition from the prefix BF, where the second string prefix is a string prefix stored in the prefix BF;
the storage unit is further configured to store the second string prefix as a string in the string BF.
Optionally, the storing unit is further configured to store the string prefix subordinate to the second string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
Optionally, the first length condition is a maximum value of a length of a string stored in the string BF.
Optionally, the second length condition is that a minimum value of prefix lengths of the character strings is stored in the prefix BF.
It can be seen from the above technical solutions that, an interface filtering apparatus is provided for a network node, the interface filtering apparatus includes a first filtering component and a second filtering component for storing a character string and a character string prefix of data related to the network node, the interface filtering apparatus can receive an acquisition request sent by a data requesting party before the network node, match a character string to be recognized carried in the acquisition request according to the stored character string and character string prefix, if the character string to be recognized can be matched to any one of the character string or character string prefix in the interface filtering apparatus, it can be determined that data requested in the acquisition request is related to the network node, the acquisition request can be uploaded to the network node for processing, it can be seen that the acquisition request requesting to acquire data unrelated to the network node can be effectively filtered through the interface filtering apparatus, it is ensured as much as possible that data requested by the acquisition request that the network node needs to process is related to the network node, therefore, the utilization rate of the processing resources of the network nodes is improved, and the processing efficiency of the network nodes is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for filtering a string according to an embodiment of the present invention;
fig. 2 is a device structure diagram of an interface filtering device for character string filtering according to an embodiment of the present invention;
fig. 3 is a hardware structure diagram of an NDN-NIC architecture according to an embodiment of the present invention;
fig. 4 is a flowchart of a method for controlling an interface filtering apparatus according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a regulation and control of an interface filtering apparatus in an NDN scenario according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a regulation and control of an interface filtering apparatus in an NDN scenario according to an embodiment of the present invention;
fig. 7 is a flowchart of a method for controlling an interface filtering apparatus according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a regulation and control of an interface filtering apparatus in an NDN scenario according to an embodiment of the present invention;
FIG. 9a is a schematic diagram illustrating adjustment and control between BF-CS and BF-FIB in an NDN scene according to an embodiment of the present invention;
FIG. 9b is a schematic diagram illustrating adjustment and control between BF-CS and BF-FIB in an NDN scene according to an embodiment of the present invention;
FIG. 9c is a schematic diagram illustrating adjustment and control between BF-CS and BF-FIB in an NDN scene according to an embodiment of the present invention;
FIG. 9d is a schematic diagram illustrating adjustment and control between BF-CS and BF-FIB in an NDN scene according to an embodiment of the present invention;
FIG. 10 is a block diagram of a string filter according to an embodiment of the present invention
Fig. 11 is a device structure diagram of a conditioning device for an interface filtering device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a communication network, Data may be identified by a string, for example, in a content tagging network (NDN), content (Data) is identified by a name (i.e., a string or interest) rather than an address of a network node storing the content, i.e., a destination host. When a data requester (Consumer) wishes to obtain a piece of data, an obtaining request carrying a string identifying the piece of data may be sent to a network node, where the obtaining request does not have an appointed receiver, for example, in an NDN, and the obtaining request (Interest) sent by the content requester is propagated in a shared media (for example, a Wi-Fi environment) without specifying a specific receiver. The acquisition request may be received by a network node in the shared medium, such as a terminal (host) storing data. But not every received acquisition request requests that the requested data is data associated with the network node. In some scenarios, even in a network node receiving an acquisition request, more than ninety acquisition requests request data independent of the network node, i.e., the network node can provide data other than the data requested by the acquisition requests. Processing these unrelated acquisition requests wastes a lot of system resources (CPU computation, table lookup, memory access, etc.) of the network node, which is not favorable for the popularization of such communication networks that identify data by character strings.
The network node has a need to filter received acquisition requests in order to hopefully filter acquisition requests that are not related to the data stored by the network node before processing, thereby improving system efficiency. However, there is currently no effective solution for filtering acquisition requests for network nodes.
To this end, an embodiment of the present invention provides a character string filtering method and a related apparatus, where an interface filtering apparatus is provided for a network node, where the interface filtering apparatus includes a first filtering component and a second filtering component that store a character string and a character string prefix of data related to the network node, and the interface filtering apparatus may, prior to a network node receiving an acquisition request sent by a data requester, match a character string to be recognized carried in the acquisition request according to the stored character string and character string prefix, and if the character string to be recognized can be matched to any one of the character strings or character string prefixes in the interface filtering apparatus, determine that data requested in the acquisition request is related to the network node, and upload the acquisition request to the network node for processing, thus, the interface filtering apparatus may effectively filter an acquisition request requesting to acquire data unrelated to the network node, the data requested by the acquisition request which needs to be processed by the network node is ensured to be related to the network node as much as possible, so that the utilization rate of the processing resources of the network node is improved, and the processing efficiency of the network node is improved.
The interface filtering device provided by the embodiment of the invention can be in one-to-one correspondence with the network nodes and is arranged on the interface between the network node and the network, so that the acquisition request can be received before the network node, the acquisition request is filtered by the interface filtering device, the requested data can be obviously screened out from the acquisition request irrelevant to the network node and is not provided for the network node for processing, and the system resource of the network node is avoided being wasted. In the NDN scenario, the Interface filtering apparatus may be disposed on a Network Interface Card (NIC) of a terminal in the NDN, so as to form an Interface filtering apparatus of an NDN-NIC framework. The task of identifying and filtering acquisition requests is kept on the NIC hardware to be completed so that the acquisition requests in the software protocol stack passed to the network node are as all as possible acquisition requests that the network node can handle.
Next, how to use the interface filtering device configured for the network node is first received, so as to implement an implementation scheme for reducing the processing of irrelevant acquisition requests by the network node.
Before introducing the filtering method flow, an interface filtering apparatus provided in an embodiment of the present invention is first introduced, where the interface filtering apparatus provides a filtering function of an acquisition request for a corresponding network node. The interface filtering device at least comprises a first filtering component and a second filtering component, wherein a character string used for identifying data related to the network node is stored in the first filtering component, and a character string prefix used for identifying the data related to the network node is stored in the second filtering component.
By way of example, a string associated with a network node may be understood as a string of data associated with the network node. In the present invention, the character string may have a function of identifying data, and may be, for example, a name of a file or content, or an ID, an identifier, or the like of data. The data related to the network node may include not only data stored in the network node but also data that is not stored on the network node but can be quickly acquired by the network node, such as data stored in a neighbor node of the network node. The string associated with the network node may be a string identifying such data and the string prefix associated with the network node may be a portion of the string identifying such data beginning at the start of the string. For example, a string of data is "/a/b/3", the starting point of the string is the leftmost "/", and the string prefix of the string may include "/", "/a/b", etc. For example, a string prefix "/" may include all strings that begin with a "/", e.g., "/a", "/b/1", etc., may be matched with the string prefix "/".
It can be seen that the string corresponds to the full name of the data, while the string prefix corresponds to the partial name of the data.
In the context of the NDN, the string stored in the first filtering component may be a Content name (string of data) stored in a Content Store (CS) of a network node of the NDN. The string prefix stored in the second filtering component may be a name prefix (string prefix) stored in a Forwarding Information Base (FIB) or a Pending Interest Table (PIT) of the network node.
Besides the first filtering component and the second filtering component, the interface filtering apparatus may further include other filtering components, for example, other filtering components that store character string prefixes.
Fig. 1 is a flowchart of a method of filtering a character string according to an embodiment of the present invention, where the method is applied to an interface filtering apparatus of a network node, and in a selectable application scenario, the network node is a network node in an NDN, and the method includes:
s101: the interface filtering device receives an acquisition request, wherein the acquisition request comprises a character string to be identified of the requested data.
For example, the interface filtering means may receive a get request in the network before the network node, the get request requesting to get data, the data including a string for identification. For the interface filtering apparatus, since it is not yet determined whether the data is data related to the network node, the basis for determining whether the data is related is as follows: the character string of the data is referred to as a character string to be recognized.
S102: and the interface filtering device judges whether the character string to be identified is matched with the character string stored by the first filtering component and the character string prefix stored by the second filtering component. And if the character string to be identified is matched with any one of the character strings stored in the first filtering component or any one of the character string prefixes stored in the second filtering component, executing S103.
For example, the character string to be recognized may be sequentially matched with the first filter component and the second filter component, or may be matched at the same time. Taking the character string to be recognized "/A/b/3" as an example, if the character string stored in the first filter component includes "/A/b/3", the "/A/b/3" can be considered to match the character string in the first filter component. If the string prefix stored in the second filter component includes one of "/a/b", "/a", "/", it can also be considered that the "/a/b/3" matches the string prefix stored in the second filter.
S103: the interface filtering means identifies the requested data as data associated with the network node.
S104: and the interface filtering device uploads the acquisition request to the network node.
When the data corresponding to the character string to be recognized is confirmed to be the data related to the network node through the character string to be recognized, the acquisition request can be uploaded to the network node, and the network node performs subsequent processing on the acquisition request.
It should be noted that, when performing the matching in S102, if the character string to be identified does not match the character string stored in the first filtering component or the prefix of the character string stored in the second filtering component, the interface filtering apparatus identifies that the requested data is data unrelated to the network node, that is, even if the obtaining request is provided to the network node, the network node cannot provide the data of the sender of the obtaining request. In this case, the interface filtering means may discard the acquisition request without uploading the acquisition request to the network node.
It can be seen that, an interface filtering device is provided for a network node, the interface filtering device comprises a first filtering component and a second filtering component for storing a character string and a character string prefix of data related to the network node, the interface filtering device can receive an acquisition request sent by a data requesting party before the network node, match a character string to be identified carried in the acquisition request according to the stored character string and character string prefix, if the character string to be identified can be matched with any character string or character string prefix in the interface filtering device, can determine that the data requested in the acquisition request is related to the network node, can upload the acquisition request to the network node for processing, and thus, the interface filtering device can effectively filter the acquisition request requesting to acquire data unrelated to the network node, and ensure that the data requested by the acquisition request to be processed by the network node is related to the network node as much as possible, therefore, the utilization rate of the processing resources of the network nodes is improved, and the processing efficiency of the network nodes is improved.
The interface filter device will be described next from the perspective of the device. As shown in fig. 2, the interface filtering apparatus 200 is configured to Filter the obtaining request for the corresponding network node, and the interface filtering apparatus 200 at least includes a first filtering component 210 and a second filtering component 220, where the first filtering component 210 includes a string Bloom Filter (BF) 211, and the second filtering component 220 includes a prefix BF 221.
BF is a data structure for presence detection (membership query) and is widely used for flow classification and lookup structures in networks. The data stored in the BF will be set through the hash table of the BF. The character string BF and the prefix BF in the embodiment of the invention are both BF.
The string BF211 is used to hold strings for identifying data related to said network node.
The prefix BF221 is used to hold a string prefix that identifies data related to the network node.
The interface filtering apparatus 200 is configured to receive an acquisition request that can be received by the network node, where the acquisition request includes a character string to be identified of the requested data.
The character string BF211 and the prefix BF221 are used for matching the character string to be identified according to the stored character string and the character string prefix, and if the character string to be identified is matched with any one of the character string stored in the character string BF or matched with any one of the character string prefix stored in the prefix BF, the interface filtering device is also used for identifying the requested data as data related to the network node; and uploading the acquisition request to the network node.
Optionally, if the character string to be identified is not matched with the character string stored in the character string BF and the prefix of the character string stored in the prefix BF, the interface filtering device is further configured to identify the requested data as data unrelated to the network node; and discarding the acquisition request without uploading the acquisition request to the network node.
Optionally, the first filtering component 210 further includes a string Counting Bloom Filter (CBF) 212 corresponding to the string BF211, and the second filtering component 220 further includes a prefix CBF222 corresponding to the prefix BF 221.
The character string CBF212 is configured to record a use condition of a first hash table, where the first hash table is a hash table corresponding to the first filtering component, and the first hash table is configured to record a hash value of a character string stored in the character string BF 211.
The prefix CBF222 is configured to record a use condition of a second hash table, where the second hash table is a hash table corresponding to the second filtering component, and the second hash table is configured to record a hash value of a string prefix stored in the prefix BF 221.
For example, CBF also belongs to a BF.
In the BF, a character string stored in a hash table (hash table) may be recorded, taking the character string BF211 as an example, if the character string BF211 stores one character string, hash values corresponding to the character string may be stored in several positions in a first hash table, when the character string BF211 needs to match the character string to be recognized, hash calculation may be performed on the character string to be recognized, and the obtained hash values are compared with the first hash table, if the hash value of the character string to be recognized has a record in the corresponding position in the first hash table, the character string BF211 may consider that the character string to be recognized matches the character string stored in itself, and if one or several records exist in the corresponding position in the first hash table, the character string BF211 may consider that the character string to be recognized does not match the character string stored in itself. The character string CBF212 corresponding to the character string BF211 may be used to record the usage of the first hash table, for example, which positions in the first hash table are recorded, and which times are recorded. For example, the hash value of the string 1 stored in the string BF211 is located at positions a1, b2 and c2 in the first hash table, and the hash value of the string 2 is located at positions a1, b5 and c8 in the first hash table, so that the usage of the first hash table recorded in the string CBF212 may be that the position a1 is recorded twice, and the positions b2, c2, b5 and c8 are recorded once.
The prefix BF221 may also perform hash calculation on the stored string prefix, and record the obtained hash value in the second hash table, and similarly, the record of the prefix CBF222 on the second hash table is similar to the record of the string CBF on the first hash table, which is not described herein again.
Optionally, when the character string stored in the character string BF211 is changed, the character string CBF212 is further configured to update the usage of the first hash table according to the change.
When the character string prefix stored in the prefix BF221 is changed, the prefix CBF222 is further configured to update the usage of the second hash table according to the change.
For example, changing the character string stored in the character string BF211 may include deleting an existing character string, may include storing a new character string, and may include modifying an existing character string. The character string stored in the character string BF211 may be changed actively, or may be changed correspondingly according to the change of the data related to the network node, for example, the network node deletes the originally stored data a, and the character string BF211 also needs to delete the character string of the originally stored data a correspondingly, so as to improve the filtering accuracy.
When the character string stored in the character string BF211 is changed, the character string CBF212 is further configured to update the usage of the first hash table according to the changed content, for example, the hash value of the character string 1 stored in the character string BF211 is located at positions a1, b2, and c2 in the first hash table, and the hash value of the character string 2 is located at positions a1, b5, and c8 in the first hash table, so that the usage of the first hash table recorded by the character string CBF212 may be that the position a1 is recorded twice, and the positions b2, c2, b5, and c8 are recorded once. If the string 2 is deleted, the string CBF212 may subtract 1 from the record values corresponding to the positions a1, b5, and c8 in the first hash table, and the usage of the first hash table may be that the positions a1, b2, and c2 are recorded once.
To explain with specific application scenarios, in the case that the network node is in the NDN, the interface filtering apparatus may be an NDN-NIC architecture, and the apparatus structure of the architecture may be as shown in fig. 3, in the NDN-NIC architecture, the interface filtering apparatus is composed of a software part NDN-NIC driver and a hardware part NDN-NIC hardware. Wherein, three filters are constructed by using Counting Bloom Filter in the NDN-NIC driver and are realized by software; NDN-NIC hardware realizes three corresponding filters based on Bloom Filter on hardware, and the filters correspond to three CBF contents in driver one by one. I.e. BF-CS includes the corresponding CBF-CS, BF-FIB includes the corresponding CBF-FIB, and BF-PIT includes the corresponding CBF-PIT.
When the NDN-NIC is updated, the CBF on the software is updated first. CBF records which locations in the corresponding hash table change (e.g., 0- >1 and 1- > 0). Subsequently, in BF, the bits of 0- >1 in CBF are updated first, and then the bits of 1- >0 are updated. Thus, the hardware may continue to process the received fetch request during the update process.
After receiving the acquisition request, the NDN-NIC can directly perform matching operation in the BF of hardware without passing through the CBF in driver.
It can be seen that an interface filtering device is provided for a network node, the interface filtering device at least comprises a first filtering component and a second filtering component, the first filtering component comprises a character string BF, the second filtering component comprises a prefix BF, the character string BF is used for storing a character string for identifying data related to the network node, and the prefix BF is used for storing a character string prefix for identifying data related to the network node. The interface filtering device can match the character string to be identified carried in the acquisition request according to the character string BF and the character string prefix stored in the prefix BF before the network node receives the acquisition request sent by the data requesting party, thereby effectively filtering the acquisition request requesting to acquire data irrelevant to the network node, ensuring that the data requested by the acquisition request to be processed by the network node is relevant to the network node as much as possible, improving the utilization rate of network node processing resources and improving the processing efficiency of the network node.
The filtering of the acquisition request is realized through BF in the interface filtering device arranged for the network node, and because the storage space of BF is limited, when the amount of the stored data is large, the situation that the character string to be recognized which is originally not matched is determined to be capable of being matched by mistake is greatly increased, and the situation can be called false positive (false positive). The occurrence of false positive results in that many acquisition requests for data unrelated to the network node are not filtered by the interface filtering device, but are uploaded to the network node for processing, so that the interface filtering device cannot work normally.
In the case of false positive, for example, after a large number of character strings stored in the character string BF, positions in the first hash table may be substantially recorded, and in this case, if the character string BF matches one character string to be recognized, corresponding positions of hash values of the character string to be recognized in the first hash table are likely to be recorded by hash values of other character strings stored in the character string BF, even if the character string to be recognized does not match the character string stored in the character string BF, the character string BF still determines that the character string to be recognized matches the self-stored character string because the corresponding positions of the hash values of the character string to be recognized in the first hash table are recorded.
The degree of false positive of a BF increases exponentially with the amount of data stored in this BF, and it can be seen that the amount of data stored in BF has a significant effect on false positive.
The degree of false positive in a BF can be calculated using the following formula:
degree of false positive ═ 1-e-kn/m)k
Wherein k is the number of positions recorded in the hash table by the hash value of one data, m is the number of positions in the hash table, and n is the number of data stored in BF.
In general, the number of character strings to be stored in the character string BF in the interface filtering apparatus is much greater than the number of character string prefixes to be stored in the prefix BF, so that the character string BF is more likely to have a problem of false positive of the character string for the character string, that is, the character string BF has a higher possibility of erroneously determining as a match the character string to be recognized that does not match any of the character strings stored in the character string BF.
In order to solve the problem caused by the high BF false positive, especially the impact on the BF of the character string, the embodiment of the present invention provides a method for controlling an interface filtering apparatus, which is applied to the interface filtering apparatus, which may be the interface filtering apparatus mentioned in the embodiment corresponding to fig. 1 and fig. 2. The embodiment of the present invention mainly provides three methods, namely, move operation (Transform), merge operation (Aggregation) and rollback operation (Reversion), which will be described below.
The interface filtering device at least comprises a first filtering component and a second filtering component, wherein the first filtering component comprises a character string BF, and the second filtering component comprises a prefix BF; the string BF is used for storing a string for identifying data related to the network node; the prefix BF is used to hold a string prefix that identifies data associated with the network node.
And the interface filtering device records the incidence relation between the character string stored in the character string BF and the character string prefix stored in the prefix BF. The association may be recorded, for example, by a tree structure, and the nodes in the tree structure may be characters in a character string.
Fig. 4 is a flowchart of a method for controlling an interface filtering apparatus according to an embodiment of the present invention, where the method includes:
s401: calculating the degree of occurrence of character string false positives in the character string BF according to the number of the character strings stored in the character string BF, wherein the character string false positives are used for identifying probability numerical values of the character string BF, which are determined to be matched by the character string BF and are not matched with the character strings stored in the character string BF and are to be recognized; if the false positive of the character string exceeds a first preset threshold, executing S402.
For example, the string false positive and the prefix false positive mentioned in this embodiment are both a type of BF false positive, that is, the string false positive is a false positive for string BF, and the prefix false positive is a false positive for prefix BF.
Since it is clear that the degree of false positives is related to the number of saved data in BF, the degree of string false positives can be calculated from the number of strings in the string BF. If the false positive of the character string is high, for example, exceeds a first preset threshold (for example, 25%), then the effect of matching the character string using the character string BF is poor, and the character string to be recognized of a lot of data irrelevant to the network node is erroneously determined as matching, so that a large number of irrelevant acquisition requests are uploaded to the network node, and a processing load is imposed on the network node.
In order to reduce the false positives of the character strings, the most direct way is to reduce the number of the character strings stored in the character string BF, so that it is necessary to determine the appropriate character string from the character string BF.
S402: and deleting a first character string meeting a first length condition from the character string BF, wherein the first character string is a character string stored in the character string BF.
S403: and saving the first character string as a character string prefix into the prefix BF.
For example, it is necessary to determine the character string for deletion in consideration of not having an excessive influence on the filtering accuracy of the character string BF, and also in consideration of not having an excessive influence on the filtering accuracy of the prefix BF when the character string becomes a character string prefix because the deleted character string is stored as a character string prefix in the prefix BF. Therefore, in the embodiment of the present invention, the length of the character string is used as the determination condition, and when a character string is long, even if the character string is used as the prefix of the character string, the range of the character string that can be matched with the prefix of the character string is not too large, and the filter precision of the prefix BF is not greatly affected.
The operation of deleting a character string in the character string BF and storing the character string as a character string prefix in the prefix BF may be referred to as a move operation (Transform). The move operation is mainly applied to the character string BF. This filter accuracy impact on the prefix BF resulting from the conversion of a string into a string prefix may be referred to as a prefix match false positive (prefix match false positive). For example, "/a/b/3" as the character string, the character string to be recognized "/a/b/3/1" is not matched with the character string "/a/b/3", but, when "/a/b/3" as the character string prefix, the character string to be recognized "/a/b/3/1" which could not be matched with the character string prefix "/a/b/3", it can be seen that the operation of using the character string as the character string prefix can match some character strings to be recognized which could not be matched with the character string to be recognized which corresponds to data which is not related to the network node, so the obtaining request carrying the character string to be recognized should not be uploaded to the network node, however, when the character string "/a/b/3" is used as the character string prefix, the acquisition request carrying the character string "/a/b/3/1" to be recognized can be uploaded to the network node, which causes unnecessary processing burden to the network node. It can be seen that prefix matching false positives need to be considered when performing move operations.
Therefore, a character string whose character string length meets the first length condition can be determined from the character string BF as the character string removed from the character string BF.
The length of the character string described herein may be determined according to the number of characters included in the character string. The larger the number of characters of the character string, the longer the length is relatively.
Optionally, the first length condition is a maximum value of a length of a string stored in the string BF. That is, the first character string may be the longest character string of the character strings BF. Thereby having minimal impact on the prefix match false positives in which BF.
Referring to the specific NDN application scenario description, as shown in fig. 5, a CS controller (CS-controller) first sends an LGT _ CS, and queries a longest entry x (i.e., a longest string) stored in a name tree (name tree), where the name tree is an association relationship between a string stored in the BF-CS and a string prefix stored in the BF-FIB.
Then DEL _ CS is sent out, and the queried x is deleted in the BF-CS; then sends information ADD _ FIB of adding x to FIB controller (FIB-controller), and then sends LBL _ FIB to update the state of x node in name tree: CS- > FIB, then sending ADD _ BF _ FIB to the BF-FIB, adding x, and completing the Transform operation of one table entry.
It can be seen that, through the calculation of the number of the stored character strings in the character string BF and the degree of the character string false positives of the character string BF, when the character string false positives exceed a first preset threshold, a first character string meeting a first length condition in the character string BF is deleted from the character string BF, so that the number of the stored character strings in the character string BF is reduced, thereby effectively reducing the degree of the false positives in the character string BF, improving the filtering efficiency of the character string BF, and meanwhile, in order to improve the filtering accuracy, the first character string can be stored as a character string prefix in the prefix BF, and the prefix BF realizes a matching function for the first character string through the prefix formed by the first character string.
Optionally, for S401, if the false positive of the character string is lower than a second preset threshold, the first character string prefix meeting a second length condition may be deleted from the prefix BF, where the first character string prefix is a character string prefix stored in the prefix BF.
And saving the first character string prefix as a character string into the character string BF.
For example, when the string false positive of the string BF is low (e.g., 10%), the number of strings in the string BF may be increased accordingly, so as to efficiently use the string BF.
In order to determine the first string prefix from the prefix BF, the present invention provides a method of determining a string prefix from the prefix BF to move to the string BF as a string, and adding a rollback operation (version) of a plurality of string prefixes belonging to the moved string prefix to the prefix BF. The rollback operation is mainly applied to the prefix BF.
When the first character string prefix is determined from the prefix BF, the group matching false positive of the prefix BF can be considered, and the selection of the shorter character string prefix can effectively reduce the prefix matching false positive in the prefix BF. Therefore, the second length condition may be smaller, and optionally, the second length condition is that the minimum value of the prefix length of the character string is stored in the prefix BF.
Further, after the removing the first string prefix meeting the second length condition from the prefix BF, the method further includes:
and storing the character string prefix subordinate to the first character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
For example, deleting the first string prefix and adding a plurality of string prefixes subordinate to the first string prefix can reduce the influence of the deletion of the first string prefix on the filtering accuracy of the prefix BF,
For example, after the character string prefix "/a/b/3" is removed from the prefix BF, other possible combinations subordinate to "/a/b/3", such as "/a/b/3/1" and "/a/b/3/b", may be determined from the association between the character string stored in the character string BF and the character string prefix stored in the former BF, so that "/a/b/3/1" and "/a/b/3/b" may be added to the prefix BF for counteracting the influence of the deletion of the character string prefix "/a/b/3" on the filtering accuracy of the prefix BF to some extent.
And (3) combining a specific NDN application scenario description, as shown in FIG. 6. Due to the introduction of the prefix matching false positive concept, NDN-NIC must balance the false positive in two aspects, and guarantee the total false positive to be minimum. NDN-NIC sets upper and lower bound thresholds for BF false positives. And when the false positive of the BF is lower than the lower bound threshold, adopting a rollback operation to expand a part of BF-FIB elements so as to reduce the prefix matching false positive. The rollback operation may select a string prefix maintained by a certain BF-FIB at a time, convert it to a corresponding string in the BF-CS, and then add other string prefixes subordinate to the selected string prefix to the BF-FIB.
The rollback operation execution process is shown in fig. 6. The FIB-controller firstly sends out an SRT _ FIB message to inquire the shortest prefix x with a plurality of successors (x1, x 2); then sending an ADD _ CS message to a CS-controller to ADD a character string x; CS-controller ADDs string x in BF _ CS with ADD _ BF _ CS command, and updates the state of x in name tree with LBL _ CS message (at this time, if x is found not to exist in original CS, the originally added x is deleted); and on the other end, the BF-controller deletes x in the BF-FIB by using DEL _ BF _ FIB commands, ADDs x1 and x2 to the BF-FIB by using ADD _ BF _ FIB, and finally ADDs the new states of x1 and x2 in the name tree by using RMV _ LBL messages as character string prefixes in the BF-FIB.
Besides the need to pay attention to the false positive of the string BF, the need to pay attention to the number of the string prefixes in the prefix BF also causes the problem of false positive of the prefix BF like the string BF if the number of the string prefixes is too large.
On the basis of the embodiment corresponding to fig. 4, fig. 7 is a flowchart of a method for controlling an interface filtering apparatus according to an embodiment of the present invention, where the method includes:
s701: and calculating the degree of prefix false positive of the prefix BF according to the number of the character string prefixes stored in the prefix BF, wherein the prefix false positive is used for identifying the probability numerical value that the prefix BF determines the error of the character string to be identified, which is not matched with the character string prefixes stored in the prefix BF, as the matching. If the prefix false positive exceeds a third preset threshold, executing S702; if the prefix false positive is lower than a fourth preset threshold, S704 is executed.
For example, the way of calculating prefix false positives is consistent with the way of calculating false positives of BF in the conventional way, and is not described herein again. Through the third preset threshold and the fourth preset threshold, the prefix false positive of the prefix BF can be controlled within a certain range, the influence of overhigh prefix false positive on the filtering accuracy of the prefix BF is avoided, and the inefficient use of overlow prefix false positive on the prefix BF is also avoided. For convenience of calculation, the third preset threshold may be the same as the first preset threshold, and the fourth preset threshold may be the same as the second preset threshold.
S702: determining a plurality of character string prefixes meeting a first length condition, wherein the character string prefixes belong to the same common prefix.
For example, the first length condition may be set longer, since the number of strings covered by the longer string prefix is smaller than the number of strings covered by the shorter string prefix, the prefix of the combined pair of the longer string prefixes to the prefix BF is less matched with the false positive,
s703: merging the plurality of character string prefixes into the common prefix, and deleting the plurality of character string prefixes from the prefix BF.
This operation of merging two string prefixes into a common prefix common to them and deleting the two string prefixes is an operation of merging (Aggregation). The merging operation is mainly applied to the prefix BF.
Taking the merging operation of two string prefixes as an example, the string prefixes "/a/c/3" and "/a/c/b" both depend on the same common prefix "/a/c", which is the same prefix in the plurality of string prefixes. Through the merging operation, the character string prefixes "/a/c/3" and "/a/c/b" can be merged into the character string prefixes "/a/c", the prefix BF deletes the originally stored character string prefixes "/a/c/3" and "/a/c/b", and stores the character string prefixes "/a/c" into the prefix BF.
By means of combination operation, the number of character string prefixes in the prefix BF can be effectively reduced, and prefix false positives of the prefix BF are reduced.
It should be noted that the rollback operation may be triggered by a low prefix false positive of the prefix BF, in addition to a low string false positive of the string BF.
S704: and deleting a second character string prefix meeting a second length condition from the prefix BF, wherein the second character string prefix is a character string prefix stored in the prefix BF.
S705: and saving the second character string prefix as a character string into the character string BF.
S706: and storing the character string prefix subordinate to the second character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
And (3) combining a specific NDN application scenario description, as shown in FIG. 8. Firstly, the FIB-controller sends an LGT _ ADD _ SET message, and queries a BF-FIB string prefix SET (x1, x2) with the longest common prefix (x) in a name tree; then deleting x1 and x2 in the BF-FIB by using DEL _ BF _ FIB messages, and adding x by using ADD _ BF _ FIB; and finally, sending out the states of x, x1 and x2 on the name tree updated by RMV _ LBL and LBL _ FIB.
Next, how the three operations described above are combined and used is further explained through a specific implementation scheme in the NDN scenario, as shown in fig. 9a, the string stored in the BF-CS and the string prefix stored in the BF-FIB are shown.
When BF-CS newly adds character strings '/A/a/2', the number of the character strings in BF-CS is changed into 7, at the moment, the false positive of the character strings is larger than a first preset threshold value, and the moving operation is required. And selecting the longest character string '/A/b/2' in the BF-CS from the BF-CS as a character string prefix and storing the character string prefix in the BF-FIB. The BF-CS and BF-FIB after the move operation may be as shown in FIG. 9 b.
And at the moment, the prefix false positive of the BF-FIB is larger than a third preset threshold value, and two longest string prefixes "/A/c/1" and "/A/c/2" with the same common prefix in the BF-FIB are selected for carrying out merging operation. The BF-CS and BF-FIB after the move operation may be as shown in FIG. 9 c.
After the BF-CS deletes the two strings "/A/a/1" and "/A/b/1", the string false positive is smaller than the second preset threshold, at this time, the string prefix "/A/c" (the shortest string prefix in the BF-FIB) existing in the BF-FIB is moved to the BF-CS as a string, and simultaneously two subsequent string prefixes "/A/c/1" and "/A/c/2" of "/A/c" are replaced to the BF-FIB, and the BF-CS and the BF-FIB after the rollback operation is completed can be as shown in FIG. 9 d.
Fig. 10 is a device structure diagram of a character string filtering device according to an embodiment of the present invention, which is applied to an interface filtering device of a network node, where the interface filtering device at least includes a first filtering component and a second filtering component, a character string used for identifying data related to the network node is stored in the first filtering component, a character string prefix used for identifying data related to the network node is stored in the second filtering component, and the character string filtering device 1000 includes a receiving unit 1001, a determining unit 1002, a first identifying unit 1003, and a sending unit 1004:
the receiving unit 1001 is configured to receive an acquisition request, where the acquisition request includes a to-be-identified character string of requested data;
the judging unit 1002 is configured to judge whether the character string to be recognized matches the character string stored in the first filtering component and the character string prefix stored in the second filtering component; if the character string to be recognized matches any of the character strings stored in the first filtering component or any of the character string prefixes stored in the second filtering component, triggering the first recognition unit 1003;
the first identifying unit 1003 is configured to identify the requested data as data related to the network node;
the sending unit 1004 is configured to upload the obtaining request to the network node.
Optionally, the system further comprises a second identifying unit and a discarding unit:
if the character string to be identified is not matched with the character string stored by the first filtering assembly and the prefix of the character string stored by the second filtering assembly, the judging unit triggers a second identifying unit;
the second identification unit is used for identifying the requested data as data irrelevant to the network node;
the discarding unit is configured to discard the acquisition request without uploading the acquisition request to the network node.
Optionally, the network node is in a content tagging network NDN.
It can be seen that, an interface filtering device is provided for a network node, the interface filtering device comprises a first filtering component and a second filtering component for storing a character string and a character string prefix of data related to the network node, the interface filtering device can receive an acquisition request sent by a data requesting party before the network node, match a character string to be identified carried in the acquisition request according to the stored character string and character string prefix, if the character string to be identified can be matched with any character string or character string prefix in the interface filtering device, it can be determined that the data requested in the acquisition request is related to the network node, the acquisition request can be uploaded to the network node for processing, it can be seen that the acquisition request for requesting to acquire data unrelated to the network node can be effectively filtered through the interface filtering device, it is ensured that the data requested by the acquisition request to be processed by the network node is related to the network node as much as possible, therefore, the utilization rate of the processing resources of the network nodes is improved, and the processing efficiency of the network nodes is improved.
Fig. 11 is a device structure diagram of a control device for an interface filtering device, which is applied to the interface filtering device, and the interface filtering device at least includes a first filtering component and a second filtering component, where the first filtering component includes a character string BF, and the second filtering component includes a prefix BF; the string BF is used for storing a string for identifying data related to the network node; the prefix BF is used for storing a string prefix for identifying data related to the network node; the interface filtering apparatus records an association relationship between a character string stored in the character string BF and a character string prefix stored in the former BF, and the control apparatus 1100 includes a calculation unit 1101, a deletion unit 1102, and a storage unit 1103:
the calculation unit 1101 is configured to calculate, according to the number of the character strings stored in the character string BF, a degree of occurrence of a character string false positive in the character string BF, where the character string false positive is used to identify a probability value that the character string BF determines, as a match, a to-be-recognized character string error that is not matched with all the character strings stored in the character string BF; if the false positive of the character string exceeds a first preset threshold, triggering the deleting unit 1102;
the deleting unit 1102 is configured to delete a first character string meeting a first length condition from the character string BF, where the first character string is a character string stored in the character string BF;
the storing unit 1103 is configured to store the first character string as a character string prefix in the prefix BF.
Optionally, if the false positive of the character string is lower than a second preset threshold, the computing unit triggers the deleting unit, and the deleting unit is further configured to delete a first character string prefix meeting a second length condition from the prefix BF, where the first character string prefix is a character string prefix stored in the prefix BF;
the storage unit is further configured to store the first string prefix as a string in the string BF.
Optionally, the storing unit is further configured to store the string prefix subordinate to the first string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
Optionally, the method further includes a determining unit and a merging unit:
the calculation unit is further configured to calculate, according to the number of the prefixes of the strings stored in the prefix BF, a degree of occurrence of prefix false positives in the prefix BF, where the prefix false positives are used to identify probability values that the prefix BF determines, as a match, that a string to be identified is erroneously determined to be mismatched with all of the prefixes of the strings stored in the prefix BF; if the prefix false positive exceeds a third preset threshold, triggering the determining unit;
the determining unit is configured to determine a plurality of character string prefixes meeting a first length condition, where the plurality of character string prefixes belong to the same common prefix;
the merging unit is configured to merge the plurality of character string prefixes into the common prefix, and delete the plurality of character string prefixes from the prefix BF.
Optionally, if the prefix false positive is lower than a fourth preset threshold, the calculation unit triggers the deletion unit, and the deletion unit is further configured to delete a second string prefix meeting a second length condition from the prefix BF, where the second string prefix is a string prefix stored in the prefix BF;
the storage unit is further configured to store the second string prefix as a string in the string BF.
Optionally, the storing unit is further configured to store the string prefix subordinate to the second string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
Optionally, the first length condition is a maximum value of a length of a string stored in the string BF.
Optionally, the second length condition is that a minimum value of prefix lengths of the character strings is stored in the prefix BF.
It can be seen that, through the calculation of the number of the stored character strings in the character string BF and the degree of the character string false positives of the character string BF, when the character string false positives exceed a first preset threshold, a first character string meeting a first length condition in the character string BF is deleted from the character string BF, so that the number of the stored character strings in the character string BF is reduced, thereby effectively reducing the degree of the false positives in the character string BF, improving the filtering efficiency of the character string BF, and meanwhile, in order to improve the filtering accuracy, the first character string can be stored as a character string prefix in the prefix BF, and the prefix BF realizes a matching function for the first character string through the prefix formed by the first character string.
The first filtering component, the first hash table, the first preset threshold, the first length condition, the first string, and the "first" prefix of the first string mentioned in the embodiments of the present invention are only used for name identification, and do not represent the first in sequence. The rule applies equally to "second", "third", "fourth".
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (26)

1. A string filtering method applied to an interface filtering apparatus of a network node, the interface filtering apparatus including at least a first filtering component and a second filtering component, a string identifying data related to the network node being stored in the first filtering component, and a string prefix identifying data related to the network node being stored in the second filtering component, the method comprising:
the interface filtering device receives an acquisition request, wherein the acquisition request comprises a character string to be identified of requested data;
the interface filtering device judges whether the character string to be identified is matched with the character string stored by the first filtering component and the character string prefix stored by the second filtering component;
if the character string to be identified is matched with any one of the character strings stored in the first filtering component or any one of the character string prefixes stored in the second filtering component, the interface filtering device identifies the requested data as data related to the network node;
and the interface filtering device uploads the acquisition request to the network node.
2. The method according to claim 1, wherein the interface filter device determines whether the character string to be recognized matches the character string stored in the first filter component and the character string prefix stored in the second filter component, and includes:
if the character string to be identified is not matched with the character string stored by the first filtering component and the prefix of the character string stored by the second filtering component, the interface filtering device identifies the requested data as data irrelevant to the network node;
the interface filtering means discards the acquisition request without uploading the acquisition request to the network node.
3. The method according to claim 1 or 2, characterized in that said network node is in a content marking network, NDN.
4. An interface filtering apparatus for string filtering, wherein the interface filtering apparatus is configured to filter an acquisition request for a corresponding network node, and the interface filtering apparatus at least includes a first filtering component and a second filtering component, where the first filtering component includes a string bloom filter BF, and the second filtering component includes a prefix BF;
the string BF is used for storing a string for identifying data related to the network node;
the prefix BF is used for storing a string prefix for identifying data related to the network node;
the interface filtering device is used for receiving an acquisition request which can be received by the network node, wherein the acquisition request comprises a character string to be identified of the requested data;
the character string BF and the prefix BF are used for matching the character string to be identified according to the stored character string and the character string prefix, and if the character string to be identified is matched with any one of the character string stored in the character string BF or matched with any one of the character string prefix stored in the prefix BF, the interface filtering device is also used for identifying the requested data as data related to the network node; and uploading the acquisition request to the network node.
5. The interface filtering device according to claim 4, wherein if the string to be identified does not match the string stored in the string BF and the prefix stored in the prefix BF, the interface filtering device is further configured to identify the requested data as data unrelated to the network node; and discarding the acquisition request without uploading the acquisition request to the network node.
6. The interface filtering apparatus according to claim 4, wherein the first filtering component further includes a string count bloom filter unit (CBF) corresponding to the string BF, and the second filtering component further includes a prefix (CBF) corresponding to the prefix BF:
the character string CBF is used for recording the use condition of a first hash table, the first hash table is a hash table corresponding to the first filtering component, and the first hash table is used for recording the hash value of the character string stored in the character string BF;
the prefix CBF is used for recording the use condition of a second hash table, the second hash table is a hash table corresponding to the second filtering component, and the second hash table is used for recording the hash value of the character string prefix stored by the prefix BF.
7. The interface filtering apparatus according to claim 6, wherein when a character string stored in the character string BF is changed, the character string CBF is further configured to update the usage of the first hash table according to the change content;
when the character string prefix stored in the prefix BF is changed, the prefix CBF is also used for updating the use condition of the second hash table according to the change content.
8. A regulation and control method for an interface filtering device is characterized by being applied to the interface filtering device, wherein the interface filtering device at least comprises a first filtering component and a second filtering component, the first filtering component comprises a character string bloom filter BF, and the second filtering component comprises a prefix BF; the character string BF is used for storing a character string used for identifying data related to the network node; the prefix BF is used for storing a string prefix for identifying data related to the network node; the interface filtering device records the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF, and the method comprises the following steps:
calculating the degree of occurrence of character string false positives in the character string BF according to the number of the character strings stored in the character string BF, wherein the character string false positives are used for identifying probability numerical values of the character string BF, which are determined to be matched by the character string BF and are not matched with the character strings stored in the character string BF and are to be recognized;
if the false positive of the character string exceeds a first preset threshold value, deleting a first character string meeting a first length condition from the character string BF, wherein the first character string is a character string stored in the character string BF;
and saving the first character string as a character string prefix into the prefix BF.
9. The method for regulating and controlling as claimed in claim 8, wherein said calculating the degree of occurrence of string false positives in said string BF according to the number of strings saved in said string BF comprises:
if the false positive of the character string is lower than a second preset threshold value, deleting a first character string prefix meeting a second length condition from the prefix BF, wherein the first character string prefix is a character string prefix stored in the prefix BF;
and saving the first character string prefix as a character string into the character string BF.
10. The control method according to claim 9, further comprising, after the removing the first string prefix meeting the second length condition from the prefix BF:
and storing the character string prefix subordinate to the first character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
11. The method for regulating and controlling according to claim 8, further comprising:
calculating the degree of prefix false positive of the prefix BF according to the number of the character string prefixes stored in the prefix BF, wherein the prefix false positive is used for identifying the probability numerical value that the prefix BF determines the error of the character string to be identified, which is not matched with the character string prefixes stored in the prefix BF, as matching;
if the prefix false positive exceeds a third preset threshold, determining a plurality of character string prefixes meeting a first length condition, wherein the character string prefixes belong to the same common prefix;
merging the plurality of character string prefixes into the common prefix, and deleting the plurality of character string prefixes from the prefix BF.
12. The method according to claim 11, wherein the calculating the degree of prefix false positive of the prefix BF according to the number of the stored string prefixes in the prefix BF comprises:
if the prefix false positive is lower than a fourth preset threshold, deleting a second character string prefix meeting a second length condition from the prefix BF, wherein the second character string prefix is a character string prefix stored in the prefix BF;
and saving the second character string prefix as a character string into the character string BF.
13. The control method according to claim 12, further comprising, after the removing the second string prefix meeting the second length condition from the prefix BF:
and storing the character string prefix subordinate to the second character string prefix into the prefix BF according to the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF.
14. The control method according to claim 8, wherein the first length condition is that a maximum value of a string length is stored in the string BF.
15. The control method according to claim 9, wherein the second length condition is that a minimum value of a prefix length of a string is stored in the prefix BF.
16. A character string filtering apparatus applied to an interface filtering apparatus of a network node, the interface filtering apparatus including at least a first filtering component and a second filtering component, a character string for identifying data related to the network node being stored in the first filtering component, a character string prefix for identifying data related to the network node being stored in the second filtering component, the character string filtering apparatus including a receiving unit, a determining unit, a first identifying unit, and a transmitting unit:
the receiving unit is used for receiving an acquisition request, wherein the acquisition request comprises a character string to be identified of the requested data;
the judging unit is used for judging whether the character string to be identified is matched with the character string stored by the first filtering component and the character string prefix stored by the second filtering component; if the character string to be identified is matched with any one of the character strings stored in the first filtering component or any one of the character string prefixes stored in the second filtering component, triggering the first identification unit;
the first identification unit is used for identifying the requested data as data related to the network node;
the sending unit is configured to upload the acquisition request to the network node.
17. The character string filtering apparatus according to claim 16, further comprising a second identifying unit and a discarding unit:
if the character string to be identified is not matched with the character string stored by the first filtering assembly and the prefix of the character string stored by the second filtering assembly, the judging unit triggers a second identifying unit;
the second identification unit is used for identifying the requested data as data irrelevant to the network node;
the discarding unit is configured to discard the acquisition request without uploading the acquisition request to the network node.
18. String filter according to claim 16 or 17, characterised in that the network node is in a content marking network NDN.
19. The regulation and control device for the interface filtering device is characterized by being applied to the interface filtering device, wherein the interface filtering device at least comprises a first filtering component and a second filtering component, the first filtering component comprises a character string bloom filter BF, and the second filtering component comprises a prefix BF; the character string BF is used for storing a character string used for identifying data related to the network node; the prefix BF is used for storing a string prefix for identifying data related to the network node; the interface filtering device records the incidence relation between the character string stored in the character string BF and the character string prefix stored in the former BF, and the regulating and controlling device comprises a calculating unit, a deleting unit and a storing unit:
the calculation unit is used for calculating the degree of the character string BF with character string false positives according to the number of the character strings stored in the character string BF, wherein the character string false positives are used for identifying probability numerical values of the character string BF, which are determined by the character string BF and are not matched with the character strings stored in the character string BF and are to be recognized as matching errors; if the false positive of the character string exceeds a first preset threshold value, triggering the deleting unit;
the deleting unit is used for deleting a first character string meeting a first length condition from the character string BF, wherein the first character string is a character string stored in the character string BF;
and the storage unit is used for storing the first character string as a character string prefix into the prefix BF.
20. The control device according to claim 19, wherein if the string false positive is lower than a second preset threshold, the calculation unit triggers the deletion unit, and the deletion unit is further configured to delete a first string prefix meeting a second length condition from the prefix BF, where the first string prefix is a string prefix stored in the prefix BF;
the storage unit is further configured to store the first string prefix as a string in the string BF.
21. The control device according to claim 20, wherein the storing unit is further configured to store the string prefix subordinate to the first string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
22. The control device according to claim 19, further comprising a determination unit and a combination unit:
the calculation unit is further configured to calculate, according to the number of the prefixes of the strings stored in the prefix BF, a degree of occurrence of prefix false positives in the prefix BF, where the prefix false positives are used to identify probability values that the prefix BF determines, as a match, that a string to be identified is erroneously determined to be mismatched with all of the prefixes of the strings stored in the prefix BF; if the prefix false positive exceeds a third preset threshold, triggering the determining unit;
the determining unit is configured to determine a plurality of character string prefixes meeting a first length condition, where the plurality of character string prefixes belong to the same common prefix;
the merging unit is configured to merge the plurality of character string prefixes into the common prefix, and delete the plurality of character string prefixes from the prefix BF.
23. The control device according to claim 22, wherein if the prefix false positive is lower than a fourth preset threshold, the calculation unit triggers the deletion unit, and the deletion unit is further configured to delete a second string prefix meeting a second length condition from the prefix BF, where the second string prefix is a string prefix stored in the prefix BF;
the storage unit is further configured to store the second string prefix as a string in the string BF.
24. The control device according to claim 23, wherein the storing unit is further configured to store the string prefix subordinate to the second string prefix into the prefix BF according to an association relationship between the string stored in the string BF and the string prefix stored in the former BF.
25. The control device according to claim 19, characterized in that the first length condition is that a maximum value of a string length is saved in the string BF.
26. The control device according to claim 20, wherein the second length condition is that a minimum value of a string prefix length is stored in the prefix BF.
CN201610850786.5A 2016-09-26 2016-09-26 Character string filtering method and related device Active CN107870925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610850786.5A CN107870925B (en) 2016-09-26 2016-09-26 Character string filtering method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610850786.5A CN107870925B (en) 2016-09-26 2016-09-26 Character string filtering method and related device

Publications (2)

Publication Number Publication Date
CN107870925A CN107870925A (en) 2018-04-03
CN107870925B true CN107870925B (en) 2021-08-20

Family

ID=61751828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610850786.5A Active CN107870925B (en) 2016-09-26 2016-09-26 Character string filtering method and related device

Country Status (1)

Country Link
CN (1) CN107870925B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471635B (en) * 2018-09-03 2021-09-17 中新网络信息安全股份有限公司 Algorithm optimization method based on Java Set implementation
CN110502629B (en) * 2019-08-27 2020-09-11 桂林电子科技大学 LSH-based connection method for filtering and verifying similarity of character strings

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6920477B2 (en) * 2001-04-06 2005-07-19 President And Fellows Of Harvard College Distributed, compressed Bloom filter Web cache server
US7673041B2 (en) * 2005-11-01 2010-03-02 Intel Corporation Method to perform exact string match in the data plane of a network processor
CN100511229C (en) * 2006-04-13 2009-07-08 华为技术有限公司 Domain name information storage and inquiring method and system
US20080022403A1 (en) * 2006-07-22 2008-01-24 Tien-Fu Chen Method and apparatus for a pattern matcher using a multiple skip structure
CN101359325B (en) * 2007-08-01 2010-06-16 北京启明星辰信息技术股份有限公司 Multi-key-word matching method for rapidly analyzing content
CN101149739A (en) * 2007-08-24 2008-03-26 中国科学院计算技术研究所 Internet faced sensing string digging method and system
CN101398820B (en) * 2007-09-24 2010-11-17 北京启明星辰信息技术股份有限公司 Large scale key word matching method
CN101383034B (en) * 2008-09-18 2016-05-18 腾讯科技(深圳)有限公司 The method and system of a kind of advertistics and input
CN101686146B (en) * 2008-09-28 2013-01-30 华为技术有限公司 Method and equipment for fuzzy query, query result processing and filtering condition processing
CN101605129B (en) * 2009-06-23 2012-02-01 北京理工大学 URL lookup method for URL filtering system
CN101901257B (en) * 2010-07-21 2012-07-04 北京理工大学 Multi-string matching method in a search engine
CN103078854B (en) * 2012-12-28 2016-04-13 北京亿赞普网络技术有限公司 Message filtering method and device
CN103294822B (en) * 2013-06-17 2016-08-10 北京航空航天大学 A kind of based on active Hash with the high-efficiency caching method of Bloom filter
CN103428093B (en) * 2013-07-03 2017-02-08 北京邮电大学 Route prefix storing, matching and updating method and device based on names
CN103544316B (en) * 2013-11-06 2017-02-08 苏州大拿信息技术有限公司 Uniform resource locator (URL) filtering system and achieving method thereof
CN104320451A (en) * 2014-10-21 2015-01-28 北京邮电大学 Content-centric networking supporting web server cache system and processing method
CN104468349B (en) * 2014-11-27 2017-11-14 中国科学院计算机网络信息中心 A kind of BGP routing authentication methods based on hop-by-hop supervision
KR101587756B1 (en) * 2015-02-17 2016-01-21 이화여자대학교 산학협력단 Apparatus and method for searching string data using bloom filter pre-searching

Also Published As

Publication number Publication date
CN107870925A (en) 2018-04-03

Similar Documents

Publication Publication Date Title
US11811660B2 (en) Flow classification apparatus, methods, and systems
US9798774B1 (en) Graph data search method and apparatus
US9742667B2 (en) Packet processing method, device and system
CN107122130B (en) Data deduplication method and device
US9787585B2 (en) Distributed storage system, control apparatus, client terminal, load balancing method and program
US10771358B2 (en) Data acquisition device, data acquisition method and storage medium
KR101577926B1 (en) Communication node, packet processing method and program
CN107305570B (en) Data retrieval method and system
CN107870925B (en) Character string filtering method and related device
CN110784336A (en) Multi-device intelligent timing delay scene setting method and system based on Internet of things
CN107547400B (en) Virtual machine migration method and device
CN109067744B (en) ACL rule processing method, device and communication equipment
CN112165505B (en) Decentralized data processing method, electronic device and storage medium
CN106789695B (en) Message processing method and device
CN108259340B (en) Topology information transmission method and device
CN112698783A (en) Object storage method, device and system
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume
CN115904211A (en) Storage system, data processing method and related equipment
CN108173689B (en) Output system of load balancing data
CN109547389B (en) Code stream file recombination method and device
CN113132261A (en) Traffic data packet classification method and device and electronic equipment
CN112148802A (en) Graph partitioning method, device, equipment and computer readable storage medium
US10146820B2 (en) Systems and methods to access memory locations in exact match keyed lookup tables using auxiliary keys
CN111506658B (en) Data processing method and device, first equipment and storage medium
JP2015053673A (en) Packet relay device and packet relay method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant