WO2021208403A1 - 数据流表及其处理方法、装置、存储介质 - Google Patents
数据流表及其处理方法、装置、存储介质 Download PDFInfo
- Publication number
- WO2021208403A1 WO2021208403A1 PCT/CN2020/124355 CN2020124355W WO2021208403A1 WO 2021208403 A1 WO2021208403 A1 WO 2021208403A1 CN 2020124355 W CN2020124355 W CN 2020124355W WO 2021208403 A1 WO2021208403 A1 WO 2021208403A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data flow
- bucket
- data stream
- record
- fingerprint
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/745—Address table lookup; Address filtering
- H04L45/7453—Address table lookup; Address filtering using hashing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/54—Organization of routing tables
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/742—Route cache; Operation thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
Definitions
- This application relates to the field of network communication technology, and in particular provides a data flow table for high-speed, large-scale concurrent data flow, and a processing method, device, and storage medium.
- the data flow table used in the current technical solutions of network communication is mainly used to store the status information of each data flow in the network, and also to realize packet forwarding, traffic classification, network intrusion detection, and network flow.
- One of the basic components required for network functions such as analysis of user behavior, network address translation, and traffic accounting.
- the processing efficiency of the data flow table has a very important influence on the actual performance of the above-mentioned network functions.
- a hash algorithm with higher space efficiency and more stable table lookup performance is also adopted through high-performance network message processing software.
- DPDK Data Plane Development Kit
- an open source message processing software library widely used in the industry uses cuckoo hash tables to implement data flow tables.
- the cuckoo hash table has high space efficiency and stable update performance, when inserting a new stream record, it may be necessary to move multiple records already in the table, which requires a longer insertion time.
- the delay of message processing increases, and even the problem of packet loss is caused.
- the exemplary embodiments of the present application provide a method and apparatus for processing a data flow table of a high-speed, large-scale concurrent data flow, which can avoid the uncertainty of the operation time of the data flow table in a high-speed network, thereby Increase the time delay of message processing, and the resulting packet loss problem.
- each embodiment of the present application provides a data flow table, wherein the data flow table includes: a fingerprint table, which is used to store the data flow fingerprints of the data flow to be inserted into the data flow, using a d-left hash table, divided into two Blocks, each block includes at least two buckets, the buckets are used to store basic units; the record table is used to store the data stream records of the data stream to be inserted, using a d-left hash table, divided into two blocks, Each block includes at least two buckets, the buckets are used to store basic units, wherein the basic units in the record table correspond to the basic units in the fingerprint table one-to-one; and an overflow table, which is used when the standby When the data stream record inserted into the data stream cannot be inserted into the record table, the data stream table to be inserted is stored.
- a fingerprint table which is used to store the data flow fingerprints of the data flow to be inserted into the data flow, using a d-left hash table, divided into two Blocks,
- the number of the basic units in the bucket does not exceed the depth of the bucket where the basic units are located.
- the depths of the buckets of the fingerprint table and the record table are equal.
- the basic unit of the fingerprint table stores data stream fingerprints
- the basic unit in the record table stores data stream records
- the data flow record is a quadruple, including a data flow identifier, a data flow state, a flow counter of the data flow, and metadata used to characterize a format.
- the data stream identifier is a five-tuple, including a source IP address, a destination IP address, a protocol number, a source port number, and a destination port number.
- the data flow identifier is used to input a hash function to output the data flow fingerprint.
- the overflow table is a single function hash table.
- the most frequently accessed basic unit among the buckets of the record table is located at the bottom of the corresponding bucket.
- various embodiments of the present application provide a method for processing a data flow table, including:
- the address of the candidate bucket in each block in the fingerprint table of the data stream table and the data stream fingerprint corresponding to the data stream identifier are obtained, wherein the fingerprint table
- the address of the candidate bucket in is the same as the address of the candidate bucket in the record table;
- the idle unit is sequentially searched from the bottom unit of the target bucket to the top unit, and when an idle unit in the target bucket is found, the data stream fingerprint of the data stream to be inserted is written into the searched idle unit, And write the data stream record of the data stream to be inserted into the record bucket corresponding to the target bucket of the record table.
- the step of detecting whether there is an idle unit in the candidate bucket of the fingerprint table further includes: if the idle unit does not exist in the candidate bucket, then adding the data stream to be inserted The data flow record is inserted into the overflow table, and the insertion process is ended.
- the step of detecting whether there are idle units in the candidate buckets of the fingerprint table further includes: if the idle units exist in the candidate buckets and the idle units in the candidate buckets The number of candidate buckets is different from each other, then one of the candidate buckets with more free units is selected as the target bucket; If the number of units is the same, the candidate bucket of the first block of the two blocks is selected as the target bucket.
- the step of obtaining a data stream fingerprint corresponding to the data stream identifier according to the data stream identifier to be inserted into the data stream includes: inputting the data stream identifier into a cyclic redundancy check algorithm to output The data stream fingerprint.
- the method before the step of acquiring the data flow table, the method further includes: performing cache alignment on the data flow table.
- the method further includes: placing the most frequently accessed basic unit in each bucket of the record table at the bottom of the corresponding bucket.
- the method further includes: calculating the address of the candidate bucket and the data flow fingerprint corresponding to the data flow to be searched according to the data flow identifier of the data flow to be searched; The data stream record corresponding to the data stream identifier of the data stream to be searched.
- the method further includes: searching for a corresponding data flow record in the data flow table according to the data flow identifier of the data flow used for updating; if the corresponding data flow record is found, then the found The field in the corresponding data stream record is updated to the corresponding field in the data stream used for update.
- the method further includes: searching for the corresponding data flow record in the data flow table according to the data flow identifier of the data flow to be deleted; if the corresponding data flow record is found, then all the data flow records found The corresponding data stream record is cleared.
- each embodiment of the present application provides a data flow table processing device, including:
- the data stream insertion module is used to obtain the address of the candidate bucket in each block in the fingerprint table of the data stream table and the data stream fingerprint for the data stream identifier according to the data stream identifier of the data stream to be inserted , Wherein the address of the candidate bucket in the fingerprint table is the same as the address of the candidate bucket in the record table;
- the processing module is configured to detect whether there is an idle unit in the candidate bucket of the fingerprint table, and if the idle unit exists, select the candidate bucket with the idle unit as a target bucket; and The bottom unit searches for the idle unit in turn toward the top unit.
- the data stream fingerprint of the data stream to be inserted is written into the searched idle unit, and the idle unit is to be inserted.
- the flow record of the data flow is written into the record bucket corresponding to the target bucket of the record table.
- the processing module is further configured to include: if there is no idle unit, insert the stream record of the data stream to be inserted into the overflow table, and end the insertion process.
- the data flow table of each data flow is composed of a fingerprint table, a record table, and an overflow table; in each data flow table, the fingerprint table and the record table both use a d-left hash table, each of which is Two blocks, each block of the fingerprint table and the record table is composed of at least 2 buckets, each bucket is used to store the basic unit, the number of basic units in each bucket does not exceed the depth of the bucket, the fingerprint table and the record The barrel depths of the tables are equal.
- the basic unit in the bucket of the fingerprint table is a data flow fingerprint
- the field content of the data flow identification of each data flow includes: source IP address, destination IP address, protocol number, source port number, and destination port number
- obtaining the data stream fingerprint of the fingerprint table includes: inputting the data stream identifier into a cyclic redundancy check algorithm, and outputting the data stream fingerprint.
- the device further includes: a search module, configured to calculate the address of the bucket of the data stream to be searched and the data stream fingerprint; use the address of the bucket obtained by calculation to determine each of the buckets from the fingerprint table Unit, and search for the data stream ID matching the calculated data stream fingerprint from the determined units; if the data stream ID matching the calculated data stream fingerprint is not found from the candidate bucket of the fingerprint table, Then continue to search the overflow table for the data stream identifier that matches the calculated data stream fingerprint.
- a search module configured to calculate the address of the bucket of the data stream to be searched and the data stream fingerprint; use the address of the bucket obtained by calculation to determine each of the buckets from the fingerprint table Unit, and search for the data stream ID matching the calculated data stream fingerprint from the determined units; if the data stream ID matching the calculated data stream fingerprint is not found from the candidate bucket of the fingerprint table, Then continue to search the overflow table for the data stream identifier that matches the calculated data stream fingerprint.
- each embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the steps of the processing method of the foregoing embodiments.
- the data flow table for high-speed, large-scale concurrent data flow and its processing method, device, and storage medium divide the data flow table into a fingerprint table, a record table, and an overflow table.
- the fingerprint table and the record table are designed based on the d-left hash table, so that each contains two blocks, each block contains several buckets, and each bucket stores multiple basic units.
- the basic unit stored in the fingerprint table is a data stream fingerprint
- the basic unit stored in the record table is a complete data stream record.
- the basic unit of the fingerprint table and the basic unit of the record table have a one-to-one correspondence.
- the overflow table is a classic single-function hash table, which is used to store data flow records that cannot be inserted into the record table.
- the design of the data structure of the flow table makes full use of cache alignment technology to obtain higher cache utilization.
- an in-bucket replacement mechanism is also designed to place frequently accessed flow records at the bottom of each bucket in the record table. Therefore, while reading the stream fingerprint, the bottom record of the bucket can be read into the cache at the same time, thereby obtaining higher performance.
- This application not only has high space efficiency, can support massive concurrent data streams, but also avoids the uncertainty of the operation time of the data flow table in the high-speed network, which leads to increased packet processing delay and even loss. The problem with the package.
- FIG. 1 is a schematic diagram of the logical structure of a data flow table provided by an embodiment of the application.
- FIG. 2 is a schematic diagram of a physical storage structure of a data flow table provided by an embodiment of the application.
- FIG. 3 is a schematic diagram of the logical structure of an IPv6 data flow table provided by an embodiment of the application.
- FIG. 4 is a schematic flowchart of the insertion process of the method for processing a data flow table provided by an embodiment of the application.
- FIG. 5 is a schematic flowchart of a search process of a method for processing a data flow table provided by an embodiment of the application.
- FIG. 6 is a schematic flowchart of an update process of a method for processing a data flow table provided by an embodiment of the application.
- FIG. 7 is a schematic flowchart of a deletion process of a method for processing a data flow table provided by an embodiment of the application.
- FIG. 8 is an internal structure diagram of a computer device provided by an embodiment of the application.
- the function of the data flow table mentioned in the embodiment of this application is to store the state information of each data flow in the network.
- Each entry of the data flow is actually a data flow record.
- a data flow record can use ⁇ Flow_ID, State ,Meter,Metadata>quadruplets, where Flow_ID is the data flow identifier (usually represented by quintuples), State is the data flow state, Meter is the flow counter, and Metadata is the metadata used to characterize the specific format of the table item .
- Flow_ID is the data flow identifier (usually represented by quintuples)
- State is the data flow state
- Meter is the flow counter
- Metadata is the metadata used to characterize the specific format of the table item .
- the four-tuple or five-tuple mentioned in this application refers to a set with multiple fields.
- a four-tuple refers to a set with four fields
- a five-tuple refers to a set with five fields.
- the general operation of the data flow table is: each time a data packet is received, the data packet is first parsed, and the flow identifier (Flow_ID) is extracted from the header field. Then, using the flow identifier as the table lookup key, by looking up the data flow table, the flow record corresponding to the flow identifier is obtained. Then, according to the metadata in the flow record, the flow record is parsed to obtain the flow state (State) and the flow counter (Meter). Finally, to update the flow counter, it may also be necessary to update the flow state.
- Flow_ID flow identifier
- data flow tables are also used in data flow processing scenarios.
- a traditional cuckoo hash table can be used to implement data flow tables.
- the exemplary embodiments provided in this application are improvements to the scenarios in which such application data flow tables process data flows, including but not limited to the improvement of the structure of the data flow table and the writing, searching, and searching of the data flow table. Process improvements such as updates and deletions.
- the solutions provided by the exemplary embodiments of the present application can be applied to most data flow table processing scenarios. Regardless of whether it is in the cloud, server, network transceiver, or other network communication hardware environment, as long as it conforms to the normal processing process of data flow, and there are equipment and scene conditions for processing data flow tables, it can be passed through the examples of this application.
- the design idea of the embodiment is to improve hardware, software or a combination thereof, and is not limited.
- the embodiment of the present application provides a method for processing a data flow table of a high-speed, large-scale concurrent data flow. As shown in Figure 4, the method includes the following steps.
- Step S1 Obtain the address of the candidate bucket in the fingerprint table and the data stream fingerprint corresponding to the data stream identifier according to the data stream identifier of the data stream to be inserted.
- the data flow table includes a fingerprint table (Fingerprints Table), a record table (Records Table), and an overflow table (Overflow Table).
- the fingerprint table and the record table are designed based on the d-left hash table, so that each contains two blocks, each block contains several buckets, and each bucket stores multiple basic units.
- the basic unit in the fingerprint table is Data Flow Footprint
- the basic unit in the record table is Data Flow Record.
- the basic units in the fingerprint table correspond to the basic units in the record table one-to-one.
- the optional bucket to be inserted is called the "candidate bucket". Because each bucket of the fingerprint table and the record table has a one-to-one correspondence, the address of the candidate bucket in the fingerprint table is the same as the address of the candidate bucket in the record table, that is, the position of the candidate bucket in the fingerprint table is determined Later, it is also possible to determine the location of the corresponding candidate bucket in the record table.
- the address of the candidate bucket needs to be determined. Specifically, the data stream identifier of the data stream to be inserted is extracted.
- the data flow identifier can be expressed as Flow_ID.
- Flow_ID the addresses of the candidate buckets in the two blocks of the fingerprint table are calculated through the hash functions h 1 and h 2 respectively, where the hash functions h 1 and h 2 refer to the hash functions used in the calculation. Greek function.
- the addresses of the candidate buckets in the two blocks of the fingerprint table are i and j respectively, where i is the address of the candidate bucket of the first block, and j is the address of the candidate bucket of the second block, where,
- the fingerprint table and the buckets in the record table also have a one-to-one correspondence. Therefore, the addresses of the candidate buckets of the two blocks in the record table also correspond to i and j.
- a 32-bit data flow fingerprint is calculated according to the data flow identifier Flow_ID through a hash function (for example, a 32-bit cyclic redundancy check algorithm hash function, CRC-32), and the data flow fingerprint is represented as ff.
- the legal data stream fingerprint ff must be greater than 0.
- Step S2 Detect whether there are idle cells in the candidate bucket of the fingerprint table, and if there are idle cells, select the candidate bucket with idle cells as the target bucket.
- the basic unit of the fingerprint table when the data stream fingerprint in the basic unit of the fingerprint table is equal to 0, the basic unit of the fingerprint table is an idle unit. When the data stream fingerprint in the basic unit of the fingerprint table is greater than 0, the basic unit of the fingerprint table is a non-empty unit.
- the process of determining the target bucket includes: checking the number of free units in the candidate buckets FB i and FB j of the fingerprint table. If the candidate bucket FB i FB j and the base unit are non-empty cell, then the candidate bucket FB i and FB j are full. In this case, insert the stream record to be inserted into the data stream into the overflow table, and end the insertion process.
- the candidate bucket FB i and FB j are not full, the candidate bucket FB i and FB j with more free units is selected as the target bucket. If the number of free units of the two is equal, the bucket in the first block is preferentially selected as the target bucket. In this embodiment, the target bucket is the candidate bucket FB i .
- the legal data stream fingerprint needs to be greater than 0, and the data stream fingerprint of the free unit is represented by 0, that is, the number of free units can be calculated by the number of basic units with a value of 0. . Therefore, a bucket with a large number of free units is a bucket with a large number of basic units with a value of 0.
- a bucket with more free units can also mean that the number of basic units with a value of 0 is more than a certain threshold (for example, more than 5 basic units with a value of 0, which means this bucket There are more than 5) buckets in the free unit.
- Step S3 Search for idle units in sequence from the bottom unit to the top unit of the target bucket, and when an idle unit in the target bucket is found, write the data stream fingerprint of the data stream to be inserted into the searched idle unit , And write the stream record of the data stream to be inserted into the record bucket corresponding to the target bucket in the record table.
- the target bucket FB i may include, for example, 8 units, where the bottom unit is FB i [0] and the top unit is FB i [7]. Start searching upwards from the bottom cell FB i [0] to find the first free cell (the value of the free cell is 0). In this embodiment, the first free unit (ie, target unit) found is FB i [k].
- the fingerprint of the data stream to be inserted is written into the target unit FB i [k], and the stream record of the data stream to be inserted is written into the record of the corresponding recording bucket RB i Unit RB i [k], so far the insertion process ends.
- step S2 also includes: if there is no free unit in the candidate bucket, inserting the stream record of the data stream to be inserted into the overflow table, and ending the insertion process.
- the solution provided in this embodiment is mainly oriented to application scenarios of high-speed networks and large-scale concurrent streams.
- the data flow table adopts the d-left hash table structure to improve the performance of the flow table.
- the data flow table can support insertion, search, update, and deletion operations of flow records.
- the flow table can support both IPv4 and IPv6.
- the data flow table proposed in this embodiment has high space efficiency and stable update performance.
- the insertion process includes the following steps.
- the address of the candidate bucket is obtained by hashing the data stream identifier to be inserted into the data stream. Through the hash functions h 1 and h 2 , the addresses of the candidate buckets corresponding to the data flow identifier Flow_ID in the two blocks are respectively calculated. For example, the address of the candidate bucket in the first block is i and the address of the candidate bucket in the second block is j.
- the data stream fingerprint corresponding to the data stream identifier is calculated by hashing the data stream identifier.
- the 32-bit flow fingerprint of the data flow identifier Flow_ID is calculated.
- a 32-bit stream fingerprint is ff, where the legal stream fingerprint needs to be greater than 0, and the stream fingerprint of an idle unit is represented by a value of 0.
- the target bucket FB i contains 8 units, where the bottom unit is FB i [0] and the top unit is FB i [7]. Starting from the bottom cell FB i [0] and searching upwards, to find the first free cell (the value of the free cell is 0) as the target unit. For example, the target unit is FB i [k].
- the data flow table logically includes a fingerprint table, a record table, and an overflow table.
- both the fingerprint table and the record table use a d-left hash table.
- the fingerprint table and the record table each include two blocks, each block includes at least two buckets, and each bucket is used to store basic units. The number of basic units in each bucket does not exceed the depth of the bucket.
- the bucket depths of the corresponding buckets of the fingerprint table and the record table are equal. It is understandable that the depth of the bucket refers to the maximum capacity of the bucket.
- the data flow table is logically divided into a fingerprint table, a record table, and an overflow table.
- the two blocks are called the first block and the second block, respectively.
- Each block of the fingerprint table and the record table includes several buckets, and each bucket can store several basic units, and the number of basic units in each bucket does not exceed the depth of the bucket.
- the bucket depths of the buckets of the fingerprint table and the record table are equal to each other, and the basic units of the two correspond to each other one-to-one.
- the units in the fingerprint table and the record table are in one-to-one correspondence, and each table is divided into two blocks, each block includes multiple buckets, and the depth of each bucket is 8.
- the basic unit in the bucket of the fingerprint table is the data stream fingerprint.
- the field content of the data flow identifier of each data flow includes: source IP address, destination IP address, protocol number, source port number, and destination port number.
- the data stream fingerprint to be inserted into the fingerprint table can be obtained, including: inputting the data stream identifier into a cyclic redundancy check algorithm (Cyclic Redundancy Check, CRC) to output the data Stream fingerprints.
- CRC Cyclic Redundancy Check
- the basic unit stored in the bucket in the fingerprint table is the data stream fingerprint.
- the length of each data stream fingerprint is 32 bits.
- Each data stream has a unique data stream identifier.
- the data stream identifier can be represented by a five-tuple formed by the ⁇ source IP address, destination IP address, protocol number, source port number, destination port number> field in the header of the message.
- the data flow fingerprint is obtained by hashing the data flow identification. This application does not limit the specific hash calculation method. For example, a 32-bit cyclic redundancy check algorithm (CRC-32) can be used to calculate the data stream fingerprint, where the input of the calculation process is the data stream identifier, and the output is 32-bit data stream fingerprint.
- CRC-32 cyclic redundancy check algorithm
- the basic unit stored in the bucket in the record table is the data flow record.
- the data flow record can be represented by the ⁇ Flow_ID, State, Meter, Metadata> quadruple, where Flow_ID is the data flow identifier, State is the data flow state, Meter is the flow counter of the data flow, and Metadata is used to characterize the table
- the metadata of the item's specific format is 32 bytes, and the record length of the IPv6 flow table is 64 bytes, as shown in Figure 3.
- the overflow table is implemented by a classic single-function hash table, which handles hash conflicts through open addressing.
- Stored in the overflow table are the data stream records to be inserted that cannot be inserted into the record table.
- the basic unit stored in each bucket of the fingerprint table is a data stream fingerprint.
- the data stream fingerprint is calculated by hashing the data stream identifier.
- the length of each data stream fingerprint is 32 bits.
- the basic unit stored in each bucket of the record table is the data flow record.
- the flow record is represented by a four-tuple of ⁇ Flow_ID, State, Meter, Metadata>, where the data flow identifier (Flow_ID) is the key to look up the table.
- the data stream identifier is usually composed of a 5-tuple of ⁇ source IP, destination IP, protocol number, source port number, and destination port number>.
- the length of each stream record is fixed at 32 bytes, in which the stream identifier occupies 13 bytes, and the other fields occupies 19 bytes.
- the overflow table is a classic single-function hash table, which uses open addressing or linked lists to handle hash conflicts.
- the basic unit stored in the overflow table is the same as the record table, and is also a data flow record.
- the hash functions h 1 and h 2 are used to hash the data stream identifier to obtain the addresses of the candidate buckets newly recorded in the two blocks of the fingerprint table. . Then, from the two candidate buckets, a bucket with more free units is selected as the target bucket. Next, write the stream fingerprint of the data stream to be inserted into the target bucket of the fingerprint table, and write the stream record of the data stream to be inserted into the record bucket of the record table corresponding to the address of the target bucket of the fingerprint table.
- the basic unit in the record table and the basic unit in the fingerprint table have a one-to-one correspondence.
- the physical storage structure of the data flow table in the memory is shown in Figure 2.
- the bucket of the fingerprint table and the bucket of the record table are alternately stored in the physical structure of the memory.
- the cache line (Cache Line) size of mainstream server processors is usually 64 bytes, so the entire data flow table needs to be cache aligned when storing, for example, 64-byte alignment to improve cache reading efficiency .
- one memory read can obtain all the basic units in the target bucket of the fingerprint table and the bottom unit in the corresponding record bucket of the record table (the record shown by the dark grid line in FIG. 2).
- the method for processing a data flow table for high-speed and large-scale concurrent data flow proposed in the present application may further include: placing the most frequently accessed basic unit in each bucket of the record table at the bottom of the bucket.
- the data flow table uses this "bucket replacement" mechanism to realize that the entire target bucket of the fingerprint table and the bottom unit of the corresponding record bucket can be read into the cache at the same time by only reading the memory once, which further improves the cache reading. Take efficiency.
- the implementation details of the in-bucket replacement mechanism will be described in detail below.
- the overflow table is stored separately in memory, and cache alignment can also be performed.
- the data flow table is divided into a fingerprint table, a record table, and an overflow table.
- the fingerprint table and the record table are designed based on the d-left hash table, each containing two blocks, each block contains several buckets, and each bucket stores multiple basic units. There is a one-to-one correspondence between the basic units of the fingerprint table and the record table.
- the basic unit stored in the fingerprint table is the data stream fingerprint
- the basic unit stored in the record table is the data stream record.
- the overflow table is a classic single-function hash table, which is used to store data flow records that cannot be inserted into the record table.
- the data structure design of the data flow table makes full use of cache alignment technology to obtain higher cache utilization.
- an embodiment of the present application also provides an in-bucket replacement mechanism, which is used to place frequently accessed flow records at the bottom of each bucket of the record table, so as to realize that the bucket can be read while reading the flow fingerprint.
- the bottom record is read into the cache at the same time to obtain a higher cache read performance.
- the method for processing the data flow table provided by the embodiments of the present application may also include processes such as searching, updating, and deleting.
- the processing method may further include a search process, and the search process includes the following steps.
- Step S41 Calculate the address of the candidate bucket in the fingerprint table corresponding to the data stream to be found and the data stream fingerprint corresponding to the data stream to be found by calculating according to the data stream identifier of the data stream to be found.
- Step S42 Search for a data flow record corresponding to the data flow identifier of the data flow to be found in the data flow table.
- each basic unit in the candidate bucket of the fingerprint table is determined, and the data stream matching the data stream fingerprint obtained by the calculation is searched from the determined basic units.
- logo If the data stream identifier that matches the calculated data stream fingerprint cannot be found from the candidate buckets of the fingerprint table, then continue to search for the data stream identifier matching the calculated data stream fingerprint from the overflow table .
- the purpose of the search process is to find the corresponding flow record in the data flow table according to the data flow identifier Flow_ID of the data flow to be searched.
- the search process includes the following steps.
- the addresses of the candidate buckets corresponding to the data flow identifier Flow_ID in the two blocks are respectively calculated.
- the address of the bucket corresponding to the data flow identifier Flow_ID in the first block is i
- the address of the bucket corresponding to the data flow identifier Flow_ID in the second block is j.
- the data stream fingerprint is obtained by hash calculation on the data stream identifier.
- the 32-bit flow fingerprint of the data flow identifier Flow_ID is calculated.
- the data stream fingerprint is ff.
- the processing method provided by the embodiment of the present application may also include an update process.
- the purpose of the update process is to find the corresponding data flow record in the data flow table according to the data flow identifier Flow_ID of the data flow to be updated, and to update the State and Meter fields of the data flow record.
- the update process includes the following steps.
- Step S51 Search for a corresponding data flow record in the data flow table according to the data flow identifier of the data flow to be updated.
- Step S52 If the corresponding data stream record is found, the fields in the data stream record are updated, the update success is returned, and the update process ends. For example, if the corresponding data flow record is found, the State and Meter fields of the flow record in the target unit RB i [k] are updated.
- Step S54 If the corresponding data stream record is not found, the update failure is returned, and the update process ends.
- the update process may also include step S53.
- Step S53 Perform in-bucket replacement. If the target unit where the corresponding data flow record is found is not the bottom unit of the bucket where it is located, the meter of the flow record in the target unit is compared with the meter of the bottom unit of the bucket where the target unit is located. If the meter of the flow record in the target unit is greater than the meter of the bottom unit, or the meter of the bottom unit is empty, the content in the flow record of the target unit is exchanged with the content in the bottom unit.
- the comparison target cell RB i [k] Meter flow field of the record and the target cell RB i [k] is located at the bottom of the tub unit Field Meter of the flow record in RB i [0]. If RB i [k].Meter>RB i [0].Meter, or RB i [0] is empty, exchange the contents of RB i [k] and RB i [0], and change FB i [k ] And FB i [0] interchange, namely:
- the processing method may also include a deletion process.
- the purpose of the deletion process is to find the corresponding flow record in the data flow table according to the data flow identifier Flow_ID of the data flow to be deleted, and to clear the corresponding flow record.
- the deletion process includes the following steps.
- the found data stream record is cleared, the deletion is returned to be successful, and the deletion process ends.
- the data stream record in the found target unit for example, RB i [k]
- the deletion process can also include the following steps.
- the first type of exception is the bucket overflow exception.
- the bucket overflow exception means that when a new data stream fingerprint is inserted into the fingerprint table, it is found that both candidate buckets are full, which causes the stream fingerprint to be unable to be inserted into the fingerprint table, and correspondingly, the stream record cannot be inserted into the record table.
- the second type of exception is the flow fingerprint conflict exception.
- the flow fingerprint conflict exception refers to that in the search process, it is found that there are multiple flow fingerprints that are the same as the searched flow fingerprint in the two candidate buckets of the fingerprint table.
- M For the first type of exception, first analyze the probability of the bucket overflow exception.
- M When the data flow table needs to contain N entries, and the record table (or fingerprint table) contains M basic units, in order to ensure a low bucket overflow probability, M should be appropriately greater than N.
- the value of M can be determined according to the required bucket overflow probability.
- Table 1 The distribution probability of the number of records in the bucket and the calculation result of the bucket overflow probability
- the required memory size is about 4.8GB.
- the expected value of the number of records in the overflow table is:
- each data stream fingerprint is 32 bits
- a data stream fingerprint with all 32 bits being 0 indicates that the basic unit is an idle unit. Therefore, when searching the two buckets of the fingerprint table, the probability of a fingerprint conflict is:
- the probability of an abnormal bucket overflow is 2.39e -06 .
- the embodiment of the present application also provides a processing device for a data flow table of a high-speed and large-scale concurrent data flow, including a data flow insertion module and a processing module.
- the data stream insertion module is used to obtain the address of the candidate bucket in the fingerprint table and the data stream fingerprint corresponding to the data stream identifier according to the data stream identifier of the data stream to be inserted, wherein the fingerprint table and the unit in the record table are one One correspondence, the address of the candidate bucket in the fingerprint table is the same as the address of the candidate bucket in the record table.
- the processing module is used to detect whether there is an idle unit in the candidate bucket of the fingerprint table. If there is an idle unit in one of the buckets, the candidate bucket with the idle unit is selected as the target bucket. Search for idle cells in sequence from the bottom unit to the top unit of the target bucket. When an idle unit in the target bucket is searched, the data stream fingerprint of the data stream to be inserted is written into the searched idle unit, and the stream record of the data stream to be inserted is written to the target bucket. The corresponding record bucket.
- the processing module is further configured to: if there are no free units in the two candidate buckets, insert the stream record of the data stream to be inserted into the overflow table, and end the insertion process.
- the data flow table of each data flow includes a fingerprint table, a record table, and an overflow table.
- both the fingerprint table and the record table use a d-left hash table, each of which is two blocks.
- Each block of the fingerprint table and the record table includes at least 2 buckets. Each bucket is used to store basic units. The number of basic units in each bucket does not exceed the depth of the bucket, and the bucket depths of the fingerprint table and the record table are equal to each other.
- the basic unit in the bucket of the fingerprint table is the data stream fingerprint.
- the field content of the data flow identifier of each data flow includes: source IP address, destination IP address, protocol number, source port number, and destination port number.
- the step of obtaining the data stream fingerprint of the fingerprint table further includes: inputting the data stream identifier into a cyclic redundancy check algorithm to output the data stream fingerprint.
- processing module may further include a search module, which is configured to search for a data flow record corresponding to the data flow table in the data flow table according to the data flow identifier of the data flow to be searched.
- a search module which is configured to search for a data flow record corresponding to the data flow table in the data flow table according to the data flow identifier of the data flow to be searched.
- the bucket address is obtained by calculation, each unit in the bucket is determined from the fingerprint table, and a data stream identifier matching the calculated data stream fingerprint is searched from the determined units. If the data stream identifier matching the calculated data stream fingerprint cannot be found from the candidate buckets of the fingerprint table, then the data stream identifier matching the calculated data stream fingerprint is continuously searched from the overflow table.
- the processing module may also include an update module, the update module is used to find the data flow record corresponding to the data flow table in the data flow table according to the data flow identifier of the data flow to be updated, and record the data flow record The State and Meter fields are updated.
- the corresponding data flow record is searched. If the corresponding data stream record is found, the fields in the data stream record are updated, the update success is returned, and the update process ends. For example, if the corresponding data flow record is found, the State and Meter fields of the flow record in the target unit RB i [k] are updated. If the corresponding data stream record is not found, the update failure is returned, and the update process ends.
- processing module may further include a deletion module, the deletion module is configured to find the corresponding data flow record in the data flow table according to the data flow identifier of the data flow to be deleted, and clear the corresponding data flow record. zero.
- the corresponding data flow record is searched. If the corresponding data stream record is found, the found data stream record will be cleared, and the deletion success will be returned, and the deletion process will end. If the corresponding data stream record is not found, the deletion failure is returned, and the deletion process ends.
- Each module in the above-mentioned processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
- the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
- the computer equipment includes a processor, a memory, and a network interface connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the database of the computer equipment is used to store the data flow table.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- the steps of the method or algorithm described in combination with the disclosure of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
- Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read Only Memory, ROM), and erasable programmable read-only memory ( Erasable Programmable ROM (EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
- the storage medium may also be an integral part of the processor.
- the processor and the storage medium may be located in the ASIC.
- the ASIC may be located in the core network interface device.
- the processor and the storage medium may also exist as discrete components in the core network interface device.
- Computer-readable media include computer storage media and communication media, where communication media includes any media that facilitates the transfer of computer programs from one place to another.
- the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例公开了一种用于高速、大规模的并发数据流的数据流表及其处理方法、装置及存储介质。该方法和装置能够避免在高速网络中由于数据流表的操作时间的不稳定,导致增加报文处理时延,甚至丢包的问题。根据待插入数据流的数据流标识,获取指纹表中的候选桶的地址和数据流指纹;检测指纹表的候选桶是否存在空闲单元,若存在空闲单元,则选择存在空闲单元的候选桶作为目标桶;从目标桶的底部单元向顶部单元依次搜索空闲单元。当搜索到目标桶中的空闲单元时,将待插入数据流的数据流指纹写入搜索到的空闲单元,并将待插入数据流的流记录写入与候选桶对应的记录桶。
Description
相关申请
本申请要求于2020年04月17日提交中国专利局、申请号为202010305885.1、发明名称为“一种用于高速大规模并发数据流的数据流表处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及网络通信技术领域,尤其提供一种用于高速、大规模的并发数据流的数据流表及其处理方法、装置、存储介质。
目前的网络通信的技术方案中所使用的数据流表(Data Flow Table),其主要作用是存储网络中每条数据流的状态信息,也是实现报文转发、流量分类、网络入侵检测、网络流量与用户行为的分析、网络地址转换、流量计费等网络功能所需的基本组件之一。数据流表的处理效率对于上述网络功能的实际性能有着非常重要的影响。
当前,随着云计算(Cloud Computing)和网络功能虚拟化(Network Functions Virtualization,NFV)技术的快速发展,基于通用服务器平台并利用软件实现高性能的网络功能成为业界关注的热点之一。随着网络链路的速率的飞速提升,网络中并发数据流的数量日益增大,导致每个报文所允许的处理时间日趋缩短。而目前所采用得通过经典的哈希表方式来实现大规模数据流表的方式,会带来如下两个问题。首先,经典哈希表需要采用较低的负载率来降低冲突概率,导致空间效率低。因此,当数据流表的规模庞大时,存储空间浪费严重。其次,由于经典哈希表采用开放定址或链表的方式来处理哈希冲突,因此最坏条件下的查表时间可能较长,从而造成系统性能波动,在高速的网络环境中容易导致丢包。
因此,在目前的改进方案中,也通过高性能的网络报文的处理软件来采用空间效率更高、查表性能更为稳定的哈希算法。例如,业界广泛采用的开源报文处理软件库Data Plane Development Kit(DPDK),就采用cuckoo哈希表来实现数据流表。虽然cuckoo哈希表具有较高的空间效率和稳定的更新性能,然而,在插入新的流记录时,可能需要移动表中已有的多个记录,从而需要较长的插入时间。尤其是在处理海量的并发数据流的场景中,在高速网络中由于数据流表操作时间的不确定,导致报文处理的时延增大、甚至造成丢包的问题。
发明内容
本申请的各示例性实施例提供一种用于处理高速、大规模的并发数据流的数据流表的方法及装置,其能够避免在高速网络中由于数据流表的操作时间的不确定,从而增加了报文处理的时延,以及由此导致的丢包的问题。
一方面,本申请各实施例提供一种数据流表,其中,所述数据流表包括: 指纹表,用于存储待插入数据流的数据流指纹,采用d-left哈希表,分为两块,每个块包括至少两个桶,所述桶用于存放基本单元;记录表,用于存储所述待插入数据流的数据流记录,采用d-left哈希表,分为两块,每个块包括至少两个桶,所述桶用于存放基本单元,其中所述记录表中的基本单元与所述指纹表中的基本单元一一对应;以及溢出表,用于当所述待插入数据流的数据流记录无法插入所述记录表中时存储所述待插入数据流表。
在一实施例中,所述桶中的所述基本单元的数量不超过所述基本单元所在桶的深度。
在一实施例中,所述指纹表和所述记录表的桶的深度相等。
在一实施例中,所述指纹表的所述基本单元中存储数据流指纹,所述记录表中的所述基本单元中存储数据流记录。
在一实施例中,所述数据流记录为四元组,包括数据流标识、数据流状态、数据流的流量计数器,以及用于表征格式的元数据。
在一实施例中,所述数据流标识为五元组,包括源IP地址、目的IP地址、协议号、源端口号,以及目的端口号。
在一实施例中,所述数据流标识用于输入哈希函数以输出所述数据流指纹。
在一实施例中,所述溢出表为单函数哈希表。
在一实施例中,所述记录表的各个桶中访问频度最高的基本单元位于对应的桶的底部。
另一方面,本申请各实施例提供一种用于处理数据流表的方法,包括:
根据待插入数据流的数据流标识,获取所述数据流表的所述指纹表中的每个块中的候选桶的地址和对应所述数据流标识的数据流指纹,其中,所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同;
检测所述指纹表的所述候选桶是否存在空闲单元,若存在所述空闲单元,则选择具有所述空闲单元的候选桶作为目标桶;
从所述目标桶的底部单元向顶部单元依次搜索所述空闲单元,当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到的空闲单元,并将所述待插入数据流的数据流记录写入所述记录表的与所述目标桶对应的记录桶。
在一实施例中,所述检测所述指纹表的所述候选桶是否存在空闲单元的步骤还包括:若所述候选桶中均不存在所述空闲单元,则将所述待插入数据流的数据流记录插入到所述溢出表,并结束插入过程。
在一实施例中,所述检测所述指纹表的所述候选桶是否存在空闲单元的步骤还包括:若所述候选桶中均存在所述空闲单元且所述候选桶中的所述空闲单元的数量彼此不同,则选择所述候选桶中具有所述空闲单元较多的一个作为目标桶;和/或若所述候选桶中均存在所述空闲单元且所述候选桶中的所述空闲单元的数量相同,则选择两个块中的第一块的候选桶作为目标桶。
在一实施例中,所述根据待插入数据流的数据流标识,获取对应所述数据流标识的数据流指纹的步骤,包括:将所述数据流标识输入循环冗余校验算法,以输出所述数据流指纹。
在一实施例中,在获取所述数据流表的步骤之前,所述方法还包括:对所述数据流表执行缓存对齐。
在一实施例中,所述方法还包括:将所述记录表的各个桶内访问频度最高的基本单元放置在对应桶的底部。
在一实施例中,所述方法还包括:根据待查找数据流的数据流标识计算候选桶的地址和对应所述待查找数据流的数据流指纹;在所述数据流表中搜索与所述待查找数据流的数据流标识对应的数据流记录。
在一实施例中,所述方法还包括:根据用于更新的数据流的数据流标识查找所述数据流表中对应的数据流记录;若查找到对应的数据流记录,则将查找到的所述对应的数据流记录中的字段更新为所述用于更新的数据流中的对应字段。
在一实施例中,所述方法还包括:根据待删除的数据流的数据流标识查找所述数据流表中对应的数据流记录;若查找到对应的数据流记录,则将查找到的所述对应的数据流记录清零。
再一方面,本申请各实施例提供一种数据流表的处理装置,包括:
数据流插入模块,用于根据待插入数据流的数据流标识,获取所述数据流表的所述指纹表中的每个块中的候选桶的地址和对于所述数据流标识的数据流指纹,其中,所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同;
处理模块,用于检测所述指纹表的所述候选桶是否存在空闲单元,若存在所述空闲单元,则选择具有所述空闲单元的候选桶作为目标桶;以及用于从所述目标桶的底部单元向顶部单元依次搜索所述空闲单元,当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到的空闲单元,并将所述待插入数据流的流记录写入所述记录表的与所述目标桶对应的记录桶。
在一实施例中,所述处理模块,还用于包括:若不存在空闲单元,则将所述待插入数据流的流记录插入到溢出表,并结束插入过程。
在一实施例中,每个数据流的数据流表由指纹表、记录表和溢出表组成;在每个数据流表中,指纹表和记录表均采用d-left哈希表,各自都为两个块,指纹表和记录表的每个块均由至少2个桶组成,每个桶用于存放基本单元,每个桶中的基本单元的数量不超过所在桶的深度,指纹表和记录表的桶深相等。
在一实施例中,指纹表的桶中的基本单元为数据流指纹;每条数据流的数据流标识的字段内容包括:源IP地址、目的IP地址、协议号、源端口号和目的端口号;根据待插入数据流的数据流标识,获取指纹表数据流指纹,包括:将所述数据流标识输入循环冗余校验算法,输出所述数据流指纹。
在一实施例中,所述装置还包括:查找模块,用于计算得到待查找数据流的桶的地址和数据流指纹;利用计算得到桶的地址,从所述指纹表中确定桶中的各个单元,并从确定到的各个单元中查找与计算得到的数据流指纹匹配的数据流标识;若从所述指纹表的候选桶中查找不到与计算得到的数据流指纹匹配的数据流标识,则继续从所述溢出表中查找与计算得到的数据流指纹匹配的数 据流标识。
又一方面,本申请各实施例提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述各实施例的处理方法的步骤。
本申请实施例提供的用于高速、大规模的并发数据流的数据流表及其处理方法、装置、存储介质,将数据流表分为指纹表、记录表和溢出表。指纹表和记录表基于d-left哈希表进行设计,使得各自包含两个块,每块包含若干个桶,每个桶存放多个基本单元。指纹表存储的基本单元是数据流指纹,记录表中存储的基本单元是完整的数据流记录。指纹表的基本单元和记录表的基本单元一一对应。溢出表是经典的单函数哈希表,其用于存储无法插入到记录表中的数据流记录。流表的数据结构的设计充分利用缓存对齐技术,以获得较高的缓存利用率。此外,还设计了桶内置换机制,将访问频度高的流记录放置在记录表的各个桶的底部。因此,在读取流指纹的同时,可以将桶底记录同时读入缓存,从而获得较高的性能。本申请不仅具有较高的空间效率,能够支持海量的并发数据流,同时还避免了在高速网络中由于数据流表的操作时间的不确定,导致报文处理的时延增大、甚至造成丢包的问题。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例提供的数据流表的逻辑结构的示意图。
图2为本申请实施例提供的数据流表的物理存放结构的示意图。
图3为本申请实施例提供的IPv6数据流表的逻辑结构的示意图。
图4为本申请实施例提供的用于处理数据流表的方法的插入过程的流程示意图。
图5为本申请实施例提供的用于处理数据流表的方法的查找过程的流程示意图。
图6为本申请实施例提供的用于处理数据流表的方法的更新过程的流程示意图。
图7为本申请实施例提供的用于处理数据流表的方法的删除过程的流程示意图。
图8为本申请实施例提供的计算机设备的内部结构图。
为使本领域技术人员更好地理解本申请的技术方案,下面结合附图和具体实施方式对本申请作进一步详细描述。下文中将详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中相同的标号表示相同或类似的元件或具有相同或类似功能的元件。
下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不旨在对本申请进行限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”也可包括复数形式。
应该进一步理解地是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。
应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或耦接。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语)具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语应该被理解为具有与现有技术的语境中的意义一致的意义。
本申请实施例中所提及的数据流表的作用是存储网络中每条数据流的状态信息,其每条表项实际为一条数据流记录,例如,一条数据流记录可以用<Flow_ID,State,Meter,Metadata>四元组来表示,其中,Flow_ID为数据流标识(通常用五元组表示),State为数据流状态,Meter是流量计数器,Metadata是用于表征表项具体格式的元数据。需要注意的是,本申请所述的四元组或五元组指的是具有多个字段的集合,例如,四元组指的是具有四个字段的集合,五元组指的是具有五个字段的集合。对数据流表的一般操作是:每次接收到一个数据包时,首先对此数据包进行解析,从其首部字段中提取出流标识(Flow_ID)。然后,以流标识为查表关键字,通过查找数据流表,得到该流标识对应的流记录。接着,根据流记录中的元数据(Metadata),对流记录进行解析,得到流状态(State)和流量计数器(Meter)。最后,更新流量计数器,可能也需要更新流状态。
目前在对数据流的处理场景中,也会使用数据流表,比如可以采用传统的cuckoo哈希表来实现数据流表的方案。本申请所提供的各示例性实施例即是对这类应用数据流表处理数据流的场景的改进,包括但不限于对数据流表的结构的改进以及对数据流表的写入、查找、更新和删除等处理过程的改进。本申请的各示例性实施例提供的方案可以应用于大部分数据流表处理场景。不论是在云端、服务端、网络收发端、还是其他网络通信的硬件环境下,只要其符合数据流的通常处理过程,存在处理数据流表的设备和场景条件,都可以通过本申请各示例性实施例的设计思路进行硬件、软件或其结合的改进,不做限定。
本申请实施例提供一种用于处理高速、大规模的并发数据流的数据流表的方法。如图4示,该方法包括以下步骤。
步骤S1、根据待插入数据流的数据流标识,获取指纹表中的候选桶的地址和对应数据流标识的数据流指纹。
其中,如上所述,数据流表包括指纹表(Fingerprints Table)、记录表 (Records Table)和溢出表(Overflow Table)。指纹表和记录表基于d-left哈希表进行设计,使得各自包含两个块(Block),每个块包含若干个桶(Bucket),每个桶存放多个基本单元。指纹表中的基本单元为数据流指纹(Data Flow Footprint),而记录表中的基本单元为数据流记录(Data Flow Record)。所述指纹表中的基本单元与所述记录表中的基本单元一一对应。
需要说明的是,一次完整的插入过程需要依次访问指纹表和记录表,其中可选的待插入的桶称为“候选桶”。因为指纹表和记录表的各个桶是一一对应的,所以所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同,即在确定了指纹表中候选桶的位置后,也就能够确定记录表中对应的候选桶的位置。
在本实施例中数据流的插入过程中需要确定候选桶的地址。具体地,提取待插入数据流的数据流标识。数据流标识可表示为Flow_ID。然后,根据数据流标识Flow_ID分别通过哈希函数h
1和h
2计算指纹表的两个块中候选桶的地址,其中,哈希函数h
1和h
2都指的是计算中所使用的哈希函数。本领域的技术人员应当知晓如何选取哈希函数h
1和h
2。指纹表的两个块中候选桶的地址分别为i和j,其中i为第一块的候选桶的地址,j为第二块的候选桶的地址,其中,
i=h
1(Flow_ID),
j=h
2(Flow_ID)。
由于指纹表和记录表中的单元是一一对应的关系,所以指纹表和记录表中的桶也是一一对应的关系。因此,记录表中两个块的候选桶的地址也对应是i和j。
接着,计算对应数据流标识的数据流指纹。具体地,通过哈希函数(例如32位的循环冗余校验算法哈希函数,CRC-32),根据数据流标识Flow_ID计算32比特的数据流指纹,该数据流指纹表示为ff。合法的数据流指纹ff需大于0。
步骤S2、检测所述指纹表的候选桶是否存在空闲单元,若存在空闲单元,则选择存在空闲单元的候选桶作为目标桶。
其中,当指纹表的基本单元中的数据流指纹等于0时,则指纹表的该基本单元为空闲单元。当指纹表的基本单元中的数据流指纹大于0时,则指纹表的该基本单元为非空单元。确定目标桶的过程包括:查看指纹表的候选桶FB
i和FB
j中的空闲单元的数量。如果候选桶FB
i和FB
j中的基本单元均为非空单元,则候选桶FB
i和FB
j均为已满。这种情况下,将待插入数据流的流记录插入到溢出表,并且结束插入过程。如果候选桶FB
i和FB
j均不满,则选择候选桶FB
i和FB
j中空闲单元较多的一个作为目标桶。如果两者的空闲单元数相等,优先选择第一块中的桶作为目标桶。在本实施例中,目标桶为候选桶FB
i。
需要说明的是,如上述所描述的,合法的数据流指纹需大于0,而空闲单元的数据流指纹用0表示,即空闲单元的数量可以通过表示出为0值的基本单元的数量来计算。因此,具有空闲单元较多的一个桶为具有0值的基本单元的数量较多的桶。但是,在其他实施例中,空闲单元较多的一个桶也可以是指具有0值的基本单元的数量多于一定门限值(比如5个以上的具有0值的基本单元,即表示这个桶中的空闲单元的数量多于5个)的桶。
步骤S3、从所述目标桶的底部单元向顶部单元依次搜索空闲单元,当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到 的空闲单元,并将所述待插入数据流的流记录写入记录表的与所述目标桶对应的记录桶。
可以理解的是,在确定插入位置的过程中,目标桶FB
i可以例如包含8个单元,其中,底部单元为FB
i[0],顶部单元为FB
i[7]。从底部单元FB
i[0]开始向上搜索,以寻找第一个空闲单元(空闲单元的值为0)。在本实施例中,找到的第一个空闲单元(即目标单元)为FB
i[k]。可以理解的是,在写入目标单元的过程中,将待插入的数据流指纹写入目标单元FB
i[k],并将待插入数据流的流记录写入对应的记录桶RB
i的记录单元RB
i[k],至此插入过程结束。
进一步地,步骤S2还包括:若候选桶中不存在空闲单元,则将所述待插入数据流的流记录插入到溢出表,并结束插入过程。
本实施例所提供的方案,主要面向高速网络、大规模的并发流的应用场景。数据流表通过采用d-left哈希表构造来提高流表性能。本实施例中,数据流表可以支持流记录的插入、查找、更新和删除操作。此外,该流表可以同时支持IPv4和IPv6。本实施例提出的数据流表具有较高的空间效率和稳定的更新性能。此外,本申请提出的在向数据流表插入新数据流时,无需移动表中已有记录,具有稳定的插入性能。
举例来说,以在数据流表中插入新的数据流<Flow_ID,State,Meter,Metadata>的过程为例,该插入过程包括以下步骤。
1)通过对待插入数据流的数据流标识进行哈希计算以得到候选桶的地址。通过哈希函数h
1和h
2,分别计算得到数据流标识符Flow_ID在两个块中对应的候选桶的地址。例如,第一块中的候选桶的地址为i和第二块中的候选桶的地址为j。
2)通过对数据流标识进行哈希计算以计算得到对应数据流标识的数据流指纹。通过哈希函数f,计算数据流标识Flow_ID的32比特的流指纹。例如,32比特的流指纹为ff,其中,合法的流指纹需大于0,空闲单元的流指纹用0值表示。
3)根据候选桶中空闲单元的数量确定指纹表中的目标桶。查看指纹表的候选桶FB
i和FB
j中的空闲单元的数量。如果候选桶FB
i和FB
j均已满,即候选桶FB
i和FB
j中均没有空闲单元,则将待插入的流记录插入到溢出表,插入过程结束。如果候选桶FB
i和FB
j均未满,则选择两者的空闲单元的数量较多的一个候选桶作为目标桶。如果两者的空闲单元的数量相等,优先选择第一块中的候选桶FB
i作为目标桶。在一实施例中,选择候选桶FB
i为目标桶。
4)确定待插入数据流的插入位置。例如,目标桶FB
i包含8个单元,其中,底部单元为FB
i[0],顶部单元为FB
i[7]。从底部单元FB
i[0]开始向上搜索,以寻找第一个空闲单元(空闲单元的值为0)作为目标单元。例如,目标单元为FB
i[k]。
5)将待插入数据流的流指纹写入目标单元,将待插入数据流的流记录写入记录表中对应的基本单元。例如,将待插入数据流的数据流指纹写入目标单元FB
i[k],将待插入数据流的流记录写入对应的记录桶RB
i的记录单元RB
i[k],即
FB
i[k]←ff,
RB
i[k]←<Flow_ID,State,Meter,Metadata>,
然后插入过程结束。
在本申请的各实施例中,还提供一种对数据流表的改进方案,其中,数据流表从逻辑结构上包括指纹表、记录表和溢出表。在每个数据流表中,指纹表和记录表均采用d-left哈希表。指纹表和记录表各自都包括两个块,每个块包括至少两个桶,每个桶用于存放基本单元。每个桶中的基本单元的数量不超过所在桶的深度。指纹表和记录表的对应的桶的桶深相等。可以理解的是,桶的深度是指桶的最大容量。
具体地,数据流表从逻辑上分为指纹表、记录表和溢出表。指纹表和记录表均采用d-left哈希表实现,且d=2,即分为两个块。两个块分别称为第一块和第二块。指纹表和记录表的每个块均包括若干个桶,每个桶可以存放若干个基本单元,且每个桶的基本单元的数量不超过桶的深度。指纹表和记录表的桶的桶深彼此相等,且两者的基本单元彼此一一对应。例如,如图1所述,指纹表和记录表中的单元是一一对应的,且每个表分别分为两个块,每块包括多个桶,各个桶的深度均为8。
进一步地,指纹表的桶中的基本单元为数据流指纹。每条数据流的数据流标识的字段内容包括:源IP地址、目的IP地址、协议号、源端口号和目的端口号。根据待插入数据流的数据流标识,可以获取待插入指纹表中的数据流指纹,包括:将所述数据流标识输入循环冗余校验算法(Cyclic Redundancy Check,CRC),以输出所述数据流指纹。
具体地,指纹表中的桶存放的基本单元是数据流指纹。每条数据流指纹的长度为32比特。每条数据流具有唯一的数据流标识。可以以报文首部的<源IP地址,目的IP地址,协议号,源端口号,目的端口号>字段所构成的五元组来表示数据流标识。通过对数据流标识进行哈希计算来获得数据流指纹。本申请对于具体的哈希计算方法没有限定,例如,可以采用32位的循环冗余校验算法(CRC-32)来计算数据流指纹,其中,计算过程的输入是数据流标识,而输出是32位的数据流指纹。
记录表中的桶存放的基本单元是数据流记录。可以用<Flow_ID,State,Meter,Metadata>四元组来表示数据流记录,其中,Flow_ID为数据流标识,State为数据流状态,Meter是该条数据流的流量计数器,Metadata是用于表征表项的具体格式的元数据。IPv4流表的记录的长度为32字节,IPv6流表的记录的长度为64字节,如图3所示。
溢出表采用经典的单函数哈希表来实现,其通过开放定址来处理哈希冲突。溢出表中存放的是无法插入到记录表中的待插入的数据流记录。例如,如图1所示的实施例中,指纹表的各个桶存放的基本单元是数据流指纹。数据流指纹由数据流标识经过哈希计算得到。每条数据流指纹的长度为32比特。记录表的各个桶存放的基本单元是数据流记录。如上文所述,流记录用<Flow_ID,State,Meter,Metadata>四元组表示,其中,数据流标识(Flow_ID)是查表的关键字。数据流标识通常是由<源IP,目的IP,协议号,源端口号,目的端口号>五元组构成。每个流记录的长度固定为32字节,其中流标识占据13字节,其他字段占据19字节。溢出表是经典的单函数哈希表,采用开放定址或者链表来处理哈希冲突。在溢出表中存储的基本单元和记录表一样,也是数据流记录。
当需要在数据流表中插入新数据流时,首先,采用哈希函数h
1和h
2对数据流标识进行哈希计算,分别得到新记录在指纹表的两个块中的候选桶的地址。然后,从两个候选桶中选择空闲单元较多的一个桶作为目标桶。接着,将待插入数据流的流指纹写入指纹表的目标桶,并将待插入数据流的流记录写入记录表的与指纹表的目标桶的地址对应的记录桶。记录表中的基本单元和指纹表中的基本单元是一一对应的。
需要注意的是,在从指纹表的两个候选桶中选择目标桶时,如果两个候选桶都已经满了,则将待插入数据里流的流记录插入到溢出表中。
在实际应用中,数据流表在内存中的物理存放结构如图2所示。指纹表的桶和记录表的桶在内存的物理结构中交替存放。当前,由于主流的服务器处理器的缓存线(Cache Line)的大小通常为64字节,所以整个数据流表在存放时,需要进行缓存对齐,例如,64字节对齐,以提高缓存读取效率。在进行内存对齐后,一次内存读取便可以获得指纹表的目标桶中的全部基本单元,以及记录表的对应的记录桶中的底部单元(图2中深色网格线显示的记录)。
本申请提出的用于处理高速、大规模的并发数据流的数据流表的方法还可以包括:将记录表的各个桶内访问频度最高的基本单元放置在桶的底部。数据流表通过这种“桶内置换”机制,实现只需读取一次内存,就可以同时将指纹表的整个目标桶以及对应的记录桶的底部单元一并读入缓存,进一步提高了缓存读取效率。桶内置换机制的实现细节将在下文详细描述。
本实施例中,溢出表在内存中单独存放,并且同样也可以进行缓存对齐。
本实施例中,将数据流表分为指纹表、记录表和溢出表。指纹表和记录表基于d-left哈希表进行设计,各自包含两个块,每块包含若干个桶,每个桶存放多个基本单元。指纹表和记录表的基本单元一一对应。指纹表中存储的基本单元是数据流指纹,记录表中存储的基本单元是数据流纪录。溢出表是经典的单函数哈希表,其用于存储无法插入到记录表中的数据流记录。数据流表的数据结构设计充分利用缓存对齐技术,获得较高的缓存利用率。进一步地,本申请的一实施例还提供了桶内置换机制,用于将访问频度高的流记录放置在记录表的各个桶的底部,从而实现在读取流指纹的同时,可以将桶底记录同时读入缓存,获得较高的缓存读取性能。本申请不仅具有较高的空间效率,能够支持海量的并发数据流,同时还避免了在高速网络中由于数据流表操作时间的不稳定,导致报文处理时延增大、甚至造成丢包的问题。
基于本实施例中所提供的数据流表,本申请各实施例提供的用于处理数据流表的方法还可以包括查找、更新、删除等过程。
具体地,如图5所示,所述处理方法还可以包括查找过程,该查找过程包括以下步骤。
步骤S41、根据待查找数据流的数据流标识计算得到待查找数据流对应的指纹表中的候选桶的地址和待查找数据流对应的数据流指纹。
步骤S42、在所述数据流表中搜索与待查找数据流的数据流标识对应的数据流记录。
具体地,利用哈希计算得到候选桶的地址后,确定所述指纹表的候选桶中 的各个基本单元,并从确定的各个基本单元中查找与计算所得到的与数据流指纹匹配的数据流标识。若从所述指纹表的候选桶中查找不到与计算所得到的与数据流指纹匹配的数据流标识,则继续从所述溢出表中查找与计算所得到的数据流指纹匹配的数据流标识。
查找过程的目的是根据待查找数据流的数据流标识Flow_ID,在数据流表中查找与其对应的流记录。具体地,查找过程包括以下步骤。
1.1)根据数据流标识计算对应的候选桶的地址。具体地,通过哈希函数h
1和h
2,分别计算得到数据流标识Flow_ID在两个块中对应的候选桶的地址。例如,数据流标识Flow_ID在第一块中对应的桶的地址为i,以及数据流标识Flow_ID在第二块中对应的桶的地址为j。
1.2)通过对数据流标识进行哈希计算计算得到数据流指纹。通过哈希函数f,计算数据流标识Flow_ID的32比特长度的流指纹。例如,数据流指纹为ff。
1.3)搜索与待查找数据流的数据流标识对应的数据流记录。逐个搜索指纹表的两个候选桶FB
i和FB
j的各个基本单元。若一候选桶中的一基本单元等于数据流指纹ff,即FB
x[y](x=i或j,y=0…7)等于ff,则从记录桶的对应单元RB
x[y]中读取数据流标识,记作RB
x[y].flow_id。若RB
x[y].flow_id==flow_id,则查找成功,查找过程结束。若RB
x[y].flow_id不等于flow_id,则继续查看指纹表的候选桶中的下一个单元。重复本步骤,直至两个候选桶的各个单元都搜索完毕。
1.4)如果两个候选桶FB
i和FB
j中的所有单元的内容都不等于ff,则查找溢出表,若找到与数据流标识匹配的数据流记录,则返回查找成功并结束查找过程。
若未找到与数据流标识匹配的数据流记录,则返回查找失败,并结束查找过程。
本申请实施例提供的处理方法还可以包括更新过程。更新过程的目的是根据用于更新的数据流的数据流标识Flow_ID,在数据流表中查找与其对应的数据流记录,并对数据流记录的State和Meter字段进行更新。
具体地,如图6所示,更新过程包括以下步骤。
步骤S51、根据用于更新的数据流的数据流标识,查找所述数据流表中对应的数据流记录。
步骤S52、若查找到对应的数据流记录,则更新数据流记录中的字段,返回更新成功,更新过程结束。例如,若查找到对应的数据流记录,更新目标单元RB
i[k]中的流记录的State和Meter字段。
步骤S54、若未查找到对应的数据流记录,则返回更新失败,更新过程结束。
更新过程还可以包括步骤S53。
步骤S53、执行桶内置换。如果查找到的对应的数据流记录所在的目标单元不是其所在桶的底部单元,则比较目标单元中的流记录的Meter和目标单元所在桶的底部单元的Meter。如果目标单元中的流记录的Meter大于底部单元的Meter,或者底部单元的Meter为空,则将目标单元的流记录中的内容与底部单元中的内容进行交换。具体地,如果目标单元RB
i[k]为非底部单元(即,k>0), 则比较目标单元RB
i[k]中流记录的字段Meter和目标单元RB
i[k]所在桶的底部单元RB
i[0]中的流记录的字段Meter。如果RB
i[k].Meter>RB
i[0].Meter,或者RB
i[0]为空,则将RB
i[k]和RB
i[0]的内容互换,同时将FB
i[k]和FB
i[0]的内容互换,即:
temp←RB
i[0],
RB
i[0]←RB
i[k],
RB
i[k]←temp,
temp←FB
i[0],
FB
i[0]←FB
i[k],
FB
i[k]←temp。
所述处理方法还可以包括删除过程。删除过程的目的是根据待删除数据流的数据流标识Flow_ID,在数据流表中查找与其对应的流记录,并将对应的流记录清零。
具体地,如图7所示,删除过程包括以下步骤。
S61、根据待删除的数据流的数据流标识,查找所述数据流表中对应的数据流记录。
S62、若查找到对应的数据流记录,将查找到的数据流记录清零,返回删除成功,删除过程结束。具体地,将查找到的目标单元(例如,RB
i[k])中的数据流记录清零,即
FB
i[k]←0,
RB
i[k]←<0,0,0,0>。
删除过程还可以包括以下步骤。
S63、若未查找到对应的数据流记录,则返回删除失败,删除过程结束。
应该理解的是,虽然图4-7的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图4-7中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
下面将结合测试数据来说明本申请各实施例的实际效果。
需要注意的是,在数据流表的实际操作过程中,通常存在两类异常。
第一类异常为桶溢出异常。桶溢出异常指的是向指纹表插入新的数据流指纹时,发现两个候选桶都已满,导致流指纹无法插入指纹表,并且相应地流记录也无法插入记录表。
第二类异常为流指纹冲突异常。流指纹冲突异常指的是在查找过程中,发现指纹表的两个候选桶中,存在多个和被查找的流指纹相同的流指纹。
对于第一类异常而言,首先分析桶溢出异常发生的概率。当数据流表需要容纳N条表项,且记录表(或者指纹表)中包含M个基本单元时,则为了保证较低的桶溢出概率,M应当适当大于N。可以根据所要求的桶溢出概率来确定M的 取值。在文献“How Asymmetry Helps Load Balancing[J].Journal of the ACM on Computer,2003,50(4):568-589.,作者B.Vocking”以及文献“The asymptotics of selecting the shortest of two,improved[J].In Analytic Methods in Applied Probability:In Memory of Fridrikh Karpelevich,edited by Y.Suhov,American Mathematical,作者M.Mitzenmacher以及B.Vocking”中,给出了一种通过解微分方程计算d-left哈希表桶溢出概率的方法,这里不再赘述。下面的表1为当d-left哈希表的哈希函数数目d等于2、桶深为8、桶中的平均记录数目(k)分别等于1~7的情况下,桶中的记录数目(x)的分布概率以及桶溢出概率。由于桶的深度为8,所以,桶中的平均记录数目(k)和M、N的关系是:k=8*N/M。
表1桶中记录数目的分布概率及桶溢出概率的计算结果
由表1可见,当桶深为8、桶中的平均记录个数为6时,则桶溢出的概率为3.17e
-6。此溢出概率足够小,从而溢出表中的记录数目和N相比可以忽略不计。此外,数据流表可以由此实现6/8=75%的空间利用率。当数据流表需要容纳一亿条表项,即N=10
8,且桶深为8、桶中的平均记录数目为6时,则所需的内存大小为:
(N*8/6)*(4+32)=4.8*10
9Bytes。
即,所需的内存大小约为4.8GB。此时,溢出表中记录数目的期望值为:
3.17e
-6*N/6=52.8
可见,溢出表中的记录数目与内存大小相比可忽略不计。因此在计算总的内存开销时,基本可以忽略溢出表的影响。
对于第二类异常而言,首先分析数据流指纹冲突异常发生的概率。每条数据流指纹的长度均为32比特,全部32比特均为0的数据流指纹表示该基本单元为空闲单元。因此,在搜索指纹表的两个桶时,发生指纹冲突的概率为:
可见,发生指纹冲突的概率极低。因此在本实施例中,在实际应用中基本可以忽略指纹冲突带来的影响。
对于如图3所示的IPv6流表而言,桶深为16、桶中的平均记录数目为14时,桶溢出异常发生的概率为2.39e
-06。数据流表的空间利用率为14/16=87.5%,比IPv4流表的75%的空间利用率更高。由于桶的深度扩大了一倍,因此,数据流指纹冲突异常发生的概率为IPv4流表的两倍,即7.46*10
-9。但是该概率依然低到可以忽略不计。
本申请实施例还提供一种用于高速、大规模的并发数据流的数据流表的处理装置,包括数据流插入模块和处理模块。
数据流插入模块用于根据待插入数据流的数据流标识,获取指纹表中的候选桶的地址和对应数据流标识的数据流指纹,其中,所述指纹表与所述记录表中的单元一一对应,所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同。
处理模块用于检测所述指纹表的候选桶是否存在空闲单元。若其中一个桶存在空闲单元,则选择存在空闲单元的候选桶作为目标桶。从所述目标桶的底部单元向顶部单元依次搜索空闲单元。当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到的空闲单元,并将所述待插入数据流的流记录写入与所述目标桶对应的记录桶。
所述处理模块还用于:若两个候选桶中均不存在空闲单元,则将所述待插入数据流的流记录插入到溢出表,并结束插入过程。
具体地,每个数据流的数据流表包括指纹表、记录表和溢出表。在每个数据流表中,指纹表和记录表均采用d-left哈希表,各自都为两个块。指纹表和记录表的每个块包括至少2个桶。每个桶用于存储基本单元。每个桶中的基本单元的数量不超过所在桶的深度,且指纹表和记录表的桶深彼此相等。
指纹表的桶中的基本单元为数据流指纹。每条数据流的数据流标识的字段内容包括:源IP地址、目的IP地址、协议号、源端口号和目的端口号。根据待插入数据流的数据流标识,获取指纹表的数据流指纹的步骤还包括:将所述数据流标识输入循环冗余校验算法,以输出所述数据流指纹。
进一步地,所述处理模块还可以包括查找模块,所述查找模块用于根据待查找数据流的数据流标识,在数据流表中查找与其对应的数据流记录。
具体地,利用计算得到桶的地址,从所述指纹表中确定桶中的各个单元,并从确定到的各个单元中查找与计算得到的数据流指纹匹配的数据流标识。若从所述指纹表的候选桶中查找不到与计算得到的数据流指纹匹配的数据流标识,则继续从所述溢出表中查找与计算得到的数据流指纹匹配的数据流标识。
进一步地,所述处理模块还可以包括更新模块,所述更新模块用于根据用于更新的数据流的数据流标识,在数据流表中查找与其对应的数据流记录,并对数据流记录的State和Meter字段进行更新。
具体地,根据数据流标识Flow_ID,查找对应的数据流记录。若查找到对应的数据流记录,则更新数据流记录中的字段,返回更新成功,更新过程结束。例如,若查找到对应的数据流记录,更新目标单元RB
i[k]中的流记录的State 和Meter字段。若未查找到对应的数据流记录,则返回更新失败,更新过程结束。
进一步地,所述处理模块还可以包括删除模块,所述删除模块用于根据待删除数据流的数据流标识,在数据流表中查找与其对应的数据流记录,并将对应的数据流记录清零。
具体地,根据数据流标识Flow_ID,查找对应的数据流记录。若查找到对应的数据流记录,将查找到的数据流记录清零,返回删除成功,删除过程结束。若未查找到对应的数据流记录,则返回删除失败,删除过程结束。
关于处理装置的具体限定可以参见上文中对于处理方法的限定,在此不再赘述。上述处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据流表。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述各实施例的数据流表的处理方法。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。
本领域技术人员应该可以理解,在上述一个或多个实施例中,本申请所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介 质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。
Claims (20)
- 一种数据流表,其中,所述数据流表包括:指纹表,用于存储待插入数据流的数据流指纹,采用d-left哈希表,分为两块,每个块包括至少两个桶,所述桶用于存放基本单元;记录表,用于存储所述待插入数据流的数据流记录,采用d-left哈希表,分为两块,每个块包括至少两个桶,所述桶用于存放基本单元,其中所述记录表中的基本单元与所述指纹表中的基本单元一一对应;以及溢出表,用于当所述待插入数据流的数据流记录无法插入所述记录表中时存储所述待插入数据流表。
- 根据权利要求1所述的数据流表,其中,所述桶中的所述基本单元的数量不超过所述基本单元所在桶的深度。
- 根据权利要求1所述的数据流表,其中,所述指纹表和所述记录表的桶的深度相等。
- 根据权利要求1至3中任意一项所述的数据流表,其中,所述指纹表的所述基本单元中存储数据流指纹,所述记录表中的所述基本单元中存储数据流记录。
- 根据权利要求4所述的数据流表,其中,所述数据流记录为四元组,包括数据流标识、数据流状态、数据流的流量计数器,以及用于表征格式的元数据。
- 根据权利要求5所述的数据流表,其中,所述数据流标识为五元组,包括源IP地址、目的IP地址、协议号、源端口号,以及目的端口号。
- 根据权利要求5所述的数据流表,其中,所述数据流标识用于输入哈希函数以输出所述数据流指纹。
- 根据权利要求1所述的数据流表,其中,所述溢出表为单函数哈希表。
- 根据权利要求1所述的数据流表,其中,所述记录表的各个桶中访问频度最高的基本单元位于对应的桶的底部。
- 一种处理根据权利要求1所述的数据流表的方法,包括:根据待插入数据流的数据流标识,获取所述数据流表的所述指纹表中的每个块中的候选桶的地址和对应所述数据流标识的数据流指纹,其中,所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同;检测所述指纹表的所述候选桶是否存在空闲单元,若存在所述空闲单元,则选择具有所述空闲单元的候选桶作为目标桶;从所述目标桶的底部单元向顶部单元依次搜索所述空闲单元,当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到的空闲单元,并将所述待插入数据流的数据流记录写入所述记录表的与所述目标桶对应的记录桶。
- 根据权利要求10所述的方法,其中,所述检测所述指纹表的所述候选桶是否存在空闲单元的步骤还包括:若所述候选桶中均不存在所述空闲单元,则将所述待插入数据流的数据流 记录插入到所述溢出表,并结束插入过程。
- 根据权利要求10所述的方法,其中,所述检测所述指纹表的所述候选桶是否存在空闲单元的步骤还包括:若所述候选桶中均存在所述空闲单元且所述候选桶中的所述空闲单元的数量彼此不同,则选择所述候选桶中具有所述空闲单元较多的一个作为目标桶;和/或若所述候选桶中均存在所述空闲单元且所述候选桶中的所述空闲单元的数量相同,则选择两个块中的第一块的候选桶作为目标桶。
- 根据权利要求10所述的方法,其中,所述根据待插入数据流的数据流标识,获取对应所述数据流标识的数据流指纹的步骤,包括:将所述数据流标识输入循环冗余校验算法,以输出所述数据流指纹。
- 根据权利要求10所述的方法,其中,在获取所述数据流表的步骤之前,所述方法还包括:对所述数据流表执行缓存对齐。
- 根据权利要求10所述的方法,其中,所述方法还包括:将所述记录表的各个桶内访问频度最高的基本单元放置在对应桶的底部。
- 根据权利要求10所述的方法,其中,所述方法还包括:根据待查找数据流的数据流标识计算候选桶的地址和对应所述待查找数据流的数据流指纹;在所述数据流表中搜索与所述待查找数据流的数据流标识对应的数据流记录。
- 根据权利要求10所述的方法,其中,所述方法还包括:根据用于更新的数据流的数据流标识查找所述数据流表中对应的数据流记录;若查找到对应的数据流记录,则将查找到的所述对应的数据流记录中的字段更新为所述用于更新的数据流中的对应字段。
- 根据权利要求10所述的方法,其中,所述方法还包括:根据待删除的数据流的数据流标识查找所述数据流表中对应的数据流记录;若查找到对应的数据流记录,则将查找到的所述对应的数据流记录清零。
- 一种用于处理权利要求1所述数据流表的装置,包括:数据流插入模块,用于根据待插入数据流的数据流标识,获取所述数据流表的所述指纹表中的每个块中的候选桶的地址和对于所述数据流标识的数据流指纹,其中,所述指纹表中的候选桶的地址与所述记录表中的候选桶的地址相同;处理模块,用于检测所述指纹表的所述候选桶是否存在空闲单元,若存在所述空闲单元,则选择具有所述空闲单元的候选桶作为目标桶;以及用于从所述目标桶的底部单元向顶部单元依次搜索所述空闲单元,当搜索到所述目标桶中的空闲单元时,将所述待插入数据流的数据流指纹写入搜索到的空闲单元,并将所述待插入数据流的数据流记录写入所述记录表的与所述目标桶对应的记录桶。
- 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求10至18中任一项所述的方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/996,378 US20230231808A1 (en) | 2020-04-17 | 2020-10-28 | Data flow table, method and device for processing data flow table, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010305885.1A CN111541617B (zh) | 2020-04-17 | 2020-04-17 | 一种用于高速大规模并发数据流的数据流表处理方法及装置 |
CN202010305885.1 | 2020-04-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021208403A1 true WO2021208403A1 (zh) | 2021-10-21 |
Family
ID=71976853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/124355 WO2021208403A1 (zh) | 2020-04-17 | 2020-10-28 | 数据流表及其处理方法、装置、存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230231808A1 (zh) |
CN (1) | CN111541617B (zh) |
WO (1) | WO2021208403A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111541617B (zh) * | 2020-04-17 | 2021-11-02 | 网络通信与安全紫金山实验室 | 一种用于高速大规模并发数据流的数据流表处理方法及装置 |
CN116991855B (zh) * | 2023-09-27 | 2024-01-12 | 深圳大普微电子股份有限公司 | 哈希表处理方法、装置、设备、介质、控制器及固态硬盘 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130003555A1 (en) * | 2010-03-22 | 2013-01-03 | Freescale Semiconductor, Inc | Token bucket management apparatus and method of managing a token bucket |
CN104753726A (zh) * | 2013-12-25 | 2015-07-01 | 任子行网络技术股份有限公司 | 一种串行数据流的审计控制方法及系统 |
CN109358987A (zh) * | 2018-10-26 | 2019-02-19 | 黄淮学院 | 一种基于两级数据去重的备份集群 |
CN110019250A (zh) * | 2019-03-06 | 2019-07-16 | 清华大学 | 基于哈希函数的网络测量方法和计算机可读存储介质 |
CN111541617A (zh) * | 2020-04-17 | 2020-08-14 | 网络通信与安全紫金山实验室 | 一种用于高速大规模并发数据流的数据流表处理方法及装置 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7325059B2 (en) * | 2003-05-15 | 2008-01-29 | Cisco Technology, Inc. | Bounded index extensible hash-based IPv6 address lookup method |
US8000244B1 (en) * | 2007-08-03 | 2011-08-16 | Hewlett-Packard Development Company, L.P. | Shared rate limiters using floating buckets |
CN101540723B (zh) * | 2009-04-20 | 2011-07-06 | 杭州华三通信技术有限公司 | 一种流表查找方法和装置 |
US8880554B2 (en) * | 2010-12-03 | 2014-11-04 | Futurewei Technologies, Inc. | Method and apparatus for high performance, updatable, and deterministic hash table for network equipment |
CN102200906B (zh) * | 2011-05-25 | 2013-12-25 | 上海理工大学 | 大规模并发数据流处理系统及其处理方法 |
CN102833134A (zh) * | 2012-09-04 | 2012-12-19 | 中国人民解放军理工大学 | 负载自适应的网络数据流流量测量方法 |
US10218647B2 (en) * | 2015-12-07 | 2019-02-26 | Intel Corporation | Mechanism to support multiple-writer/multiple-reader concurrency for software flow/packet classification on general purpose multi-core systems |
US10198291B2 (en) * | 2017-03-07 | 2019-02-05 | International Business Machines Corporation | Runtime piggybacking of concurrent jobs in task-parallel machine learning programs |
US11310158B2 (en) * | 2017-12-08 | 2022-04-19 | Corsa Technology Inc. | Packet classification using fingerprint hash table |
CN108337172B (zh) * | 2018-01-30 | 2020-09-29 | 长沙理工大学 | 大规模OpenFlow流表加速查找方法 |
CN110768856B (zh) * | 2018-07-27 | 2022-01-14 | 华为技术有限公司 | 网络流测量的方法、网络测量设备以及控制面设备 |
CN110808910B (zh) * | 2019-10-29 | 2021-09-21 | 长沙理工大学 | 一种支持QoS的OpenFlow流表节能存储架构及其方法 |
-
2020
- 2020-04-17 CN CN202010305885.1A patent/CN111541617B/zh active Active
- 2020-10-28 US US17/996,378 patent/US20230231808A1/en active Pending
- 2020-10-28 WO PCT/CN2020/124355 patent/WO2021208403A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130003555A1 (en) * | 2010-03-22 | 2013-01-03 | Freescale Semiconductor, Inc | Token bucket management apparatus and method of managing a token bucket |
CN104753726A (zh) * | 2013-12-25 | 2015-07-01 | 任子行网络技术股份有限公司 | 一种串行数据流的审计控制方法及系统 |
CN109358987A (zh) * | 2018-10-26 | 2019-02-19 | 黄淮学院 | 一种基于两级数据去重的备份集群 |
CN110019250A (zh) * | 2019-03-06 | 2019-07-16 | 清华大学 | 基于哈希函数的网络测量方法和计算机可读存储介质 |
CN111541617A (zh) * | 2020-04-17 | 2020-08-14 | 网络通信与安全紫金山实验室 | 一种用于高速大规模并发数据流的数据流表处理方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN111541617A (zh) | 2020-08-14 |
US20230231808A1 (en) | 2023-07-20 |
CN111541617B (zh) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8924687B1 (en) | Scalable hash tables | |
US8255398B2 (en) | Compression of sorted value indexes using common prefixes | |
WO2018099107A1 (zh) | 一种哈希表管理的方法和装置、计算机存储介质 | |
WO2022143540A1 (zh) | 区块链索引的存储方法、装置、计算机设备及介质 | |
CN109446362B (zh) | 基于外存的图数据库结构、图数据存储方法、装置 | |
US9871727B2 (en) | Routing lookup method and device and method for constructing B-tree structure | |
CN106874348B (zh) | 文件存储和索引方法、装置及读取文件的方法 | |
US20130080485A1 (en) | Quick filename lookup using name hash | |
US11269956B2 (en) | Systems and methods of managing an index | |
US9244857B2 (en) | Systems and methods for implementing low-latency lookup circuits using multiple hash functions | |
WO2022048284A1 (zh) | 一种基因对比的哈希查表方法、装置、设备及存储介质 | |
CN112667636B (zh) | 索引建立方法、装置及存储介质 | |
WO2021208403A1 (zh) | 数据流表及其处理方法、装置、存储介质 | |
US11461239B2 (en) | Method and apparatus for buffering data blocks, computer device, and computer-readable storage medium | |
WO2020207248A1 (zh) | 一种流分类方法及装置 | |
WO2013075306A1 (zh) | 数据访问方法和装置 | |
WO2020082597A1 (zh) | 一种b+树节点的批量插入和删除方法及装置 | |
EP2972911B1 (en) | Apparatus and methods for a distributed memory system including memory nodes | |
US20200133787A1 (en) | Method, electronic device and computer readable medium of file management | |
CN113867627A (zh) | 一种存储系统性能优化方法及系统 | |
JP2023534123A (ja) | パケットマッチング方法、装置、ネットワークデバイスおよび媒体 | |
WO2024119797A1 (zh) | 一种数据处理方法、系统、设备以及存储介质 | |
CN117435912A (zh) | 基于网络数据包属性值长短特征的数据包索引与检索方法 | |
CN113392039B (zh) | 一种数据存储、查找方法及装置 | |
CN114398373A (zh) | 应用于数据库存储的文件数据存储读取方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20931188 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931188 Country of ref document: EP Kind code of ref document: A1 |