CN107248939B

CN107248939B - Network flow high-speed correlation method based on hash memory

Info

Publication number: CN107248939B
Application number: CN201710384744.1A
Authority: CN
Inventors: 王海; 董超; 牛大伟; 于卫波; 米志超; 郭晓; 李艾静
Original assignee: PLA University of Science and Technology
Current assignee: PLA University of Science and Technology
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2020-07-31
Anticipated expiration: 2037-05-26
Also published as: CN107248939A

Abstract

The invention relates to a network flow high-speed correlation method based on a hash memory, which comprises the following steps: taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area; storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area; when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area. The invention has the characteristics of high correlation speed of the network flow of the multiple network nodes, upper limit guarantee of the correlation time delay, representative correlation data flow and the like, and has good popularization and application prospects.

Description

Network flow high-speed correlation method based on hash memory

Technical Field

The invention belongs to the field of network communication, and particularly relates to a high-speed network flow association method based on a hash memory.

Background

Data acquisition and analysis of a high-speed network are one of important means and methods for analyzing data transmission performance, diagnosing network faults and judging packet transmission service quality. Because packets are transmitted through multiple nodes in a network, an important way to analyze the transmission performance of the network is to perform a correlation analysis on the same data stream passing through the multiple nodes, such as observing when packets are received from node a, sent out, received by node B, received and sent out if packets are received, and whether packets are received, received by node C, and so on. The information of network transmission delay, packet loss rate, path distribution, path change and the like can be known by analyzing the receiving and sending logs of the same stream at a plurality of nodes. However, today, the storage and processing capabilities of each node are greatly enhanced as network transmission bandwidths become higher and higher. For example, each node may store 1T of data, and if a packet is taken, it takes a very long time to retrieve the packet in another node or nodes. Especially, when the number of network nodes is large and the data storage capacity is very large, the correlation retrieval of the same network flow on different nodes is basically 'turtle speed', and the requirements of users on knowing and observing the time delay of the network flow and the change condition of the packet loss rate in real time are far from being met. Processing of such massive amounts of data at different nodes typically requires a long time to be processed offline, or can be done when the amount of data is very small.

In addition, during the existing data acquisition and analysis, one packet is often captured randomly for association analysis, and as obvious elephant flows (meaning flows with very long duration and very many packets) and mouse flows (meaning short flows with very short duration and very few packets) exist in the network, the captured packets are often elephant flow packets, while short flow performance is frequently ignored, and the statistical delay and loss rate results are lack of representativeness. And cannot represent a general performance index of the network.

The invention patent CN104396216A discloses a method, non-transitory computer-readable medium and apparatus for identifying network traffic characteristics to associate and manage one or more subsequent flows, comprising sending a monitoring request comprising a timestamp and one or more attributes extracted from an HTTP request received from a client computing device to a monitoring server to associate one or more subsequent flows related to the HTTP request; after receiving a confirmation response to the monitoring request from the monitoring server, the HTTP request is sent to an application server; receiving an HTTP response to the HTTP request from the application server; performing an operation with respect to the HTTP response. But this management and association of a particular flow is accomplished with a particular monitoring server.

Disclosure of Invention

The invention aims to provide a hash memory-based network flow high-speed association method, which solves the problems of performing high-speed and real-time association search on a plurality of nodes aiming at the same flow in a high-bandwidth multi-node network and selecting representative sampling grouping when the transmission performance such as time delay, packet loss rate and the like among network nodes is evaluated.

The technical scheme for realizing the purpose of the invention is as follows: a high-speed network flow association method based on a hash memory comprises the following steps:

1) taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area;

2) storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area;

3) when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area.

Compared with the prior art, the invention has the following remarkable advantages: (1) the grouping association speed is high, and particularly when multiple nodes and mass data are associated, the association time is exponentially reduced compared with that of the traditional association search method; (2) the stream high-speed association method of the invention is realized locally at a network node (router) without a server; (3) since the size of the storage area is fixed, the upper limit of the search time associated with a particular packet on a node is determined, which facilitates the implementation of software and hardware.

Drawings

Fig. 1 is a schematic diagram of the relationship between the system HASH information table and the HASH memory logical page.

Fig. 2 is a schematic diagram of the association analysis of the network flow of the present invention.

Detailed Description

With reference to fig. 1 and fig. 2, a method for high-speed association of a network flow based on a hash memory according to the present invention includes the following steps:

2) the packets conforming to the characteristic fields are stored in the address range indicated by the high-order address in sequence, and the stream information corresponding to the packets is recorded at the tail of the storage area;

Furthermore, the stream information corresponding to the packet includes a feature field, an address pointer, a next quintuple pointer, a corresponding read-write pointer, a last write time, and a number of bytes.

Further, the characteristic field of the IP packet includes 5-tuple of IP source address, destination address, source port, destination port, transport layer protocol.

Further, the characteristic field of the IP packet includes 5-tuple of IP source address, destination address, source port, destination port, packet type TOS.

Further, the characteristic field of the IP packet includes 4-tuple of source IP address, destination IP address, source port and destination port.

Further, the characteristic fields of the IP packet include the 7-tuple of source IP address, destination IP address, source port, destination port, transport layer protocol, packet type TOS, and interface index.

Further, the IP packet is an IPv6 packet, and its characteristic fields include 3-tuple of IP source address, destination address, and flow ID.

The invention realizes the classified storage of different network flows by utilizing the Hash memory, divides the network flows into a plurality of different logical pages according to the size of stored data, and then quickly associates a specific packet flow by looking up a table when needed. The Hash table is only needed to locate a specific logical page to find the associated packet on other nodes. The page can then be searched for a group to be associated with. Since the page size is fixed, the upper limit of the search time associated with a particular packet on a node is determined.

The flow splitting pattern of the current main flow is a 5-tuple, i.e. the source address, destination address, source port, destination port and transport layer protocol field of the IP packet, the invention is also applicable to other flow splitting patterns. Taking the 5-tuple of ipv4 as an example, when a packet arrives at the router, the router may take the 5-tuple of a packet using hardware or software, and input the 5-tuple as input into a HASH table. The Hash table input is a 5-tuple and the output can be set according to the node store size, e.g., 32 bits. The HASH table may be selected from mainstream HASH functions such as the murmurhash () function.

For each 5-tuple, the murmurmurhash () function would produce a 32-bit HASH value. About 4G. If the storage space is limited, the folding method can be adopted to shorten the number of bits. For example, from the high position to the 20 position. This will result in 2²⁰And (c) a HASH value. Of course, decreasing HASH value space increases the probability of 5-tuple collisions. For each HASH value, one memory page is opened, depending on the router memory size. Thus the total HASH memory is 64 gbytes. In addition, the system reserves a few more memory pages for 5-tuple data storage of the same HASH value. If the memory is larger, a larger page may be set, or the HASH value space may be increased.

A new packet's 5-tuple will generate a HASH value, which is used as the upper address to define a logical page. And each logical page creates stream information in a memory, records 5-tuple information and a read-write pointer corresponding to the HASH address and finally modifies the date. As shown in table 1.

TABLE 1

If multiple 5-tuples are mapped to the HASH table at the same time, the system applies for a spare page beyond the 20 bit address of HASH, and then adds a piece of new 5-tuple information in the table in the form of a linked list. And recording the number of bytes of the packets stored by the 5-tuple index. The storage area of each packet storage area is page size-32 bytes, the last 32 bytes are used for storing the corresponding flow table pointer, and a ring buffer is adopted, namely when the logical page is written into the memory area with more than page bytes, the content is covered from the page head. The total packet byte number remains the same at the maximum. The read and write pointers are modified accordingly. All nodes do so for all new incoming streams. Since the HASH memory is mainly used for packet post-processing, especially network stream processing, the excessive elephant stream does not need to be saved in its entirety, and only part of the stream needs to be saved to analyze the network performance.

When the network flow is selected to be associated, the system finds all the established flow tables from the memory of the starting node, and for each flow, if a plurality of 5-tuple elements are mapped to 1 HASH value, the 5-tuple element with the maximum number of written storage bytes is found from a plurality of linked lists with the same HASH value. Any one or more groupings are then selected on the page corresponding to the 5-tuple. Then, on one or more nodes in the following, each node only needs to use 5-tuple to do HASH operation, and can quickly locate the logic page of the query group and then search the group in the logic page. Because of the limited page size, the packet can be located very quickly.

The present invention will be described in detail with reference to specific examples.

Examples

With reference to fig. 1, a method for high-speed association of network streams based on a hash memory includes the following steps:

the first step is as follows: it is assumed that all network nodes are configured with HASH memory. Assuming 20 bits for the HASH memory high address line and 16 bits for the low address line, each logical page can store 65504 bytes (32 bytes for pointing to the HASH flow table and other uses). The total storage capacity of the system is 64 gigabytes. In addition, in order to store 5-tuple data of the same HASH index, 2G is additionally added to 64G (32K extra pages can be stored, and a HASH table with more output bits should be considered if the HASH repetition rate is too high). HASH memory address line total space 66G, total address length 37 bits (partial upper address space is unused). With the 20 bits from the second highest bit down, except the most significant bit, being the HASH index bit and the 16 lower bits being the logical page internal addressing.

The second step is that: when a packet comes from a network node, its 5-tuple is extracted and the 5-tuple is input into the HASH table, resulting in a 20-bit index value.

The third step: finding a storage area corresponding to the index according to the hash table index and the low-order address, inquiring whether a flow table pointer exists at the tail of the storage area, if not, representing that the flow table pointer is a new flow, creating a new flow table in the flow table area in the memory, and writing the pointer into the tail of the storage area; and if the flow table exists, jumping to the fifth step.

The fourth step: if the flow table is a new flow table, recording a corresponding five-tuple, initializing an address pointer, a next five-tuple pointer and a corresponding read-write pointer, and writing the packet into a memory area where the write pointer of the memory area starts. Updating the number of bytes written in the data and the writing time. And jumping to the sixth step.

The fifth step: the old flow table checks whether the 5-tuple matches the present 5-tuple. If the data is matched with the data, the data is written according to the write pointer, the number of written bytes and the write time are updated, if the data is not matched with the data, the data conflicts with the old 5-tuple index, a new storage area (taken from the 2G space with the highest bit of 1) is applied at the moment, a new flow table is created, related contents are initialized, and data packets are written into the new storage area.

And a sixth step: and waiting for the arrival of the next packet, and storing the backup flow table in any other area of the node according to the requirement.

With reference to fig. 2, when performing network flow packet association analysis on each node of the network, each network node works according to the following procedures:

the first step is as follows: the system determines the sequence of the starting point node and the node passing through the middle of the network flow packet association, and sets the starting point of the association analysis as A, the end point as C and the node passing through the middle as B;

the second step is that: starting from point a, node a retrieves the flow with data in its flow table area. If the 5-tuple in the flow table has the next table entry, the next table entry is searched one by one to find the 5-tuple with the maximum byte number. If the 5-tuple has no next table entry, the current 5-tuple is taken as an index, a corresponding logical page is found, and 1 or more groups are randomly selected from the logical page. Recording the Sequence-id number of the file;

the third step: the point A sends the packet stream information to be associated to the point B and the point C, the point B carries out HASH calculation by utilizing the 5-tuple information sent by the point A, locates a specific logic page, and then searches for the packet adopting the current sequence-id in the logic page. For the point C, repeating the third step until the current sequence-id group is found, or searching the whole page and not finding the current sequence-id group;

the fourth step: determining A, B, C packet transmission delay and packet loss rate according to the time of receiving the storage packet;

the fifth step: and analyzing whether the counted packet flow number meets the requirement, if not, selecting the next packet flow, and returning to the second step.

It should be noted that the page size should be selected appropriately, and in the packet retrieval period T, no existing packet should arrive at a node, but the number of flow packets at the node is too large, which causes the packet coverage phenomenon. The logical page should be enlarged at this time. The packet retrieval period T is set as needed, and may be set to within 10 minutes, for example. Thus, in the case of a backbone network node, memory space may need to be calculated in T bytes.

Fig. 2 illustrates a schematic diagram of association analysis of network flows. The network flow respectively carries out HASH processing on corresponding packets according to the nodes through which the network flow flows, so that the corresponding packets are searched in a logical page, and the upper limit of searching time delay is ensured, wherein the upper limit is the retrieval time of the whole logical page.

Claims

1. A high-speed network flow association method based on a hash memory is characterized by comprising the following steps:

2) storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area; the stream information corresponding to the packet comprises a characteristic field, an address pointer, a next quintuple pointer, a corresponding read-write pointer, the last write-in time and the number of bytes;

2. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic fields of the IP packet comprise 5-tuple of IP source address, destination address, source port, destination port, transport layer protocol.

3. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic field of the IP packet comprises 5-tuple of IP source address, destination address, source port, destination port, packet type TOS.

4. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic field of the IP packet comprises 4-tuple of source IP address, destination IP address, source port and destination port.

5. The hash-memory-based network flow high-speed correlation method of claim 1, wherein the characteristic field of the IP packet comprises 7 tuples of source IP address, destination IP address, source port, destination port, transport layer protocol, packet type TOS and interface index.

6. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the IP packet is an IPv6 packet, and its characteristic fields include 3-tuple of IP source address, destination address, and flow ID.