CN107248939B - Network flow high-speed correlation method based on hash memory - Google Patents

Network flow high-speed correlation method based on hash memory Download PDF

Info

Publication number
CN107248939B
CN107248939B CN201710384744.1A CN201710384744A CN107248939B CN 107248939 B CN107248939 B CN 107248939B CN 201710384744 A CN201710384744 A CN 201710384744A CN 107248939 B CN107248939 B CN 107248939B
Authority
CN
China
Prior art keywords
packet
hash
address
network flow
storage area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710384744.1A
Other languages
Chinese (zh)
Other versions
CN107248939A (en
Inventor
王海
董超
牛大伟
于卫波
米志超
郭晓
李艾静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and Technology filed Critical PLA University of Science and Technology
Priority to CN201710384744.1A priority Critical patent/CN107248939B/en
Publication of CN107248939A publication Critical patent/CN107248939A/en
Application granted granted Critical
Publication of CN107248939B publication Critical patent/CN107248939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/255Maintenance or indexing of mapping tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/564Enhancement of application control based on intercepted application data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a network flow high-speed correlation method based on a hash memory, which comprises the following steps: taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area; storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area; when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area. The invention has the characteristics of high correlation speed of the network flow of the multiple network nodes, upper limit guarantee of the correlation time delay, representative correlation data flow and the like, and has good popularization and application prospects.

Description

Network flow high-speed correlation method based on hash memory
Technical Field
The invention belongs to the field of network communication, and particularly relates to a high-speed network flow association method based on a hash memory.
Background
Data acquisition and analysis of a high-speed network are one of important means and methods for analyzing data transmission performance, diagnosing network faults and judging packet transmission service quality. Because packets are transmitted through multiple nodes in a network, an important way to analyze the transmission performance of the network is to perform a correlation analysis on the same data stream passing through the multiple nodes, such as observing when packets are received from node a, sent out, received by node B, received and sent out if packets are received, and whether packets are received, received by node C, and so on. The information of network transmission delay, packet loss rate, path distribution, path change and the like can be known by analyzing the receiving and sending logs of the same stream at a plurality of nodes. However, today, the storage and processing capabilities of each node are greatly enhanced as network transmission bandwidths become higher and higher. For example, each node may store 1T of data, and if a packet is taken, it takes a very long time to retrieve the packet in another node or nodes. Especially, when the number of network nodes is large and the data storage capacity is very large, the correlation retrieval of the same network flow on different nodes is basically 'turtle speed', and the requirements of users on knowing and observing the time delay of the network flow and the change condition of the packet loss rate in real time are far from being met. Processing of such massive amounts of data at different nodes typically requires a long time to be processed offline, or can be done when the amount of data is very small.
In addition, during the existing data acquisition and analysis, one packet is often captured randomly for association analysis, and as obvious elephant flows (meaning flows with very long duration and very many packets) and mouse flows (meaning short flows with very short duration and very few packets) exist in the network, the captured packets are often elephant flow packets, while short flow performance is frequently ignored, and the statistical delay and loss rate results are lack of representativeness. And cannot represent a general performance index of the network.
The invention patent CN104396216A discloses a method, non-transitory computer-readable medium and apparatus for identifying network traffic characteristics to associate and manage one or more subsequent flows, comprising sending a monitoring request comprising a timestamp and one or more attributes extracted from an HTTP request received from a client computing device to a monitoring server to associate one or more subsequent flows related to the HTTP request; after receiving a confirmation response to the monitoring request from the monitoring server, the HTTP request is sent to an application server; receiving an HTTP response to the HTTP request from the application server; performing an operation with respect to the HTTP response. But this management and association of a particular flow is accomplished with a particular monitoring server.
Disclosure of Invention
The invention aims to provide a hash memory-based network flow high-speed association method, which solves the problems of performing high-speed and real-time association search on a plurality of nodes aiming at the same flow in a high-bandwidth multi-node network and selecting representative sampling grouping when the transmission performance such as time delay, packet loss rate and the like among network nodes is evaluated.
The technical scheme for realizing the purpose of the invention is as follows: a high-speed network flow association method based on a hash memory comprises the following steps:
1) taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area;
2) storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area;
3) when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area.
Compared with the prior art, the invention has the following remarkable advantages: (1) the grouping association speed is high, and particularly when multiple nodes and mass data are associated, the association time is exponentially reduced compared with that of the traditional association search method; (2) the stream high-speed association method of the invention is realized locally at a network node (router) without a server; (3) since the size of the storage area is fixed, the upper limit of the search time associated with a particular packet on a node is determined, which facilitates the implementation of software and hardware.
Drawings
Fig. 1 is a schematic diagram of the relationship between the system HASH information table and the HASH memory logical page.
Fig. 2 is a schematic diagram of the association analysis of the network flow of the present invention.
Detailed Description
With reference to fig. 1 and fig. 2, a method for high-speed association of a network flow based on a hash memory according to the present invention includes the following steps:
1) taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area;
2) the packets conforming to the characteristic fields are stored in the address range indicated by the high-order address in sequence, and the stream information corresponding to the packets is recorded at the tail of the storage area;
3) when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area.
Furthermore, the stream information corresponding to the packet includes a feature field, an address pointer, a next quintuple pointer, a corresponding read-write pointer, a last write time, and a number of bytes.
Further, the characteristic field of the IP packet includes 5-tuple of IP source address, destination address, source port, destination port, transport layer protocol.
Further, the characteristic field of the IP packet includes 5-tuple of IP source address, destination address, source port, destination port, packet type TOS.
Further, the characteristic field of the IP packet includes 4-tuple of source IP address, destination IP address, source port and destination port.
Further, the characteristic fields of the IP packet include the 7-tuple of source IP address, destination IP address, source port, destination port, transport layer protocol, packet type TOS, and interface index.
Further, the IP packet is an IPv6 packet, and its characteristic fields include 3-tuple of IP source address, destination address, and flow ID.
The invention realizes the classified storage of different network flows by utilizing the Hash memory, divides the network flows into a plurality of different logical pages according to the size of stored data, and then quickly associates a specific packet flow by looking up a table when needed. The Hash table is only needed to locate a specific logical page to find the associated packet on other nodes. The page can then be searched for a group to be associated with. Since the page size is fixed, the upper limit of the search time associated with a particular packet on a node is determined.
The flow splitting pattern of the current main flow is a 5-tuple, i.e. the source address, destination address, source port, destination port and transport layer protocol field of the IP packet, the invention is also applicable to other flow splitting patterns. Taking the 5-tuple of ipv4 as an example, when a packet arrives at the router, the router may take the 5-tuple of a packet using hardware or software, and input the 5-tuple as input into a HASH table. The Hash table input is a 5-tuple and the output can be set according to the node store size, e.g., 32 bits. The HASH table may be selected from mainstream HASH functions such as the murmurhash () function.
For each 5-tuple, the murmurmurhash () function would produce a 32-bit HASH value. About 4G. If the storage space is limited, the folding method can be adopted to shorten the number of bits. For example, from the high position to the 20 position. This will result in 220And (c) a HASH value. Of course, decreasing HASH value space increases the probability of 5-tuple collisions. For each HASH value, one memory page is opened, depending on the router memory size. Thus the total HASH memory is 64 gbytes. In addition, the system reserves a few more memory pages for 5-tuple data storage of the same HASH value. If the memory is larger, a larger page may be set, or the HASH value space may be increased.
A new packet's 5-tuple will generate a HASH value, which is used as the upper address to define a logical page. And each logical page creates stream information in a memory, records 5-tuple information and a read-write pointer corresponding to the HASH address and finally modifies the date. As shown in table 1.
TABLE 1
Figure BDA0001306053000000041
If multiple 5-tuples are mapped to the HASH table at the same time, the system applies for a spare page beyond the 20 bit address of HASH, and then adds a piece of new 5-tuple information in the table in the form of a linked list. And recording the number of bytes of the packets stored by the 5-tuple index. The storage area of each packet storage area is page size-32 bytes, the last 32 bytes are used for storing the corresponding flow table pointer, and a ring buffer is adopted, namely when the logical page is written into the memory area with more than page bytes, the content is covered from the page head. The total packet byte number remains the same at the maximum. The read and write pointers are modified accordingly. All nodes do so for all new incoming streams. Since the HASH memory is mainly used for packet post-processing, especially network stream processing, the excessive elephant stream does not need to be saved in its entirety, and only part of the stream needs to be saved to analyze the network performance.
When the network flow is selected to be associated, the system finds all the established flow tables from the memory of the starting node, and for each flow, if a plurality of 5-tuple elements are mapped to 1 HASH value, the 5-tuple element with the maximum number of written storage bytes is found from a plurality of linked lists with the same HASH value. Any one or more groupings are then selected on the page corresponding to the 5-tuple. Then, on one or more nodes in the following, each node only needs to use 5-tuple to do HASH operation, and can quickly locate the logic page of the query group and then search the group in the logic page. Because of the limited page size, the packet can be located very quickly.
The present invention will be described in detail with reference to specific examples.
Examples
With reference to fig. 1, a method for high-speed association of network streams based on a hash memory includes the following steps:
the first step is as follows: it is assumed that all network nodes are configured with HASH memory. Assuming 20 bits for the HASH memory high address line and 16 bits for the low address line, each logical page can store 65504 bytes (32 bytes for pointing to the HASH flow table and other uses). The total storage capacity of the system is 64 gigabytes. In addition, in order to store 5-tuple data of the same HASH index, 2G is additionally added to 64G (32K extra pages can be stored, and a HASH table with more output bits should be considered if the HASH repetition rate is too high). HASH memory address line total space 66G, total address length 37 bits (partial upper address space is unused). With the 20 bits from the second highest bit down, except the most significant bit, being the HASH index bit and the 16 lower bits being the logical page internal addressing.
The second step is that: when a packet comes from a network node, its 5-tuple is extracted and the 5-tuple is input into the HASH table, resulting in a 20-bit index value.
The third step: finding a storage area corresponding to the index according to the hash table index and the low-order address, inquiring whether a flow table pointer exists at the tail of the storage area, if not, representing that the flow table pointer is a new flow, creating a new flow table in the flow table area in the memory, and writing the pointer into the tail of the storage area; and if the flow table exists, jumping to the fifth step.
The fourth step: if the flow table is a new flow table, recording a corresponding five-tuple, initializing an address pointer, a next five-tuple pointer and a corresponding read-write pointer, and writing the packet into a memory area where the write pointer of the memory area starts. Updating the number of bytes written in the data and the writing time. And jumping to the sixth step.
The fifth step: the old flow table checks whether the 5-tuple matches the present 5-tuple. If the data is matched with the data, the data is written according to the write pointer, the number of written bytes and the write time are updated, if the data is not matched with the data, the data conflicts with the old 5-tuple index, a new storage area (taken from the 2G space with the highest bit of 1) is applied at the moment, a new flow table is created, related contents are initialized, and data packets are written into the new storage area.
And a sixth step: and waiting for the arrival of the next packet, and storing the backup flow table in any other area of the node according to the requirement.
With reference to fig. 2, when performing network flow packet association analysis on each node of the network, each network node works according to the following procedures:
the first step is as follows: the system determines the sequence of the starting point node and the node passing through the middle of the network flow packet association, and sets the starting point of the association analysis as A, the end point as C and the node passing through the middle as B;
the second step is that: starting from point a, node a retrieves the flow with data in its flow table area. If the 5-tuple in the flow table has the next table entry, the next table entry is searched one by one to find the 5-tuple with the maximum byte number. If the 5-tuple has no next table entry, the current 5-tuple is taken as an index, a corresponding logical page is found, and 1 or more groups are randomly selected from the logical page. Recording the Sequence-id number of the file;
the third step: the point A sends the packet stream information to be associated to the point B and the point C, the point B carries out HASH calculation by utilizing the 5-tuple information sent by the point A, locates a specific logic page, and then searches for the packet adopting the current sequence-id in the logic page. For the point C, repeating the third step until the current sequence-id group is found, or searching the whole page and not finding the current sequence-id group;
the fourth step: determining A, B, C packet transmission delay and packet loss rate according to the time of receiving the storage packet;
the fifth step: and analyzing whether the counted packet flow number meets the requirement, if not, selecting the next packet flow, and returning to the second step.
It should be noted that the page size should be selected appropriately, and in the packet retrieval period T, no existing packet should arrive at a node, but the number of flow packets at the node is too large, which causes the packet coverage phenomenon. The logical page should be enlarged at this time. The packet retrieval period T is set as needed, and may be set to within 10 minutes, for example. Thus, in the case of a backbone network node, memory space may need to be calculated in T bytes.
Fig. 2 illustrates a schematic diagram of association analysis of network flows. The network flow respectively carries out HASH processing on corresponding packets according to the nodes through which the network flow flows, so that the corresponding packets are searched in a logical page, and the upper limit of searching time delay is ensured, wherein the upper limit is the retrieval time of the whole logical page.

Claims (6)

1. A high-speed network flow association method based on a hash memory is characterized by comprising the following steps:
1) taking a characteristic field of an IP packet as the input of a HASH HASH table, and taking the generated HASH bits as the high-order address of a storage area;
2) storing the packets meeting the characteristic fields in an address range indicated by a high-order address in sequence, and recording stream information corresponding to the packets at the tail of a storage area; the stream information corresponding to the packet comprises a characteristic field, an address pointer, a next quintuple pointer, a corresponding read-write pointer, the last write-in time and the number of bytes;
3) when network flow association is needed, a packet characteristic field determined at any node is used as input of a HASH table at other nodes, a storage area where the packet characteristic field is located is determined, and the packet to be searched is searched and located in the storage area.
2. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic fields of the IP packet comprise 5-tuple of IP source address, destination address, source port, destination port, transport layer protocol.
3. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic field of the IP packet comprises 5-tuple of IP source address, destination address, source port, destination port, packet type TOS.
4. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the characteristic field of the IP packet comprises 4-tuple of source IP address, destination IP address, source port and destination port.
5. The hash-memory-based network flow high-speed correlation method of claim 1, wherein the characteristic field of the IP packet comprises 7 tuples of source IP address, destination IP address, source port, destination port, transport layer protocol, packet type TOS and interface index.
6. The hash-memory-based network flow high-speed correlation method according to claim 1, wherein the IP packet is an IPv6 packet, and its characteristic fields include 3-tuple of IP source address, destination address, and flow ID.
CN201710384744.1A 2017-05-26 2017-05-26 Network flow high-speed correlation method based on hash memory Active CN107248939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710384744.1A CN107248939B (en) 2017-05-26 2017-05-26 Network flow high-speed correlation method based on hash memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710384744.1A CN107248939B (en) 2017-05-26 2017-05-26 Network flow high-speed correlation method based on hash memory

Publications (2)

Publication Number Publication Date
CN107248939A CN107248939A (en) 2017-10-13
CN107248939B true CN107248939B (en) 2020-07-31

Family

ID=60017120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710384744.1A Active CN107248939B (en) 2017-05-26 2017-05-26 Network flow high-speed correlation method based on hash memory

Country Status (1)

Country Link
CN (1) CN107248939B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829814A (en) * 2018-06-10 2018-11-16 张超 A kind of the knowledge learning Asymptotical Method and device of speech interactive robot
JP7396469B2 (en) * 2020-04-14 2023-12-12 日本電信電話株式会社 Traffic monitoring devices, methods and programs
CN112511450B (en) * 2020-11-02 2022-05-31 杭州迪普信息技术有限公司 Flow control equipment and method
CN115914102B (en) * 2023-02-08 2023-05-23 阿里巴巴(中国)有限公司 Data forwarding method, flow table processing method, equipment and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567907A (en) * 2003-06-14 2005-01-19 华为技术有限公司 A method for utilizing network address resource
CN101140592A (en) * 2007-09-30 2008-03-12 华为技术有限公司 Keywords storing and researching method and apparatus
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN103546307A (en) * 2012-07-16 2014-01-29 清华大学 Network flow storage method
CN103581007A (en) * 2013-10-28 2014-02-12 汉柏科技有限公司 Message classifying and looking-up method
CN104618361A (en) * 2015-01-22 2015-05-13 中国科学院计算技术研究所 Network stream data reordering method
CN105227348A (en) * 2015-08-25 2016-01-06 广东睿江科技有限公司 A kind of Hash storage means based on IP five-tuple
CN105515919A (en) * 2016-01-20 2016-04-20 中国电子科技集团公司第五十四研究所 Network flow monitoring method based on Hash compression algorithm
CN106027427A (en) * 2016-05-27 2016-10-12 深圳市风云实业有限公司 HASH average distribution method and device based on FPGA
CN106789733A (en) * 2016-12-01 2017-05-31 北京锐安科技有限公司 A kind of device and method for improving large scale network flow stream searching efficiency

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567907A (en) * 2003-06-14 2005-01-19 华为技术有限公司 A method for utilizing network address resource
CN101140592A (en) * 2007-09-30 2008-03-12 华为技术有限公司 Keywords storing and researching method and apparatus
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN103546307A (en) * 2012-07-16 2014-01-29 清华大学 Network flow storage method
CN103581007A (en) * 2013-10-28 2014-02-12 汉柏科技有限公司 Message classifying and looking-up method
CN104618361A (en) * 2015-01-22 2015-05-13 中国科学院计算技术研究所 Network stream data reordering method
CN105227348A (en) * 2015-08-25 2016-01-06 广东睿江科技有限公司 A kind of Hash storage means based on IP five-tuple
CN105515919A (en) * 2016-01-20 2016-04-20 中国电子科技集团公司第五十四研究所 Network flow monitoring method based on Hash compression algorithm
CN106027427A (en) * 2016-05-27 2016-10-12 深圳市风云实业有限公司 HASH average distribution method and device based on FPGA
CN106789733A (en) * 2016-12-01 2017-05-31 北京锐安科技有限公司 A kind of device and method for improving large scale network flow stream searching efficiency

Also Published As

Publication number Publication date
CN107248939A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN107248939B (en) Network flow high-speed correlation method based on hash memory
CN109921996B (en) High-performance OpenFlow virtual flow table searching method
CN110808910B (en) OpenFlow flow table energy-saving storage framework supporting QoS and method thereof
US7787442B2 (en) Communication statistic information collection apparatus
US7706302B2 (en) Optimization of routing forwarding database in a network processor
EP1788490B1 (en) Method and apparatus for monitoring traffic in a network
US7281085B1 (en) Method and device for virtualization of multiple data sets on same associative memory
CN107528783B (en) IP route caching with two search phases for prefix length
CN103428093A (en) Route prefix storing, matching and updating method and device based on names
CN110912826B (en) Method and device for expanding IPFIX table items by using ACL
CN106713144B (en) Reading and writing method of message outlet information and forwarding engine
CN105591914B (en) Openflow flow table lookup method and device
CN111984835B (en) IPv4 mask quintuple rule storage compression method and device
CN111988231B (en) Mask quintuple rule matching method and device
CN111597142A (en) Network security acceleration card based on FPGA and acceleration method
CN111240599B (en) Data stream storage method and device
CN109150962B (en) Method for rapidly identifying HTTP request header through keywords
US6687715B2 (en) Parallel lookups that keep order
CN111200542B (en) Network flow management method and system based on deterministic replacement strategy
CN110851672A (en) Method for realizing multi-hit based on TCAM
Xie et al. Index–Trie: Efficient archival and retrieval of network traffic
CN115834478A (en) Method for realizing PBR high-speed forwarding by using TCAM
CN114710444A (en) Data center flow statistical method and system based on tower abstract and evictable flow table
KR101467942B1 (en) Fast Application Recognition System and Processing Method Therof
JPWO2004054186A1 (en) Data relay apparatus, associative memory device, and associative memory device utilization information retrieval method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant