WO2021113904A1 - Network traffic identification device - Google Patents
Network traffic identification device Download PDFInfo
- Publication number
- WO2021113904A1 WO2021113904A1 PCT/AU2020/051339 AU2020051339W WO2021113904A1 WO 2021113904 A1 WO2021113904 A1 WO 2021113904A1 AU 2020051339 W AU2020051339 W AU 2020051339W WO 2021113904 A1 WO2021113904 A1 WO 2021113904A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- packet
- packets
- sample
- flow
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/022—Capturing of monitoring data by sampling
- H04L43/024—Capturing of monitoring data by sampling by adaptive sampling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/022—Capturing of monitoring data by sampling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/026—Capturing of monitoring data using flow identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/062—Generation of reports related to network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/18—Protocol analysers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1023—Server selection for load balancing based on a hash applied to IP addresses or costs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/12—Network monitoring probes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/14—Arrangements for monitoring or testing data switching networks using software, i.e. software packages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Definitions
- the present invention relates to the identification of traffic on a network and in particular to processing of data at line-rate speeds.
- Various networking protocols divide a message, or data, into small packets, which are each transmitted to a destination address.
- the path each packet takes to the destination address may not be the same, nevertheless when they arrive they are reassembled to recreate the original message.
- Each packet will have a header and optionally a payload.
- the payload can contain the application specific data which is being sent.
- the header includes several identifying details such as the source address of the packet, the destination address the packet is going to, and the protocol and port.
- the header can also include other attributes such as the packet length and protocol metadata, among other details.
- the multiple packets of the data to be sent for an application are considered a stream of packets and form a flow.
- the payload of a flow is the combined payload of all the packets, and it is this combined payload that once reconstituted reflects the original data that was transmitted.
- Each of the packets in the flow will have the same source address, destination address and protocol. Other attributes may also be available, such as, for example the port.
- TCP Transmission Control Protocol
- SMTP Simple Mail Transfer Protocol
- HTTP Hypertext Transfer Protocol
- Packets are routed across a network by different devices, including address routing switches which look at the destination address of the packet, and flow routing switches that look at the source/destination address and/or protocol and port.
- These routers and switches are specialised in performing specific tasks (routing and switching) with very high performance.
- a small (1 rack unit) data centre switch may have 32 ports each capable of lOOGbps line-rate wire speeds, giving a bidirectional bandwidth and throughput of 6.4Tbps with a fixed port-to-port latency. This is achievable because the switch hardware uses an Application Specific Integrated Circuit (ASIC) dedicated to, and specialised for, the switching task.
- ASIC Application Specific Integrated Circuit
- next generation firewall occupies five rack units, has ten ports capable of l OOGbps speeds but only 28Gbps of NGFW (Next Generation Firewall) throughput with an enterprise traffic mix.
- NGFW Next Generation Firewall
- DPI Deep Packet Inspection
- network switches either side of the DPI appliance only route certain traffic through the appliance, and bypass the rest around it.
- the idea is to reduce the amount of network traffic flowing through the DPI system, which has limited throughput.
- Flowever the problem with this approach is that the network switches can only route traffic based on the information contained in packet headers, and this level of granularity isn’t sufficient.
- HTTPS Hypertext Transfer Protocol Secure
- QUIC Quadick User Datagram Protocol Internet Connections
- Streaming video is delivered over HTTPS and QUIC, general browsing HTTPS and HTTP, downloads HTTPS and HTTP with other being a mix.
- the network switches can only differentiate traffic by the protocol port (HTTPS/HTTP/QUIC) and source/destination IP address ranges. To have the most benefit from the DPI solution, streaming video would be bypassed, however it is not possible to reliably separate streaming video out from other HTTPS and QUIC traffic without inspecting the packet payloads - something that traditional network switches cannot do.
- a DPI solution deployed in a non-corporate environment can only extract information from the non- encrypted headers of the HTTPS and QUIC stream.
- TLS Transport Layer Security
- SNI Server Name Indication
- Electrophant flows such as downloads or streaming video are comprised of hundreds or thousands of packets over an extended period of time, and take up disproportionally more bandwidth than short-lived steams (“mice flows”) such as general web browsing.
- Fig 1 Another problem for network management is attempting to identify certain usage. For example, consider the data flows set out in Fig 1 . Set out are four data flows relating to traffic from Netflix 80, Facebook 79, YouTube 77 and general emails 78 that are transmitted across a network link 82. While not realistic for illustrative purposes assume that each data flow has only five packets. Each application has its identifying packet 32 and 35 as well as non-identifying packets 31 . If the object is to identify usage of Facebook 35 then the obvious option is to adopt the solution of Fig 2 and send every packet for analysis 83 - that is applying DPI and looking at the full packet payload of every packet transmitted between source 10 and destination 14. This would be an expensive option and impractical for any network of significant size.
- FIG. 3 An alternative to the brute force option of Fig 2, is to use address routing as shown in Fig 3 where packets are filtered by network. In the example shown, all the traffic from the 10.0.0.0/8 network is sent for analysis 11 . This can significantly reduce the amount of data that is to be analysed, but there is the obvious risk of the usage that is desired to be found being on a different network.
- FIG. 4 A further alternative is flow routing as shown in Fig 4, where packets are filtered by service. In this case only web service packets are sent for analysis 11 . In practice this is only marginally improved from the option of Fig 2 given that the vast majority - perhaps 90% - of overall Internet traffic is web service related.
- Another problem faced by network management is traffic analysis at scale. In datacentres and corporate networks it is desired to manage the data traffic on the network to reduce congestion and latency. In other words, to ensure the data is free flowing and the user experience is not negatively impacted. This could for example be a problem if a small number of elephant flows consume the available bandwidth, negatively impacting a large number of mice flows.
- the difficulty is that in order to manage network traffic, it is necessary to analyse the network traffic to determine any bottlenecks and network congestion.
- TAP test access point
- the probe device 41 could be manually attached to a specific port 84 of the data centre network as shown in Fig 5. The entire traffic on the port can then be analysed. The problem is that only a single port can be looked at, and it can be time consuming to change the probe to a different port 40.
- Fig 6 shows an alternative approach of a TAP switch 86 where each port is attached to the dedicated TAP switch 86 and a probe device 41 is also attached to the switch 86. While it can be quicker to change links and which port is being analysed, the same problem exists that only a single port at a time can be reviewed 85.
- Fig 7 replaces the TAP switch 86 with a TAP network 89, and rather than send entire traffic from a port or link, sends selected flows from multiple ports 87, 88 to the probe device.
- the TAP network also has limitations of identifying which flow to transmit and the practicalities of being able to analyse the amount of data.
- a network device able to filter out identifying packets from network traffic, and also sample a predetermined proportion of packets from the network traffic.
- a network traffic device comprising: at least one network device adapted to receive network data packets; wherein the at least one network device filters network the data packets to locate at least one identifying packet, and samples the network data packets to select at least one sample packet.
- the identifying packets and sample packets can then be sent to an analyser, which may carry out deep packet inspection (DPI) on the received packets.
- DPI deep packet inspection
- the network device will include a programmable application specific integrated circuit (ASIC), and operate exclusively in the data plane.
- ASIC application specific integrated circuit
- the number of sample packets selected by the network device can be based on a predetermined sample rate.
- the sample packet may be selected at random, or alternatively may be selected by selecting each Nth network data packet, where N is a predetermined number. N can be selected having regard to the desired sample rate.
- the analyser will estimate (or substantially reconstruct) the flow information of the network data packets from the received packets and the predetermined sample rate, which may be around 4 or 5%.
- the device may also include a load balancer to determine which analyser each received packet is to be transferred to.
- the present invention provides a network traffic device comprising: at least one network device operating exclusively in the data plane, the at least one network device adapted to receive data packets from data streams forming network traffic; the at least one network device adapted to filter the data packets to locate each identifying packet, and sample the data packets to select a predetermined number of sample packets; and at least one analyser, adapted to perform deep packet inspection on received packets, the received packets comprising the at least one identifying packet and the at least one sample packet.
- Figure 1 shows an example scenario.
- Figure 2 shows one possible solution to the scenario of Figure 1 using brute force.
- Figure 3 shows another possible solution to the scenario of Figure 1 using address routing.
- Figure 4 shows another possible solution to the scenario of Figure 1 using flow routing.
- Figure 5 shows a network diagnostic approach using a manual probe.
- Figure 6 shows an alternative approach to Figure 5 using a TAP switch.
- Figure 7 shows another alternative approach to Figure 5 using a TAP network.
- Figure 8 demonstrates a possible hardware setup using the present invention.
- Figure 9 shows an alternative view to Figure 8, and exemplifies the ability of the present invention to scale up vertically.
- Figure 10 exemplifies the ability of the present invention to scale out horizontally.
- Figure 11 demonstrates a possible system set up using the present invention.
- Figure 12 shows a screenshot of a FITTP packet.
- Figure 13 shows a screenshot of a HTTPS packet.
- Figure 14 shows a flow diagram of one approach to configure a programmable network switch in accordance with the present invention.
- Figure 15 exemplifies the approach of the present invention.
- Figure 16 shows a possible solution to the scenario of Figure 1 using the present invention.
- Figure 17 shows an alternative approach to Figure 5 using the present invention.
- Figure 18 shows a possible traffic management approach using the present invention.
- the invention describes a novel approach to identifying traffic on a network.
- the system is able to operate at line rate speeds, and is thus able to avoid the bottlenecks faced by current techniques relying on known network appliances.
- the present invention sends a copy of network traffic through a device, that filters out the identifying packets and sends only those identifying packets to the DPI appliance.
- the device will also sample the non-identifying packets and send a predetermined proportion, such as 1 in every 1 ,000 through to the DPI for counting or analysis.
- the device can run at the speed of a network switch and send only the useful packets through to a DPI for processing.
- the invention can utilise a programmable ASIC chip such as the Tofino available from Barefoot Networks.
- the ASIC chip can be programmed using a Software Development Kit (SDK) with the P4 programming language, and may run at line-rate speeds.
- SDK Software Development Kit
- the chip can function as a powered network switch, and can be configured to send a copy of packets that match specific packet header conditions and/or specific packet payload conditions to another device for processing/inspection.
- Another embodiment of the invention is a server (x86 or similar) that processes and inspects the packets extracted by the chip.
- the output of this analysis process can be fed to other devices/systems, shown to operators and/or collected for later use.
- an embodiment includes a hardware network device 11 , including a fully programmable switching ASIC chip that operates at line-rate with multiple high-speed ports (e.g. 32x l OOGbps ports).
- the network device 11 will receive input(s) from a source network 10, for example either via SPAN (Switch Port Analyser) ports or optical TAPs.
- the output of the network device 11 passes to an analysis device(s) 12 to process the identifying packet, and preferably the sampled packet stream to produce the desired analytical output 13.
- the source network 10 provides a copy of the network traffic into the network device 11 . While this is not essential, as the network device 11 can pass-thru traffic when operating inline, it is applicable when operating out-of-band.
- the network device 11 could also be placed within the source network 10 itself and be performing traditional network switching functions in addition to the specific functionality of the present invention. In any event, the programmable network device 11 is receiving inputs from the source network 10.
- the analysis device 12 is connected to the network device 11 to receive outputs post-filtering and after any packet encapsulation.
- everything from the source network 10 through the network device 11 and up to but excluding the analysis device 12 operates in the data plane at line-rate. There is no control plane involvement required, removing a source of latency and a potential bottleneck. That is, the throughput equals the available bandwidth: there are no throughput limitations, and the network device 11 operates at line-rate. Accordingly, the analysis device(s) 12 should be scaled sufficiently to handle the configured amount of network traffic output by the network device 11 post-filtering. In this regard, the configuration could be altered by the operator as desired, and could allow anything from all packets to no packets to a proportion of packets to be passed through to the analysis device 12. The Applicant considers that for realistic scenarios a sampling rate of around 4% to 5% is sufficient to identify network flow applications and be able to accurately estimate network flow statistical information.
- the network device 11 can be configured to act as a filter.
- the network device 11 can be configured to receive a copy of every packet flowing through the source network 10 and only send the packets that match the filter criteria through to the analysis device 12.
- the solution of the present invention can also scale by deploying more hardware and adding an intermediate tier. This is not an option with conventional systems.
- multiple network devices 11 and 16 interconnect to receive and filter data from the source network 10 and distribute the filtered output across multiple analysis devices 12. This can scale horizontally as needed, until the aggregating network device 16 runs out of capacity. According to an embodiment, at this point the entire solution can be replicated to scale-out indefinitely, with flow data 13 records being deduplicated and merged as required.
- Fig 11 assume there is a network with 21 Tbps of traffic 90 flowing between users 15 and the Internet 17. This represents nearly double the Australian National Broadband Network’s residential Internet connection capacity as of September 2019.
- the network has unrealistically high utilisation of 80% and the system is sampling at a rate of 4%.
- a copy 91 of the 21 Tbps of traffic flowing between users and the Internet is fed into 18x network devices 11 operating at the edge.
- Each 100Gbps bidirectional link 93 translates into a pair of 100Gbps unidirectional link inputs 92.
- Each of the edge programmable network switches filters and samples the data down 97 to approximately 96Gbps of output 96. This is fed to the analysis devices 12 via an intermediary network device 16 performing an aggregation function.
- the requirement for 45x analysis devices assumes that each is only capable of processing 40Gbps of network traffic, although typically they would have much greater capacity.
- a network packet broker arrangement would require the addition of 1 ,350x servers (each processing 40G of input) or dedicated ‘service appliances’. Therefore, currently available network packet broker solutions can lead to a bottleneck as to sample all traffic at 4% requires, in addition to the network packet broker switches themselves, sufficient servers/service appliances to handle all of the network traffic flowing through the network.
- the present invention is able to sample the packets, both randomly and picking out identifying packets, exclusively in the data plane at line-rate speeds while still performing cut-through processing.
- the analysis device 12 of embodiments of the present invention receives a different data makeup. It receives the identifying packets as well as randomly sampled other packets, de-encapsulates them and performs analysis. Advantageously, this can produce flow data as well as other outputs.
- the analysis device 12 can be implemented in a general purpose computer (x86), FPGA (field-programmable gate array) or dedicated hardware.
- the network devices 11 are configurable (e.g. via “tables”) to enable identification of packets of interest and to determine how identified packets are to be handled.
- a matching table will be configured that will match packets with specific headers of interest, and also ideally at least the first six bytes of the payload.
- the matching should be ternary/wildcard based for efficiency, although exact matching would also work.
- Ternary matching allows operators to easily configure “don’t care” values rather than having to exhaustively list all possible matching values.
- Adopting the above matching table, matching rules to detect identifying packets for HTTP and HTTPS as well as sampling 4% of HTTP and HTTPS traffic (ignoring all other traffic) could be created as follows:
- HTTP a Select TCP packets with either source or destination port of 80 (HTTP) that have a payload starting with “GET/” (HTTP GET) and sample at 100% with rule priority 1
- HTTP GET Select TCP packets with either source or destination port of 80 (HTTP) and sample at 4% with rule priority 2
- HTTPS Select TCP packets with either source or destination port of 443 (HTTPS) that have a payload first byte of 0x16 hexadecimal and a payload sixth byte of 0x01 hexadecimal (HTTPS ClientHello) and sample at 100% with rule priority 1
- HTTPS Select TCP packets with either source or destination port of 443 (HTTPS) and sample at 4% with rule priority 2
- the first rule in each set will extract all of the identifying packets, and the second rule in each set samples 4% of the non-identifying packets.
- a HTTP request starts by issuing a packet from client to server that contains either a GET, POST, PUT, DELETE, HEAD, OPTIONS, TRACE or PATCH request method. These appear at the start of the packet payload, so the system can match - with wildcards - such that packets that start with these terms are selected. It is possible that other packets may randomly start with these characters, but they are very infrequent and could be treated by the analysis servers as a random sample.
- the screenshot of Fig 12 shows a HTTP packet from a GET request.
- the packet protocol 20 is TCP
- the destination port 21 is 80
- the first few bytes of the packet payload 22 are GET /.
- the filter could be applied in the network device 11 to send a copy of this packet to the analysis device 12.
- the analysis device 12 can then look deeper into the packet payload to find the Host: 23 to determine the name of the server the HTTP GET request was sent to.
- the network device 11 is able to send a small representative portion of the HTTP packets to the analysis devices. This allows the analysis devices to estimate the number of packets in the flow, the size of the flow, when the flow started and when the flow finished. By knowing the sample rate applied to select a given packet, the analysis devices could estimate the number of packets in the flow and the size of the flow by simple extrapolation. For example if a packet is received with a 1 in 20 chance of being sampled, the system can add 20 to the number of packets in the flow and 20x the size of the packet to the size of the flow.
- the system is interested in the Server Name Indication field 24 present within the Client Hello 25 handshake packet, and/or the Common Name field present within the certificate that follows the Server Hello handshake packet.
- HTTPS packets can be identified by the TCP protocol 20 and port 21 being 443.
- the Client Hello handshake packet is sent from client to server, so the destination port will be 443.
- the Server Hello handshake packet is sent from server to client, so the source port will be 443.
- the TLS handshake packets have a first payload byte of 0x16 hexadecimal (handshake) and a sixth payload byte of 0x01 hexadecimal (Client Hello) or 0x02 hexadecimal (Server Hello).
- Fig 13 shows a screenshot showing a HTTPS network packet containing a Client Hello. Deeper within the payload of the packet the Server Name Indication 24 extension shows the name of the server that the request was sent to.
- Flow Hash Once a packet has been selected by the matching table, and noting that if a rule has a sampling rate less than 100% then there is a chance it won’t be sent through, a decision is made whether to send the packet to the analysis device 12. While a single analysis device 12 may be sufficient for a small network, in order for the present invention to scale horizontally, the system will ideally balance the load across multiple analysis device, in which case the system can determine which analysis device 12 to send the packet to.
- the preferred arrangement of the present invention computes a flow-hash of the packet.
- Each flow can be uniquely identified by the protocol and source/destination addresses and source/destination ports. Although depending on the network additional packet headers, such as VLAN or MPLS tags, may also be required.
- Flows operate in both the upload and download direction (from client to server, and from server to client). Technically each is a separate flow. However, the system may prefer the packets from both the upload and download direction to be sent to the same analysis device 12. This is to more easily correlate the upload and download packets, and when an identifying packet is detected on, say, an upload flow the system can apply that information to the corresponding download flow at the same time.
- the flow-hash may be computed by taking a one-way hash of the flow as follows:
- the system computes the flow-hash of the packet from the EtherType and Payload bytes 1 to 6. This will result in the packet being randomly distributed between the analysis devices.
- the system determines if the flow is an “upload” or “download” flow. This can be done in any deterministic manner. For TCP and UDP flows one option is to use the source port and destination port, and consider a flow to be an “upload” if the source port is higher than or equal to the destination port - otherwise it is a “download” flow. 3. If it is an “upload” flow the system computes the flow-hash from the Protocol, Source Address, Destination Address, Source Port and Destination Port.
- the order of the flow hash is reversed for each of the upload and download flows to ensure that the flow hash for the download flow will be identical to the corresponding upload flow.
- the actual flow hash value is not critical and alternative flow hashes could be used if desired. What is important is to ensure that the same hash value is outputted for a given flow regardless of which direction the packet is travelling in.
- the output table can be used to determine which analysis device 12 a given packet should be sent to for processing.
- the range of possible flow-hash values (e.g. 0 to 65535) could be entered, and a portion of this range associated with each available analysis device 12. Overlapping ranges can be permitted and ties can be broken using the rule priority, allowing removal of one analysis device 12 with a fallback to an alternative or default to ensure uninterrupted operation.
- the network device 11 receives 20 the input packet, and then parses 21 the packet headers and packet payload, or preferably at least the first six bytes of the payload. Ideally, at least the headers Ethernet, IPv4/IPv6, TCP/UDP, and the payload will be parsed.
- the next step in the flow chart of Fig 14 is to compute 22 the flow hash to assist with load balancing, although this step is optional or could instead be performed later if preferred.
- To compute the flow hash a decision as to which fields uniquely identify a flow should have been identified (For example, as indicated above these could be Protocol, Source Address, Destination Address, Source Port and Destination Port.
- Those fields that are interchanged when the packets are flowing in the opposite direction should be identified, for example, the Source Address and Destination Address would be swapped, as would the Source Port and Destination Port.
- An arbitrary decision can be made whether to swap the interchangeable fields for the “upload” direction or the “download” direction, and then the flow hash can be computed by applying a one-way hash function to the fields that uniquely identify a flow, swapping the interchangeable fields of the packet in the one direction. The result should be the same for any packet in the flow traveling in either direction. For example, if there is a TCP connection between host A address 10.0.0.1 port
- a random number is optionally generated 23 which can be used in association with the sample rate.
- the random number could be generated at a different step if preferred.
- the network device 11 compares 24 the parsed data from the packet headers and packet payload, against a matching table.
- the matching table should be defined to match the packets of interest, and should at least include the identifying packet.
- the sample rate could also be configured in the matching table, or it could be configured elsewhere.
- the matching table could be split up into several different tables, possibly with varying fields to match against. In each scenario the matching table determines if a packet is of interest. Fields can be tested for matching using exact, ternary, range or other matching methods.
- the random number could be used with the sample rate to determine if the packet will be sampled 25 and analysed. This can be done in a number of ways. For example, if the random number is given as a number between 0 and 1024, and the sample rate is 50%, then if the random number is below 512 the packet is sampled, and if the random number is greater than 512 it is not sampled.
- the flow hash is then optionally matched 26 against the output table to determine which analysis device 12 the packet is to go to, and then the packet is transferred to the destination analysis server 12.
- This specific method of load balancing is optional.
- An alternative is that a single analysis device 12 is used, or a separate load balancing mechanism is used.
- the analysis device 12 can be configured to perform a range of tasks. According to an embodiment, the primary use discussed above is to handle identifying packets and reconstruct flow information from sampled data.
- the analysis device 12 should have enough information to build up the data needed. If the matching table is not available then preferably the packet will have been encapsulated with the applied sampling rate information. The analysis device 12 can determine what sampling rate was applied to the packet and use that to extrapolate out the number of packets in the flow and/or the size of the flow. The analysis device 12 can also try and read the contents of any identifying packets to identify more information about the flow.
- the metadata about the flow itself can be held in a cache so that it can be updated as more flow packets arrive, and an expiry mechanism on the cache could be used to detect a flow terminating.
- the analysis device functions could be split into and performed by separate components.
- the flow information may not be stored in a cache and could be sent to a data store for correlation by another process.
- flow termination could also be detected by looking more deeply at the packet contents for example for TCP FIN packets.
- the analysis device 12 is configured to estimate the size of the flow (packet count and total size in bytes) from the sampled packets it is provided. This estimation can be performed in several different ways. Preferably, the analysis device 12 will have at a minimum the sampled packets delivered, and also knowledge of what probability/sample rate each packet was extracted.
- the analysis device 12 may also examine the identifying packets to pull out that information and add that to the metadata of the flow.
- the analysis device 12 will receive a sampled packet and determine the sample rate/probability with which it was extracted. Once it has this data the analysis device 12 will update the packet flow metadata based on packet contents (e.g. identifying packets), and estimate of flow packet count and flow total size. For example, in a simple approach, with a 20% sampling rate applied, the analysis device 12 could add 5 to the packet count and 5x the packet size to the total size. That is, the data analysed by the analysis device 12 is extrapolated out to estimate the result had the sample rate been 100%.
- the analysis device 12 can determine if there is any additional useful information that can be extracted. For example, for a FITTP packet the analysis device 12 can check if it is a GET, POST, PUT, PATCFI, DELETE, OPTIONS, HEAD or TRACE request and extract out the Host accordingly. Similarly, for a HTTPS packet the analysis device 12 could check if it is a handshake packet with Client Hello or Server Hello and extract out the Server Name Indication or Certificate Common Name.
- a low sampling rate is applied to all packets (such as for example 4%), and in addition all identifying packets are sampled. Provided the sampling rate is sufficient for the network traffic profile (typically 4% to 5%), the sample size from the total packet population will yield a good estimate of the true packet count and size of each flow. Briefer and smaller flows will be less accurate than long-lived larger flows. However, network operators are generally more interested in the long- lived larger flows as they have the most impact on the network. Evidence of the smaller flows will still be seen by the extraction of their identifying packets.
- the present invention can extract out the packets of interest from a flow without having to process the entire flow.
- FIG 15 shows a source 10 sending packets to a destination 14.
- the network device 11 of the present invention sits in the middle, and selectively extracts packets.
- each flow may have any number of packets
- each flow numbered 1 , 2, 3 and 4.
- Flows 1 , 2 and 4 contain identifying packets 32. Note that only one (or relatively few in proportion to the number of other packets in the flow) packet is an identifying packet and the rest of the flow isn’t of interest in this application.
- the present invention can achieve a similar result without having to process all 20 packets.
- the network device 11 still inspects all 20 packets, however in the example given only 5 packets end up with the analysis device 12, being all of the identifying packets 98 and some randomly sampled packets 99.
- the present invention provides an ideal tool. As shown in Fig 16, the present invention receives the source 10 data, detects each of the identifying packets 32 for analysis, and is then able to locate the Facebook identifying packet 35. In this example, 3 packets were sent for analysis which compares very favourably with the 20 packets that needed to be analysed in the example of Fig 2.
- the present invention may provide significant improvement in analysing traffic flow.
- the examples of Figs 5 to 7, show the limitations faced by network operators, or more the point their inability to monitor all of a networks internal traffic. Adopting the approach exemplified in Fig 15 of selecting identifying packets 98 and random packets 99, an improved approach to network monitoring can be seen in Fig 17.
- all the ports 40 are attached to a network device 11 of the present invention, which in turn can be connected to a diagnostic probe device 41.
- This approach allows the network operator to identify all of the internal traffic on the network.
- the system can identify the applications running on the network.
- the system can also provide data feeds for intrusion detection, application monitoring, traffic analysis and network diagnostics.
- This data can be of certain assistance in traffic management.
- a general goal of traffic management is to improve the customer experience, and in particular during peak times.
- Large elephant flows such as software updates, which are generally not critical, can consume much of the available bandwidth for extended periods of time. This can have the effect of blocking mice flows such as web browsing traffic. For the ultimate consumer this can mean delays in web pages loading, and blame is commonly attributed to the service provider.
- QoS Quality of Service
- game play and game downloads of a popular online game. Once the distinction is made the game downloads, and not game play, could be rate limited so as to achieve bandwidth savings and allowing other data to travel more freely across the network. Similarly, other software and operating system updates could be limited to ensure mice flows are not negatively impacted.
- the optical TAP 45 can take an out-of-line copy of all the upload and download traffic 46, and feed this to the network device 11 of the present invention.
- the network device 11 detects all the identifying packets and forwards these to an analysis device 12 for packet inspection. Samples of a percentage of remaining packets can also be sent.
- the analysis device could be an x86 and performs packet inspection on the received packets to fingerprint applications. It can also track flow counters and detect elephant flow start and finish events. The elephant flows can be matched against a table of undesirable applications, and if a match is detected an undesirable application flow notice 49 can be sent to network policy enforcement 50. The network policy enforcement 50 can then flow rate limit the offending application, or take any other remedial action that may be selected. For example, elephant flows could be marked to be placed in low priority queues.
- the present invention advantageously may address a scale problem. That is, it may overcome the technical limitations of current technology which is not able to view all the data of a network or expand to encompass the whole of a network.
- the present invention can filter network packets at line-rate, relatively low cost and high scale. Something existing systems are not able to achieve.
- the present invention provides a unique combination of features including random sampling of packets from flows that match criteria and extraction of packets from flows that match criteria, where the criteria includes packets headers as well as part of the packet payload.
- Existing network packet brokers cannot perform both these functions unless they are connected to some other device. Relying on such a connection would mean that they could not operate at the data centre network switch speeds.
- the present invention is able to operate both of these features at line-rate speeds.
- the present invention could be deployed at-scale across an entire network as a network packet broker with the added benefits of providing analysis of network traffic at scale. It can therefore be used to proactively identify network problems, gather flow metadata records for analysis and feeding into security systems, provide protocol (DHCP, DNS) data extraction in real-time and provide network visibility at a new level of detail.
- DHCP protocol
- DNS protocol
- the present invention does not rely upon receiving a complete copy of a given packet stream, but rather only extracts out the packets of interest and a random sample of other packets of interest.
- the present invention can see every packet but does not have the processing overhead requirement of the DPI appliance.
- the present invention is capable of counting streams/flows in terms of the number of packets and byte sizes.
- flow based network switches are limited in the number of concurrent flows they can handle, typically only a few million. Once the flow based network switch exhausts its limited flow table memory, the switch will evict other active streams from its flow table, leading to a churn, which places additional load on the SDN (software-defined networking) controllers of the switch. In short, flow based switches do not work at scale. However, the present invention is able to scale up as needed.
- the core of the invention is the extraction of identifying packets from a data stream. This coupled with the extraction of samples of other packets from the stream, enables an analyser to derive the data required for a particular implementation.
- the application may be to monitor traffic flowing through a network so as to proactively manage the available bandwidth.
- An alternative may be to monitor traffic from a source or to a destination, or the effect of a particular application on the network.
- it could be used to limit the dissemination of undesirable information, such as that for example, from known terrorist groups. These applications cannot currently be undertaken except in a largely token effort.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
- Circuits Of Receivers In General (AREA)
- Devices For Checking Fares Or Tickets At Control Points (AREA)
- Small-Scale Networks (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20898456.7A EP4073981A4 (en) | 2019-12-11 | 2020-12-09 | Network traffic identification device |
AU2020400165A AU2020400165A1 (en) | 2019-12-11 | 2020-12-09 | Network traffic identification device |
JP2022536489A JP2023505720A (en) | 2019-12-11 | 2020-12-09 | network traffic identification device |
US17/784,442 US11894994B2 (en) | 2019-12-11 | 2020-12-09 | Network traffic identification device |
CA3161543A CA3161543A1 (en) | 2019-12-11 | 2020-12-09 | Network traffic identification device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2019904689 | 2019-12-11 | ||
AU2019904689A AU2019904689A0 (en) | 2019-12-11 | Network Traffic Identification Device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021113904A1 true WO2021113904A1 (en) | 2021-06-17 |
Family
ID=76328766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2020/051339 WO2021113904A1 (en) | 2019-12-11 | 2020-12-09 | Network traffic identification device |
Country Status (6)
Country | Link |
---|---|
US (1) | US11894994B2 (en) |
EP (1) | EP4073981A4 (en) |
JP (1) | JP2023505720A (en) |
AU (1) | AU2020400165A1 (en) |
CA (1) | CA3161543A1 (en) |
WO (1) | WO2021113904A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113904952A (en) * | 2021-10-08 | 2022-01-07 | 深圳依时货拉拉科技有限公司 | Network flow sampling method and device, computer equipment and readable storage medium |
US20230052712A1 (en) * | 2019-12-11 | 2023-02-16 | Redfig Consulting Pty Ltd | Network Traffic Identification Device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220400070A1 (en) * | 2021-06-15 | 2022-12-15 | Vmware, Inc. | Smart sampling and reporting of stateful flow attributes using port mask based scanner |
US20230082780A1 (en) * | 2021-08-05 | 2023-03-16 | Intel Corporation | Packet processing load balancer |
US12063161B1 (en) * | 2023-05-31 | 2024-08-13 | Cisco Technology, Inc. | Discovering multi-application workflows to identify potential qoe- impacting issues |
CN118101357B (en) * | 2024-04-29 | 2024-08-06 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Network flow classification method combining data packet semantics |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050190695A1 (en) * | 1999-11-12 | 2005-09-01 | Inmon Corporation | Intelligent collaboration across network systems |
US20070160073A1 (en) * | 2006-01-10 | 2007-07-12 | Kunihiko Toumura | Packet communications unit |
US20090116398A1 (en) * | 2007-11-07 | 2009-05-07 | Juniper Networks, Inc. | Systems and methods for flow monitoring |
US20110242994A1 (en) * | 2010-03-30 | 2011-10-06 | Allwyn Carvalho | Flow sampling with top talkers |
EP2632083A1 (en) | 2012-02-21 | 2013-08-28 | Tektronix, Inc. | Intelligent and scalable network monitoring using a hierarchy of devices |
WO2016122708A1 (en) | 2015-01-28 | 2016-08-04 | Hewlett Packard Enterprise Development Lp | Determining a sampling rate for data traffic |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7111163B1 (en) * | 2000-07-10 | 2006-09-19 | Alterwan, Inc. | Wide area network using internet with quality of service |
US7719966B2 (en) * | 2005-04-13 | 2010-05-18 | Zeugma Systems Inc. | Network element architecture for deep packet inspection |
US8151019B1 (en) * | 2008-04-22 | 2012-04-03 | Lockheed Martin Corporation | Adaptive network traffic shaper |
US8792491B2 (en) * | 2010-08-12 | 2014-07-29 | Citrix Systems, Inc. | Systems and methods for multi-level quality of service classification in an intermediary device |
EP4073981A4 (en) * | 2019-12-11 | 2023-01-18 | Redfig Consulting Pty Ltd | Network traffic identification device |
-
2020
- 2020-12-09 EP EP20898456.7A patent/EP4073981A4/en active Pending
- 2020-12-09 CA CA3161543A patent/CA3161543A1/en active Pending
- 2020-12-09 JP JP2022536489A patent/JP2023505720A/en active Pending
- 2020-12-09 AU AU2020400165A patent/AU2020400165A1/en active Pending
- 2020-12-09 WO PCT/AU2020/051339 patent/WO2021113904A1/en unknown
- 2020-12-09 US US17/784,442 patent/US11894994B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050190695A1 (en) * | 1999-11-12 | 2005-09-01 | Inmon Corporation | Intelligent collaboration across network systems |
US20070160073A1 (en) * | 2006-01-10 | 2007-07-12 | Kunihiko Toumura | Packet communications unit |
US20090116398A1 (en) * | 2007-11-07 | 2009-05-07 | Juniper Networks, Inc. | Systems and methods for flow monitoring |
US20110242994A1 (en) * | 2010-03-30 | 2011-10-06 | Allwyn Carvalho | Flow sampling with top talkers |
EP2632083A1 (en) | 2012-02-21 | 2013-08-28 | Tektronix, Inc. | Intelligent and scalable network monitoring using a hierarchy of devices |
WO2016122708A1 (en) | 2015-01-28 | 2016-08-04 | Hewlett Packard Enterprise Development Lp | Determining a sampling rate for data traffic |
Non-Patent Citations (1)
Title |
---|
See also references of EP4073981A4 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230052712A1 (en) * | 2019-12-11 | 2023-02-16 | Redfig Consulting Pty Ltd | Network Traffic Identification Device |
US11894994B2 (en) * | 2019-12-11 | 2024-02-06 | Redfig Consulting Pty Ltd | Network traffic identification device |
CN113904952A (en) * | 2021-10-08 | 2022-01-07 | 深圳依时货拉拉科技有限公司 | Network flow sampling method and device, computer equipment and readable storage medium |
CN113904952B (en) * | 2021-10-08 | 2023-04-25 | 深圳依时货拉拉科技有限公司 | Network traffic sampling method and device, computer equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
US11894994B2 (en) | 2024-02-06 |
EP4073981A4 (en) | 2023-01-18 |
EP4073981A1 (en) | 2022-10-19 |
CA3161543A1 (en) | 2021-06-17 |
US20230052712A1 (en) | 2023-02-16 |
JP2023505720A (en) | 2023-02-10 |
AU2020400165A1 (en) | 2022-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11894994B2 (en) | Network traffic identification device | |
Moore et al. | Toward the accurate identification of network applications | |
US11750520B2 (en) | Hash tag load balancing | |
US9282064B2 (en) | Method for processing a plurality of data and switching device for switching communication packets | |
CN102739457B (en) | Network flow recognition system and method based on DPI (Deep Packet Inspection) and SVM (Support Vector Machine) technology | |
US9917783B2 (en) | Method, system and non-transitory computer readable medium for profiling network traffic of a network | |
Kekely et al. | Software defined monitoring of application protocols | |
US20030177253A1 (en) | TCP-splitter: reliable packet monitoring methods and apparatus for high speed networks | |
US10511505B2 (en) | Systems and methods to recreate real world application level test packets for network testing | |
CN112565262A (en) | Flow data processing method, system, network equipment and storage medium | |
JP6290849B2 (en) | Traffic analysis system and traffic analysis method | |
KR101292873B1 (en) | Network interface card device and method of processing traffic by using the network interface card device | |
Ricart-Sanchez et al. | NetFPGA-based firewall solution for 5G multi-tenant architectures | |
WO2016169121A1 (en) | Link analysis method, device and system | |
Shi et al. | Protocol-independent identification of encrypted video traffic sources using traffic analysis | |
US8724473B2 (en) | Locating signatures in packets | |
Chen et al. | Analysis of the State of ECN on the Internet | |
JP2008193628A (en) | Traffic information distribution and collection method | |
Niemann et al. | Performance evaluation of Netfilter: a study on the performance loss when using Netfilter as a firewall | |
Manesh et al. | An improved approach towards network forensic investigation of HTTP and FTP protocols | |
GB2536681A (en) | Methods and apparatus for processing data in a network | |
Bujlow et al. | A method for evaluation of quality of service in computer networks | |
JP2015076879A (en) | Method and device for classifying encrypted data flow, computer program and information storage means | |
CN113132179B (en) | Measuring packet residence and propagation time | |
Constantine et al. | RFC 6349: Framework for TCP throughput testing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20898456 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022536489 Country of ref document: JP Kind code of ref document: A Ref document number: 3161543 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 2020400165 Country of ref document: AU Date of ref document: 20201209 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020898456 Country of ref document: EP Effective date: 20220711 |