WO2017046617A1 - Methods and apparatus for detecting patterns in data packets in a network - Google Patents

Methods and apparatus for detecting patterns in data packets in a network Download PDF

Info

Publication number
WO2017046617A1
WO2017046617A1 PCT/GB2016/052919 GB2016052919W WO2017046617A1 WO 2017046617 A1 WO2017046617 A1 WO 2017046617A1 GB 2016052919 W GB2016052919 W GB 2016052919W WO 2017046617 A1 WO2017046617 A1 WO 2017046617A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
packets
packet filter
rule
expression
Prior art date
Application number
PCT/GB2016/052919
Other languages
French (fr)
Inventor
Mark Field
Original Assignee
Telesoft Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telesoft Technologies Ltd filed Critical Telesoft Technologies Ltd
Publication of WO2017046617A1 publication Critical patent/WO2017046617A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management

Definitions

  • the present invention relates to a packet filter, a method of filtering packets, a compiler and a syntax, in particular arranged for detecting patterns in data packets in a network.
  • weak selectors which examine more generic attributes of a packet or packets such as length and checksum and how these attributes change over time. These attributes can be used to detect packets of interest that might be missed by detection using strong selectors.
  • implementing these techniques in a way that can cope with today's data rates is technically challenging. Small savings for each filtering operation, when multiplied by the number of packets being processed by a system, can quickly add up to a significant difference.
  • the present application aims to address the problems monitoring, filtering or processing traffic at the high speed data rates encountered in today's networks using matching techniques based around weak selectors.
  • a packet filter comprising:
  • a packet processor arranged to match a stored rule against the packets, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern,
  • the packet processor comprising a regular expression engine arranged to match the regular expression pattern against a target sequence of received packets,
  • the regular expression engine is arranged to invoke a packet filter engine to evaluate the packet filter expression of that element against the byes of that packet, wherein the regular expression engine is arranged to determine that the rule matches the sequence of packets if all the elements have been matched according to the regular expression pattern;
  • an actioning module to take a predetermined action in response to the rule being determined to match the packets.
  • a novel architecture comprising a regular expression engine invoking packet filter engines for evaluating rules against packets that can be highly efficiently produced in hardware.
  • regular expressions for matching character patterns in a string of data e.g. a search function
  • elements of the regular expression are byte matches against the packets in the sequence.
  • These byte matches are implemented by a packet filter engine which evaluates the packet filter expressions nested within the regular expression pattern.
  • regular expression engine moves through the packets in the sequence, invoking the packet filter engine to evaluate the elements of the pattern at each stage looking for a matching pattern.
  • the packet filter can detect patterns in a sequence of packets using weak selectors as a selection criteria in a highly computationally efficient way.
  • Prior art packet filters are limited to using strong selectors, which match on a single byte pattern only. Generic relationships with multiple possible variations between multiple byte values in packets and multiple packets in a stream cannot be captured using the syntax of strong selectors.
  • the invention provides an efficient means of encoding and processing such weak selectors, and enables detection applications to cover a broader scope of detection possibilities, making them more effective.
  • the module can alert a user that a sequence of packets has matched a rule or filter, route or otherwise process the matching packets in a different way to non-matching packets.
  • Detection patterns encoded and processed using the novel techniques proposed here can be implemented in a way that will use very few OP codes and thus require minimal processing power. Further, the efficiency of this syntax and processing architecture will remain consistent regardless of the throughput rate of the traffic being filtered.
  • a further advantage of this architecture is that new detection patterns could be applied or modified at runtime by a user or a program using it. To implement new detection rules without this syntax would require new code to be written and compiled for each new detection rule.
  • the novel techniques can be used for network monitoring, intrusion detection, cyber defense, etc. Any suitable action can be taken in response to detecting a packet or packets of interest, such as processing matching packets differently, dropping, filtering or routing matching packets differently, alerting a user to a match, etc.
  • multiple rules can be stored and matched against the packets in this way.
  • the packet processor can be implemented in software or hardware or a combination.
  • the packet processor is implemented using a Field Programmable Gate Array to implement the virtual machine or machines implementing the packet processor.
  • the packet filter expressions and packet filter engine are based on the Berkeley Packet Filter (BPF), preferably with extensions made to BPF syntax to allow for the additional functionality described herein.
  • BPF Berkeley Packet Filter
  • the regular expressions are in the form of Perl Compatible Regular Expressions.
  • the regular expression engine comprises a state machine wherein transition to the next state depends upon an evaluation of a packet filter expression by the packet filter engine.
  • the regular expression engine is preferably arranged as a virtual machine.
  • the packet filter engine preferably comprises a virtual machine processing bytecodes, e.g. interpreting or using just in time compilation, derived from compiling the packet filter expression syntax.
  • the target sequence of received packets against which the regular expression pattern is matched is a stream (also interchangeably referred to as a flow) between two nodes in the network.
  • the stream is identified by the packets' 5-tuple, i.e. by source IP address and port, destination IP address and port, and protocol fields within the packet header.
  • the packet filter comprising a context store in communication with the packet processor, wherein in response to instructions in the rule, the packet processor is arranged to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in the context store and to retrieve the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
  • the regex engine retrieves and examines this information and determines the next subexpression to match based on the state retrieved from the context store. This means that the regex engine is not tied to a particular flow and therefore is not idle in between packets in that flow arriving. Instead, by saving the state of the expression matching in a context store, the regex engine hardware can be used to evaluate packets in different flows in different states of evaluation. This makes the system more efficient and flexible.
  • the context store can also be used to hold values resulting from evaluation of a sub-expression that are to be used in later subexpressions to allow patterns to be matched across packets in a flow. For example, if a packet stream is to be filtered when app[3] of packet 1 is equal to app[8] of packet 2, the value of app[3] in packet 1 will need to be stored for comparison with app[8] of packet 2. For this reason the BPF syntax is in preferred embodiments extended to allow information stored here to be saved and retrieved in a context store.
  • the results are stored in the context store according to the stream, e.g. two communicating nodes in network. This can be achieved by storing the information about the result of a match according to the packets' 5-tuple, i.e.
  • source IP address and port and destination IP address and port and protocol This allows bytes in one packet to be compared with bytes in another packet as part of the matching process. This is not possible with existing packet filter technology. If more than one rule is being evaluated, separate regex engines can be implemented for each rule, each engine having its own context store to be used to store state when evaluating the rules.
  • the packet filter comprises a packet classifier module arranged to calculate metadata for the packets which is appended to the packets, wherein in response to instructions in the packet filter expression, the packet filter engine is arranged to evaluate the metadata against the packet filter expression.
  • the packet is written into working memory in the packet filter and the metadata is appended to the packet in the memory.
  • the metadata comprises one or more of an offset of one or more protocol headers in the packet, a length of one or more protocol layers in the packet, a size of one or more protocol layers in the packet, or a checksum of one or more protocol layers in the packet.
  • information about the protocol layers in the packet e.g. datalink, network, transport and application layers, is pre-calculated and made available to the packet filter engine when evaluating the packet filter expression. This calculation can be done efficiently by hardware. Thus, by relieving this processing burden from the packet processor, the processing can be made faster and also this allows more functionality for the packet filter.
  • the metadata can be appended to the packet in such a way that the metadata elements are at known positions relative to the packet.
  • the compiler when compiling the packet filter expression that includes a reference to a metadata element in the syntax of the rule being compiled, can include suitable opcodes for loading that element from memory at the appropriate offset into a register of the virtual machine for further processing and matching against the packet data.
  • a packet filter expression can be written that compares the value in the checksum field of the network layer of the packet with the actual calculated checksum value appended to the packet as metadata.
  • the compiler knows where these values are in memory in terms of an offset relative to the packet and so can load the values to perform a comparison. This allows functionality that is not possible with existing packet filters.
  • the metadata comprises information about the application layer allowing the packet filter expression to be evaluated against bytes in the application layer of the packets. This is something that is not possible with existing packet filter technology, where the byte values that can be matched are limited to datalink and internet layers.
  • the packet filter engine in response to instructions in the packet filter expression, is arranged to evaluate application layer data in the packet against the packet filter expression.
  • the packet filter comprises a compiler arranged to compile the rules into byte codes for execution in a virtual machine provided by the packet processor to implement the regular expression engine and/or the packet filter engine.
  • the compiler recognises instructions to store the results of evaluating a packet filter expression to a context store and to retrieve the results from the context store to a register or memory for use in evaluating a further packet filter expression.
  • the packet filter comprises a console by which a new user rule can be received and compiled by the compiler, the compiled rule being supplied to the virtual machine in the packet processor whilst the packet processor is running so as to dynamically update the rules being matched against the packets at runtime.
  • new rules can be implemented at run time under control of a software program.
  • the rules being matched against packets can be changed "on the fly” by a user of the system or another program. This is a considerable technical advantage over comparative existing techniques for applying weak selector rules to packet data which necessitate writing custom programs, e.g. in C++, which are compiled into an executable file which then has to be executed before any matching can occur. This gives no possibility of changing the rules on the fly or adding new rules.
  • the method comprises in response to instructions in the rule, storing the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
  • the method comprises calculating metadata for the packets and appending the metadata to the packets; and in response to instructions in the packet filter expression, evaluating the metadata against the packet filter expression.
  • the method comprises, in response to instructions in the packet filter expression, evaluating application layer data in the packet.
  • the method comprises compiling the rules into byte codes for execution in a virtual machine. In an embodiment, the method comprises dynamically updating the rules being matched against the packets in a virtual machine at runtime.
  • a compiler arranged to compile a rule whose syntax comprises packet filter expressions defining matches on packet bytes nested in a regular expression, the compiler outputting bytecodes suitable for execution in a virtual machine for evaluating the rules against packets of data in a sequence.
  • a compiler arranged to recognise syntax in the rule in which the packet filter expression addresses metadata associated with each packet, and to output bytecodes to cause the virtual machine to load the metadata to a register for evaluation.
  • the compiler is arranged to recognise syntax in the rule in which the packet filter expression addresses a context store, and to output bytecodes to cause the virtual machine to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
  • a syntax for encoding rules for packet detection comprising packet filter expressions defining matches on packet bytes nested in a regular expression.
  • the novel syntax is preferably based on a combination of the Berkeley Packet Filter (BPF) and Regular Expressions (Regex) with extensions to implement one or more of the novel techniques described herein.
  • BPF Berkeley Packet Filter
  • Regex Regular Expressions
  • the syntax encodes rules for packet detection in a way that is flexible enough to cover a broad range of weak selector packet detection rules whilst also being simple enough to be efficiently implemented in hardware or software.
  • apparatus arranged to match packets of data received on a network by processing a rule whose syntax comprises packet filter expressions defining matches on packet bytes nested in a regular expression.
  • Figure 1 shows schematically an example of a system for detecting packets in a network according to an embodiment of the present invention
  • Figure 2 shows an example of the architecture of a Field Programmable Gate Array based packet filter engine which may form part of the system of Figure 1 ;
  • FIG. 3 shows an example of steps required in software to configure the packet filter engine of Figure 2;
  • Figure 4 shows a flow diagram of a series of operations performed by the packet filter engine of Figure 2.
  • Figure 1 shows schematically an example of a system 10 for detecting packets in a network 100.
  • the system 10 includes a host computer 12 having a processor 14, memory 16, storage 18 and user input/output peripherals 20,22.
  • the processor is operable to execute program instructions stored in the storage 18.
  • the system 10 comprises an interface 26 for connecting to a communication network 100, which may comprise one or more network ports or network taps for making connections to interfaces of the network 100 and receiving packets 25 of data transmitted over the network interface.
  • a communication network 100 may comprise one or more network ports or network taps for making connections to interfaces of the network 100 and receiving packets 25 of data transmitted over the network interface.
  • the host computer 12 also comprises a daughter card 30 which connects to a suitable interface of the host computer 12 and includes a Field Programmable Gate Array 32, which implements a packet filter engine (as described below in relation to Figure 3) for detecting packets received over the network interface 26.
  • the processor 14 is arranged to configure the FPGA 32 with rules for detecting packets, as will be described in more details below.
  • FIG. 2 shows in more detail the way in which the FPGA packet filter engine 50 is configured with rules for detecting packets.
  • Weak selector detection rules 40 are provided to the system, e.g. entered by a user, encoded in a novel combined BPF and Regex syntax, as described in more detail below.
  • the rules 40 are passed to bespoke BPF and regex compilers 42,43 that convert the rules into configuration instructions (referred to as op codes herein) 44 for the target hardware.
  • the BPF compiler 42 used in this invention will be based on the standard BPF compiler but is extended to implement the additional functionality describe herein.
  • the compiler will need to access bytes in the application layer and metadata (metadata is added by the Packet Classifier - see below), as well dealing with accessing the context store. These extensions have been described elsewhere in more detail.
  • the Regex compiler 43 will be based on a standard Regex compiler but will initiate evaluation of BPF expressions in place of byte expressions. This allows packets to be treated like bytes are in standard Regex processors and means patterns can be detected across multiple packets in a stream.
  • opcodes 44 are generated for processing by a virtual machine implemented in the FPGA packet filter engine 50.
  • the hostware module 46 of the architecture interacts directly with the target hardware, in this example a FPGA based card 30,32.
  • the pseudo-machine code instructions generated by the compilers 42,43 are passed down by the hostware 46 to the FPGA card 30 and stored as instructions in FPGA accessible memory. These instructions are executed by the Packet Processor implemented within the FPGA 32 for each packet 25 that is received.
  • the FPGA 32 packet filter engine has a MAC 50 (media access controller) which is in communication with the network tap 26 and receives the packet data 25 from the network 100.
  • the MAC 50 implements the layer 2 sublayer media access control (MAC) data communication protocol and interfaces the physical layer with the logical link layer.
  • the MAC sublayer provides addressing and channel access control mechanisms that make it possible for several terminals or network nodes to communicate within a multiple access network that incorporates a shared medium, e.g. an Ethernet network.
  • the packet classifier 52 detects protocol layers in packets received from the MAC 50 and their respective lengths, sizes, offsets and checksums and adds this to each packet as metadata.
  • the metadata can be included at the beginning or end of the packet data, i.e. at a fixed position relative to the packet data when stored in FPGA memory.
  • the metadata is accessible by the packet filter 54 in response to extended BPF syntax and usable in evaluating packet filter expressions. Inclusion of this metadata enables the modified BPF to perform the following operations that the original BPF cannot:
  • the Packet Classifier 52 also retrieves context data, if appropriate, from the Context Store 56.
  • the classifier 52 calculates the tuple for incoming packets and so can select the required context which can then be buffered into FIFO along with the associated packet. This may or may not be useful depending on the type of flows that are being inspected.
  • a FIFO 58 temporarily stores packets in situations where there is no Packet Processor 54 currently available. A typical example of when this would be needed is when a burst of short length packets 25 are received.
  • the packet processor module 54 contains the Regex 54a and BPF engines 54b that apply the encoded rules to each packet received. Packets 25 can be checked against multiple rules and one packet may match multiple rules.
  • the BPF engine 54b creates BPF virtual machines to implement its filtering capabilities.
  • the BPF virtual machines interpret pseudo-machine language programs according to BPF syntax, which define operations allowing it to fetch data from the packet, perform arithmetic operations on data from the packet, and compare the results against constants or against data in the packet or test bits in the results, accepting or rejecting the packet based on the results of those tests.
  • BPF syntax defines operations allowing it to fetch data from the packet, perform arithmetic operations on data from the packet, and compare the results against constants or against data in the packet or test bits in the results, accepting or rejecting the packet based on the results of those tests.
  • just-in-time compilation is used to convert virtual machine instructions into native code in order to further avoid overhead.
  • the BPF engine 54b in Figure 3 accesses byte values accessible in standard BPF but has been extended to access byte values in the application layer and metadata layer (metadata is added to the packet by the Packet Classifier).
  • the BPF engine 54b has also been extended to handle the exclusive-or (XOR) and context assignment operators.
  • Regular expressions are a notation for describing sets of character strings. When a particular string is in the set described by a regular expression, the regular expression is said to match the string.
  • the regex engine 54a can be thought of as a finite state machine. The decision equates to a branch between states. Each state machine is highly optimised for the job due to the small op code set.
  • regex engine 54a treats packets 25 in a packet sequence the same way an ordinary regex processor treats bytes in a byte sequence.
  • the regex engine 54a acts as a manager invoking new packet filter virtual machines 54a to evaluate the packet filter subexpressions in the regular expression each time a new packet is received.
  • the regex engine 54a proceeds through the subexpressions in the regular expression using the packet filter 54a to match each subexpression against the packets of data in the sequence. If all subexpressions are matched, the regular expression is found to match the sequence of packets 25. This means patterns can be identified across numerous packets 25 in a stream.
  • a stream of packets (also interchangeably called a "flow" herein) is a sequence of packets 25 transmitted between two particular endpoints in the network 100 and can be identified for instance by looking at the 5-tuple of the packet data, i.e. the source and destination IP addresses and ports and the network protocol fields.
  • Every flow has a unique context store 56 associated with it.
  • the flow is identified from the 5-tuple and the context associated with that flow is loaded. Thus the state of the machine is maintained between packets.
  • the regex engine 54b determines that the first subexpression in the rule has been matched.
  • the virtual machine is then primed to check whether Packet 2 matches the next subexpression in the rule.
  • Packet 2 i.e. the next packet in the stream
  • the context for this stream will be checked and if bytes 2 to 3 of Packet 2 are equal to 53 the packets will be matched against the rule and the User Module 60 will be altered.
  • the User Module 60 then takes a predetermined action for matching packets, which may be to alert the user, store matching packets, filter, route or otherwise process matching packets differently from other packets.
  • the context store 56 allows the retrieval of data using 5-Tuples, the update of data following retrieval, the automatic addition of 5-Tuple filters and initial data and the removal of 5-Tuple filters.
  • the BPF provides access to bytes of data within packets flowing through a network. Packets 25 may be filtered by the BPF according to configurable rules that are applied to these bytes in order to identify packets that match.
  • the standard syntax for these rules is clearly defined.
  • Standard BPF syntax provides access to byte values within the data link, network and transport layers only.
  • the syntax for byte n of the network layer of a packet for example, would be ip[n]. Or, for a range of bytes, ip[n:m] where n is the byte offset and m is the number of bytes the range will cover. Packet filtering using weak selectors, as is described in this patent application, will require access to layers of packets not originally needed in BPF and so the syntax will need to be extended to include them.
  • n is the offset from the start of the application layer.
  • n is the offset from the start of the application layer and m is the number of bytes from this offset the range will cover.
  • the compiler 42 creates suitable opcodes for accessing the fields in the application layer by generally calculating the offset of the field from the start of the packet.
  • the application layer is encapsulated in various layers of the protocol stack.
  • the compiler 42 generates pseudo instructions to jump to the correct field in the application header by techniques such as finding the size of the header of the protocol of a particular type, or accessing a field known to hold the size of the header or data portion offset for protocol of a particular type.
  • the implementation of some filtering rules may require additional information about a packet e.g. the actual checksum values for comparison with those reported in the packet. This information will be included with the packet as metadata.
  • the syntax proposed for metadata access in the extended BPF is: meta[key]
  • n is the offset of the byte of data to be saved/retrieved from context.
  • n is the offset of the byte of data to be saved/retrieved from context and m is the number of bytes from this offset the range will cover.
  • the packet processor 54 When the packet processor 54 receives a packet (packet 1 ) with an invalid TCP checksum (by checking that the protocol is TCP [ip proto 6[, and checking the declared checksum [tcp[16:2]] against the calculated checksum in the metadata [meta[transport-checksum]]), it will store the value of the difference
  • the regex engine 54a determines from the stored context 56 that the first subexpression is matched, and primes the BPF engine 54b to evaluate the second subexpression, which proceeds as before. If packet 2 found to have an invalid checksum, the difference is again stored in the context store 56.
  • the regex engine 54a determines from the stored context that the first and second subexpressions are matched, and primes the BPF engine 54a to evaluate the third subexpression, which proceeds as before. If is found to have an invalid checksum, the difference is again stored in the context store 56.
  • the three stored checksum differences are compared. If the checksums differences are the same across the three
  • a Regular Expression is a string of characters using standard syntax and operators that can describe a range of strings of characters.
  • regular expressions use meta-characters which have special meanings.
  • the strings 'grey' and 'gray' could both be described with the regular expression 'gr(a
  • the ⁇ [2:2]' in these expressions refers to the two bytes in the IP header that contain the length of the IP packet.
  • the first '2' is the byte offset from the start of the IP layer and the second is the number of bytes from this offset the range will cover.
  • the subpattern contained within the '()' parentheses says to look for IP packets where the first packet has a length of 50 and the next packet, as indicated by the '+' metacharacter, in the same flow has a length of 53. If this subpattern is seen 5 times consecutively, as indicated by ' ⁇ 5 ⁇ ', the packets will match the pattern in this example.
  • Example 2
  • Detection patterns encoded using the packet filter engine 32 architecture and syntax proposed here can be implemented in a way that will use very few OP codes and thus require minimal processing power. Further, the efficiency of this syntax will remain consistent regardless of the throughput rate of the traffic being filtered.
  • a further advantage of this architecture and syntax is that new detection patterns could be applied or modified at runtime by a program using it. To implement new detection rules without this syntax would require new code to be written and compiled for each new detection rule.
  • An example is now described of a method of implementing a packet filter using weak selectors using the novel techniques described herein in relation to the flow diagram of Figure 4.
  • Rules are encoded in BPF/Regex Syntax,
  • step 402 the modified Regex and BPF compilers convert rules into opcodes.
  • step 403 the hostware module 46 passes down machine code instructions to FPGA 32 where they are stored in FPGA accessible memory.
  • step 404 Packet 1 is received by FPGA MAC 50.
  • step 405 packet classifier 52 FPGA module identifies protocol layers in packet 1 and adds this information to the packet as metadata.
  • step 406 if no packet processors 54 are currently available, the FIFO 58 temporarily stores packet 1 until a packet processor is made available.
  • step 407 the packet processor 54 applies the BPF/Regex rules to each packet received.
  • the virtual machine checks if the full rule is matched. If the full rule is matched, the User Module 408 is alerted. If not, this information is stored in Context Store 56 in step 409 and the system waits for the next packet in the flow to arrive (packet 2). Thus, the context store holds information indicating a partial match of rule against the flow. Alternatively, if there is no match the context store is reinitialised so that matching begins at the beginning of the rule when the next packet arrives.
  • Steps 404 to 409 are repeated for packet 2.
  • step 407 the packet processor 54 applies next set of rules to packet 2.
  • step 408 the packets are matched and User Module 60 is alerted.
  • the user module 60 takes the desired action in response to finding a match. For instance, the user can be alerted that the matching packets have been found in the network, or the packets concerned can be given different treatment, e.g. dropped, or differently routed from other packets.

Abstract

The application relates to methods and apparatus for detecting patterns in data packets in a network. In an aspect, a packet filter is provided comprising an interface (50) for receiving packets of data from a network and a packet processor (54). The packet processor matches a stored rule against the packets, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern. A regular expression engine (54a) is arranged to match the regular expression pattern against a target sequence of received packets. To match an element of the regular expression pattern with a packet in the sequence, a packet filter engine (54b) is used to evaluate the packet filter expression of that element against the byes of that packet. The rule is determined to match the sequence of packets if all the elements have been matched according to the regular expression pattern. An actioning module (60) takes a predetermined action in response to the rule being determined to match the packets.

Description

Methods and Apparatus for Detecting Patterns in Data Packets in a Network
The present invention relates to a packet filter, a method of filtering packets, a compiler and a syntax, in particular arranged for detecting patterns in data packets in a network.
Various applications, such as detecting malevolent network traffic, network monitoring, packet filtering, etc. require the identification of certain packets or types of packets from within IP-based packet streams. Traditionally this has been achieved by detecting the presence of keywords or byte patterns, otherwise called 'signatures' or 'strong selectors', within the packets. These matching schemes can be implemented relatively efficiently using known hardware and software techniques. However, strong selectors like these are in many cases not sufficient for the purposes of the applications that use them, as a number of packets of potential interest cannot be detected through the use of strong selectors. For example, a packet that has been modified during transmission will have a checksum that is inconsistent with the rest of the packet. However, because the correct checksum will vary from packet to packet, it is impossible to use strong selectors to detect modified packets.
Accordingly for some applications it is desirable to use weak selectors, which examine more generic attributes of a packet or packets such as length and checksum and how these attributes change over time. These attributes can be used to detect packets of interest that might be missed by detection using strong selectors. However, implementing these techniques in a way that can cope with today's data rates is technically challenging. Small savings for each filtering operation, when multiplied by the number of packets being processed by a system, can quickly add up to a significant difference.
Furthermore, some indications of a flow of packets of interest only become apparent by looking at several packets in the flow over time. Currently the only way to monitor for such packets is to write and compile a custom program. This is likely to be complex, inefficient and incapable of being changed at runtime to implement new rules.
It is also particularly difficult to monitor encrypted packets using existing techniques. Clues may exist in the patterns seen in the flow of packets, but these are missed by packages like "Snort" which rely on hard signature matching.
Accordingly there is a need for improved systems where existing techniques are becoming increasingly inadequate. The present application aims to address the problems monitoring, filtering or processing traffic at the high speed data rates encountered in today's networks using matching techniques based around weak selectors.
According to a first aspect of the present invention, there is provided a packet filter comprising:
an interface for receiving packets of data from a network;
a packet processor arranged to match a stored rule against the packets, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern,
the packet processor comprising a regular expression engine arranged to match the regular expression pattern against a target sequence of received packets,
wherein to match an element of the regular expression pattern with a packet in the sequence, the regular expression engine is arranged to invoke a packet filter engine to evaluate the packet filter expression of that element against the byes of that packet, wherein the regular expression engine is arranged to determine that the rule matches the sequence of packets if all the elements have been matched according to the regular expression pattern;
an actioning module to take a predetermined action in response to the rule being determined to match the packets.
In this arrangement, a novel architecture is provided comprising a regular expression engine invoking packet filter engines for evaluating rules against packets that can be highly efficiently produced in hardware. The normal application of regular expressions for matching character patterns in a string of data, e.g. a search function, has been used to match patterns across a sequence of packets, where each packet in the sequence can be thought of as being equivalent to a character in the target string. Thus, instead of matching literal character elements in the regular expression against the target string, in the present arrangement, elements of the regular expression are byte matches against the packets in the sequence. These byte matches are implemented by a packet filter engine which evaluates the packet filter expressions nested within the regular expression pattern. Thus, regular expression engine moves through the packets in the sequence, invoking the packet filter engine to evaluate the elements of the pattern at each stage looking for a matching pattern. This allows the invention to implement complex matching rules in a highly efficient way which is simply not possible with existing techniques.
In particular, the packet filter can detect patterns in a sequence of packets using weak selectors as a selection criteria in a highly computationally efficient way. Prior art packet filters are limited to using strong selectors, which match on a single byte pattern only. Generic relationships with multiple possible variations between multiple byte values in packets and multiple packets in a stream cannot be captured using the syntax of strong selectors. However, many applications would benefit from the ability to match using weak selectors. The invention provides an efficient means of encoding and processing such weak selectors, and enables detection applications to cover a broader scope of detection possibilities, making them more effective.
In response to a match, the module can alert a user that a sequence of packets has matched a rule or filter, route or otherwise process the matching packets in a different way to non-matching packets. Detection patterns encoded and processed using the novel techniques proposed here can be implemented in a way that will use very few OP codes and thus require minimal processing power. Further, the efficiency of this syntax and processing architecture will remain consistent regardless of the throughput rate of the traffic being filtered. A further advantage of this architecture is that new detection patterns could be applied or modified at runtime by a user or a program using it. To implement new detection rules without this syntax would require new code to be written and compiled for each new detection rule.
The novel techniques can be used for network monitoring, intrusion detection, cyber defense, etc. Any suitable action can be taken in response to detecting a packet or packets of interest, such as processing matching packets differently, dropping, filtering or routing matching packets differently, alerting a user to a match, etc.
In embodiments, multiple rules can be stored and matched against the packets in this way.
The packet processor can be implemented in software or hardware or a combination. In a preferred embodiment, the packet processor is implemented using a Field Programmable Gate Array to implement the virtual machine or machines implementing the packet processor.
In an embodiment, the packet filter expressions and packet filter engine are based on the Berkeley Packet Filter (BPF), preferably with extensions made to BPF syntax to allow for the additional functionality described herein. In
embodiments, the regular expressions are in the form of Perl Compatible Regular Expressions.
In an embodiment, the regular expression engine comprises a state machine wherein transition to the next state depends upon an evaluation of a packet filter expression by the packet filter engine. The regular expression engine is preferably arranged as a virtual machine. The packet filter engine preferably comprises a virtual machine processing bytecodes, e.g. interpreting or using just in time compilation, derived from compiling the packet filter expression syntax. In an embodiment, the target sequence of received packets against which the regular expression pattern is matched is a stream (also interchangeably referred to as a flow) between two nodes in the network. Preferably the stream is identified by the packets' 5-tuple, i.e. by source IP address and port, destination IP address and port, and protocol fields within the packet header.
In an embodiment, the packet filter comprising a context store in communication with the packet processor, wherein in response to instructions in the rule, the packet processor is arranged to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in the context store and to retrieve the result for use in evaluating a packet filter expression against a second packet in the sequence of packets. When identifying packets using weak selectors it may be necessary to retain information from packet to packet to verify whether a match has occurred. Because of the large number of packets being processed, and potentially the large number of rules being evaluated, it is not practical for the regex engine to hold state for matching the pattern. The context store is used instead to hold state for evaluating the expression. In other words, when the packet filter matches a subexpression in the rule, this information is stored in the context store for this particular flow. When the next packet in the flow is received, the regex engine retrieves and examines this information and determines the next subexpression to match based on the state retrieved from the context store. This means that the regex engine is not tied to a particular flow and therefore is not idle in between packets in that flow arriving. Instead, by saving the state of the expression matching in a context store, the regex engine hardware can be used to evaluate packets in different flows in different states of evaluation. This makes the system more efficient and flexible.
The context store can also be used to hold values resulting from evaluation of a sub-expression that are to be used in later subexpressions to allow patterns to be matched across packets in a flow. For example, if a packet stream is to be filtered when app[3] of packet 1 is equal to app[8] of packet 2, the value of app[3] in packet 1 will need to be stored for comparison with app[8] of packet 2. For this reason the BPF syntax is in preferred embodiments extended to allow information stored here to be saved and retrieved in a context store. Preferably the results are stored in the context store according to the stream, e.g. two communicating nodes in network. This can be achieved by storing the information about the result of a match according to the packets' 5-tuple, i.e. source IP address and port and destination IP address and port and protocol. This allows bytes in one packet to be compared with bytes in another packet as part of the matching process. This is not possible with existing packet filter technology. If more than one rule is being evaluated, separate regex engines can be implemented for each rule, each engine having its own context store to be used to store state when evaluating the rules.
In an embodiment, the packet filter comprises a packet classifier module arranged to calculate metadata for the packets which is appended to the packets, wherein in response to instructions in the packet filter expression, the packet filter engine is arranged to evaluate the metadata against the packet filter expression.
Preferably, the packet is written into working memory in the packet filter and the metadata is appended to the packet in the memory.
In an embodiment, the metadata comprises one or more of an offset of one or more protocol headers in the packet, a length of one or more protocol layers in the packet, a size of one or more protocol layers in the packet, or a checksum of one or more protocol layers in the packet. Thus, information about the protocol layers in the packet, e.g. datalink, network, transport and application layers, is pre-calculated and made available to the packet filter engine when evaluating the packet filter expression. This calculation can be done efficiently by hardware. Thus, by relieving this processing burden from the packet processor, the processing can be made faster and also this allows more functionality for the packet filter.
The metadata can be appended to the packet in such a way that the metadata elements are at known positions relative to the packet. Thus, the compiler, when compiling the packet filter expression that includes a reference to a metadata element in the syntax of the rule being compiled, can include suitable opcodes for loading that element from memory at the appropriate offset into a register of the virtual machine for further processing and matching against the packet data. Thus, for instance, a packet filter expression can be written that compares the value in the checksum field of the network layer of the packet with the actual calculated checksum value appended to the packet as metadata. The compiler knows where these values are in memory in terms of an offset relative to the packet and so can load the values to perform a comparison. This allows functionality that is not possible with existing packet filters.
Preferably the metadata comprises information about the application layer allowing the packet filter expression to be evaluated against bytes in the application layer of the packets. This is something that is not possible with existing packet filter technology, where the byte values that can be matched are limited to datalink and internet layers.
In an embodiment, in response to instructions in the packet filter expression, the packet filter engine is arranged to evaluate application layer data in the packet against the packet filter expression.
In an embodiment, the packet filter comprises a compiler arranged to compile the rules into byte codes for execution in a virtual machine provided by the packet processor to implement the regular expression engine and/or the packet filter engine.
In an embodiment the compiler recognises instructions to store the results of evaluating a packet filter expression to a context store and to retrieve the results from the context store to a register or memory for use in evaluating a further packet filter expression.
In an embodiment, the packet filter comprises a console by which a new user rule can be received and compiled by the compiler, the compiled rule being supplied to the virtual machine in the packet processor whilst the packet processor is running so as to dynamically update the rules being matched against the packets at runtime. Alternatively, new rules can be implemented at run time under control of a software program. Thus, the rules being matched against packets can be changed "on the fly" by a user of the system or another program. This is a considerable technical advantage over comparative existing techniques for applying weak selector rules to packet data which necessitate writing custom programs, e.g. in C++, which are compiled into an executable file which then has to be executed before any matching can occur. This gives no possibility of changing the rules on the fly or adding new rules.
According to a second aspect of the present invention, there is provided a method of filtering packets of data from a network according to a rule, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern, the method comprising:
matching the regular expression pattern against a target sequence of received packets;
evaluating the packet filter expressions against the bytes of a packet in the sequence in order to match elements of the regular expression pattern with the packets in the sequence;
determining that the rule matches the sequence of packets if all the elements have been matched according to the regular expression pattern;
taking a predetermined action in response to the rule being determined to match the packets.
In an embodiment, the method comprises in response to instructions in the rule, storing the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
In an embodiment, the method comprises calculating metadata for the packets and appending the metadata to the packets; and in response to instructions in the packet filter expression, evaluating the metadata against the packet filter expression.
In an embodiment, the method comprises, in response to instructions in the packet filter expression, evaluating application layer data in the packet.
In an embodiment, the method comprises compiling the rules into byte codes for execution in a virtual machine. In an embodiment, the method comprises dynamically updating the rules being matched against the packets in a virtual machine at runtime.
According to a third aspect of the present invention, there is provided a compiler arranged to compile a rule whose syntax comprises packet filter expressions defining matches on packet bytes nested in a regular expression, the compiler outputting bytecodes suitable for execution in a virtual machine for evaluating the rules against packets of data in a sequence.
According to a fourth aspect of the present invention, there is provided a compiler arranged to recognise syntax in the rule in which the packet filter expression addresses metadata associated with each packet, and to output bytecodes to cause the virtual machine to load the metadata to a register for evaluation.
In an embodiment, the compiler is arranged to recognise syntax in the rule in which the packet filter expression addresses a context store, and to output bytecodes to cause the virtual machine to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
According to a further aspect of the present invention, there is provided a syntax for encoding rules for packet detection comprising packet filter expressions defining matches on packet bytes nested in a regular expression. The novel syntax is preferably based on a combination of the Berkeley Packet Filter (BPF) and Regular Expressions (Regex) with extensions to implement one or more of the novel techniques described herein. The syntax encodes rules for packet detection in a way that is flexible enough to cover a broad range of weak selector packet detection rules whilst also being simple enough to be efficiently implemented in hardware or software.
According to a further aspect of the present invention, there is provided apparatus arranged to match packets of data received on a network by processing a rule whose syntax comprises packet filter expressions defining matches on packet bytes nested in a regular expression.
It will be appreciated that any features expressed herein as being provided "in one example" or "in an embodiment" or as being "preferable" may be provided in combination with any one or more other such features together with any one or more of the aspects of the present invention.
Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which:
Figure 1 shows schematically an example of a system for detecting packets in a network according to an embodiment of the present invention;
Figure 2 shows an example of the architecture of a Field Programmable Gate Array based packet filter engine which may form part of the system of Figure 1 ;and,
Figure 3 shows an example of steps required in software to configure the packet filter engine of Figure 2; and
Figure 4 shows a flow diagram of a series of operations performed by the packet filter engine of Figure 2. Figure 1 shows schematically an example of a system 10 for detecting packets in a network 100. The system 10 includes a host computer 12 having a processor 14, memory 16, storage 18 and user input/output peripherals 20,22. The processor is operable to execute program instructions stored in the storage 18.
The system 10 comprises an interface 26 for connecting to a communication network 100, which may comprise one or more network ports or network taps for making connections to interfaces of the network 100 and receiving packets 25 of data transmitted over the network interface.
The host computer 12 also comprises a daughter card 30 which connects to a suitable interface of the host computer 12 and includes a Field Programmable Gate Array 32, which implements a packet filter engine (as described below in relation to Figure 3) for detecting packets received over the network interface 26. The processor 14 is arranged to configure the FPGA 32 with rules for detecting packets, as will be described in more details below.
Figure 2 shows in more detail the way in which the FPGA packet filter engine 50 is configured with rules for detecting packets. Weak selector detection rules 40 are provided to the system, e.g. entered by a user, encoded in a novel combined BPF and Regex syntax, as described in more detail below. The rules 40 are passed to bespoke BPF and regex compilers 42,43 that convert the rules into configuration instructions (referred to as op codes herein) 44 for the target hardware.
The BPF compiler 42 used in this invention will be based on the standard BPF compiler but is extended to implement the additional functionality describe herein. In particular, the compiler will need to access bytes in the application layer and metadata (metadata is added by the Packet Classifier - see below), as well dealing with accessing the context store. These extensions have been described elsewhere in more detail.
The Regex compiler 43 will be based on a standard Regex compiler but will initiate evaluation of BPF expressions in place of byte expressions. This allows packets to be treated like bytes are in standard Regex processors and means patterns can be detected across multiple packets in a stream.
As a result of the compilation process, opcodes 44 (also known as bytecodes or pseudo machine code are referred to interchangeably herein) are generated for processing by a virtual machine implemented in the FPGA packet filter engine 50.
The hostware module 46 of the architecture interacts directly with the target hardware, in this example a FPGA based card 30,32. The pseudo-machine code instructions generated by the compilers 42,43 are passed down by the hostware 46 to the FPGA card 30 and stored as instructions in FPGA accessible memory. These instructions are executed by the Packet Processor implemented within the FPGA 32 for each packet 25 that is received.
FPGA Architecture:
The FPGA 32 packet filter engine has a MAC 50 (media access controller) which is in communication with the network tap 26 and receives the packet data 25 from the network 100. The MAC 50 implements the layer 2 sublayer media access control (MAC) data communication protocol and interfaces the physical layer with the logical link layer. The MAC sublayer provides addressing and channel access control mechanisms that make it possible for several terminals or network nodes to communicate within a multiple access network that incorporates a shared medium, e.g. an Ethernet network.
The packet classifier 52 detects protocol layers in packets received from the MAC 50 and their respective lengths, sizes, offsets and checksums and adds this to each packet as metadata. For instance, the metadata can be included at the beginning or end of the packet data, i.e. at a fixed position relative to the packet data when stored in FPGA memory.
The metadata is accessible by the packet filter 54 in response to extended BPF syntax and usable in evaluating packet filter expressions. Inclusion of this metadata enables the modified BPF to perform the following operations that the original BPF cannot:
- Access byte values in the application layer
- Access calculated checksum for comparison with the checksum reported in the packet
The Packet Classifier 52 also retrieves context data, if appropriate, from the Context Store 56. The classifier 52 calculates the tuple for incoming packets and so can select the required context which can then be buffered into FIFO along with the associated packet. This may or may not be useful depending on the type of flows that are being inspected.
Once the metadata is appended to the packets 25, the packets are passed to a packet processor module 54 for processing. A FIFO 58 temporarily stores packets in situations where there is no Packet Processor 54 currently available. A typical example of when this would be needed is when a burst of short length packets 25 are received. The packet processor module 54 contains the Regex 54a and BPF engines 54b that apply the encoded rules to each packet received. Packets 25 can be checked against multiple rules and one packet may match multiple rules.
The BPF engine 54b creates BPF virtual machines to implement its filtering capabilities. The BPF virtual machines interpret pseudo-machine language programs according to BPF syntax, which define operations allowing it to fetch data from the packet, perform arithmetic operations on data from the packet, and compare the results against constants or against data in the packet or test bits in the results, accepting or rejecting the packet based on the results of those tests. On some platforms, just-in-time compilation is used to convert virtual machine instructions into native code in order to further avoid overhead.
The BPF engine 54b in Figure 3 accesses byte values accessible in standard BPF but has been extended to access byte values in the application layer and metadata layer (metadata is added to the packet by the Packet Classifier). The BPF engine 54b has also been extended to handle the exclusive-or (XOR) and context assignment operators. Regular expressions are a notation for describing sets of character strings. When a particular string is in the set described by a regular expression, the regular expression is said to match the string.
The regex engine 54a can be thought of as a finite state machine. The decision equates to a branch between states. Each state machine is highly optimised for the job due to the small op code set.
Normally regex matches byte patterns in data (e.g. characters in a string). In the present system, the regex engine 54a treats packets 25 in a packet sequence the same way an ordinary regex processor treats bytes in a byte sequence. The regex engine 54a acts as a manager invoking new packet filter virtual machines 54a to evaluate the packet filter subexpressions in the regular expression each time a new packet is received. The regex engine 54a proceeds through the subexpressions in the regular expression using the packet filter 54a to match each subexpression against the packets of data in the sequence. If all subexpressions are matched, the regular expression is found to match the sequence of packets 25. This means patterns can be identified across numerous packets 25 in a stream. A stream of packets (also interchangeably called a "flow" herein) is a sequence of packets 25 transmitted between two particular endpoints in the network 100 and can be identified for instance by looking at the 5-tuple of the packet data, i.e. the source and destination IP addresses and ports and the network protocol fields.
For example:
(<"ip[2:2] = 50">+<"ip[2:2] = 53">)
requires matching BPF expressions across two packets in a sequence looking for a packet having a length of 50 bytes immediately followed by a packet having a length of 53 bytes in the same flow. When a packet, Packet 1 , is received the first BPF sub-expression is checked. Thus, the Packet Processor will check the value of bytes 2 to 3 in the IP header. If these bytes values are equal to 50 this information will be stored against the packet's 5-Tuple in the Context Store (see below). The result of evaluating the first subexpression is stored because it is difficult to predict when the next packet in a flow will arrive and the large number of rules and packets processed means that it would be difficult in practice for the regex engine to hold the states of all matched subexpressions across all packets. Therefore information indicating that the first subexpression has been matched is stored in the Context Store according to the packet's 5-Tuple, so it can be retrieved when the next packet in the flow arrives.
This is a departure from an ordinary regex engine. Every flow has a unique context store 56 associated with it. When a packet 25 is received the flow is identified from the 5-tuple and the context associated with that flow is loaded. Thus the state of the machine is maintained between packets. Thus, in this example, by reading the context when the next packet in the flow (Packet 2) is received, the regex engine 54b determines that the first subexpression in the rule has been matched.
The virtual machine is then primed to check whether Packet 2 matches the next subexpression in the rule. When Packet 2 (i.e. the next packet in the stream) is received the context for this stream will be checked and if bytes 2 to 3 of Packet 2 are equal to 53 the packets will be matched against the rule and the User Module 60 will be altered. The User Module 60 then takes a predetermined action for matching packets, which may be to alert the user, store matching packets, filter, route or otherwise process matching packets differently from other packets.
The context store 56 allows the retrieval of data using 5-Tuples, the update of data following retrieval, the automatic addition of 5-Tuple filters and initial data and the removal of 5-Tuple filters.
In order to take advantage of the new functionality provided by the system in applying new rules against packet data, a new syntax is proposed for efficiently encoding the rules which will be compiled into opcodes for the virtual machines in the packet processor.
BPF Syntax
The BPF provides access to bytes of data within packets flowing through a network. Packets 25 may be filtered by the BPF according to configurable rules that are applied to these bytes in order to identify packets that match. The standard syntax for these rules is clearly defined. Standard BPF syntax provides access to byte values within the data link, network and transport layers only. The syntax for byte n of the network layer of a packet, for example, would be ip[n]. Or, for a range of bytes, ip[n:m] where n is the byte offset and m is the number of bytes the range will cover. Packet filtering using weak selectors, as is described in this patent application, will require access to layers of packets not originally needed in BPF and so the syntax will need to be extended to include them.
The syntax proposed for application layer access is:
app[n]
where n is the offset from the start of the application layer.
For a range of bytes within the application layer the proposed syntax is:
app[n:m]
where n is the offset from the start of the application layer and m is the number of bytes from this offset the range will cover.
The compiler 42 creates suitable opcodes for accessing the fields in the application layer by generally calculating the offset of the field from the start of the packet. As will be appreciated, the application layer is encapsulated in various layers of the protocol stack. The compiler 42 generates pseudo instructions to jump to the correct field in the application header by techniques such as finding the size of the header of the protocol of a particular type, or accessing a field known to hold the size of the header or data portion offset for protocol of a particular type. The implementation of some filtering rules may require additional information about a packet e.g. the actual checksum values for comparison with those reported in the packet. This information will be included with the packet as metadata. The syntax proposed for metadata access in the extended BPF is: meta[key]
Where key can be:
meta[datalink-offset]
meta[datalink-length]
meta[network-offset]
meta[network-length]
meta[network-size]
meta[network-checksum]
meta[transport-offset]
meta[transport-length]
meta[transport-size]
meta[transport-checksum]
meta[application-offset]
meta[application-length]
When identifying packets using weak selectors it may be necessary to retain information from packet to packet to verify whether a match has occurred. For example, if a packet stream is to be filtered when app[3] of packet 1 is equal to app[8] of packet 2, the value of app[3] in packet 1 will need to be stored for comparison with app[8] of packet 2. For this reason the BPF syntax will need to be extended to allow information stored here to be saved and retrieved in a context store 56.
The syntax for this is:
context[n]
where n is the offset of the byte of data to be saved/retrieved from context. When a number of bytes need to be saved/retrieved from the context store the proposed syntax for this is:
context[n,m]
where n is the offset of the byte of data to be saved/retrieved from context and m is the number of bytes from this offset the range will cover.
When values are assigned to context the proposed notation for this is: which will represent the assignment operator.
An example of when context[n,m] would be used is when a rule is filtering for bad checksums that are incorrect by the same value across multiple packets 25. Say, for example, the filter is for a sequence of three packets where both packets have a TCP checksum that is invalid by the same amount.
An example of an expression in combined BPF/Regex syntax for this rule would be:
<"ip proto 6 and (meta[transport-checksum] != tcp[16:2])
and ((meta[transport-checksum] - tcp[16:2]) := context[0:2])">+
<"ip proto 6 and (meta[transport-checksum] != tcp[16:2])
and ((meta[transport-checksum] - tcp[16:2]) := context[1 :2])">+ <"ip proto 6 and (meta[transport-checksum] != tcp[16:2])
and ((meta[transport-checksum] - tcp[16:2]) := context[2:2]) and context[0:2] = context! 1 -Ά and context! 1 -Ά = context[2:2]">
When the packet processor 54 receives a packet (packet 1 ) with an invalid TCP checksum (by checking that the protocol is TCP [ip proto 6[, and checking the declared checksum [tcp[16:2]] against the calculated checksum in the metadata [meta[transport-checksum]]), it will store the value of the difference
[(meta[transport-checksum] - tcp[16:2])] between the calculated (correct) checksum and the one reported in the packet in the context store. When the next packet (packet 2) in this flow is received, the regex engine 54a determines from the stored context 56 that the first subexpression is matched, and primes the BPF engine 54b to evaluate the second subexpression, which proceeds as before. If packet 2 found to have an invalid checksum, the difference is again stored in the context store 56. When the next packet (packet 3) in this flow is received, the regex engine 54a determines from the stored context that the first and second subexpressions are matched, and primes the BPF engine 54a to evaluate the third subexpression, which proceeds as before. If is found to have an invalid checksum, the difference is again stored in the context store 56. As an additional part of the third subexpression, the three stored checksum differences are compared. If the checksums differences are the same across the three
evaluated packets, then the packets will match the filter.
A further proposed extension to the syntax of the BPF is the inclusion of the exclusive or (XOR) operation. The proposed notation for this is:
Regex syntax
A Regular Expression is a string of characters using standard syntax and operators that can describe a range of strings of characters. As well as literal characters regular expressions use meta-characters which have special meanings. The strings 'grey' and 'gray', for example, could both be described with the regular expression 'gr(a|e)y' where the meta-character '|' means 'or' and the parentheses indicate that what is contained within them constitutes a subpattern.
There is no universal standard for regex so the meanings of meta-characters may differ slightly from application to application. To avoid ambiguity the precise syntax for regex used here will be that of Perl Compatible Regular Expressions (PCRE).
The main meta-characters used for this invention and already defined in PCRE are:
0 To indicate a subpattern.
+
To indicate one or more of the preceding pattern.
I
To indicate Or', {n}
To indicate n instances of a pattern.
Combining BPF and Regex to Encode Detection Patterns
The combination of the modified BPF syntax described above and regex syntax can be used to encode complex and sophisticated detection patterns (weak selectors) in a single syntax that can be efficiently implemented in hardware or software.
To illustrate how this syntax will be used in practice some simple examples are included below:
Example 1 :
(<"ip[2:2] = 50">+<"ip[2:2] = 53">){5}
The Ίρ[2:2]' in these expressions refers to the two bytes in the IP header that contain the length of the IP packet. The first '2' is the byte offset from the start of the IP layer and the second is the number of bytes from this offset the range will cover.
The subpattern contained within the '()' parentheses says to look for IP packets where the first packet has a length of 50 and the next packet, as indicated by the '+' metacharacter, in the same flow has a length of 53. If this subpattern is seen 5 times consecutively, as indicated by '{5}', the packets will match the pattern in this example. Example 2:
((<"app[4] Λ app[10] = app[0]">)|(<"app[3] Λ app[1 1 ] = app[1 ]">)){3} This example looks for packets where the result of an XOR operation, as indicated by the 'Λ' symbol, on byte 4 of the application layer with byte 10 of the application layer is equal to the value of byte 0 of the application layer or
(indicated by the† symbol) if the result of an XOR operation on byte 3 of the application layer with byte 1 1 of the application layer is equal to the value of byte 1 of the application layer. If 3 consecutive packets, as indicated by '{3}', in the same flow match either of these descriptions the packets will match the pattern.
The patterns described in these two examples are examples of weak selectors. In the second one, for instance, many different combinations of values for app[4], app[10] and app[0] could result in a match for this expression. Generic
relationships with multiple possible variations like these cannot be captured using the syntax of strong selectors, which match on a single byte pattern only, but many applications would benefit from the ability to do so. The novel syntax proposed in this patent application, by providing an efficient means of encoding weak selectors, enables detection applications to cover a broader scope of detection possibilities, making them more effective.
Detection patterns encoded using the packet filter engine 32 architecture and syntax proposed here can be implemented in a way that will use very few OP codes and thus require minimal processing power. Further, the efficiency of this syntax will remain consistent regardless of the throughput rate of the traffic being filtered.
A further advantage of this architecture and syntax is that new detection patterns could be applied or modified at runtime by a program using it. To implement new detection rules without this syntax would require new code to be written and compiled for each new detection rule. An example is now described of a method of implementing a packet filter using weak selectors using the novel techniques described herein in relation to the flow diagram of Figure 4. In step 401 , Rules are encoded in BPF/Regex Syntax,
e.g (<"ip[2:2] = 50">+<"ip[2:2] = 53">)
In step 402 the modified Regex and BPF compilers convert rules into opcodes. In step 403 the hostware module 46 passes down machine code instructions to FPGA 32 where they are stored in FPGA accessible memory.
In step 404 Packet 1 is received by FPGA MAC 50. In step 405 packet classifier 52 FPGA module identifies protocol layers in packet 1 and adds this information to the packet as metadata.
In step 406, if no packet processors 54 are currently available, the FIFO 58 temporarily stores packet 1 until a packet processor is made available.
In step 407, the packet processor 54 applies the BPF/Regex rules to each packet received.
Based on the context for the flow (i.e. the current position in evaluating the rule against packets in the current flow), the next subexpression in the rule is evaluated by the virtual machine:
e.g. packet 1 ip[2:2] = 50
The virtual machine checks if the full rule is matched. If the full rule is matched, the User Module 408 is alerted. If not, this information is stored in Context Store 56 in step 409 and the system waits for the next packet in the flow to arrive (packet 2). Thus, the context store holds information indicating a partial match of rule against the flow. Alternatively, if there is no match the context store is reinitialised so that matching begins at the beginning of the rule when the next packet arrives.
Steps 404 to 409 are repeated for packet 2.
In step 407, the packet processor 54 applies next set of rules to packet 2.
e.g. packet 2 ip[2:2] = 53
In step 408, the packets are matched and User Module 60 is alerted.
The user module 60 takes the desired action in response to finding a match. For instance, the user can be alerted that the matching packets have been found in the network, or the packets concerned can be given different treatment, e.g. dropped, or differently routed from other packets.
Embodiments of the present invention have been described with particular reference to the example illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the present invention.

Claims

1 . A packet filter comprising:
an interface for receiving packets of data from a network;
a packet processor arranged to match a stored rule against the packets, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern,
the packet processor comprising a regular expression engine arranged to match the regular expression pattern against a target sequence of received packets,
wherein to match an element of the regular expression pattern with a packet in the sequence, the regular expression engine is arranged to invoke a packet filter engine to evaluate the packet filter expression of that element against the byes of that packet, wherein the regular expression engine is arranged to determine that the rule matches the sequence of packets if all the elements have been matched according to the regular expression pattern;
an actioning module to take a predetermined action in response to the rule being determined to match the packets.
2. A packet filter according to claim 1 , wherein the regular expression engine comprises a state machine wherein transition to the next state depends upon an evaluation of a packet filter expression element by the packet filter engine.
3. A packet filter according to claim 1 or claim 2, wherein the target sequence of received packets against which the regular expression pattern is matched is a stream between two nodes in the network.
4. A packet filter according to any of claims 1 to 3, comprising a context store in communication with the packet processor, wherein in response to instructions in the rule, the packet processor is arranged to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in the context store and to retrieve the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
5. A packet filter according to any of claims 1 to 4, comprising a packet classifier module arranged to calculate metadata for the packets which is appended to the packets, wherein in response to instructions in the packet filter expression, the packet filter engine is arranged to evaluate the metadata against the packet filter expression.
6. A packet filter according to any of claims 1 to 5, wherein the metadata comprises an offset of one or more protocol headers in the packet, a length of one or more protocol layers in the packet, a size of one or more protocol layers in the packet, or a checksum of one or more protocol layers in the packet.
7. A packet filter according to any of claims 1 to 6, wherein in response to instructions in the packet filter expression, the packet filter engine is arranged to evaluate application layer data in the packet against the packet filter expression.
8. A packet filter according to any of claims 1 to 7, comprising a
compiler arranged to compile the rules into byte codes for execution in a virtual machine provided by the packet processor to implement the regular expression engine and/or the packet filter engine.
9. A packet filter according to claim 8, wherein the compiler
recognises instructions to store context and to retrieve context to register or memory.
10. A packet filter according to claim 7 or claim 8, comprising a console by which a new user rule can be received and compiled by the compiler, the compiled rule being supplied to the virtual machine in the packet processor whilst the packet processor is running so as to dynamically update the rules being matched against the packets at runtime. Alternatively, new rules can be implemented at run time under control of a software program.
1 1 . A method of filtering packets of data from a network according to a rule, wherein the rule comprises packet filter expressions nested as elements within a regular expression pattern, the method comprising:
matching the regular expression pattern against a target sequence of received packets;
evaluating the packet filter expressions against the bytes of a packet in the sequence in order to match elements of the regular expression pattern with the packets in the sequence;
determining that the rule matches the sequence of packets if all the elements have been matched according to the regular expression pattern;
taking a predetermined action in response to the rule being determined to match the packets.
12. A method according to claim 1 1 , comprising in response to instructions in the rule, store the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
13. A method according to any of claims 1 1 to 12, comprising calculating metadata for the packets and appending the metadata to the packets; and in response to instructions in the packet filter expression, evaluating the metadata against the packet filter expression.
14. A method according to any of claims 1 1 to 13, comprising, in response to instructions in the packet filter expression, evaluating application layer data in the packet.
15. A method according to any of claims 1 1 to 14, comprising compiling the rules into byte codes for execution in a virtual machine.
16. A method according to any of claims 1 1 to 15, comprising dynamically updating the rules being matched against the packets in a virtual machine at runtime.
17. A compiler arranged to compile a rule whose syntax comprises packet filter expressions defining matches on packet bytes nested in a regular expression, the compiler outputting bytecodes suitable for execution in a virtual machine for evaluating the rules against packets of data in a sequence.
18. A compiler arranged to recognise syntax in the rule in which the packet filter expression addresses metadata associated with each packet, and to output bytecodes to cause the virtual machine to load the metadata to a register for evaluation.
19. A compiler arranged to recognise syntax in the rule in which the packet filter expression addresses a context store, and to output bytecodes to cause the virtual machine to store the result of evaluating a packet filter expression against a first packet in the sequence of packets in a context store and retrieving the result for use in evaluating a packet filter expression against a second packet in the sequence of packets.
20. A syntax for encoding rules for packet detection comprises packet filter expressions defining matches on packet bytes nested in a regular expression.
PCT/GB2016/052919 2015-09-18 2016-09-19 Methods and apparatus for detecting patterns in data packets in a network WO2017046617A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1516541.8 2015-09-18
GB1516541.8A GB2542396A (en) 2015-09-18 2015-09-18 Methods and Apparatus for Detecting Patterns in Data Packets in a Network

Publications (1)

Publication Number Publication Date
WO2017046617A1 true WO2017046617A1 (en) 2017-03-23

Family

ID=54544442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2016/052919 WO2017046617A1 (en) 2015-09-18 2016-09-19 Methods and apparatus for detecting patterns in data packets in a network

Country Status (2)

Country Link
GB (1) GB2542396A (en)
WO (1) WO2017046617A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022267815A1 (en) * 2021-06-21 2022-12-29 中兴通讯股份有限公司 Data packet filtering method and apparatus, and electronic device and computer-readable storage medium
US11743108B1 (en) 2022-03-15 2023-08-29 Cisco Technology, Inc. Dynamic customization of network controller data path based on controller internal state awareness

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217936A1 (en) * 2007-02-02 2010-08-26 Jeff Carmichael Systems and methods for processing access control lists (acls) in network switches using regular expression matching logic
US20130132961A1 (en) * 2011-11-21 2013-05-23 David Lehavi Mapping tasks to execution threads
US20130136127A1 (en) * 2011-11-30 2013-05-30 Broadcom Corporation System and Method for Efficient Matching of Regular Expression Patterns Across Multiple Packets

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20002377A (en) * 2000-10-27 2002-04-28 Ssh Comm Security Corp A method for managing a reverse filter code
US7180895B2 (en) * 2001-12-31 2007-02-20 3Com Corporation System and method for classifying network packets with packet content
CN101296114B (en) * 2007-04-29 2011-04-20 国际商业机器公司 Parallel pattern matching method and system based on stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217936A1 (en) * 2007-02-02 2010-08-26 Jeff Carmichael Systems and methods for processing access control lists (acls) in network switches using regular expression matching logic
US20130132961A1 (en) * 2011-11-21 2013-05-23 David Lehavi Mapping tasks to execution threads
US20130136127A1 (en) * 2011-11-30 2013-05-30 Broadcom Corporation System and Method for Efficient Matching of Regular Expression Patterns Across Multiple Packets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RUSS COX: "Regular Expression Matching: the Virtual Machine Approach", 31 December 2009 (2009-12-31), XP055324539, Retrieved from the Internet <URL:https://swtch.com/~rsc/regexp/regexp2.html> [retrieved on 20161130] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022267815A1 (en) * 2021-06-21 2022-12-29 中兴通讯股份有限公司 Data packet filtering method and apparatus, and electronic device and computer-readable storage medium
US11743108B1 (en) 2022-03-15 2023-08-29 Cisco Technology, Inc. Dynamic customization of network controller data path based on controller internal state awareness
US20230300019A1 (en) * 2022-03-15 2023-09-21 Cisco Technology, Inc. Dynamic customization of network controller data path based on controller internal state awareness

Also Published As

Publication number Publication date
GB201516541D0 (en) 2015-11-04
GB2542396A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
US9392004B2 (en) Method and system for dynamic protocol decoding and analysis
US9563399B2 (en) Generating a non-deterministic finite automata (NFA) graph for regular expression patterns with advanced features
US9762544B2 (en) Reverse NFA generation and processing
US7596809B2 (en) System security approaches using multiple processing units
US9419943B2 (en) Method and apparatus for processing of finite automata
US9602532B2 (en) Method and apparatus for optimizing finite automata processing
US8681819B2 (en) Programmable multifield parser packet
US8086609B2 (en) Graph caching
KR100641279B1 (en) Method and arrangement for implementing IPSEC policy management using filter code
US8176300B2 (en) Method and apparatus for content based searching
US20140324900A1 (en) Intelligent Graph Walking
WO2009070191A1 (en) Deterministic finite automata (dfa) graph compression
US8484147B2 (en) Pattern matching
EP2215563A1 (en) Method and apparatus for traversing a deterministic finite automata (dfa) graph compression
WO2004019587A1 (en) Hardware-based packet filtering accelerator
US20120078832A1 (en) Exploitation of transition rule sharing based on short state tags to improve the storage efficiency
WO2017046617A1 (en) Methods and apparatus for detecting patterns in data packets in a network
US20160261723A1 (en) Optimized message processing
US20190306118A1 (en) Accelerating computer network policy search
KR20190028597A (en) Matching method of high speed snort rule and yara rule based on fpga
CN112994931B (en) Rule matching method and equipment
US11323372B2 (en) Flexible steering
EP3935540A1 (en) Return-oriented programming protection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16770342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16770342

Country of ref document: EP

Kind code of ref document: A1