WO2021158171A1

WO2021158171A1 - System and method for detecting and identifying individual attack-stages in internet-of-things (iot) networks

Info

Publication number: WO2021158171A1
Application number: PCT/SG2021/050037
Authority: WO
Inventors: Dinil Mon DIVAKARAN; Kushan Sudheera KALUPAHANA LIYANAGE; Rhishi Pratap Singh; Mohan GURUSAMY
Original assignee: Singtel Cyber Security (Singapore) Pte Ltd
Priority date: 2020-02-04
Filing date: 2021-01-28
Publication date: 2021-08-12

Abstract

This document describes a system and method for detecting and identifying correlated attack-stages in Internet of Things (IoT) networks. In particular, this invention relates to a system andmethod for using a computer server to detect and identifying multiple attack-stages in IoTnetworks based on alerts selectively generated by gateways of the IoT network. This is doneby extracting patterns from the alerts generated by the gateways and classifying thesepatterns into individual-attack stages using a supervised machine learning technique.

Description

SYSTEM AND METHOD FOR DETECTING AND IDENTIFYING INDIVIDUAL ATTACK- STAGES IN INTERNET-OF-THINGS (IOT) NETWORKS

Field of the Invention

This invention relates to a system and method for detecting and identifying correlated attack-stages in Internet of Things (loT) networks. In particular, this invention relates to a system and method for using a computer server to detect and identifying multiple attack- stages in loT networks based on alerts selectively generated by gateways of the loT network. This is done by extracting patterns from the alerts generated by the gateways and classifying these patterns into individual-attack stages using a supervised machine learning technique.

Summary of Prior Art

The Internet of Things (loT) market is witnessing a proliferating growth in various application domains such as healthcare, manufacturing, transportation, etc. As a result of this increased rate of adoption, loT devices are increasingly being deployed in homes, enterprises, and different industries enabling automation and digital transformations. This unprecedented growth in number and type of devices, as well as device-centric applications, makes loT devices an attractive target for malicious third parties, thereby introducing new challenges to cybersecurity and privacy providers.

Presently, existing loT devices may be compromised and may be exploited to launch large-scale attacks causing huge losses. For example, a commonly known attack that affects loT devices is the Mirai attack which simultaneously targets large numbers of vulnerable loT devices to cause disruption to external services such as PayPal, Amazon, etc. through attacks originating from comprised loT devices. In general, a cyberattack comprises of multiple stages (henceforth referred to as attack-stages) such as social engineering campaigns, reconnaissance, intrusion, malware injections, command and control (C&C) communications, and the launch of targeted attacks (e.g., TCP SYN flooding, DNS-based DDoS attacks, reflective DDoS attacks, data exfiltration, etc.).

Additionally, through the widespread use of bots in the space of providing crime-as-a- service, different attacks can be launched via the compromised bots. To deal with this rapidly- evolving threat landscape, it is important to detect and identify the different stages of large- scale attacks as early as possible. Such a solution is useful in many ways, providing early warning of malware spread, extraction of signatures, timely prediction of attack stages and quick mitigation and even determination of kill switches. Those skilled in the art have proposed security solutions that can be developed at device-level and also at the network-level. Any device-centric solution typically involves device vendors. There are way too many vendors in this vibrant yet volatile market; and it is not certain if the vendors will continue to exist during the life cycle of devices. Besides, unlike the traditional computers and mobiles that are typically attended by humans, loT devices are left unattended. Indirectly, this results in a great number of vulnerable devices being unpatched, thereby increasing the risk of these devices being exploited. A network centric solution, however, is not just complementary, but can also work independent of the device-centric solution. It also allows multiple devices to be analysed together, exploiting the potential correlation among them.

A simple approach to detect and identify attack-stages is to aggregate network traffic from different networks at a central (cloud-based) server and to search for specific patterns of attacks. But such an approach has multiple issues. The first issue would be that the extent of data that needs to be sent and processed at the centralized server will be extremely large. This can quickly deplete the computational, storage, and bandwidth resources of the loT network. Even Security Information and Event Management (SIEM) managers deployed in cloud servers would not be able to process all alerts arising from enterprises due to scalability issues. The second issue would be that as loT network traffic contains private and confidential data about its users, there exists the possibility that behavioural information of these users may be leaked even though the network traffic is encrypted. Therefore, most users would be unwilling to store device traffic in the cloud, where there exists a real risk of their information being leaked to malicious parties.

Another common approach adopted by those skilled in the art is to model the attack- stage detection problem as a Hidden Markov Model (HMM). In this approach, alerts generated by the system are taken as observable states, whereas the corresponding actual attack-stages that alerts relate to are represented by hidden states. The objective would be for the model to determine the most-likely attack type corresponding to a sequence of alerts. HMM facilitates the modelling of attack stage sequences through the transition of states, however, in modern botnets such as Mirai and Hajime, the order and timing of stages can be different from one attack to another depending on the actions of the perpetrator, botnet variants, malware mutation, etc. In the case of Advanced Persistent Threats (APT), the stages can be spanned over a prolonged time period, making it difficult to be modelled effectively thereby rendering this approach ineffective. For the above reasons, those skilled in the art are constantly striving to come up with a system and method that is capable of using spatial and temporal correlation between the selectively generated alerts to detect and identify attack-stages.

Summary of the Invention

The above and other problems are solved and an advance in the art is made by systems and methods provided by embodiments in accordance with the invention.

A first advantage of embodiments of systems and methods in accordance with the invention is that the invention correlates suspicious activities across space and time of loT networks to detect patterns and to classify the patterns into their possible attack stages.

A second advantage of embodiments of systems and methods in accordance with the invention is that the gateways of the loT network are configured to selectively send alerts that do not match the pre-generated profile of loT devices in the network thereby reducing the amount of bandwidth required and more importantly, minimizing the possibility of private information of loT devices being leaked to malicious third parties.

A third advantage of embodiments of systems and methods in accordance with the invention is that the invention reduces the amount of alerts to be analysed by a security analyst as the invention is able to filter out false alerts, leaving behind only relevant and contextual alerts to be analysed by the analysts.

The above advantages are provided by embodiments of a method in accordance with the invention operating in the following manner.

According to a first aspect of the invention, a computer server in an Internet-of-Things (loT) network for identifying individual attack-stages in network traffic of the loT network is disclosed, the computer server comprising: a security module being configured to: receive alerts selectively generated by gateways provided within the loT network; extract, using a data mining module, patterns from the received alerts, wherein the data mining module is configured to extract the patterns by processing and correlating the received alerts in a spatial and temporal manner; obtain alert-level and pattern-level features associated with the extracted patterns; train a supervised machine learning model using the obtained alert-level and pattern-level features; and identify, using the trained supervised machine learning model, individual attack stages from the received alerts. With reference to the first aspect of the invention, the selective generating of the alerts by the gateways comprises: each gateway within the loT network being configured to: create and store at least one device profile for each loT device communicatively connected to the gateway; identify, based on the profiles of each of the loT devices, anomalies in network traffic exchanged between each of the loT devices and the gateway; and generate alerts based on the identified anomalies in the network traffic exchanged between each of the loT devices and the gateway.

With reference to the first aspect of the invention, the creating and storing of the at least one device profile for each of the loT devices comprises: the gateway being configured to: process network traffic flows at each of the loT devices to generate a profile table P' for each of the loT devices where d is defined as an identifier of an loT device; store the generated profile tables in a Cuckoo Hash Table (CHT) P whereby the CHT P is indexed using a session identifier, sid, comprising source and destination Internet Protocol (IP) addresses, destination ports, protocols and directions of connections in a session.

With reference to the first aspect of the invention, the identifying the anomalies in the network traffic exchanged between each of the loT devices and the gateway comprises: the gateway being configured to: for each of the loT devices, identify a connection at the loT device as a connection anomaly, and add the connection anomaly to a Cuckoo Hash Table (CHT) Pwhen it is determined that the connection is not indexed in the device profile of the loT device whereby the CHT Pis indexed using a 5-tuple flow identifier, fid, comprising source and destination IP addresses, source and destination ports and protocols; and identify a connection at the loT device as a behavioural anomaly, and add the behavioural anomaly to a CHT Pwhen it is determined that the connection is indexed in the device profile of the loT device and has flow features that deviate from flow features defined in the device profile of the loT device.

With reference to the first aspect of the invention, the extracting of the patterns from the received alerts comprises: the data mining module being configured to: aggregate, during a pre-defined time period, the received alerts from different gateways into a first group; and extract from the first group, using a data mining technique with a first detection threshold, a first patterns of fields of alerts.

With reference to the first aspect of the invention, the extracting of the patterns from the received alerts further comprises: the data mining module being further configured to: aggregate, during a subsequent pre-defined time period, the received alerts from the different gateways into a second group; extract from the second group, using the data mining technique with a second detection threshold, subsequent patterns of fields of alerts; retrieve items associated with the extracted subsequent patterns of fields of alerts; filter out alerts in the first group that are not associated with the retrieved items; extract from the filtered first group patterns of fields of alerts, using the data mining technique with a third detection threshold, additional first patterns of fields of alerts, wherein the third detection threshold has a lower value than the first detection threshold.

With reference to the first aspect of the invention, the data mining technique comprises a Frequent Itemset Mining (FIM) technique and the first, second and third detection thresholds comprise minimum support values.

With reference to the first aspect of the invention, the obtaining the alert-level features comprises: the security module being configured to: expand the extracted patterns into their associated alerts; generate alert-level features based on the source or destination Internet Protocol (IP) addresses associated with the alerts, the direction of flow associated with the alerts and protocols associated with the alerts.

With reference to the first aspect of the invention, wherein the pattern-level features comprise: Internet Protocol (IP) and port orientations of alerts associated with the extracted patterns; average packet sizes of alerts associated with the extracted patterns; a support value used to indicate a number of alerts in each extracted pattern; source-to-destination and destination-to-source ratios of alerts associated with the extracted patterns; average number of unique ports accessed by alerts associated with the extracted patterns; unique attributes of alerts associated with the extracted patterns; and unique entities of alerts associated with the extracted patterns.

According to a second aspect of the invention, a method for identifying individual attack-stages in network traffic of an Internet-of-Things (loT) network using a computer server in the loT network is disclosed, the method comprising the steps of: receiving alerts selectively generated by gateways provided within the loT network; extracting, using a data mining module provided within the computer server, patterns from the received alerts, wherein the data mining module extracts the patterns by processing and correlating the received alerts in a spatial and temporal manner; obtaining alert-level and pattern-level features associated with the extracted patterns; training a supervised machine learning model using the obtained alert- level and pattern-level features; and identifying, using the trained supervised machine learning model, individual attack stages from the received alerts.

With reference to the second aspect of the invention, the selective generating of the alerts by the gateways comprises the steps of: for each gateway within the loT network, creating and storing at least one device profile for each loT device communicatively connected to the gateway; identifying, based on the profiles of each of the loT devices, anomalies in network traffic exchanged between each of the loT devices and the gateway; and generating alerts based on the identified anomalies in the network traffic exchanged between each of the loT devices and the gateway.

With reference to the first aspect of the invention, the creating and storing of the at least one device profile for each of the loT devices comprises: for each of the gateways, processing network traffic flows at each of the loT devices to generate a profile table F for each of the loT devices where d is defined as an identifier of an loT device; storing the generated profile tables in a Cuckoo Hash Table (CHT) F whereby the CHT F is indexed using a session identifier, sid, comprising source and destination Internet Protocol (IP) addresses, destination ports, protocols and directions of connections in a session.

With reference to the first aspect of the invention, the identifying the anomalies in the network traffic exchanged between each of the loT devices and the gateway comprises: for each of the loT devices in each of the gateways, identifying a connection at the loT device as a connection anomaly, and adding the connection anomaly to a Cuckoo Hash Table (CHT) ⁽F when it is determined that the connection is not indexed in the device profile of the loT device whereby the CHT F is indexed using a 5-tuple flow identifier, fid, comprising source and destination IP addresses, source and destination ports and protocols; and identifying a connection at the loT device as a behavioural anomaly, and adding the behavioural anomaly to a CHT F when it is determined that the connection is indexed in the device profile of the loT device and has flow features that deviate from flow features defined in the device profile of the loT device.

With reference to the first aspect of the invention, the extracting of the patterns from the received alerts comprises: aggregating, using the data mining module, during a predefined time period, the received alerts from different gateways into a first group; and extracting from the first group, using a data mining technique with a first detection threshold, a first patterns of fields of alerts.

With reference to the first aspect of the invention, the extracting of the patterns from the received alerts further comprises: aggregating, using the data mining module, during a subsequent pre-defined time period, the received alerts from the different gateways into a second group; extracting from the second group, using the data mining technique with a second detection threshold, subsequent patterns of fields of alerts; retrieving items associated with the extracted subsequent patterns of fields of alerts; filtering out alerts in the first group that are not associated with the retrieved items; extracting from the filtered first group patterns of fields of alerts, using the data mining technique with a third detection threshold, additional first patterns of fields of alerts, wherein the third detection threshold has a lower value than the first detection threshold.

With reference to the first aspect of the invention, the obtaining the alert-level features comprises: expanding the extracted patterns into their associated alerts; generating alert-level features based on the source or destination Internet Protocol (IP) addresses associated with the alerts, the direction of flow associated with the alerts and protocols associated with the alerts.

With reference to the first aspect of the invention, the pattern-level features comprise: Internet Protocol (IP) and port orientations of alerts associated with the extracted patterns; average packet sizes of alerts associated with the extracted patterns; a support value used to indicate a number of alerts in each extracted pattern; source-to-destination and destination- to-source ratios of alerts associated with the extracted patterns; average number of unique ports accessed by alerts associated with the extracted patterns; unique attributes of alerts associated with the extracted patterns; and unique entities of alerts associated with the extracted patterns.

Brief Description of the Drawings

The above and other problems are solved by features and advantages of a system and method in accordance with the present invention described in the detailed description and shown in the following drawings.

Figure 1 illustrating a distributed network architecture of the modules in an loT network in accordance with embodiments of the invention;

Figure 2 illustrating a functional block diagram of the modules in an loT network in accordance with embodiments of the invention;

Figure 3 illustrating a block diagram representative of processing systems providing embodiments in accordance with embodiments of the invention

Figure 4 illustrating exemplary alerts received by a computer server of the loT network and maximal frequent itemsets (MFI) mined from the alerts in accordance with embodiments of the invention; and Figure 5 illustrating orientation of patterns originating from a compromised loT device and patterns directed to an attack victim.

Detailed Description

This invention relates to a system and method for detecting and identifying correlated attack-stages in Internet of Things (loT) networks. In particular, this invention relates to a system and method for using a computer server to detect and identify multiple attack-stages in loT networks based on alerts selectively generated by gateways of the loT network. This is done by extracting patterns from the alerts generated by the gateways and classifying these patterns into individual-attack stages using a trained supervised machine learning model. The attack-stages that may be detected, include, but are not limited to attacks generated by malicious Botnets such as Mirai and Hajime (which each comprise multiple stages of attacks that manifest over network). Typical attacks include port and network scans, brute force login attempts, Command and Control (C&C) communications, malware loaders, launching of targeted attacks such as HTTP DDoS, reflective DNS DDoS attacks, and etc.

The present invention will now be described in detail with reference to several embodiments thereof as illustrated in the accompanying drawings. In the following description, numerous specific features are set forth in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art, that embodiments may be realised without some or all of the specific features. Such embodiments should also fall within the scope of the current invention. Further, certain process steps and/or structures in the following may not been described in detail and the reader will be referred to a corresponding citation so as to not obscure the present invention unnecessarily.

Further, one skilled in the art will recognize that many functional units in this description have been labelled as modules throughout the specification. The person skilled in the art will also recognize that a module may be implemented as circuits, logic chips or any sort of discrete component. Still further, one skilled in the art will also recognize that a module may be implemented in software which may then be executed by a variety of processors. In embodiments of the invention, a module may also comprise computer instructions or executable code that may instruct a computer processor to carry out a sequence of events based on instructions received. The choice of the implementation of the modules is left as a design choice to a person skilled in the art and does not limit the scope of this invention in any way. Figure 1 illustrates distributed network architecture 100 of an loT network comprising gateways 104a, 104b and 104c and computer server 102. One skilled in the art will recognize that gateways 104a, 104b and 104c comprise a networking module/hardware used to allow data to flow from one discrete network to another and that computer server 102 may comprise a server configured to operate as the security manager of the loT network and may reside in the cloud or a datacentre. In embodiments of the invention, server 102 may be deployed as a standalone software entity, or as an application that is integrated with a Security Information and Event Management (SIEM) engine, or as a Software Defined Networking (SDN) controller in order to actively assist in mitigating attacks in real-time.

A gateway is typically configured to reside at the perimeter of a local network and all the loT devices in the network will connect to it using one or more technologies such as Bluetooth, ZigBee, WiFi, etc. Such gateways may be integrated with local routers (such as home routers) and is typically configured to passively monitor network traffic, both incoming and outgoing, of the connected loT devices and should have reasonable computational and storage resources as of a standard mini-PC of the day.

Distributed loT network architecture 100 is arranged in a hierarchical manner whereby loT devices in each network are connected to a centralized computer server 102 through gateways hosted locally. The connections may take place via wired and/or wireless means and is left as a design choice to one skilled in the art. As illustrated, each local network comprises a plurality of loT devices connected to a Gateway, that is loT devices 106a are connected to gateway 104a, loT devices 106b are connected to gateway 104b, and loT devices 106c are connected to gateway 104c, whereby gateways 104a-c are all in turn connected to computer server 102.

In accordance with embodiments of the invention, loT traffic in each local network will then be processed locally at each of the gateways. At this stage, anomalous activities will be detected with respect to the normal profiles of the respective loT devices. This means that alerts are generated only for those traffic flows that do not match against the profile of the loT device, and subsequently, the generated alerts are sent to computer server 102. This is advantageous as not all anomalous activities and/or network traffic will be sent to a central entity thereby reducing the bandwidth required by the network and more importantly, minimizes the leakage of private information as only alerts relating to suspicious flows are sent to computer server 102 by the Gateways.

At computer server 102, a data mining technique, such as, but not limited to, Frequent Itemset Mining (FIM) is utilized to exploit the spatial and temporal characteristics of the alerts received from Gateways 104a-c and efficiently extract patterns corresponding to attack- stages. It should be noted that at this stage, knowledge of specific attack-stages is not a prerequisite, and therefore, this step has the potential to detect new malicious activities from the received alerts. In further embodiments of the invention, a sliding window-based mining algorithm that discovers missed out patterns in previous time windows based on the new patterns found in the current time window may be applied to the data mining technique to identify and detect new malicious activities in the received alerts.

The outcome of the data mining technique, which comprise of both alert-level and pattern-level information may then be provided to machine learning techniques such as, but not limited to, k-Nearest Neighbour (k-NN), Random Forest (RF), and Support Vector Machine (SVM) and etc. to classify the malicious activities into probable stages of attacks.

Figure 2 illustrates a functional block diagram of the modules in loT network 100 in accordance with embodiments of the invention whereby the sequential steps taking place in gateways 104 and computer server 102 of network 100 are depicted as steps (1)-(6).

Initially, gateways 104 will operate in its initialization stage when new loT devices are connected to its local network, e.g. when each of loT devices 106a are connected to gateway 104a. During this stage, each of gateways 104 will build profiles of loT devices connected to it and this is done at step (1). As illustrated in Figure 2, the profiles of the loT devices may be stored as a Cuckoo Flash Table (CFIT). Once this is done, each of gateways 104 will then continuously monitor the network behaviour of loT devices on its local network, extract features (step 2), and cross check with benign profiles to detect anomalies (step 3). When such anomalies are detected, gateways 104 will subsequently encapsulate meta-information of the traffic anomalies as alerts and send them to computer server 103 at step (4).

Upon receiving the alerts, the computer server 102 then processes and correlates the alerts from various gateways 104 using a data mining module provided within server 102 to extract patterns from these received alerts and this takes place at step (5). In embodiments of the invention, the patterns may be extracted from the aggregated alerts using a data mining technique known as Frequent Itemset Mining (FIM). The FIM data mining technique is configured to generate patterns based on dominant alerts found across multiple networks and time — effectively utilizing spatial and temporal correlation. Additionally, during the generation of the patterns, the data mining module may be configured to carry out a “look back” step to identify missed patterns that may be related to attack-stages while keeping patterns generated by “noise alerts” to a minimum. This is will described in greater detail in the subsequent sections. Once the patterns have been generated, a set of features that utilizes both alert-level and pattern-level information will be extracted from the generated patterns and this will be used to train supervised machine learning models. The trained supervised machine learning models are then used to classify incoming alerts into corresponding individual attack-stages in the loT network. This takes place at step (6). One skilled in the art will recognize that the supervised machine learning models may include, but are not limited to, k-Nearest Neighbour (k-NN), Random Forest (RF), and Support Vector Machine (SVM) whereby once trained, may be used to classify the alerts into probable individual stages of attacks.

In embodiments of the invention, gateways 104 and computer server 102 may comprise controller 301 and user interface 302. User interface 302 is arranged to enable manual interactions between a user and gateways 104 and/or server 102 as required and for this purpose includes the input/output components required for the user to enter instructions to provide updates to gateways 104 and/or server 102. A person skilled in the art will recognize that components of user interface 302 may vary from embodiment to embodiment but will typically include one or more of display 340, keyboard 335 and track-pad 336.

Controller 301 is in data communication with user interface 302 via bus 315 and includes memory 320, processor 305 mounted on a circuit board that processes instructions and data for performing the method of this embodiment, an operating system 306, an input/output (I/O) interface 330 for communicating with user interface 302 and a communications interface, in this embodiment in the form of a network card 350. Network card 350 may, for example, be utilized to send data from these modules via a wired or wireless network to other processing devices or to receive data via the wired or wireless network. Wireless networks that may be utilized by network card 350 include, but are not limited to, Wireless-Fidelity (Wi-Fi), Bluetooth, Near Field Communication (NFC), cellular networks, satellite networks, telecommunication networks, Wide Area Networks (WAN) and etc.

Memory 320 and operating system 306 are in data communication with CPU 305 via bus 310. The memory components include both volatile and non-volatile memory and more than one of each type of memory, including Random Access Memory (RAM) 320, Read Only Memory (ROM) 325 and a mass storage device 345, the last comprising one or more solid- state drives (SSDs). Memory 320 also includes secure storage 346 for securely storing secret keys, or private keys. One skilled in the art will recognize that the memory components described above comprise non-transitory computer-readable media and shall be taken to comprise all computer-readable media except for a transitory, propagating signal. Typically, the instructions are stored as program code in the memory components but can also be hardwired. Memory 320 may include a kernel and/or programming modules such as a software application that may be stored in either volatile or non-volatile memory.

Herein the term “processor” is used to refer generically to any device or component that can process such instructions and may include: a microprocessor, microcontroller, programmable logic device or other computational device. That is, processor 305 may be provided by any suitable logic circuitry for receiving inputs, processing them in accordance with instructions stored in memory and generating outputs (for example to the memory components or on display 340). In this embodiment, processor 305 may be a single core or multi-core processor with memory addressable space. In one example, processor 305 may be multi-core, comprising — for example — an 8 core CPU.

Profiling of loT Devices

With reference to Figure 2, during the initialization stage of gateways 104, each of these gateways will build and store profiles of loT devices connected to each of these gateways. An loT device usually serves a specific functionality; and therefore, unlike an end- user computer, an loT device uses a specific set of protocols. The set of protocols and applications a device uses are highly dependent on the device’s functionality yet, the mappings of devices to protocols and servers are deterministic in the sense that, they are not expected to change drastically over a reasonable time duration (such as days), unless a firmware/software updates are carried out. Based on the above, a device profile for an loT device may be defined as a concise representation of the network traffic characteristics of that particular loT device.

Once the device profiles for the loT devices have been generated, these device profiles are then stored in each of the gateways. In embodiments of the invention, the device profiles of the loT devices may be stored in a Cuckoo Hash Table (CHT). Hash tables are efficient data structures for storing such information as they offer fast operations. Once a profile table is built, lookup operations with the CHT may be used for real-time detection of anomalies in loT communications as it is able to guarantee a constant lookup time. For example, in a CHT with two hash functions, exactly two bucket locations may be accessed to perform a lookup operation even in the worst case. The CHT design makes this possible by compromising the time taken to perform an insert operation but as the insert operation is only required for constructing a profile, and is therefore only used offline, the overhead of an insert operation is therefore tolerable. The detailed workings of a CHT are omitted for brevity as it is known to one skilled in the art. As such, a device profile for an loT device d may be created as follows. Network traffic flows of an loT device d is first processed during a predefined interval to generate profile table P^d. A traffic flow is then identified by the common 5-tuple of source and destination IP addresses, source and destination ports and protocol, such that two flows with the same 5- tuple are separated in time by a threshold (e.g. five seconds). The 5-tuple flow identifier may then be defined as fid. Although the network traffic is processed in 5-tuple flows, one skilled in the art will recognize that the table of the device profile only stores session-level information whereby a session is defined as an aggregation of 5-tuple flows, localized in time, and having the same 4-tuple of srclP, dstIP, dstPort, Protocol. In other words, multiple connections between two end-points for the same service (e.g., HTTP) will be aggregated into one session by dropping the source port (which changes randomly with every new connection). In embodiments of the invention, incoming and outgoing connections may also be separated by using two tables or by adding another attribute in addition to the 4-tuple key — Dir, which will indicate the direction of the connections in a session.

In summary, the table of a device profile may be indexed using the session identifier sid= { srclP, dstIP, dstPort, Protocol, Dir}. In the indexed slot corresponding to a session in the CHT P^d, statistical information such as mean and standard deviation of the different features may also be stored whereby the features may comprise flow-size in packets (count), flow-size in bytes (size), etc. Exemplary profile tables of an loT smart plug and an loT camera are illustrated in Tables 1 and 2 below respectively.

TABLE 1

TABLE 2 Anomaly Detection at Gateways

Once the device profiles for the loT devices have been generated and stored in CHTs, the gateways are then configured to detect connection and behavioural anomalies occurring in the network traffic.

In embodiments of the invention, a connection to a new external destination or the use of a new application (port) not found in the profile table would be treated as a connection anomaly. Specifically, such a new connection will not have an index in the profile table P^d for the specific loT device d. In order for the gateway to detect such anomalies, for each packet corresponding to device d, the gateway will extract the values for the sid fields (srclP, dstIP, dstPort, and Protocol from the header; Dir from the direction of the first packet of the flow), and perform a "lookup” on the profile table P^d with sid as the key. If the lookup fails, the connection would then be marked as a connection anomaly. All such connection anomalies for the device d are then stored in a CHT C^d; and unlike the CHT P^d, a 5-tuple fid is used as the index in CHT C^d. These connection anomalies are used to generate alerts - whereby each alert corresponds to a flow. For completeness, the attributes of an alert may comprise srclP, dstIP, Protocol, srcPort, dstPort, Dir, ingress & egress packet counts and connection sizes (in bytes). It is useful to note that packet count and connection size attributes may form a new feature called sizeBin, which discretizes the average size of the connection. Some of the attributes such as packet count and connection size can then be used to extract behavioural information of users of the loT device.

Conversely, a connection to a new external destination or the use of a new application (port) may find a matching key in the profile table CHT P^d (implying there is no connection anomaly) and instead may comprise a statistical anomaly. For example, an attacker might have compromised a device and cause the data exfiltration service to be hosted on the same cloud platform which hosts the device’s application. Another example is that of a compromised device not sending critical readings. This category of anomalies consists of flows that already map to a valid key in the profile table CHT P^d, but yet deviates from its “normal” behaviour. To detect such statistical or behavioural anomalies, each gateway is configured to maintain a CHT B^d for an loT device d, whereby this CHT B^d includes of all the active 5-tuple flow that had at least one packet in the last t seconds (t is a configurable parameter). Once the flow completes or becomes inactive (meaning no packet was transmitted in the last t seconds), the gateway uses a z-score to compare the flow features (sizes of flow in bytes and packets as well as flow duration) to the corresponding session features in CHT P^d. If the z-score is found to be high, the flow is then deemed as a statistical anomaly, and an alert will be generated. Thus, the flows (and the corresponding anomalies) are processed immediately after they complete or become inactive, thereby making it easy to maintain the CHT B^d. Each gateway is configured to temporarily store anomalies in tables B and C (the superscripts are dropped for readability). A gateway g is then configured to process alerts stored in both tables at the end of a predefined interval (interval length can be different for different gateways). Subsequently, both the corresponding entries in the respective hash tables are cleared off.

Examples of alerts generated by the gateways are illustrated in Figure 4 where the first six fields of alerts 400 correspond to the 5-tuple fid and direction Dir, while the last column of alerts 400 set out the size of the flow in discrete categories.

Pattern Extraction

The alerts are then communicated via, secure web sockets from each of the gateways, to computer server 102. In embodiments of the invention, server 102 is configured to receive alerts from a gateway at every interval of length L_g, where g defines a gateway (the interval can be different for different gateways).

It is useful to note at this stage that not all the alerts received at server 102 are caused by attack stages. This is because most of the gateways are configured to generate alerts when deviation in normal network activity is captured. In other words, there exists the possibility that the gateways may generate alerts for some benign transactions. This can happen for a number of reasons including network errors and false positives resulting from changes in behaviour of the loT devices (e.g. due to firmware updates), applications or even users. Alerts that are not related to attack-stages are referred to in this document as false alerts or noise.

In general, the number of false alerts received by server 102 are expected to be low in number, and should not occur frequently across multiple networks (no spatial correlation). Unlike false alert, alerts relating to actual attacks would be persistent across both time and space (networks). Therefore, server 102 exploits this persistency to mine for attack patterns by automatically filtering out false alerts or noises.

In embodiments of the invention, a data mining module provided within server 102 is configured to utilize the data mining technique FIM to extract recurring patterns across a given set of alerts. In FIM, each field of an alert is identified as an item, and a set of k items is called a k-itemset, where k also represents the length of the itemset.

For example, as illustrated in Figure 4, each alert comprises seven items, whereas, the initial five mined itemsets (in extracted itemsets 410) have five items (k = 5) and the last itemset (in extracted itemsets 410) has six items (k = 6). Flence, given a list of n alerts, an itemset (i.e. a pattern) is called a frequent itemset, if it appears in at least q x n alerts, where θ: θ ≤ θ ≤ 1 , is defined as the “ minimum support, i.e. the detection threshold. Therefore, the primary function of FIM is to mine frequent itemsets in an alert database. It should be noted that the term “itemset” and “pattern" may be used interchangeably in the FIM data mining technique.

Various algorithms that exist in the art that may be used to execute the FIM technique include, but are not limited to, Apriori, Charm, and FPMax. The complete Frequent Itemsets (FI), which is also called the lattice, is able to generate all the possible patterns that exist in a transaction database, and may be mined using a fundamental FIM algorithm such as Apriori which uses a bottom-up approach, where the generation of the lattice starts from length 1 and proceeds to expand to higher lengths under the minimum support constraint. Though the complete lattice provides a comprehensive overview of all the patterns, the number of patterns can be very high, and more importantly, lower length itemsets are usually subsets of higher length itemsets, and thus, redundant. For instance, in the case of lattice, there would be redundant lower length patterns such as «^*, 10.6.2.7, ^*, ^*, 22,^*, Small» and «^*, ^*, TCP, ^*, 22, in, ^* » (where ^* is a wildcard — a collection of multiple values) in addition to the higher length pattern « ^*, 10.6.2.7, TCP, ^*, 22, In, Small », among the mined patterns listed in extracted itemsets 410.

Moreover, the complexity associated with the generation of the lattice can be high, and may increase to 0(h x 2^{η _1}), where h is the total number of itemsets. As an alternative, subsets of the lattice may be generated in the form of Closed Frequent Itemset (CFI) and Maximal Frequent Itemset (MFI), where the itemsets in the former do not have supersets with the same support, while the itemsets in the latter do not have supersets which are frequent. CFIs can be mined using algorithms such as Charm, whereas FPMax can be used to mine MFIs. Both CFI and MFI have significantly lesser number of itemsets than the lattice, while MFI is itself a subset of CFI. In terms of information contained therein, the patterns in the MFI possess much more information as they are generally of higher length while having a lower number of patterns. Moreover, in general, noise alerts such as random scans may also form lower length patterns, whereas alerts relating to attacks tend to be more consistent and as such, has a tendency to form higher length itemsets.

This is illustrated in Figure 4 which shows alert database 400 comprising incoming alerts that were mined for MFI where the results are shown as extracted itemsets 410. As shown, the alerts relating to attack-stages, i.e. alerts 402, tend to form meaningful, higher length patterns when mined for MFI while false alerts, i.e. alerts 404, tend to not exhibit multiple common items and thus, are not extracted as patterns with MFI. For example, when mined with a minimum support count of 2, alerts #1 and #2 (in alert database 400) form a pattern «^*, 10.6.2.7, TCP, ^*, 22, In, Small», and it can be read as 10.6.2.7 is scanned by multiple sources over SSH port (22). Similarly, alerts #5 and #6 form a pattern comprising «^*, cnc.com, TCP, ^*, 48000, Out, Medium», and it can be interpreted as multiple loT devices attempting to connect with “cnc.com”. However, in the case of false alerts, i.e. alerts 404, a clear pattern is not visible when mined. As can be seen, Protocol and Dir are the only frequently occurring items; but this itemset is of a very small length and will be a subset of higher length itemsets. Moreover, when the value of the minimum support θ is increased, minimally correlated false alerts that have the potential to be patterns will be filtered out.

In order for coordinated attacks to be detected as early as possible, the aggregation of alerts and attack detection are divided into multiple time-windows, rather than aggregating alerts over a long duration and carrying out detection in a passive and network forensic manner. When the attack detection is executed across multiple time-windows, it enables server 102 to take temporal correlation into account as well. The length of the time-window will influence the amount of time required to detect attacks and as such, is kept as a configurable and/or optimizable parameter. As mentioned in the description above, selecting an appropriate value for the minimum support i θs important as it allows noise to be removed while ensuring that attack patterns are extracted from the received alerts. If the value for the minimum support θ is set to be too low, i.e. when the detection threshold is set too low, false alerts may be extracted as patterns (false positives); and if it is set to a value that is too high, some of the attack-patterns, which are not too frequent such as patterns related to malware loader stage, may not be extracted.

To address this problem, a detection algorithm (which is set out below as Algorithm 1) is utilized by server 102.

Algorithm 1 Pattern search at time-slot τ with look-back

In operation, Algorithm 1 dynamically adapts the minimum support θ depending on the intensity of the attack across various intervals or time-slots, while minimizing the number of false positives. This implies that the value of minimum support θ varies from one time-slot to the next. The algorithm is based on two main aspects which is the careful handling of minimum support θ to minimize the mining of false alerts, and the scrutiny of temporal correlation in attacks to search for less frequent patterns. In order to balance the trade-off in minimum support θ, the algorithm starts by initializing the minimum support θ to its upper bound (line 2), and goes down iteratively until any pattern is detected (line 4). To detect any other correlated attack-stages, the algorithm then explores more patterns with a lower minimum support θ. The challenge here is that, when the minimum support θ is lowered, this will result in the extraction of false alerts as patterns, as discussed above. To address this issue, the alert database is filtered based on the items of the detected patterns in the current time-slot (line 8). In embodiments of the invention, the items of the detected patterns may include, but are not limited to, srclP associated with the alert, dstIP associated with the alert, the Protocol associated with the alert, the srcPort associated with the alert, dstPort associated with the alert, Dir associated with the alert and the sizeBin associated with the alert. These exemplary items are illustrated in table 410 of Figure 4. In other words, the alert database of previous time-slots is initially pruned by filtering out alerts that do not have one or more of the items found in the detected patterns in the current time- slot. For example, alerts of the previous time-slot that do not have similar IP addresses of the detected patterns of the current time-slot will be filtered out.

Note that this step is advantageous as it makes the mining step more computationally effective. This conditional mining (line 9) is carried out on pruned alerts of not just the current timeslots, but also for previous timeslots Tw (line 5-12), thereby considering correlation across time. In other words, the number of time-slots “looked back” by the algorithm is defined by the Tw parameter and the value accorded to this parameter may be defined by one skilled in the art as required.

Essentially, the algorithm does a “look-back in time” (i.e. over previous time-slots) for more patterns conditioned on the patterns (i.e. based on the items associated with the patterns) detected in the current time-slot. This mechanism enables server 102 to detect even less frequently occurring attack-stages such as malware loader, which usually gets mixed with noise due to its limited occurrence. Moreover, to avoid the minimum support be θing lowered to a very small value (and thereby increase the false alert patterns), it is increased by a constant (line 13) when shifted through to the next time-slot

Classification Model

Once the aggregated alerts have been mined for patterns as described above, the final step is to attribute these patterns to the respective attack stages. Unfortunately, as the number of patterns (data samples) are typically quite low, (especially due to the mining of MFIs) it is difficult to train an effective machine learning model to perform attack-stage classification at the pattern-level. Therefore, the extracted patterns are subsequently expanded by server 102 into their associated original raw alerts these alerts are then used by a pre-trained classifier to classify the attack stages. In embodiments of the invention, the raw alerts and their associative patterns may be used to extract alert-level and pattern-level features. Algorithm 2 Classification steps

Alert-Level Features

The alert-level features may be divided into the following features.

1. Source and Destination IP Addresses: IP addresses on their own do not provide useful information to a trained model when the model is applied in a different domain. Hence, server 102 converts these IP addresses into categorical features indicating whether an IP address is internal or external. If neither of the IP addresses in an alert belong to the loT network, it is likely that the affected IP address may be linked to a spoofing situation, and therefore, possibly related to reflective type attacks.

2. Direction: This is a binary feature representing the direction of the flow. It is used to differentiate attacks coming to the loT network (e.g., dictionary attacks) and attacks originating from the loT network (e.g., C&C activities).

3. Protocol: This is again a categorical feature and represents the underlying protocol used in the flow, primarily UDP or TCP. It can be useful in differentiating UDP- based volumetric attacks from TCP-based flooding attacks.

Pattern-Level Features

In general, a pattern comprises a collection of alerts and contains additional information and may be categorized as follows:

1. IP and Port orientations: These are categorical features and represent the relationship between the source and the destination in terms of connection size. There are four levels. When a single source entity 502 (IP address or port number) connects with multiple destination entities 501 , it is defined as a source oriented connection as shown in Figure 5. In contrast, when multiple source entities 524 connect with a single destination entity 522, it is defined as a destination oriented connection as shown in Figure 5. The two other orientations are the “one to one” connection and the “many to many” connections as in bipartite graph relationships. These features play a crucial role in capturing spatial dispersion and identifying attack-stages such as scans, login attempts, C&C communications, DDoS attacks. For instance, in a scan out phase, a compromised loT will scan multiple other loT devices in the same network and on the Internet, and thus, a source-oriented pattern will emerge. In contrast, during a login stage, multiple attempts will be made on the telnet/ SSH ports by multiple random source ports generating a destination- oriented pattern. Average packet size (inward and outward): This feature is developed by dividing the total connection size in the respective pattern by the corresponding total packet count. The size bin may be expanded into original packet counts and sizes. If the connection size and packet count are used as separate features, their values will depend on the number of alerts that formed the pattern and can vary according to network size and attack intensity resulting in overfitting of the model. Therefore, in order to minimize these effects, the total connection size is normalized by dividing it with the total packet count.

It is useful to note that the average packet size is an indicator of size of attack- stage and very useful in isolating attack-stages such as malware upload. Moreover, it is useful in differentiating volumetric attacks such as DDoS from scans, in which the former exhibits a higher average packet size, while packet sizes of the latter tend to be small. Support: This is a numerical feature and indicates the number of alerts in the respective pattern. When there is a surge of similar alerts in a limited time span as in the case of DDoS attack, it may generate dominant patterns with a high support value. On the other hand, during malware upload stage, the number of alerts generated by the gateway is usually a very few, and thus, has a lower support. Flaving a lower support in fact is a challenge at the FIM phase, because a higher minimum support might filter out the loader phase. Source to Destination and Destination to Source ratios: These are also numerical features and the former indicates the average number of destinations a source node connects to, while the latter represents the average number of source nodes that connect to a single destination, in the corresponding pattern. In the scan in phase, a bot generally scans multiple loT devices and thus, the value of the former will be high in the corresponding patterns. In attack-stages such as DDoS, as multiple compromised loT devices attacks a single victim node, the value of latter will be high.

5. Port per IP: This feature captures the average number of ports a source IP address interacts with. It is useful in identifying the port scan stage, as in this stage, bots generally scan a wide range of ports for vulnerabilities. In “scan in” and “login” stages, which are also towards the loT network, the interaction is with a particular port number.

6. Unique attributes in pattern: A pattern can be formed with unique attribute values and also with wild cards. For instance, a scan in pattern with 5-tuples would look like «^*, IP, TCP, ^*, 22, In, Small», where both source IP address and port number are wildcards. For port scan, a pattern similar to «^*, IP, TCP, ^*, ^*, In, Small» can be expected. In the above two instances, the number of unique attributes are five and four, respectively, and can be a useful indicator to differentiation two corresponding stages.

7. Unique entities in wild cards: As mentioned above, a pattern may consist of definite values and wildcards. Again, a wild card is composed of multiple unique entities from multiple alerts and it provides useful insights in differentiating attack-stages. For instance, a port scan pattern may consist of multiple unique destination ports. Similarly, scan in, scan out, and DDoS attacks may form wildcards in the source IP address with multiple unique IP addresses, yet different in values.

Numerous other changes, substitutions, variations and modifications may be ascertained by the skilled in the art and it is intended that the present invention encompass all such changes, substitutions, variations and modifications as falling within the scope of the appended claims.

Claims

CLAIMS:

1 . A computer server in an Internet-of-Things (loT) network for identifying individual attack- stages in network traffic of the loT network, the computer server comprising: a security module being configured to: receive alerts selectively generated by gateways provided within the loT network; extract, using a data mining module, patterns from the received alerts, wherein the data mining module is configured to extract the patterns by processing and correlating the received alerts in a spatial and temporal manner; obtain alert-level and pattern-level features associated with the extracted patterns; train a supervised machine learning model using the obtained alert-level and pattern- level features; and identify, using the trained supervised machine learning model, individual attack stages from the received alerts.

2. The computer server according to claim 1 wherein the selective generating of the alerts by the gateways comprises: each gateway within the loT network being configured to: create and store at least one device profile for each loT device communicatively connected to the gateway; identify, based on the profiles of each of the loT devices, anomalies in network traffic exchanged between each of the loT devices and the gateway; and generate alerts based on the identified anomalies in the network traffic exchanged between each of the loT devices and the gateway.

3. The computer server according to claim 2 wherein the creating and storing of the at least one device profile for each of the loT devices comprises: the gateway being configured to: process network traffic flows at each of the loT devices to generate a profile table P' for each of the loT devices where d is defined as an identifier of an loT device; store the generated profile tables in a Cuckoo Hash Table (CHT) P^d whereby the CHT P' is indexed using a session identifier, sid, comprising source and destination Internet Protocol (IP) addresses, destination ports, protocols and directions of connections in a session

4. The computer server according to any one of claims 2 or 3 wherein the identifying the anomalies in the network traffic exchanged between each of the loT devices and the gateway comprises: the gateway being configured to: for each of the loT devices, identify a connection at the loT device as a connection anomaly, and add the connection anomaly to a Cuckoo Hash Table (CHT) G when it is determined that the connection is not indexed in the device profile of the loT device whereby the CHT G is indexed using a 5-tuple flow identifier, fid, comprising source and destination IP addresses, source and destination ports and protocols; and identify a connection at the loT device as a behavioural anomaly, and add the behavioural anomaly to a CHT G when it is determined that the connection is indexed in the device profile of the loT device and has flow features that deviate from flow features defined in the device profile of the loT device.

5. The computer server according to claim 1 wherein the extracting of the patterns from the received alerts comprises: the data mining module being configured to: aggregate, during a pre-defined time period, the received alerts from different gateways into a first group; and extract from the first group, using a data mining technique with a first detection threshold, a first patterns of fields of alerts.

6. The computer server according to claim 5 wherein the extracting of the patterns from the received alerts further comprises: the data mining module being further configured to: aggregate, during a subsequent pre-defined time period, the received alerts from the different gateways into a second group; extract from the second group, using the data mining technique with a second detection threshold, subsequent patterns of fields of alerts; retrieve items associated with the extracted subsequent patterns of fields of alerts; filter out alerts in the first group that are not associated with the retrieved items; extract from the filtered first group patterns of fields of alerts, using the data mining technique with a third detection threshold, additional first patterns of fields of alerts, wherein the third detection threshold has a lower value than the first detection threshold.

7. The computer server according to any one of claims 5 or 6 whereby the data mining technique comprises a Frequent Itemset Mining (FIM) technique and the first, second and third detection thresholds comprise minimum support values.

8. The computer server according to claim 1 wherein the obtaining the alert-level features comprises: the security module being configured to: expand the extracted patterns into their associated alerts; generate alert-level features based on the source or destination Internet Protocol (IP) addresses associated with the alerts, the direction of flow associated with the alerts and protocols associated with the alerts.

9. The computer server according to claim 1 wherein the pattern-level features comprise:

Internet Protocol (IP) and port orientations of alerts associated with the extracted patterns; average packet sizes of alerts associated with the extracted patterns; a support value used to indicate a number of alerts in each extracted pattern; source-to-destination and destination-to-source ratios of alerts associated with the extracted patterns; average number of unique ports accessed by alerts associated with the extracted patterns; unique attributes of alerts associated with the extracted patterns; and unique entities of alerts associated with the extracted patterns.

10. A method for identifying individual attack-stages in network traffic of an Internet-of-Things (loT) network using a computer server in the loT network comprises: receiving alerts selectively generated by gateways provided within the loT network; extracting, using a data mining module provided within the computer server, patterns from the received alerts, wherein the data mining module extracts the patterns by processing and correlating the received alerts in a spatial and temporal manner; obtaining alert-level and pattern-level features associated with the extracted patterns; training a supervised machine learning model using the obtained alert-level and pattern-level features; and identifying, using the trained supervised machine learning model, individual attack stages from the received alerts.

11. The method according to claim 10 wherein the selective generating of the alerts by the gateways comprises the steps of: for each gateway within the loT network, creating and storing at least one device profile for each loT device communicatively connected to the gateway; identifying, based on the profiles of each of the loT devices, anomalies in network traffic exchanged between each of the loT devices and the gateway; and generating alerts based on the identified anomalies in the network traffic exchanged between each of the loT devices and the gateway.

12. The method according to claim 11 wherein the creating and storing of the at least one device profile for each of the loT devices comprises: for each of the gateways, processing network traffic flows at each of the loT devices to generate a profile table P for each of the loT devices where d is defined as an identifier of an loT device; storing the generated profile tables in a Cuckoo Hash Table (CHT) P whereby the CHT P is indexed using a session identifier, sid, comprising source and destination Internet Protocol (IP) addresses, destination ports, protocols and directions of connections in a session

13. The method according to any one of claims 11 or 12 wherein the identifying the anomalies in the network traffic exchanged between each of the loT devices and the gateway comprises: for each of the loT devices in each of the gateways, identifying a connection at the loT device as a connection anomaly, and adding the connection anomaly to a Cuckoo Hash Table (CHT) C^when it is determined that the connection is not indexed in the device profile of the loT device whereby the CHT P is indexed using a 5-tuple flow identifier, fid, comprising source and destination IP addresses, source and destination ports and protocols; and identifying a connection at the loT device as a behavioural anomaly, and adding the behavioural anomaly to a CHT P when it is determined that the connection is indexed in the device profile of the loT device and has flow features that deviate from flow features defined in the device profile of the loT device.

14. The method according to claim 10 wherein the extracting of the patterns from the received alerts comprises: aggregating, using the data mining module, during a pre-defined time period, the received alerts from different gateways into a first group; and extracting from the first group, using a data mining technique with a first detection threshold, a first patterns of fields of alerts.

15. The method according to claim 14 wherein the extracting of the patterns from the received alerts further comprises: aggregating, using the data mining module, during a subsequent pre-defined time period, the received alerts from the different gateways into a second group; extracting from the second group, using the data mining technique with a second detection threshold, subsequent patterns of fields of alerts; retrieving items associated with the extracted subsequent patterns of fields of alerts; filtering out alerts in the first group that are not associated with the retrieved items; extracting from the filtered first group patterns of fields of alerts, using the data mining technique with a third detection threshold, additional first patterns of fields of alerts, wherein the third detection threshold has a lower value than the first detection threshold.

16. The method according to any one of claims 14 or 15 whereby the data mining technique comprises a Frequent Itemset Mining (FIM) technique and the first, second and third detection thresholds comprise minimum support values.

17. The method according to claim 10 wherein the obtaining the alert-level features comprises: expanding the extracted patterns into their associated alerts; generating alert-level features based on the source or destination Internet Protocol (IP) addresses associated with the alerts, the direction of flow associated with the alerts and protocols associated with the alerts.

18. The method according to claim 10 wherein the pattern-level features comprise: