WO2020022953A1

WO2020022953A1 - System and method for identifying an internet of things (iot) device based on a distributed fingerprinting solution

Info

Publication number: WO2020022953A1
Application number: PCT/SG2018/050373
Authority: WO
Inventors: Vijayanand THANGAVELU; Dinil Mon DIVAKARAN; Mohan GURUSAMY
Original assignee: Singapore Telecommunications Limited
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2020-01-30

Abstract

This document describes a system and method for identifying Internet of Things (IoT) devices communicatively connected to a plurality of gateways. The gateways are in turn all connected to a controller module which is configured to train and maintain classifier models for fingerprinting the IoT devices. The gateways themselves are configured to utilize the updated classifier models generated by the controller module to classify IoT devices connected to them.

Description

SYSTEM AND METHOD FOR IDENTIFYING AN INTERNET OF THINGS (IOT) DEVICE BASED ON A DISTRIBUTED FINGERPRINTING SOLUTION

Field of the Invention

This invention relates to a system and method for identifying Internet of Things (loT) devices communicatively connected to a plurality of gateways. The gateways are in turn all connected to a controller module which is configured to train and maintain classifier models for fingerprinting the loT devices. The gateways themselves are configured to utilize the updated classifier models generated by the controller module to classify loT devices connected to them.

Summary of Prior Art

The security and well-being of personal computers (PCs) and their related networks have become quintessential to the normal operation of everyday businesses. Security concerns associated with loT devices differ from the concerns and issues faced by PC users. A primary factor is the scale of its implementation as loT devices are expected to grow by a few tens of thousands year on year. In addition to the above, the loT market is much more heterogeneous as loT devices are heavily reliant on applications and protocols to communicative with networks.

These factors raise multiple issues. As new vulnerabilities get discovered, it is likely that many of these vulnerabilities will be left unpatched and hence open to exploitation by malicious parties. Even in the PC market, where there is often an administrator who will be tasked to maintain the computer(s) in a network, it is not uncommon that most of the vulnerabilities will remain unattended to. This trend will be worse for loT devices, as most of such devices will not have a dedicated administrator. When such vulnerabilities remain unpatched, this allows for the loT devices to be easily exploited by attackers. This may in turn result in private information being compromised, large-scale attacks being launched on networks, failure of critical network infrastructures and so on. loTs raise not only security concerns, but also privacy risks. It has been demonstrated how sensitive private information can be inferred by analysing network traffic from smart homes, even when the device traffic is encrypted. It has also been shown that user activities can be inferred from traffic rates of a few loT devices.

l Before an loT device may be patched or before its network vulnerabilities may be remedied to address the issues raised above, the loT device has to be first identified or fingerprinted. The loT device’s unique characteristics are usually used to form the device’s fingerprint which is then in turn utilized to identify and distinguish the loT device from other devices. Once a vulnerability of a device type (for example camera of vendor X and model Y) is known, security patches or mitigation solutions can then be applied on these types of loT devices. In addition, device identification helps an organization to continuously maintain its asset list, to quarantine and isolate misbehaving or vulnerable devices.

It is useful at this stage to define fingerprinting as a process that is performed to identify specific characteristics associated with an loT device which in turn can be used identify the loT device from other types of devices. For example, network traffic can be used to fingerprint entities such as wireless devices (e.g., access points), operating systems, applications and etc. Depending on the goal, the features extracted from the network traffic will differ from one device type to another. For example, a sequence of inter-arrival times between packets flowing through an access point (AP) may be used as features for fingerprinting APs whereas features extracted from TCP/IP (such as IP TTL and TCP header values), TLS and HTTP protocols, may be used as features for passive OS fingerprinting in obfuscation strategies. Passive fingerprinting refers to the approach in which traffic is only monitored by the solution as opposed to active approaches that send probes into the network for fingerprinting purposes.

As fingerprints of devices are unique to a device’s type, this means that they can be used for authentication purposes. Motivated by this objective, it has been proposed by those skilled in the art that a multivariate Gaussian distribution be used to model a device’s fingerprint whereby a supervised approach is used to learn the model. While this is an interesting proposal, due to lack of experimentation using real loT devices, it is too early to comment on the effectiveness of such an approach. Further, it would be challenging to extend the model to identify unknown devices.

Another approach that was proposed by those skilled in the art involves the use of a meta-classifier that has been trained using supervised machine learning. This approach was developed with the aim of identifying loT devices in two stages. In the first stage, loT devices are differentiated from other devices; and in the second stage, identification of loT devices are carried out. Besides extracting features from the network, transport and application layers, features are also extracted from data gathered from external sources (e.g. websites and geo-location of IP addresses). In another related work, a system for automatic identification of loT devices and security enforcement of loT devices was presented whereby 23 features were extracted from a fixed number of packets for identification purposes. A twofold classification method is then applied. For the first step, a supervised binary classifier is trained for each device. If the classifier generates positive results for a device, then an edit distance based comparison is made in the second step to decide on the final device type. Each classifier is trained using ‘one v/s rest’ data. However, such a system does not use important features from protocols such as Domain Name System (DNS), transport layer security (TLS), Hypertext Transfer Protocol (HTTP), Simple Service Discovery Protocol (SSDP), Session Traversal Utilities for Network Address Translator (STUN), etc. As a result, the classification results show that more than a third of the devices had a classification accuracy of only around 50%. It should be noted that both the above mentioned solutions utilize a centralized and supervised learning approach, which neither scales well nor successfully identifies new device types efficiently and effectively.

For the above reasons, those skilled in the art are constantly striving to come up with a system and method that is able to accurately and efficiently identify existing loT devices and newly added loT devices. The system and method also has to be easily scalable as it would have to handle large numbers of loT devices.

Summary of the Invention

The above and other problems are solved and an advance in the art is made by systems and methods provided by embodiments in accordance with the invention. A first advantage of embodiments of systems and methods in accordance with the invention is that known loT devices are identified locally by a gateway while new loT devices are classified by a separate controller module. Once the new loT devices have been classified, an updated classifier module containing the new loT device’s classification will be communicated to the gateway to update the classified model at the gateway.

A second advantage of embodiments of systems and methods in accordance with the invention is that loT devices are able to be accurately classified by the gateway as the classification of the loT devices are performed based on the traffic sessions generated by each loT device.

A third advantage of embodiments of systems and methods in accordance with the invention is that only a minimal number of seed devices are required to train the classifier model at the controller. Once the classifier model has been trained, it is able to accurately form clusters for the seed devices and other newly added devices. The above advantages are provided by embodiments of a method in accordance with the invention operating in the following manner.

In accordance with a first aspect of the invention, a system for identifying an Internet of Things (loT) device communicatively connected to a gateway that is communicatively connected to a controller module is disclosed, the system comprising the gateway being configured to: classify the loT device, using a classifier model, based on traffic sessions collected from the loT device; generate a feature aggregate (collection of feature vectors) from the collected traffic sessions and communicate the feature aggregate to the controller module when it is determined that a classification of the loT device is not contained in the gateway; the controller module being configured to: cluster, using semi-supervised machine learning algorithms, the received feature aggregate and other feature aggregates provided at the controller into groups; update a set of labelled clusters based on the groups of feature aggregates; train a classifier model using supervised machine learning algorithms and the updated set of labelled clusters; communicate the trained classifier model to the gateway whereby upon receiving the trained classifier model, the gateway updates the classifier model at the gateway with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device.

With reference to the first aspect of the invention, each of the traffic sessions comprises an aggregation of traffic connections localized in fixed-size time intervals.

With reference to the first aspect of the invention, the feature vector comprises features selected from a Domain Name System (DNS) protocol, a Multicast DNS protocol, session statistics, a transport layer security (TLS) protocol, a Hypertext Transfer Protocol (HTTP), a Simple Service Discovery Protocol (SSDP), a Quick User Datagram Internet Connections (QUIC) protocol, a Message Queuing Telemetry Transport (MQTT) protocol, a Session Traversal Utilities for Network Address Translator (STUN) protocol, a Network Time Protocol (NTP) and a Bootstrap Protocol (BOOTP).

With reference to the first aspect of the invention, the determination of the classification of the loT device is performed using a supervised classification module provided within the existing classifier model at the gateway.

With reference to the first aspect of the invention, the semi-supervised machine learning algorithms used to cluster the received feature aggregate and the other feature aggregates at the controller comprises modified K-means clustering algorithms whereby labels of known data are used to estimate mean and standard deviation of inter-cluster distances. With reference to the first aspect of the invention, the supervised machine learning algorithms used to train the classifier model by the controller module comprises a Random Forests algorithm, a k-nearest neighbour algorithm, or a Gaussian and Bernoulli Naive Bayes algorithm.

With reference to the first aspect of the invention, the system further comprises another gateway being configured to: receive the trained classifier model from the controller; update an existing classifier model provided at the another gateway using the received trained classifier model; collect traffic sessions from another loT device and classifying the another loT device with the identity of the loT device when it is determined that the collected traffic sessions from the another loT device matches the traffic sessions of the loT device.

With reference to the first aspect of the invention, the system further comprises another gateway being configured to: collect traffic sessions from another loT device and classifying the another loT device based on the collected traffic sessions; generate another feature aggregate from the traffic sessions collected from the another loT device and communicate the another feature aggregate to the controller module when it is determined that a classification of the another loT device is not contained in the another gateway; the controller module being configured to: cluster into groups, using semi-supervised machine learning algorithms, the another received feature vector, the received feature vector of the loT device and other feature vectors provided at the controller; update the set of labelled clusters based on the groups of feature aggregates; train the classifier model using supervised machine learning algorithms and the updated set of labelled clusters; communicate the trained classifier model to the gateway and the another gateway whereby upon receiving the trained classifier model, the gateway and the another gateway updates the existing classifier model with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device and the previously another unclassified loT device.

In accordance with a second aspect of the invention, a method for identifying an Internet of Things (loT) device communicatively connected to a gateway that is communicatively connected to a controller module is disclosed, the method comprising: classifying, using a classifier model in the gateway, the loT device, based on traffic sessions collected from the loT device and generating a feature aggregate from the collected traffic sessions and communicate the feature aggregate to the controller module when it is determined that a classification of the loT device is not contained in the gateway; clustering, using semi-supervised machine learning algorithms provided at the controller module, the received feature aggregate and other feature aggregates provided at the controller into groups; updating a set of labelled clusters based on the groups of feature aggregates; training a classifier model using supervised machine learning algorithms and the updated set of labelled clusters; and communicating the trained classifier model to the gateway whereby upon receiving the trained classifier model, the gateway updates the classifier model at the gateway with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device.

With reference to the second aspect of the invention, each of the traffic sessions comprises an aggregation of traffic connections localized in fixed-size time intervals.

With reference to the second aspect of the invention, the feature vector comprises features selected from a Domain Name System (DNS) protocol, a Multicast DNS protocol, session statistics, a transport layer security (TLS) protocol, a Hypertext Transfer Protocol (HTTP), a Simple Service Discovery Protocol (SSDP), a Quick User Datagram Internet Connections (QUIC) protocol, a Message Queuing Telemetry Transport (MQTT) protocol, a Session Traversal Utilities for Network Address Translator (STUN) protocol, a Network Time Protocol (NTP) and a Bootstrap Protocol (BOOTP).

With reference to the second aspect of the invention, the determination of the classification of the loT device is performed using a supervised classification module provided within the existing classifier model at the gateway.

With reference to the second aspect of the invention, the semi-supervised machine learning algorithms used to cluster the received feature vector and the other feature vectors at the controller comprises modified K-means clustering algorithms whereby labels of known data are used to estimate mean and standard deviation of inter-cluster distances.

With reference to the second aspect of the invention, the supervised machine learning algorithms used to train the classifier model by the controller module comprises a Random Forests algorithm, a k-nearest neighbour algorithm, or a Gaussian and Bernoulli Naive Bayes algorithm.

With reference to the second aspect of the invention, the method further comprises the steps of: receiving from the controller, using another gateway, the trained classifier model; updating an existing classifier model provided at the another gateway using the received trained classifier model; collecting traffic sessions from another loT device and classifying the another loT device with the identity of the loT device when it is determined that the collected traffic sessions from the another loT device matches the traffic sessions of the loT device.

With reference to the second aspect of the invention, the method further comprises collecting, using another gateway, traffic sessions from another loT device and classifying another loT device based on the collected traffic sessions; generating another feature aggregate from the traffic sessions collected from the another loT device and communicate the another feature aggregate to the controller module when it is determined that a classification of the another loT device is not contained in the another gateway; clustering into groups, using semi-supervised machine learning algorithms provided at the controller module, the another received feature aggregate, the received feature aggregate of the loT device and other feature aggregates provided at the controller; updating the set of labelled clusters based on the groups of feature aggregates; training the classifier model using supervised machine learning algorithms and the updated set of labelled clusters; communicating the trained classifier model to the gateway and the another gateway whereby upon receiving the trained classifier model, the gateway and the another gateway updates the existing classifier model with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device and the previously another unclassified loT device.

Brief Description of the Drawings

The above and other problems are solved by features and advantages of a system and method in accordance with the present invention described in the detailed description and shown in the following drawings.

Figure 1 illustrating a network diagram of a system for identifying loT devices in a hierarchical manner in accordance with embodiments of the invention;

Figure 2 illustrating a block diagram representative of processing systems providing embodiments in accordance with embodiments of the invention;

Figure 3 illustrating components provided within a system for identifying loT devices in a hierarchical manner in accordance with embodiments of the invention;

Figure 4 illustrating a flow diagram of a process for identifying loT devices in a hierarchical manner in accordance with embodiments of the invention; Figures 5a-5d illustrating graphs showing the principal component analysis results for various devices;

Figure 6 illustrating a graph showing the accuracy of the clustering results in relation to a value of a variable Q;

Figures 7a-7b illustrating box charts showing the accuracy of the intra-cluster and inter-cluster distances for the various devices;

Figure 8 illustrating a box chart showing the clustering accuracy as a function of the z-score b;

Figure 9 illustrating a box chart showing the overall accuracy of the different classes of feature vectors defined in Table 1 ;

Figure 10 illustrating a flow diagram of a process for classifying loT devices connected to a gateway in accordance with embodiments of the invention; and

Figure 1 1 illustrating a flow diagram of a process for clustering feature vectors into groups and training a classifier model in accordance with embodiments of the invention.

Detailed Description

This invention relates to a system and method for identifying Internet of Things (loT) devices that are communicatively connected to a plurality of gateways. The plurality of gateways is in turn connected to a controller module which is configured to train and maintain classifier models for fingerprinting the loT devices. The gateways themselves are configured to utilize the updated classifier models generated by the controller module to classify loT devices connected to them. As for the controller module, this module is configured to cluster feature aggregates (collection of feature vectors) from unclassified loT devices into groups whereby these groups are then used to train the classifier model. The trained classifier model is then communicated to all the gateways to update their respective classifier models.

Figure 1 illustrates a network diagram of a system 100 for identifying loT devices in a hierarchical manner in accordance with embodiments of the invention. In this“hierarchical structure”, controller module 105 is provided at the top of the structure and all the gateways in system 100 such as gateways 1 14, 1 19 and 124 are connected to module 105. Each of the gateways is then in turn connected to a plurality of loT devices. For example, gateway 1 14 is connected to loT devices 1 15a, 1 15b, 1 15c, gateway 1 19 is connected to loT devices 120a, 120b, 120c and gateway 124 is connected to loT devices 125a, 125b, 125c. One skilled in the art will realize that although Figure 1 only illustrates three gateways being connected to controller module 105, any number of gateways may be utilized without departing from this invention. Similarly, one skilled in the art will also recognize that any number of loT devices may be connected to a single gateway and that the loT devices connected to the gateways may refer to loT devices of a similar type, although a different label has been applied. For example, loT device 1 15a may be a similar type of loT device as device 120c or may be a different type and such a choice is left to one skilled in the art. loT devices 1 15a-c, 120a-c, and 125a-c may comprise loT enabled devices such as printers, display devices, home appliances, lighting appliances, or any device that is able to carry out wireless communicative functions such as smart watches, smart plugs, or any other similar smart devices (i.e. electronic devices that are generally connected to other devices or networks via different protocols such as Bluetooth, NFC, WiFi, 3G, etc).

As for gateways 1 14, 1 19 and 124, these gateways may comprise a server in a smart home, an enterprise server or any wireless device that acts as a central connection point indirectly connecting wireless devices and/or loT devices to each other or to internal or external (i.e. the Internet) wireless networks. One skilled in the art will recognize that a gateway is typically provided with communicative means to connect various loT devices to a controller provided at a main server or a cloud server, where collected data may be stored, processed and accessed at a later stage by an authorized user. The communicative means may be through direct networking means including, but not limited to, Wi-Fi networks, Bluetooth networks or Near Field Communication (NFC). Such gateways are also typically provided at the loT devices’ location such as smart homes 1 10a, 1 10b or 1 10c and not at a remote location away from the loT devices themselves.

Figure 2 illustrates a block diagram representative of components of processing system 200 that may be provided within the loT devices 1 15a-c, 120a-c, 125a-c, gateways 1 14, 1 19, 124 and controller module 105 for implementing embodiments in accordance with embodiments of the invention. One skilled in the art will recognize that the exact configuration of each processing system provided within these modules and servers may be different and the exact configuration of processing system 200 may vary and Figure 2 is provided by way of example only.

In embodiments of the invention, module 200 comprises controller 201 and user interface 202. User interface 202 is arranged to enable manual interactions between a user and module 200 and for this purpose includes the input/output components required for the user to enter instructions to control module 200. A person skilled in the art will recognize that components of user interface 202 may vary from embodiment to embodiment but will typically include one or more of display 240, keyboard 235 and track-pad 236.

Controller 201 is in data communication with user interface 202 via bus 215 and includes memory 220, processor 205 mounted on a circuit board that processes instructions and data for performing the method of this embodiment, an operating system 206, an input/output (I/O) interface 230 for communicating with user interface 202 and a communications interface, in this embodiment in the form of a network card 250. Network card 250 may, for example, be utilized to send data from electronic device 200 via a wired or wireless network to other processing devices or to receive data via the wired or wireless network. Wireless networks that may be utilized by network card 250 include, but are not limited to, Wireless-Fidelity (Wi-Fi), Bluetooth, Near Field Communication (NFC), cellular networks, satellite networks, telecommunication networks, Wide Area Networks (WAN) and etc.

Memory 220 and operating system 206 are in data communication with CPU 205 via bus 210. The memory components include both volatile and non-volatile memory and more than one of each type of memory, including Random Access Memory (RAM) 220, Read Only Memory (ROM) 225 and a mass storage device 245, the last comprising one or more solid- state drives (SSDs). Memory 220 also includes secure storage 246 for securely storing secret keys, or private keys. It should be noted that the contents within secure storage 246 are only accessible by a super-user or administrator of module 200 and may not be accessed by any user of module 200. One skilled in the art will recognize that the memory components described above comprise non-transitory computer-readable media and shall be taken to comprise all computer-readable media except for a transitory, propagating signal. Typically, the instructions are stored as program code in the memory components but can also be hardwired. Memory 220 may include a kernel and/or programming modules such as a software application that may be stored in either volatile or non-volatile memory.

Herein the term“processor” is used to refer generically to any device or component that can process such instructions and may include: a microprocessor, microcontroller, programmable logic device or other computational device. That is, processor 205 may be provided by any suitable logic circuitry for receiving inputs, processing them in accordance with instructions stored in memory and generating outputs (for example to the memory components or on display 240). In this embodiment, processor 205 may be a single core or multi-core processor with memory addressable space. In one example, processor 205 may be multi-core, comprising— for example— an 8 core CPU. Figure 3 illustrates an overview 300 of building blocks or modules contained within system 100, in particular the modules contained within a gateway 301 and a controller 302 and the processes performed by each of these modules. One skilled in the art will recognize that many functional units in this description have been labelled as modules throughout the specification. The person skilled in the art will recognize that a module may be implemented as electronic circuits, logic chips or any combination of electrical and/or electronic discrete components. Further, one skilled in the art will also recognize that a module may be implemented in software which may then be executed by a variety of processors. In embodiments of the invention, a module may also comprise computer instructions or executable code that may instruct a computer processor to carry out a sequence of events based on instructions received. The choice of the implementation of the modules is left as a design choice to a person skilled in the art and does not limit the scope of this invention in any way.

Returning to Figure 3, it is shown that the upper part of the figure consists of processes that may be executed by the various modules provided at the gateways, while the lower part illustrates processes that may be executed by the various modules provided at a controller of system 100 in accordance with embodiments of the invention. When an loT device is connected to a gateway 301 , the loT device will generate traffic or traffic sessions. This traffic 305 flows from the loT device and arrives at the gateway 301 and are provided to module 310 which then proceeds to extract features from the collected traffic flows. The extracted features are subsequently utilized by module 315 to classify the loT device. Module 315 does this by classifying the device based on its traffic sessions into a particular type (or class). It should be noted that the classifier model utilized by module 315 of the gateways are obtained from the controller and are continually updated by the controller. The gateways themselves on their own do not perform any training of the classifier model and instead, the training of the classifiers is performed by the controller.

After the classification process has been completed, the gateway then proceeds to categorize the sessions that were classified with low probability, as diffident sessions 325 whereby such sessions are later sent to the controller. This is done by module 320. Traffic sessions are normally categorized as having a low probability when its associated loT device is new; or when an existing device’s traffic pattern has changed. It is useful to note that a change in the firmware of an existing loT device might also lead to a low classification probability.

In embodiments of the invention, the three modules 310 (for feature extraction), 315 (for device classification) and 320 (for confidence evaluation) may be parts of a single fingerprint classification application that can be developed as a virtual network function (VNF) and deployed at any one of the gateways. Furthermore, these three individual modules may be developed as microservices, thereby allowing them to be modified and deployed independently, while adhering to standard interfaces for communications between themselves.

The controller module 302 then gathers diffident sessions 325 from the different gateways it controls using module 330, and proceeds to cluster them using module 335. For the clustering process performed by module 335, module 335 will use a set of labelled data as seed clusters. Controller 302 updates the labelled dataset 340 continuously and actively based on its algorithm; and each time the dataset is updated, the controller retrains the classifier generating a new updated model 350. This is done by module 345. This updated model 350 is then sent to all the gateways under the purview of controller 302. Upon receiving the updated model 350, the gateways will update their existing module using the updated model. From thereon, all extracted traffic sessions will be classified by module 315 using this new model.

It is important to note at this stage that, gateway 301 only sends feature vectors of diffident sessions to its controller, and not the actual traffic sessions received. The size of a feature vector is typically small. Thus, even when traffic sessions may comprise hundreds of packets resulting in traffic session sizes between hundreds of KBs (kilobytes) to thousands of MBs (megabytes), the size of the corresponding feature vector is only between 1 -2KBs. It is also worth noting that, system 100 is designed to be scalable. This means that when the numbers of gateways increase, more instances of the controller can be spawned to balance the load; and the different controllers may be configured to synchronize the classifier model at each of the gateways via a controller that has been provided in a higher layer in the hierarchical architecture of system 100.

Figure 4 illustrates a flow diagram of a process for identifying loT devices in a hierarchical manner in accordance with embodiments of the invention. In particular, when loT device 1 15a first connects to gateway 1 14, traffic sessions from this loT device will be collected and analysed by gateway 1 14. This occurs at step 402.

Traffic sessions generated by the loT devices contain important information that can be extracted as features for different kinds of analysis, including fingerprinting. These features may be extracted both from the packet headers as well as the payloads themselves. Extracting features from the payloads is an expensive process, though such features may provide useful information for fingerprinting. Payload analysis is also often considered intrusive to user privacy. Depending on the type of analysis performed, the traffic sessions may be represented by feature vectors having varying levels of detail. At the finest level of granularity, a feature vector of a traffic session may represent per packet information; examples being application protocol (port number) used, packet size, destination IP address, IP options, etc. This representation is useful if the goal is to classify each packet. However, there is no strong motivation to carry out fingerprinting of devices at the packet level, in particular considering the fact that per-packet classification is costly. Instead, it should be sufficient to classify a traffic session as an aggregation of traffic connections localized in time.

Multiple sessions from a device can be differentiated using two approaches. In a dynamic approach, the session’s inactive period may be identified and based on this, the traffic connections may be grouped into one session as long as the inactivity period is less than a predetermined time (say, one minute). In a static approach, time is split into fixed-size intervals (say, 15 minutes), and all traffic connections from a device within one interval is considered as a single session. The length of dynamic sessions could be quite arbitrary, particularly because devices may maintain keep-alive probes. In this embodiment of the invention, the traffic sessions are defined based on fixed-size time intervals.

Some of the features that may be used to form a feature vector having a dimension of size “1 1 1” are illustrated in Table 1 below. In summary, these features comprise parameters that may be extracted from the collected traffic sessions. One skilled in the art will recognize that other types of protocols and their corresponding features too may be utilized and that the invention is not limited to only the protocols set out in table 1 below. In particular, the protocols may comprise any protocol from the layers of the TCIP/IP (transmission control protocol/internet protocol) model.

Table 1

In summary, the protocols in Table 1 comprise a Domain Name System (DNS) protocol, a Multicast DNS protocol, session statistics, a transport layer security (TLS) protocol, a Hypertext Transfer Protocol (HTTP), a Simple Service Discovery Protocol (SSDP), a Quick User Datagram Internet Connections (QUIC) protocol, a Message Queuing Telemetry Transport (MQTT) protocol, a Session Traversal Utilities for Network Address Translator (STUN) protocol, a Network Time Protocol (NTP) and a Bootstrap Protocol (BOOTP). The protocols listed in Table 1 may be broadly classified as follows:

• \ : DNS is one of the fundamental protocols of the Internet. Features extracted from the

DNS protocol could be valuable for fingerprinting loT devices. Features relating to both DNS and mDNS (multicast DNS) are considered. The features considered are number of DNS queries, DNS packet count, most frequently queried domain name number of DNS errors (i.e., response code 6= NOERROR), the number of INTERNET class queries, statistics of DNS packet length and DNS query response time, etc. All these features are collected over each of the sessions defined based on time.

• V₂: Two session level features that are protocol agnostic are also extracted: (i) number of packets sent during the session; and (ii) activity period of the session. Although a session might be defined for, say, 15 minutes, packets might be sent only in the first ten minutes; in this case, the activity period is 10 minutes. It is useful to note that this set of features is the least privacy-intrusive, as they do not require packet headers to be read.

• V₃: Most communications between loT devices and cloud are encrypted. As the adoption rate of TLS is increasing, TLS related features are extracted for fingerprinting. The features extracted are like minimum, maximum and mean of TLS packet length, flow duration, and number of TCP keep-alive probes used in TLS session. In addition, in this class, HTTP features are also considered as listed in the table.

• V₄: In this feature class, a number of protocols are considered, as given in Table I. STUN (Session Traversal of UDP through NAT) is used to establish bidirectional communication between an loT device and its cloud server, in the presence of a NAT server. SSDP (Simple Service Discovery Protocol) is a server-less discovery protocol, forming the basis of UPnP (Universal Plug and Play) architecture; it is adopted by many loT devices. MQTT (message queue telemetry transport) is a publish-subscribe based light-weight messaging protocol used to collect and transfer data from devices to their servers. Due to its popular features (small footprint, adaptability with constrained network, simplicity in implementation, etc.), MQTT is expected to be widely adopted by the loT market. QUIC (Quick UDP Internet Connection) is a recently developed transport protocol, supported by Google servers and Google Chrome that aims to perform better than the widely used TCP.

In order to evaluate the effectiveness of utilizing the feature vectors in Table 1 to distinguish the loT devices, preliminary analysis were carried out using PCA (Principal Component Analysis). PCA is often used for dimension-reduction, by explaining the variance in the data using a small number of orthogonal components. It is also useful to visualize high-dimension data in smaller, two or three, dimensions. Feature vectors were extracted from traffic sessions of 16 devices and these feature vectors were analysed using PCA with two components. The corresponding 2D planes are plotted in Figures 5a-d. For clarity, the devices are separated into smaller groups and presented in different plots. It can be seen that there is a clear separation among feature vectors of different devices. It should be noted that the groups overlapped when the components of all devices were plotted in a single 2D plane however, the overlap reduced significantly when another component was added to the PCA.

Returning to Figure 4, after the traffic sessions from loT device 1 15a have been collected and analysed by gateway 1 14 at step 402, various features will then be extracted from the collected traffic session to form a feature vector. Algorithm 1 below set outs the steps performed by gateway 1 14 after it has captured the traffic sessions.

Algorithm 1 : Fingerprinting at GW(S, X)

Input: S: Traffic session; X: device address

Variables: c: device class; p: prob. of classification,

diffident sessions[X]: list of sessions via interface X

1 : if ExistsNewModel == TRUE then

2: M <— From_Controller() > Update model

3: end if

4: F = ExtractFeature(S)

5: [c, p] <— Classify_GW(F,/W)

6: if p < threshold then > If confidence is low

7: diffident sessions[X].append(F)

8: if len(diffident sessions[X] == Q then

9: SendToController(diffident sessions[X])

10: diffident sessions[X] = [] > Empty list

1 1 : end if

12: return NULL > No classification done

13: end if

14: return c as class of S

The first if statement checks if a controller connected to this gateway has sent a new model back to this gateway, and if so, the gateway updates its model accordingly. In practice, this would be implemented separately and independent of the classification at gateways, using a publish-subscribe model, so that gateways receive the latest model at the earliest time possible.

Line no. 4 in Algorithm 1 extracts features from traffic sessions. In embodiments of the invention, the feature extraction may be performed by a function inherent in the access point or it may be performed using functions preloaded in typical packet capture libraries. Supervised classification is performed in line no. 5, where M denotes the model used at gateways for classification. We assume the first such model is trained using a set of known devices, which we call as seed devices. The model itself is obtained from the controller, which retrains the model as and when necessary (explained later in Algorithm 2). All gateways use the same model M for classification.

The Classify_GW function runs a supervised classifier trained by the controller. The supervised learning algorithms used for device classification are discussed in greater detail in the following sections and exemplary machine learning algorithms that may be utilized are the Random Forests, k-NN, Gaussian and Bernoulli Naive Bayes. The Classify_GW function returns the device class c of the session, as well as a measure of confidence p for the class assigned. The second if statement assesses this confidence; if the confidence is low, the corresponding session’s feature vector F is appended to the list diffident sessions— a list of sessions predicted with low confidence.

The diffident sessions are indexed by X; where X is the unique address of an loT device, for example, its MAC address. This information allows the gateway to segregate sessions based on the device addresses, even while it is not certain of the device type. Note that, while a device can fake a MAC address it presents to the gateway, this does not pose a problem as long as the faked MAC address is not the same as another device connected to the gateway. Duplicate MAC addresses can be easily detected by a gateway. This segregation based on addresses is used only to aggregate feature vectors from the same device. When the number of low-confidence sessions observed from a particular device (i.e., length of list diffident sessions[X]) is Q, it is sent over to the controller (line no. 9). Flenceforth, diffident sessions[X] are referred to as feature aggregate F (that is, F is a set of feature vectors coming from an unknown device).

With reference to Figure 4, once the feature aggregates (collection of feature vectors) have been extracted from the loT device’s traffic sessions and if it is determined to have come from an unknown device (i.e. identified as diffident sessions), these feature vectors are sent to the controller 105 at step 404. Step 404 corresponds to the SendToController function in Algorithm 1 above.

It is useful at this stage to define certain parameters that are utilized at the controller. A list of feature aggregates received by the controller from some of its gateways is defined as P. Another important data structure is defined as R, which is a list of clusters from seed devices. A cluster here is defined as a labelled feature aggregate, which has at least Q’ number of feature vectors. A data structure similar to P (list of feature aggregates) is also maintained as T where T is used to reduce the feature aggregates in P, such that each feature aggregate in T corresponds to a unique device type. In other words, the controller merges‘similar’ feature aggregates (potentially belong to same device type), as detailed in the algorithms below. Initially, T is bootstrapped with clusters from R. The logic of having two related lists R and 7 is that the controller uses R to maintain the final clusters corresponding to device types.

The resulting clustering algorithm is inspired by the seeded K-means algorithm which is a semi-supervised clustering technique which is known to those skilled in the art. In this algorithm, the labels of the known data are not changed but are instead used to estimate the centroids of the clusters. The clustering algorithm is based on this concept of using data with known labels. It should be noted that the clustering algorithm does not fix the number of clusters and instead of performing clustering on points, a set of points— feature aggregates are clustered instead.

The clustering algorithm performed by the controller is defined in Algorithm 2 below. The basic idea adopted by Algorithm 2 is to check if the feature aggregate obtained from a gateway is close to any of the known feature aggregate in T. Each iteration in the for loop operates on one feature aggregate F. The cluster in Tthat is nearest to F is first found, using the function nearestCluster (line no. 2 - algorithm 3). The“If” statement then checks whether the distance to this nearest cluster G is within an acceptable range, using the function withinICR (as described below as algorithm 4, z-score is used for this purpose). In this case, the feature vectors in F are added to G, and this new merged feature aggregate (F u G) replaces the old feature aggregate G in T. If the nearest feature aggregate is not within the acceptable range, F is considered as a new feature aggregate in T, and added to T (line no. 6). Finally, both the model M and the list of clusters in R are updated by invoking the function updateModelClusters (i.e. algorithm 6 below)

Algorithm 2: Clustering at Controller(P)

Input: P: List of feature aggregates

Variables: T. List of labelled feature aggregates

1 : for each set F e P do

2: G = nearestCluster(7,F)

3: if withinlCR(F, G) == TRUE then

4: T.replace( G,F U G)

5: else

6: T.add(F) > add new cluster

7: end if 8: end for

9: updateModelClusters(T)

The functions utilized by Algorithm 2 are defined below. In particular, Algorithm 3 defines the nearestCluster function which computes the distances between centroid of the given feature aggregate F to the centroid of each feature aggregate in T, and returns the feature aggregate to which the distance is minimum

Algorithm 3: nearestCluster(T,F)

Input: T : List of labeled feature aggregates

F: Set of feature vectors (points)

1 : 0 = z(E) > computed using Eq. 1

2: G = arg min_¾.eT(C - z(c)) > find nearest based on centroid

3: return G

Equation 1 as referred to in Algorithm 3 above for obtaining the centroid z of set F is defined as: ...equation (1 )

where F is a set of m-dimensional vectors, m is the feature dimension, |F] denotes the number of vectors in F, and where the addition and average is being computed over the vector components.

Algorithm 4 withinlCR(F₀,F₁)

Input: Fo^ : Feature aggregate

Variables: T : List of sets of labelled feature vectors;

1 : Compute centroids z _Fo and z _Fi using Eq. (1 )

2: D(F₀,F₁) = dist^ _F0, z _n ) > computed using Eq. (2)

3: if | zScore(D(F₀,F₁) m, s)| < b then

4: return TRUE

5: end if

The function withinICR, defined in Algorithm 4, checks if the two given feature aggregates are within the range of the inter-cluster distance measure. Given two feature aggregates, the algorithm computes the centroid of these feature aggregates (line no. 1 ) and the distance between the centroids (line no. 2). To decide whether these two feature aggregates can be merged or not, the following steps are performed. Let m and s denote the mean and standard deviation of the inter-cluster distances— distances between the clusters in R.

The Euclidean distance referred to in Algorithm 4 is computed as:

...equation (2) where the Euclidean distance is used to compute the distance between two sets of points, where u and are vectors having the same dimension m, where u, and v, denotes the i^th component of vectors u and v respectively.

Estimation of m and s are carried out in Algorithm 5.

Algorithm 5: estimateClusterDist(ff)

Input: R: List of clusters used for training the classifier

1 : l = len(R )

2: for i in (1.. I - 1) do

3: for j in [j + 1.. 1] do

4: D [R[i], R [j]] = dist(R[i], R [j])

5: end for

6: end for

7: m, s = mean_std(D)

Given the estimated mean and standard deviation, and using the computed distance between the feature aggregates in line no. 2, the z-score may be computed using Eq. (3) as defined below. If the absolute value of z-score is less than a threshold, b, the controller considers the feature aggregates to be close enough to be merged.

In particular, the z-score may be computed as: ...equation (3)

where m is the mean, s is the standard deviation and x is the observation.

Given an observation x, the z-score gives the number of standard deviations the observed value is from the population mean. The z-score is based on the mean and standard deviation, which has a breakdown point of 0%. Therefore, in scenarios where the data may have anomalies, it is common to replace these estimators (m and s) with their robust counterparts of median and median of all absolute deviations from the median (MAD). Both median and MAD have breakdown point of 50%.

It is also useful at this stage to define k-means clustering. This algorithm is typically used to partition a given set of n data points into k clusters, where k is provided as input. If m_{ΐ 5} p₂,... ,p_k represent the centroids of the k clusters F_k s, then the partition is achieved by minimizing the following objective function:

Seeded k-means is a semi-supervised clustering algorithm based on k-means, in which labelled data sets are used for seeding. These labelled data sets are used for initializing k-means in computing the centroids (instead of choosing random points). Thereafter, each observation f is assigned to the nearest cluster (i.e. the cluster with the nearest centroid to the observation), and the centroid of that cluster is re-computed.

Algorithm 6 updates the cluster list R, such that R C T and each F e R has more than Q’ feature vectors. These clusters are then used to update the model used for classification at the gateways; and this is achieved using the train function (line no. 7). This occurs as step 410 in Figure 4. The train function uses a supervised learning algorithm for training a classifier model. In particular, the following machine learning algorithms were used to train the classifier model - Random Forests, k-NN, Gaussian and Bernoulli Naive Bayes. Of all these, Random Forests performed the best with 98% accuracy while Naive Bayes achieved 85% accuracy.

Algorithm 6: updateModelClusters(T )

Input: T : List of feature aggregates

Variables: F?: List of clusters used for training the classifier; M : Classifier model 1 : Re-initialize R as an empty list

2: for each F e T do

3: if len(F ) > q' then

4: R. add(F " )

5: end if

6: end for

7: M = train(R) > Retrain and obtain new model 8: estimateClusterDist(R)

9: SendToGWs(M) > Send new model to all gateways

Precision, recall and F^ are the commonly used metrics for multi-class classification. For a given class, precision and recall are defined as:

# True Positive

precision =—— - - - —— -

# ( True Positive + Faise Positive)

# True Positive

recall = -

# ( True Positive + False Negative)

“Precision” gives the fraction of correctly predicted instances of all those predicted for (and as) a particular class while“recall” is the fraction of correctly predicted instances of the true instance of a class. Based on precision and recall, the F^ score for a class is defined as: precision x recall

F_t score = 2 x - —

precision + recall

The overall accuracy is the ratio of the sum of correctly predicted points to the total number of points.

Based on the above, the clustering accuracy of the models, the inter-cluster and intra-cluster distances, the z-score analysis and the controller’s classification accuracy are all discussed as follows:

1 ) Clustering accuracy: The accuracy of clustering the data corresponding to test devices is first analysed. Recall that a gateway sends a feature aggregate (set of points) to the controller when in doubt. The size of this set is controlled by a parameter Q (refer line no. 8 in Algorithm 1), and the accuracy of clustering as a function of this parameter Q is illustrated in Figure 6. All the features listed in Table I are utilized for this analysis.

If P denotes the list of feature aggregates received at the controller, let v denote the number of merges required to reduce P to a set of clusters such that each cluster maps uniquely to its corresponding device. Let h denote the number of feature aggregates correctly merged to form the right clusters corresponding to the devices.

h

Clustering accuracy =—

v It is of interest to note that the accuracy is high even for small values of Q, and close to 100% accuracy is achieved with a feature aggregate of size 100. Note that, each point in the plot is the mean of clustering accuracies from five runs.

2) Inter-cluster and intra-cluster distances: Furthermore, the inter-cluster distances and intra-cluster distances are analysed as follows. These distances are essentially the Euclidean distances between the centroids of the corresponding clusters. Hence, inter-cluster distances for a device are obtained from the distances between its cluster and the clusters of the remaining devices. For intra-cluster distances of a device, the corresponding device cluster are randomly partition into 10 equal parts, and the distances between every pair are calculated. Figures 7a and 7b illustrate the box plots for these distances. Almost an order of different is observed to exist between the intra-cluster distances and the inter-cluster distances.

3) Analysis based on z-score: The clustering accuracy is analysed as a function of z-score, in Figure 8. The dimension of the feature vector is 100 and the cluster size is set to 100. It is useful to highlight that the parameters for computing the z-score, namely m and o, are initially estimated from the data of seed devices. Subsequently, these parameters are continuously re-estimated as and when the clusters in R are updated. There is a range of values for the threshold b, approximately [2.2- 3.0], for which the clustering accuracy peaks, and the accuracy decreases on both sides of this range. This is expected as a low z-score would create a larger number of clusters than necessary, and a high z-score may inadvertently omit the creation of new clusters for unknown test devices.

4) Classifier model classification accuracy: Next, the accuracy of the classifier models are analysed. The first five devices given in Table 2 below were used as seed devices. More specifically, the list R was initialized with five seed clusters containing 50% of the total number of sessions collected for these device. That also means that no initial labelled dataset was used for the remaining 1 1 devices in the classifier model. Therefore, clusters corresponding to these 1 1 devices were formed automatically and dynamically by the classifier model during this analysis. The performance metrics presented are for all the devices, including the seed devices. It should be pointed out that, the classifier model performs similarly, when initialized with other (randomly selected) sets of seed clusters.

Though fingerprint classification is performed at the gateway, the accuracy is dependent on the clustering accuracy performed at the controller. For this analysis, the cluster size Q was set to 105. The value of b for the z-score test was set to three. An accuracy of approximately 97% was achieved for fingerprinting using the classifier model. This high accuracy was obtained when all features listed in Table 1 were used. Table 3 gives the precision, recall and F^ score of all devices. Observe that the classifier model performs well even in the presence of similar devices. For instance, the last four devices are from one vendor, and in particular two of them have same functionality but different models. In this case, the minimum F score is still above 75%.

Table 2

Table 3

5) Analysis of feature classes: In Figure 9, the overall accuracy for the different classes of feature vectors defined in Table 1 is plotted. Observing accuracies with the feature class V₂ and its combination with other classes (V₁₂ , V₂₃ , V₂₄ ), it is noted that this class of session-related features (which are protocol agnostic) contributes more to accurate fingerprinting than the rest. Though limited to the devices tested with, it is observed that for systems where only such aggregate and minimally privacy-intrusive features are available, fingerprinting using the classifier models described in accordance with embodiments of this invention could still achieve high accuracy. 6) Scalability: To quantify the bandwidth saved due to the distributed approach of system 100 (see Figure 1 ), the sum of the sizes of sessions gathered from all 16 loT devices is computed. The average session size was approximately 1 .6 MB. Whereas, the average size of feature vectors (corresponding to these same sessions) was less than 550 bytes. This illustrates how system 100 is advantageous over a reinforcing classifier model which would need to analyse traffic continuously. In such a centralized solution, as the gateways are not provided with“intelligence” besides the controller, all traffic would have to be sent to the controller. But system 100 does not send all feature vectors to the controller. Even if a conservative assumption were to be made that the gateways in system 100 sends only one in ten feature vectors to the controller, it can be noted that there is at least four orders of magnitude of savings brought about by embodiments of this invention. This illustrates the scalability of system 100 as compared to a centralized approach.

Returning to Figure 4, this means that when a similar loT device 1 15a is now connected to gateway 124, after traffic sessions from this device are sent to gateway 124 at step 406, device 1 15a would now be able to be classified by the classifier model that has been updated within gateway 124. For completeness, one skilled in the art will understand that when a new device 120b is connected to any one of the gateways in system 100, a corresponding diffident session or associated feature vector will be extracted and sent to controller 105 as step 408. The processes of clustering, updating of labelled clusters, training of classifier model and communication of the trained classifier models are then repeated as discussed above.

In accordance with embodiments of the invention, a method for identifying an loT device that is connected to a gateway which is in turn connected to a controller comprises the following steps:

Step 1 , classifying, using a classifier model at the gateway, based on traffic sessions collected from the loT device;

Step 2, generating, using the gateway, a feature aggregate from the collected traffic sessions and communicating the feature vector to the controller module when it is determined that a classification of the loT device is not within the gateway or the classifier model in the gateway;

Step 3, clustering, using the controller module which is configured to use semi- supervised machine learning algorithms, the received feature aggregate and other feature aggregates provided at the controller into groups; Step 4, updating a set of labelled clusters based on the groups of feature aggregates;

Step 5, training a classifier model using supervised machine learning algorithms and the updated set of labelled clusters; and

Step 6, communicating the trained classifier model to the gateway, whereby upon receiving the trained classifier model, the gateway updates the classifier model at the gateway with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device.

In embodiments of the invention, a process is needed for identifying loT devices in a system comprising gateways to which the loT devices are connected to and a controller to which the gateways are connected to. The following description and Figures 10 and 1 1 describe embodiments of processes that provide processes in accordance with this invention.

Figure 10 illustrates process 1000 that is performed by a gateway or by modules contained within a gateway to classify an loT device in accordance with embodiments of the invention. Process 1000 begins at step 1005 by collecting traffic sessions from a newly connected loT device. Process 1000 then determines at step 1010 whether the collected traffic sessions are classifiable using an existing classifier model. When process 1000 determines that there is low probability of classifying the traffic sessions generated by the loT device, process 1000 will then proceed to step 1 100 (which comprises the steps performed at a controller). Prior to step 1 100, process 1000 will first extract a set of feature vectors from the collected traffic sessions and these features aggregates (collection of feature vectors) are sent to the controller by process 1000.

Alternatively, if process 1000 determines at step 1010 that the traffic sessions generated by the loT device are classifiable, process 1000 then proceeds to step 1015 instead. At this step, process 1000 then classifies the loT device based on its supervised classifier model. Process 1000 then ends.

Figure 1 1 illustrates process 1 100 that is performed by a controller that is connected to a plurality of gateways. Process 1 100 begins at step 1 105 when process 1 100 receives feature aggregates from a gateway. At this step, process 1 100 will proceed to cluster the received feature aggregate and existing feature aggregates into distinct clusters. Process 1 100 then proceeds to update the labelled dataset corresponding to these clusters at step 1 1 10 and retrains a classifier model at step 1 1 15 based on the output of step 1 1 10. The trained classifier model is then communicated from the controller to the gateways at step 1 120 and process 1 100 then ends. Experimental Setup

The experimental setup consists of 16 loT devices relevant to smart homes and may comprise the devices listed in Table 2. They are connected to the Internet via a gateway over Ethernet or WiFi. A Raspberry Pi 3 was used as a gateway, to capture traffic generated by the devices. An interval length of 15 minutes was used to define sessions. In total, 7594 sessions were generated and captured over a period of seven days. In this controlled environment, MAC addresses in the packet headers were used to label the traffic sessions from different devices

The scenario considered for evaluation consists of one controller and five gateways. Each gateway is assumed to be located at a smart home, to which five devices are connected. Each home has 1 -2 devices that are used in other homes as well. At the start of the experiment, it is assumed there are five known devices, each located at different homes; that is, the traffic of these devices are available for training. Therefore, when the system initializes, there are five seed clusters whose labels are known. These known devices are referred to as seed devices and the remaining ones as test devices. The initial model is trained using feature vectors from these seed devices.

The labels of the test devices are not known, and their traffic would be new to gateway(s) and controller. As described in the algorithms, when a gateway observes low classification probability of traffic session(s) from the new device, it sends the corresponding set of feature vectors to the controller. This scenario is executed offline, where traffic is captured and send as input to the gateway (Algorithm 1 ).

The controller is then configured to cluster the feature vector extracted from the new device by the gateway. A set of labelled clusters are then updated based on the groups of feature vectors and a classifier model is trained using supervised machine learning algorithms. The trained classifier model is then sent to all the gateways to replace the classifier models at these gateways.

The above is a description of embodiments of a system and process in accordance with the present invention as set forth in the following claims. It is envisioned that others may and will design alternatives that fall within the scope of the following claims.

Claims

CLAIMS:

1. A system for identifying an Internet of Things (loT) device communicatively connected to a gateway that is communicatively connected to a controller module, the system comprising:

the gateway being configured to:

classify the loT device, using a classifier model, based on traffic sessions collected from the loT device;

generate a feature aggregate from the collected traffic sessions and communicate the feature aggregate to the controller module when it is determined that a classification of the loT device is not contained in the gateway; the controller module being configured to:

cluster, using semi-supervised machine learning algorithms, the received feature aggregate and other feature aggregates provided at the controller into groups; update a set of labelled clusters based on the groups of feature aggregates; train a classifier model using supervised machine learning algorithms and the updated set of labelled clusters;

communicate the trained classifier model to the gateway whereby upon receiving the trained classifier model, the gateway updates the classifier model at the gateway with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device.

2. The system according to claim 1 whereby each of the traffic sessions comprises an aggregation of traffic connections localized in fixed-size time intervals.

3. The system according to any one of claims 1 or 2 whereby the feature vector comprises features selected from a Domain Name System (DNS) protocol, a Multicast DNS protocol, session statistics, a transport layer security (TLS) protocol, a Hypertext Transfer Protocol (HTTP), a Simple Service Discovery Protocol (SSDP), a Quick User Datagram Internet Connections (QUIC) protocol, a Message Queuing Telemetry Transport (MQTT) protocol, a Session Traversal Utilities for Network Address Translator (STUN) protocol, a Network Time Protocol (NTP) and a Bootstrap Protocol (BOOTP).

4. The system according to any one of claims 1 to 3 whereby the determination of the classification of the loT device is performed using a supervised classification module provided within the existing classifier model at the gateway.

5. The system according to any one of claims 1 to 4 whereby the semi-supervised machine learning algorithms used to cluster the received feature aggregate and the other feature aggregates at the controller comprises modified K-means clustering algorithms whereby labels of known data are used to estimate mean and standard deviation of inter cluster distances.

6. The system according to any one of claims 1 to 5 whereby the supervised machine learning algorithms used to train the classifier model by the controller module comprises a Random Forests algorithm, a k-nearest neighbour algorithm, or a Gaussian and Bernoulli Naive Bayes algorithm.

7. The system according to claim 1 further comprising:

another gateway being configured to:

receive the trained classifier model from the controller;

update an existing classifier model provided at the another gateway using the received trained classifier model;

collect traffic sessions from another loT device and classifying the another loT device with the identity of the loT device when it is determined that the collected traffic sessions from the another loT device matches the traffic sessions of the loT device.

8. The system according to claim 1 further comprising:

another gateway being configured to:

collect traffic sessions from another loT device and classifying the another loT device based on the collected traffic sessions;

generate another feature aggregate from the traffic sessions collected from the another loT device and communicate the another feature aggregate to the controller module when it is determined that a classification of the another loT device is not contained in the another gateway; the controller module being configured to:

cluster into groups, using semi-supervised machine learning algorithms, the another received feature aggregate, the received feature aggregate of the loT device and other feature aggregates provided at the controller; update the set of labelled clusters based on the groups of feature vectors; train the classifier model using supervised machine learning algorithms and the updated set of labelled clusters;

communicate the trained classifier model to the gateway and the another gateway whereby upon receiving the trained classifier model, the gateway and the another gateway updates the existing classifier model with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device and the previously another unclassified loT device.

9. A method for identifying an Internet of Things (loT) device communicatively connected to a gateway that is communicatively connected to a controller module, the method comprising:

classifying, using a classifier model in the gateway, the loT device, based on traffic sessions collected from the loT device and generating a feature aggregate from the collected traffic sessions and communicate the feature aggregate to the controller module when it is determined that a classification of the loT device is not contained in the gateway; clustering, using semi-supervised machine learning algorithms provided at the controller module, the received feature aggregate and other feature aggregates provided at the controller into groups;

updating a set of labelled clusters based on the groups of feature aggregates;

training a classifier model using supervised machine learning algorithms and the updated set of labelled clusters; and

communicating the trained classifier model to the gateway whereby upon receiving the trained classifier model, the gateway updates the classifier model at the gateway with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device.

10. The method according to claim 9 whereby each of the traffic sessions comprises an aggregation of traffic connections localized in fixed-size time intervals.

1 1 . The method according to any one of claims 9 or 10 whereby the feature vector comprises features selected from a Domain Name System (DNS) protocol, a Multicast DNS protocol, session statistics, a transport layer security (TLS) protocol, a Hypertext Transfer Protocol (HTTP), a Simple Service Discovery Protocol (SSDP), a Quick User Datagram Internet Connections (QUIC) protocol, a Message Queuing Telemetry Transport (MQTT) protocol, a Session Traversal Utilities for Network Address Translator (STUN) protocol, a Network Time Protocol (NTP) and a Bootstrap Protocol (BOOTP).

12. The method according to any one of claims 9 to 1 1 whereby the determination of the classification of the loT device is performed using a supervised classification module provided within the existing classifier model at the gateway.

13. The method according to any one of claims 9 to 12 whereby the semi-supervised machine learning algorithms used to cluster the received feature vector and the other feature vectors at the controller comprises modified K-means clustering algorithms whereby labels of known data are used to estimate mean and standard deviation of inter-cluster distances.

14. The method according to any one of claims 9 to 13 whereby the supervised machine learning algorithms used to train the classifier model by the controller module comprises a Random Forests algorithm, a k-nearest neighbour algorithm, or a Gaussian and Bernoulli Naive Bayes algorithm.

15. The method according to claim 9 further comprising the steps of:

receiving from the controller, using another gateway, the trained classifier model; updating an existing classifier model provided at the another gateway using the received trained classifier model;

collecting traffic sessions from another loT device and classifying the another loT device with the identity of the loT device when it is determined that the collected traffic sessions from the another loT device matches the traffic sessions of the loT device.

16. The method according to claim 9 further comprising:

collecting, using another gateway, traffic sessions from another loT device and classifying the another loT device based on the collected traffic sessions;

generating another feature aggregate from the traffic sessions collected from the another loT device and communicate the another feature aggregate to the controller module when it is determined that a classification of the another loT device is not contained in the another gateway;

clustering into groups, using semi-supervised machine learning algorithms provided at the controller module, the another received feature aggregate, the received feature aggregate of the loT device and other feature aggregates provided at the controller;

updating the set of labelled clusters based on the groups of feature aggregates; training the classifier model using supervised machine learning algorithms and the updated set of labelled clusters; communicating the trained classifier model to the gateway and the another gateway whereby upon receiving the trained classifier model, the gateway and the another gateway updates the existing classifier model with the received trained classifier model whereby the updated classifier model is configured to identify the previously unclassified loT device and the previously another unclassified loT device.