WO2022171380A1

WO2022171380A1 - Anomaly detection

Info

Publication number: WO2022171380A1
Application number: PCT/EP2022/050845
Authority: WO
Inventors: Michael Gibson
Original assignee: British Telecommunications Public Limited Company
Priority date: 2021-02-15
Filing date: 2022-01-17
Publication date: 2022-08-18
Also published as: GB202102085D0

Abstract

Anomaly Detection The present invention provides a computer implemented method for detecting anomalous behaviour of a computer system in a set of computer systems. The method obtains connection data representing communications occurring in each direction between the computer system and other computer systems in the set during a period of time. The method classifies the communications in each direction between the computer system and the other computer systems based on the connection data. The method aggregates the communications in each direction relative to the computer system based on the classifications to produce at least one respective aggregate property for the communications in each direction. The method determines a classification for the computer system based, at least in part, on the at least one respective aggregate property in each direction, wherein at least one of the classes into which the computer system may be classified indicates anomalous behaviour of the computer system. (Figure 3)

Description

Anomaly Detection

Field of the invention

The present invention relates to detecting anomalies in a computer network. In particular, the present invention identifies anomalous behaviour of a computer system in the network.

Background to the invention

Network connected computer systems, whether physical and/or virtual computer systems connected via one or more physical and/or virtual network communication mechanisms, can be susceptible to malicious attack. For example, one or more computer systems can become infected with malicious software such as botnet agents or the like, and such infected systems can instigate malicious communication with other systems. Such malicious communications can, for example, be intended to propagate infections and/or affect the operation of target computer systems (e.g. denial of service attacks, hijacking or the like).

Systems are commonly deployed which attempt to detect such attacks and either notify a network administrator that an attack is taking place or automatically take steps to prevent or mitigate the attack (or both). Such a system may be referred to as an Intrusion Detection System (IDS) where attacks are simply detected or as an Intrusion Prevention System (IPS) in the case where the system is arranged to automatically takes preventative or mitigating actions against detected attacks.

In order to detect attacks, an IDS (or IPS) can monitor communications occurring within the network and attempt to spot malicious communications that are associated with (i.e. result from) an attack. One approach that may be used to spot malicious communications is so-called signature-based detection. In this approach, communications are compared to the signatures of known attacks to see if they match, in which case the attack is detected. However, this approach can only detect attacks that are already known and so can be vulnerable to novel attacks. Accordingly, another approach that is utilised is so called anomaly-based detection. This approach seeks to determine a model of normal behaviour or characteristics of the traffic in a network and then use that model to identify traffic which behaves differently or has different characteristics from the normal traffic (i.e. anomalous traffic). Since anomaly-based detection relies on knowledge of a system’s normal behaviour, as opposed to knowledge of the characteristics of a specific attack (as in the case of signature-based detection), anomaly-based detection can detect new attacks that have not been previously seen. Summary of the invention

Although such anomaly-based intrusion detection systems provide a means for detecting new attacks, they commonly do not provide the full context of an attack. That is to say, whilst they can identify an anomaly occurring in the traffic between particular computer systems, they do not identify the roles that each of the computer systems play in association with the attack. Without such context, it can be difficult to determine a preventative measure that may be applied to respond to the attack. Furthermore, it can also lead to less optimal preventative measures being taken. Accordingly, it would be beneficial to mitigate these disadvantages.

The present invention accordingly provides, in a first aspect, a computer implemented method for detecting anomalous behaviour of a computer system in a set of computer systems. The method obtains connection data representing communications occurring in each direction between the computer system and other computer systems in the set during a period of time. The method classifies the communications in each direction between the computer system and the other computer systems based on the connection data. The method aggregates the communications in each direction relative to the computer system based on the classifications to produce at least one respective aggregate property for the communications in each direction. The method determines a classification for the computer system based, at least in part, on the at least one respective aggregate property in each direction, wherein at least one of the classes into which the computer system may be classified indicates anomalous behaviour of the computer system.

The connection data may comprise network traffic data, the network traffic data indicating one or more properties of flows of network traffic between the computer systems in the set during the period of time. The connection data may comprise process data obtained from one or more respective processes running on at least one of the computer systems in the set. The process data may comprise authentication data derived from one or more authentication services running on the at least one of the computer systems in the set.

At least one of the classes into which the computer system may be classified may indicate that the computer system is attacking one or more of the other computer systems in the set. At least one of the classes into which the computer system may be classified may indicate that the computer system is the victim of an attack from one or more of the other computer systems in the set. At least one of the classes into which the computer system may be classified may comprise a plurality of classes, each of the plurality of classes further indicating a respective type of attack involving the computer system. The classification of the computer system may be further based on one or more attributes of the computer system. The classification of the computer system may be further based on one or more other attributes that are not directly associated with a specific computer system or the communications between computer systems in the set.

The method may further implement protective measures in respect of the computer system in response to the classification of the computer system indicating anomalous behaviour of the computer system. The protective measures may include one or more of: preventing network communication to and/or from the computer system; performing an anti malware task on the computer system; disconnecting the computer system; increasing a level of monitoring of network communications involving the computer system; and issuing an alert.

The at least one respective aggregate property for the communications in each direction may comprise a respective aggregate classification of the communications in each direction.

The present invention accordingly provides, in a second aspect, a computer system comprising a processor and a memory storing computer program code for performing a method according to the first aspect.

The present invention accordingly provides, in a third aspect, a computer program which, when executed by one or more processors, causes the one or more processors to carry out a method according to the first aspect.

Brief Description of the Figures

In order that the present invention may be better understood, embodiments thereof will now be described, by way of example only, with reference to the accompanying drawings, in which:

Figure 1 is a block diagram of a computer system suitable for the operation of embodiments of the present invention;

Figure 2A is an exemplary graph-based representation of a set of intercommunicating computer systems operating normally;

Figure 2B is an exemplary graph-based representation of a set of intercommunicating computer systems operation abnormally; Figure 3 is a schematic diagram of a component for detecting anomalous behaviour of one or more computer systems in a set of intercommunicating computer systems; and

Figure 4 is a flowchart illustrating a method for detecting anomalous behaviour of one or more computer systems in a set of intercommunicating computer systems.

Detailed Description of Embodiments

Figure 1 is a block diagram of a computer system 100 suitable for the operation of embodiments of the present invention. The system 100 comprises: a storage 102, a processor 104 and an input/output (I/O) interface 106, which are all communicatively linked over one or more communication buses 108.

The storage (or storage medium or memory) 102 can be any volatile read/write storage device such as a random access memory (RAM) or a non-volatile storage device such as a hard disk drive, magnetic disc, optical disc, ROM and so on. The storage 102 can be formed as a hierarchy of a plurality of different storage devices, including both volatile and non volatile storage devices, with the different storage devices in the hierarchy providing differing capacities and response times, as is well known in the art.

The processor 104 may be any processing unit, such as a central processing unit (CPU), which is suitable for executing one or more computer programs (or software or instructions or code). These computer programs may be stored in the storage 102. During operation of the system, the computer programs may be provided from the storage 102 to the processor 104 via the one or more buses 108 for execution. One or more of the stored computer programs which, when executed by the processor 104, cause the processor 104 to carry out a method according to an embodiment of the invention, as discussed below (and accordingly configure the system 100 to be a system 100 according to an embodiment of the invention).

The input/output (I/O) interface 106 provides interfaces to devices 110 for the input or output of data, or for both the input and output of data. The devices 110 may include user input interfaces, such as a keyboard 110a or mouse 110b as well as user output interfaces such as a display 110c. Other devices, such a touch screen monitor (not shown) may provide means for both inputting and outputting data. The input/output (I/O) interface 106 may additionally or alternatively enable the computer system 100 to communicate with other computer systems via one or more networks 112. It will be appreciated that there are many different types of I/O interface that may be used with computer system 100 and that, in some cases, computer system 100, may include more than one I/O interface. Furthermore, there are many different types of device 100 that may be used with computer system 100. The devices 110 that interface with the computer system 100 may vary considerably depending on the nature of the computer system 100 and may include devices not explicitly mentioned above, as would be apparent to the skilled person. For example, in some cases, computer system 100 may be a server without any connected user input/output devices. Such a server may receive data via a network 112, carry out processing according to the received data and provide the results of the processing via a network 112.

It will be appreciated that the architecture of the system 100 illustrated in figure 1 and described above is merely exemplary and that other computer systems 100 with different architectures (such as those having fewer components, additional components and/or alternative components to those shown in figure 1) may be used in embodiments of the invention. As examples, the computer system 100 could comprise one or more of: a personal computer; a laptop; a tablet; a mobile telephone (or smartphone); a television set (or set top box); a games console; an augmented/virtual reality headset; a server; or indeed any other computing device with sufficient computing resources to carry out a method according to embodiments of this invention.

Figure 2A is an exemplary graph 200A representing a set of intercommunicating computer systems 210(1 )-210(6) operating normally. Each of the computer systems 210(1 )- 210(6) may be a computer system 100 as discussed above in relation to Figure 1 .

As will be known to the skilled person, a graph (which may also be referred to as a network) comprises a plurality of vertices (which may also be referred to as nodes) and one or more edges (which may also be referred to as connections). Each vertex can be connected to one or more other vertices by a respective edge - in some cases, more than one edge may connect two vertices. The edges may be directed from one vertex to another, in which case the graph may be referred to as a directed graph. Graphs have been used to model a wide range of systems in order to solve a multitude of problems by representing entities as the vertices of the graph and the relationships between those entities as edges of the graph. Attributes can be associated with the vertices and edges representing properties of the associated entities and relationships that are being modelled by the graph.

This invention makes use of graphs to represent the communications occurring between computer systems in a set of computer systems, such as the computer systems contained in a network or a subnetwork, during a period of time. For example, in the exemplary graph 200A illustrated in figure 2A, the communications occurring between a set of six different computer systems 210(1 )-210(6) during a particular period of time are represented. Where one computer system in the set of computer systems has sent network traffic to another computer system during the particular period of time, those two computer systems are considered to have interacted. This interaction is represented in the graph through the inclusion of an edge between the vertices representing those two computer systems. The edge is directed, representing only the network traffic flowing from one computer system to the other computer system and not any network traffic flowing in the other direction. That is to say, the edge represents network traffic transmitted by the computer system to the other computer system (i.e. where the computer system is the source of the network traffic), but not network traffic received from the other computer system (i.e. where the computer system is the destination of the network traffic). Where communications occur in both directions between two computer systems during the period of time, at least two directed edges are included in the graph between the vertices representing those computer systems with at least one edge in each direction. However, such bi-directional communication is not represented in the exemplary graph illustrated in figure 2A.

Each of the edges of the graph 200A is associated with the respective values of one or more attributes that indicate the properties of the communications between the two computer systems represented by the two vertices connected by that edge during the period of time represented by the graph 200A. For example, as shown in figure 2A, each edge has an attribute representing the total amount of traffic in bytes that has been sent from each computer system to each other computer system during the period of time. As can be seen, in this example, a first computer system 210(1) interacted with two other computer systems, 210(4) and 210(5), in the set during the time period represented by the graph 200A. Specifically, the first computer system 210(1) received 121 bytes from computer system 210(4) and sent 81 bytes to computer system 210(5). The second computer system 210(2) interacted with two of the computer systems, 210(3) and 210(5), in the set. Specifically, the second computer system 210(2) received 100 bytes from computer system 210(3) and sent 57 bytes to computer system 210(5). The third computer system 210(3) only interacted with the second computer system 210(2) by sending 100 bytes to the second computer system 210(2), and did not receive any data from any computer system in the set during the time period represented by the graph 200A. The fourth computer system 210(4) interacted with two computer systems, 210(1) and 210(5), in the set by sending those computer systems 121 bytes and 64 bytes respectively during the time period represented by the graph 200A. The fourth computer system 210(4) also did not receive any data from any computer system in the set during the time period represented by the graph 200A. The fifth computer system 210(5) interacted with three other computer systems 210(2), 210(4) and 210(6). Specifically, the fifth computer system 210(5) received 81 bytes from the first computer system 210(1), 57 bytes from the second computer system 210(2) and 64 bytes from the fourth computer system 210(4). The fifth computer system 210(5) did not send any data to any computer system in the set during this time period. Finally, the sixth computer system 210(6) did not interact with any other computer system in the set, either by sending or receiving data, during the period of time represented by the graph 200A. Accordingly, there are no edges associated with the sixth computer system 210(6) in the graph 200A.

Figure 2B is an exemplary graph 200B representing the set of intercommunicating computer systems 210(1 )-210(6) operating abnormally. That is to say, where one or more of the computer systems 210(1 )-210(6) is behaving anomalously. In this example, the graph 200B represents the interactions between the set of computer systems 210(1 )-210(6) during a different period of time, such as a subsequent period of time, from the graph 200A illustrated in figure 2A. Again, each of the edges of the graph 200B is associated with the respective values of the one or more attributes that were used to label the graph 200A. Flowever, since the graph 200B represents a different period of time from the graph 200A illustrated in figure 200A, the edges that are present in the graph 200B and their associated values for the one or more attributes are different (although it is of course possible, albeit unlikely, that they could remain the same). As can be seen, in this example, the first computer system 210(1) interacted with all the other computer systems in the set during the period of time covered by this graph 200B. Specifically, the first computer system 210(1) transmitted 133 bytes to the fourth computer system 210(4), 87 bytes to the fifth computer system 210(5) and 92 bytes to the sixth computer system 210(6). The first computer system 210(1) also received 5714 bytes from the second computer system 210(2), 4523 bytes from the third computer system 210(3) and 6841 bytes from the fifth computer system 210(5).

The second computer system 210(2) only interacted with the first computer system 210(1) during this time period by sending 5714 bytes to the first computer system 210(1). The third computer system 210(3) also only interacted with the first computer system 210(1) during this time period by sending 4523 bytes to the first computer system 210(1). The fourth computer system 210(4) interacted with two computer systems, 210(1) and 210(4), during the time period covered by this graph 200B. Specifically, the fourth computer system 210(4) received 133 bytes from the first computer system 210(1) and sent 81 bytes to the fifth computer system 210(5). The fifth computer system 210(5) interacted with three computer systems 210(1), 210(4) and 210(6), in the set of computer systems during this time period. Specifically, the fifth computer system 210(5) received 87 bytes from the first computer system 210(1) and 81 bytes from the fourth computer system 210(4). The fifth computer system 210(5) also sent 6841 bytes to the first computer system 210(1) and 59 bytes to the sixth computer system 210(6). Finally, the sixth computer system 210(6) interacted with two computer systems 210(1) and 210(5). Specifically, the sixth computer system 210(6) received 92 bytes from the first computer system 210(1) and 59 bytes from the fifth computer system 210(5).

As will be appreciated, a graph 200, such as the graphs 200A and 200B can be generated from any data that indicates interaction (i.e. communication) occurring between different computer systems from which the connections (or edges) of the graph can be determined. The data from which the graph is generated may therefore be generally referred to herein as connection data.

As an example, network traffic data, such as NetFlow data, can be used to generate graphs representing the interactions between a set of computer systems during a respective period of time. As will be appreciated, NetFlow data provides a continuous stream of traffic information from a network and can be collected from a router within a network over which the set of computer systems communicates. NetFlow data provides various features of the communications occurring between computer systems in a network. It typically provides features such as a source IP address, source IP port, destination IP address, destination IP port, number of packets sent from the source to the destination, number of bytes sent from the source to the destination and number of flows of data between the source and destination. However, other features may also be provided with the network traffic data. A set of vertices V for the generated graph 200 can be determined by creating a vertex v_t for each unique endpoint IP address in the NetFlow data (that is, for each unique IP addresses mentioned as either source IP address and/or destination IP address for one or more flows in the NetFlow data). Alternatively, a set of computer systems of interest can be defined such that the generated graph will only include vertices representing those computer systems - the NetFlow data may then be filtered to only include flows between those computer systems. Next, a set of directed edges E for the generated graph 200 can be determined by generating, for each flow indicated in the NetFlow data, a directed edge e_{ from the vertex v_s representing the computer system having the source IP address to the vertex v_d representing the computer system having the destination IP address. The values of the one or more features provided by the NetFlow data for the flow (such as the number of bytes transmitted) can be added as an attribute value to the generated edge e_{. Where multiple flow entries exist in the NetFlow data between a particular source computer system and a particular destination computer system, a separate directed edge may be provided for each such flow in the generated graph 200. Alternatively, the features of all flows between the particular source computer system can be combined so that they are represented by a single directed edge in the generated graph 200. For example, the total number of bytes for all flows between a source computer system and a destination computer system can be summed and the sum of the number of bytes on all flows used as the attribute for a single directed edge e_t connecting the vertex v_s (representing the source computer system) and the vertex v_d (representing the destination computer system) in the generated graph 200. Of course, since the edges are directed edges, where bidirectional communication occurs (that is, where a computer system is the source for some flows of data to another computer system, but the destination for other flows of data from that other computer system) two directed edges will be present in the generated graph 200 - one going in each direction between the vertices representing the computer systems that are engaged in bidirectional communication. Alternatively, in some cases, only the directed edge in which communication is predominantly occurring (i.e. the edge associated with the larger amount of data) may be included in the generated graph 200, with the more minimal communications in the opposite direction being ignored. In other cases, a hybrid approach may be taken with two directed edges being used to represent the communications between certain pairs of computer systems and only a single directed edge in the predominant direction of communication being used between other pairs of computer systems. For example, where the data flowing in each direction is of a similar magnitude (i.e. where the difference between the amount of data flowing in one direction and the amount of data flowing in the other direction is less than a predetermined threshold), the communications in both directions may be represented by a respective directed edge in the generated graph. Otherwise, if the data flowing in one direction is more substantial than the other (i.e. where the difference between the amount of data flowing in one direction and the amount of data flowing in the other direction is greater than a predetermined threshold), only the communications in the predominant direction may be represented by a directed edge in the generated graph.

Where the NetFlow data indicates computer systems that are not to be included in the generated graph (i.e. computer systems which fall outside the monitoring performed by the invention), the NetFlow data may be filtered so that it only includes flows involving two computer systems that are to be included in the generated graph 200.

Whilst the reminder of the description of the invention will discuss the use of network traffic data, it will be appreciated that there are other sources of connection data that can be used to identify the interactions occurring between computer systems on a network that may be used instead of, or in addition to, network traffic data. As an example, various different types of process data (e.g. process log files) that are generated by processes running on one or more of the computer systems on the network may be used. For example, where a file sharing service is provided by a process on a computer system, the logs for that file sharing service may indicate computer systems that have sent data to and/or retrieved data from the computer system hosting that file sharing service. Similarly, an authentication service running on a computer system may provide data indicating which computer system has authenticated with which other computer systems, thereby providing an indication of interactions occurring between different computer systems. For example, a graph may be generated by creating directed edges between the vertices representing different computer systems when one computer system attempts to authenticate with another computer system. In this example, the attributes for the edges in the created graph may include one or more of: a number of successful login attempts, a number of unsuccessful login attempts, a number of distinct usernames for which successful login attempts have been recorded, a number of distinct usernames for which unsuccessful login attempts have been recorded and so on. Indeed, in some cases, the connection comprise data collected from multiple different types of sources.

Furthermore, although the graphs 200A and 200B illustrated in figures 2A and 2B shows the edges having an attribute representing the total amount of traffic in bytes that has been sent from one computer system to another computer system in the set during a time period covered by each of the graphs, it will be appreciated that other attributes that indicate different properties of the interactions between computer systems may be used instead, or in addition to the total amount of traffic in bytes.

Additionally, whilst the graphs 200A and 200B illustrated in figures 2A and 2B only show an attribute being associated with the edges of each graph, it will be appreciated that values of one or more attributes can also be associated with the vertices of the graph. For example, each vertex may be associated with an attribute indicating a particular type of the computer system represented by each vertex (e.g. “a server “a workstation”, etc.), or an operating system running on that computer system, or indeed any other type of attribute. Flowever, the use of attributes for the vertices of the graph 200 is not necessary and in some cases, no attributes are associated with the vertices of the graph 200 as is the case for the graphs 200A and 200B shown in figures 2A and 2B.

The graphs 200A and 200B will now be discussed further in conjunction with figure 3, which is a schematic diagram of a component 300 for detecting anomalous behaviour of one or more computer systems in a set of intercommunicating computer systems such as may be used to implement embodiments of the invention. This component 300 provides a graph network which can operate on an input graph to determine classifications (or labels) for each of the vertices and edges in that graph. A general discussion of graph networks and various implementations thereof is provided by the paper “Relational inductive biases, deep learning, and graph networks” by Peter W. Battaglia et al. The component 300 receives a set of vertices V and a set of directed edges E as input. The set of vertices V and a set of directed edges E collectively represent a graph 200, such as the graphs 200A and 200B illustrated in figures 2A and 2B. As discussed above, each vertex v_t in the set of vertices V represents a computer system 210 in a set of intercommunicating computer systems, whilst each directed edge e_t in the set of directed edges E represents the interaction between two computer systems during the period of time represented by the graph 200. Respective values of one or more attributes are associated with each edge, such as an associated value for an “amount of traffic in bytes” attribute. In some cases, respective values of one or more attributes may also be associated with each vertex, although this need not be the case. In some cases, a global attribute u may also be provided as an input. This global attribute provides one or more additional factors to be considered which are not directly associated with a specific computer system or the interactions between computer systems in the set. This global attribute u can provide additional context for the analysis that component 300 performs. For example, the global attribute u could provide a time-based context for the input graph 200, in which case the additional factors provided by the global attribute may include one or more of a time of day, day of the week, week of the year and so on. The use of a global attribute may improve the classification that is performed by the component 300, for example by enabling periodic or seasonal behaviour to be better accounted for. However, it is not necessary for a global attribute u to be provided as input to component 300, hence this input is shown using dashed lines in figure 3.

The component 300 comprises two classification blocks 310, namely an edge classification block 310(1) and a vertex classification block 310(2), and two aggregator blocks 320, namely an outbound edge aggregator block 320(1) and an inbound edge aggregator block 320(2).

The edge classification block 310(1 ) implements a classifier function <p^e, which is configured to classify the interactions between two computer systems during a period of time (i.e. the classifier function <p^e is arranged to classify the edges of the graphs illustrated in figures 2A and 2B). The edge classification block 310(1) receives, as an input, the set of edges E (including the respective values of the one or more attributes for each edge e_{ in the set) and provides, as an output E', an indication, for each of the edges e_t in the set of edges E, of a classification (or label) for that edge e_t which indicates whether that connection is considered to be “compromised’ or “normaf’ (i.e. non-compromised). A “compromised’ edge is an edge in which the interactions between the source computer system v_s and the destination computer system v_d during the period of time represented by the input graph 200 are considered to be indicative of an attack on one of the computer systems by the other computer system. Conversely, a “normal· edge is an edge in which the interactions during the period of time represented by the input graph 200 are not considered to be indicative of an attack. As will be appreciated by those skilled in the art, the output E' may provide the classification in a form that is suitable for its subsequent use, such as by encoding the classification using one-hot encoding which is usually more suitable for use with machine learning techniques. In some cases, the classifications that are provided by edge classification block 310(1) may include multiple different types of classification which all indicate an edge e_t to be “compromised’ (and/or multiple different types of classification which indicate an edge e_{ to be “normal·). That is to say, the interactions between the computer systems may be indicative of multiple different ways in which an edge may be considered “compromised’ (e.g. by different types of attack) and the classifier function <p^e may classify each edge into a class which additionally conveys a manner in which the edge is considered “compromised’ (e.g. by a particular type of attack). Therefore, the output from edge classification block 310(1) E' may result in different edges e_t being given different classifications, even though the classifications of those edges all serve to indicate that the edge is considered “compromised’ (albeit in different ways). Of course, in other cases, the classification may be a simple binary classification indicating that the edge is either “compromised’ or “normal·.

The edge classifier function <p^e that is provided by the edge classification block 310(1) classifies an edge e_{ based, at least in part, on the values of the one or more attributes that are associated with that edge. For example, when provided with the graphs 200A and 200B as input, the edge classifier function <p^e classifies each of the directed edges based, at least in part, on the amount of traffic in bytes that has been sent via that edge during the time period covered by the graph (however, in other examples, attributes other than an amount of traffic may be used additionally or instead). In other cases, the edge classifier function <p^e may further base the classification of each edge on the values of one or more attributes associated with either the source vertex for that edge, the destination vertex for that edge, or both. However, it will be appreciated that this is not necessary and that the classification may be entirely based on the values of the attributes associated with the set of edges E that is provided as input to the component 300. Hence the line connecting the input set of vertices V to the edge classification block 310(1) has been shown using a dashed line in figure 3. Similarly, in some cases, the edge classifier function <p^e may further base the classification of each edge on the values of one or more additional factors provided by the global attribute u. However, this is also not necessary, and so the line connecting the input global attribute u to the edge classification 310(1) is again shown using a dashed line. To implement the edge classifier function <p^e, a machine learning model (such as a neural network) may be trained according to a suitable machine learning algorithm based on a set of training data. The training data comprises a plurality of sample graphs representing interactions between sets of computer systems similar to those illustrated in figures 2A and 2B in which the edges are associated with values for one or more attributes that the model is to use to classify the edges. However, the edges of the sample graphs are additionally labelled as being either “compromised’ or “normaf’. As will be understood by the skilled person, the machine learning algorithm can be trained using the training data by iteratively providing sample graphs to the model being trained in order to obtain classifications of the edges in the sample graphs from the model. The classifications provided by the model are then compared to the labelled classification (i.e. “compromised’ or “normaf’) of each edge in the sample graph through a loss function. Back propagation can then be used to update the model for the next iteration of the learning algorithm. Since the skilled person would be readily familiar with such machine learning techniques, they will not be discussed in any further detail. It is also noted that any other suitable means for implementing the edge classifier function <p^e may be used instead.

As an example, a model may be trained which embodies the edge classifier function <p^e to classify the edges based on the amount of traffic in bytes that has been transferred between two computer systems in a particular period of time. It may be expected for example, that the model could learn a threshold amount of traffic which, if exceeded, indicates the communications between those computer systems have been compromised in some way. For example, with reference to the exemplary graphs 200, the model might learn a classifier function <p^e which embodies a threshold of 1000 bytes for classifying an edge as “compromised’. Accordingly, all of the communications illustrated in figure 2A might be classified as being “normaf’ (since they are all under this threshold). Meanwhile, in figure 2B, the communications from the second computer system 210(2) to the first computer system 210(1), from the third computer system 210(3) to the first computer system 210(1) and from the fifth computer system 210(5) to the first computer system 210(1 ) might be classified as being “compromised’ (since they are over this threshold). Similarly, the communications from the first computer system 210(1 ) to the fourth computer system 210(4), from the first computer system 210(1) to the fifth computer system 210(5), from the first computer system 210(1) to the sixth computer system 210(6), from the fourth computer system 210(4) to the fifth computer system 210(5) and from the fifth computer system 210(5) to the sixth computer system 210(6) may all be classified as being “normaf’ (since they are below this threshold). It will of course be appreciated that this is a simplified example aimed at facilitating discussion of the operation of this invention and that more complex classifier functions <p^e may be trained to classify the edges of a graph representing communications between computer systems in a set during a period of time in much more nuanced ways (e.g. by basing the classification on further features such as other attributes of the communications between the computer system or attributes of the source and/or destination computer systems as discussed above).

Having classified the edges of the input graph 200, the output E' from the edge classification block 310(1) is provided as an input to the aggregator blocks 320 which aggregate the edges e_{ associated with each vertex v_t in each direction to produce one or more aggregate properties that represent all the edges associated with that node in a particular direction. Specifically, the outbound edge aggregator block 320(1 ) implements an aggregation function p^_t function which aggregates all the outbound edges from each vertex to produce one or more aggregate properties of all the outbound edges from each vertex. That is to say, for a particular vertex v_t the function p^_t aggregates all edges e_{ for which that vertex is the source vertex v_s and produces one or more aggregate properties representing those edges. Meanwhile, the inbound edge aggregator block 320(2) implements an aggregation function p ^v which aggregates all the inbound edges to each vertex to produce one or more aggregate properties of all the inbound edges from each vertex. That is to say, for a particular vertex v the function p^_t aggregates all edges e_t for which that vertex is the destination vertex v_d and produces one or more aggregate properties representing those edges. In some cases the aggregator blocks 320 may each output a single aggregate property representing the edges in that direction. However, in other cases multiple aggregate properties may be output by each of the aggregator blocks.

The aggregation that is performed by the aggregator blocks 320 takes account of (i.e. is based on) the classifications of the edges, as provided by the output E' from the edge classification block 310(1).

In some cases, the aggregator blocks 320 may output an aggregate classification of the edges in each direction as an aggregate property. For example, the aggregator blocks may determine which classification is predominant amongst the edges in a particular direction (i.e. the classification which is assigned to the greatest number of edges amongst the edges associated with a particular vertex in a particular direction). This predominant classification may then be used as an aggregate classification of all the edges in that direction. This aggregate classification may therefore indicate whether the edges in a particular direction are predominantly “compromised’ or predominantly “normaf’. In cases where the output E' from the edge classification block 310(1) includes multiple different classifications which are all indications of an edge being “compromised”, all such classifications may be grouped together when determining a predominant classification. Similarly, in cases where the output E' from the edge classification block 310(1) include multiple different classifications which are all indications of an edge being “normal’, all such classifications may be grouped together when determining a predominant classification.

Although for simplicity the remainder of the description will focus on the use of an aggregate classification to represent all the edges in a particular direction, it will be appreciated that other aggregate properties can be used in addition, or as an alternative, to an aggregate classification. For example, a total number of edges having each classification can be determined. Similarly, one or more attributes of all edges having each classification can be aggregated. For example, a total or average amount of traffic in bytes for all edges in each direction having each classification may be determined and provided as an aggregate property.

Returning to the exemplary graphs 200 illustrated in figures 2A and 2B, an aggregate inbound and outbound edge classification may be determined for each of the vertices representing computer systems 210(1 )-210(6). For example, in the graph 200A illustrated in figure 2A, even though the fifth computer system 210(5) has separate communications with three different computer systems (i.e. the first computer system 210(1), second computer system 210(2) and fourth computer system 210(4)), a single aggregate classification of all three inbound communications is produced. In this case, since each of these communications is classified as being “normal’, the aggregated classification of the inbound communications for the fifth computer system 210(5) may also be “normal’. Indeed, since all the communications in the graph 200A were classified as “normal’ (in this example), each of the aggregated classifications of the inbound and outbound communications for each computer system 210(1 )-210(6) may also be classified as “normal’. Flowever, in the graph 200B, there are a total of 6 edges representing communications between the first computer system 210(1) and other computer systems in the set. There are three outbound edges representing data sent by the first computer system 210(1) to the fourth computer system 210(4), fifth computer system 210(5) and sixth computer system 210(6). There are also three inbound edges representing data received by the first computer system 210(1) from the second computer system 210(2), third computer system 210(3) and fifth computer system 210(5). The communications sent by the first computer system 210(1 ) to the fourth computer system 210(4), fifth computer system 210(5) and sixth computer system 210(6) were all classified as being “normal’. Therefore, an aggregated classification for the outbound communications from the first computer system 210(1) may also be determined to be “normal·’. However, the communications received by the first computer system 210(1) from the second computer system 210(2), third computer system 210(3) and fifth computer system 210(5) are classified as being “compromised’. Therefore an aggregated classification for the inbound communications to the first computer system 210(1) may also be determined to be “compromised’. Where a mixture of classifications, exists, such as the case for the outbound communications from the fifth computer system 210(5), the aggregated classification may reflect the predominant classification of those communications. For example, if more of the communications are classified as “compromised’ than “normaf’ then the aggregate classification may be “compromised’ and vice-versa. Where an equal number of communications have each classification, as is the case for the outbound communications from the fifth computer system 210(5) in figure 2B, either classification may be chosen. For example, it may be predetermined that in such cases, the aggregate classification is to be “compromised’ even though equal numbers of communications have each classification. Alternatively, a further type of aggregate classification could be used to identify this situation (e.g. a “mixed” aggregate classification could be used to indicate cases where equal numbers of communications have each classification).

The vertex classification block 310(2) implements a classifier function <p^v, which is configured to classify computer systems based, at least in part on their interactions with each other during a period of time (i.e. the classifier function fⁿ is arranged to classify the vertices of the graphs illustrated in figures 2A and 2B).

As an input, the vertex classification block 310(2) receives the one or more aggregate properties of the edges in each direction, as generated by the respective aggregator blocks 320. The vertex classification block 310(2) implements a classifier function <p^v, which is configured to classify the behaviour of the computer system (represented by a vertex v_t) during the period of time represented by the input graph based, at least in part, on the aggregate properties of the edges in each direction. Optionally, the classifier function fⁿ may make use of one or more attributes associated with the vertex v_t being classified. However, this is not necessary and in some cases the classifier function fⁿ solely operates based on the aggregate properties of the edges in each direction. Additionally or alternatively, the classifier function fⁿ may also make use of the global attribute u which may provide one or more additional factors (or attributes) that are not directly associated either with the specific computer system represented by the vertex or with the communications between that computer system and other computer systems in the set represented by the edges associated with that vertex. The vertex classification block 310(2) provides, as an output V , an indication, for each of the vertices v_t in the set of vertices V of a classification (or label) for that vertex v_t classifying the behaviour of the computer system represented by that vertex v_t during the time period represented by the input graph 200. In general, the classifications provided by the vertex classification block 310(2) provide an indication that a computer system’s behaviour is one of: “normal·’, “atacker” or “victim”. However, in some cases, component 300 may only be concerned with identifying one type of anomalous behaviour, such as only those computer systems that are “attackers” or only those computer systems that are “victims”, in which case the other classification may not be used. The computer system’s behaviour may be directly classified into these three classes (that is to say, there may be only three classes into which computer systems are classified). However, in other cases more than three classes may exist. For example, multiple classifications may indicate a computer system that is behaving as an “atacker”, such as by using a separate classification for an attacking computer system that further indicates a specific type of attack. Similarly, multiple classifications may indicate a computer system that is a “victim” of an attack. For example, a separate classification may be associated with a victim computer system for each of a plurality of different types of attack.

In a similar manner to the edge classifier function <p^e, the vertex classifier function fⁿ may be implemented using a machine learning model (such as a neural network) that has been trained according to a suitable machine learning algorithm based on a set of training data. In particular, the sample graphs that form the training data (as discussed earlier in relation to the edge classifier function <p^e) also includes a label for each of the vertices v_t indicating a classification of that vertex (e.g. “normal·’, “attacker” or “victim”). The machine learning algorithm can be used to train a model that embodies the vertex classifier function fⁿ. To do so, the edge labels for the sample graphs in the set of training data may be fed through the edge aggregators 320 to produce the aggregate properties for the edges in each direction (i.e. inbound or outbound) for each vertex. The aggregate properties can then be provided as input to the model being trained (in addition to any other inputs that are to be used by the model, such as specific attributes of the vertex) and a classification of the vertex obtained from the model. In this way, the training of the vertex classifier function can be performed separately from the training of the edge classifier function. The classification for each vertex as produced by the model can then be compared to the correct classification of each vertex, as indicated by the labels on the sample graph, through a loss function and the model can be updated via back propagation ready for the next iteration of training. Again, the skilled person would be familiar with such machine learning techniques, so they will not be discussed in any further detail herein. It is also noted that any other suitable means for implementing the vertex classifier function fⁿ may be used instead.

In the simplified example, that has been discussed in relation to figures 2A and 2B, the vertex classifier function may learn to classify the role of a computer system in an attack based on the aggregate classification of the communications in each direction. For example, in this simplified example, a computer system with an aggregate classification of outbound communications that is “compromised” (such as the second, third and fifth computer systems 210(2), 210(3) and 210(5) in figure 2B) may be labelled as an “attacker”, whilst a computer system with an aggregate classification of inbound communications that is “compromised” (such as the first computer system 210(1) in figure 2B) may be labelled as a victim. Of course, it will be appreciated that the complexity of the vertex classification function can be increased to improve the classification of the computer systems. For example, as discussed above, the aggregate property for the inbound and outbound communications from a computer system may include a total number of edges associated with different types of classes, whereby the different type of class further indicate a type of attack. Accordingly, the vertex classification function may classify a computer system dependent on the type of attack. For example, if the aggregate property for the inbound edges to the first computer system 210(1) in figure 2B indicates that the inbound communications are predominantly associated with a data extraction attack, the vertex classification function may instead be trained to classify the first computer system as an “attacker” in such a type of attack and the second, third and fifth computer systems 210(2),

210(3) and 210(5), from which data was received by the first computer system 210(1) as the “victims”.

The functions of the various elements of the component 300 illustrated in figure 3 will now be discussed further in relation to figure 4 which is a flowchart illustrating a method 400 for detecting anomalous behaviour of one or more computer systems in a set of intercommunicating computer systems, such as the set 200 of computer systems 210(1)- 210(6) illustrated in figures 2A and 2B.

At an operation 410, the method 400 obtains connection data. The connection data represents communications occurring in each direction between the computer system and other computer systems in the set during a period of time. As discussed above, any sort of data which represents communications occurring between the computer systems in the set can be used. In some cases, the connection data may comprise network traffic data (or otherwise be derived from network traffic data), such as NetFlow data, but in other cases may additionally or alternatively include (or be derived from) other types such as process data from one or more processes running on at least one of the computer systems in the set (e.g. authentication data). In obtaining the connection data, the method 400 may filter the various datasets from which it is obtained so that the connection data only includes data representing communication between computer systems that occurred within the period of time being considered by the current iteration of method 400 (i.e. so that the connection data does not represent communications occurring outside of the period of time currently under consideration). In some cases, it may be necessary to process the connection data to summarise the communications occurring between computer systems during the period of time. For example, the connection data may include multiple representations of communications between the two computer systems (such as via different protocols or representing distinct communication sessions). In which case those multiple representations may be summarised (or aggregated) into a single representation of the communications between the two computer system during the period of time (e.g. to give a total number of bytes transmitted from one computer system to the other regardless of the protocol or communication session under which that data was transmitted).

At an operation 420, the method 400 classifies the communications occurring in each direction between the computer system and the other computer systems based on the connection data. As discussed in relation to the component 300 illustrated in figure 3, the edges can be classified using a classifier function which indicates whether each connection (i.e. the communications occurring between the computer system and another computer system during the period of time under consideration) is considered to be “compromised’ or “normar.

At an operation 430 the method 400 aggregates the communications in each direction relative to the computer system based on the classifications to produce at least one respective aggregate property for the communications in each direction. That is to say, all the inbound communications are aggregated to produce at least one respective aggregate property for the inbound communications to the computer system during the time period and all the outbound communications are aggregated to produce at least one respective aggregate property for the outbound communications from the computer system during the time period. As discussed in relation to the component 300 illustrated in figure 3, the aggregate property for the classifications of the inbound communications to the computer system from other computer systems can be produced by the aggregation function p ^v of the inbound edge aggregator block 320(2). Meanwhile, the aggregate property for the classifications of outbound communications from the computer system to other computer systems can be produced by the aggregation function p^_t of the outbound edge aggregator block 320(1).

At an operation 440, the method 400 determines a classification for the computer system. The classification is determined based, at least in part, on the at least one respective aggregate property in each direction (i.e. on the at least one aggregate property that was determined for the inbound communications to the computer system and the at least one aggregate property that was determined for the outbound communications from the computer system). As discussed in relation to the component 300 illustrated in figure 3, the classification for the computer system can be produced by a vertex classifier function fⁿ. The classification of the computer system indicates whether the computer system is behaving anomalously (or not). That is to say, at least one of the classes into which the computer system may be classified at operation 450 indicates anomalous behaviour of the computer system. In some cases, the classification may indicate that the computer system is an “attacker”. That is to say that the computer system has been determined to be the source of attacks on one or more of the other computer systems in the set during the time period being considered. In some cases, the classification may indicate that the computer system is a “victim”. That is to say it is the target of attacks from one or more of the other computer system in the set during the time period being considered. In some cases, the classification may indicate a respective type of attack that the computer system is involved in (such as a denial of service attack or a data extraction attack). That is to say, the classification indicate that the computer system is an “ attacker ” performing that type of attack on one or more other computer systems or that the computer system is a “ victim ” of that type of attack that is being carried out by one or more of the other computer systems in the set.

At an optional operation 450, the method 400 implements any protective measures that may be needed to prevent or mitigate any threat to, or caused by, the computer system on the basis of the classification of the computer system at operation 450. In some cases, the protective measure that is selected is dependent on the classification of the computer system. For example, the protective measure may be different where the computer system is determined to be a “ victim ” of an attack, compared to the protective measure that might be taken if the computer system is determined to be the “attacker”. Similarly, where the classification of the computer system indicates whether it is the “ victim ” or “ attacker ” for a particular type of attack, the protective measures that are applied may be dependent upon the type of attack. Furthermore, where the classifications of multiple computer systems in the set have been obtained, for example through multiple iterations of the method 400, the selection of a protective measure may be based on the classification of multiple computer systems, or all of the computer systems, in the set.

As examples, the protective measures that may be applied by this operation 450 include one or more of: the deployment of firewalls; performing additional authentication or authorisation checks; preventing network communication to and/or from the computer system; performing an anti-malware task on the computer system; increasing a level of monitoring, tracing and/or logging of network communications involving the computer system; and issuing an alert. However, any other suitable protective measures may be applied instead or in addition to such measures as will be apparent to those skilled in the art.

Returning to the example illustrated in figure 2B, the first computer system 210(1) might have been classified as being a “victim of a distributed denial of service attack’, whilst the second computer system 210(2), third computer system 210(3) and fifth computer system 210(5) might each have been classified as being an “attacker in a distributed denial of service attack’. Accordingly, at operation 450, the method 400 may apply protective measures to the first computer system 210(1) by blocking any communications from computers systems that have been classified as being attackers in that kind of attack. The method 400 may therefore reconfigure various systems in the network to prevent communications from second, third and fifth computer systems 210(2), 210(3) and 210(5) reaching the first computer system 210(1), whilst enabling all other communications, such as between the first computer system 210(1) and the fourth computer system 210(4) or between the first computer system 210(1) and the sixth computer system 210(6) to continue as normal. Alternatively, the method 400 may disconnect the computer systems that were classified as being “attackers” (i.e. the second computer system 210(2), third computer system 210(3) and fifth computer system 210(5)) to remove the threat both to the first computer system 210(1 ) and any other computer systems in the network. Of course it will be appreciated that this is just a simplified example and that any combination of appropriate protective measures may be applied, as suited to the context of the attack (that is to say, the roles of various computer systems in relation to the attack, as provided by the classifications determined at operation 440).

After applying any protective measures at optional operation 450, or if operation 450 is not present in the method 400, the method 400 proceeds to an optional operation 460. However, where the selection of one or more protective measures is to be based on the classification of multiple (or all) computer systems in the set operation the operation 450 may be performed subsequent to optional operation 460 (if present). At optional operation 460, the method 400 determines whether there are further computer systems to classify for the period of time currently being analysed. If so, the method 400 returns to operation 420 and repeats operations 420, 430, 440 and 450 for the further computer systems. As will be appreciated, when focussing on a single computer system of the set of computer systems (i.e. when only attempting to monitor for anomalous behaviour of a specific computer system in the set), only the communications involving that computer system need to be classified and communications between other pairs of computer systems in the set may be ignored. However, when focussing on multiple computer systems, or even when looking for anomalous behaviour of any computer system in the set, it will be necessary to classify the communications involving each computer system being monitored. Hence, the method 400 may be performed iteratively by focussing on a different computer system with each iteration. Alternatively, the processing required to classify one or more or all of the computer systems in the set may be performed substantially in parallel by performing multiple instances of method 400. Alternatively, rather than iteratively performing each of the steps of the method 400, each operation of the method 400 may be performed in respect of all the computer systems (and connections) in the set before proceeding to the next step. For example, at operation 420, the method 400 may classify all of the communications occurring between all the computer systems in the set (i.e. it may classify all of the edges in an input graph representing the communications between the computer systems in the set) before proceeding to operation 430 and so on.

If it is determined at operation 460 that there are no further computer systems to be classified during the period of time currently being analysed, or if operation 460 is not present in the method 400, the method 400 proceeds to an optional operation 470.

At an optional operation 480, the method 400 determines whether a further period of time should be analysed. If so, the method 400 returns to operation 410 to obtain connection data for the further period of time and repeats operations 420, 430, 440, 450 and optionally operation 460 to detect anomalous behaviour during the further period of time.

As discussed above, each graph represents the interactions occurring between intercommunicating computer systems during a particular period of time. That is to say, each of the graphs represents the interactions that occur between the computer systems during a predetermined duration of time. Where it is desired to monitor a network over a longer period of time, rather than simply producing a graph representing all communications occurring during that longer period of time, a sequence of graphs may instead be obtained by partitioning the communication data into partitions, whereby each partition comprises connection data for a particular time window having the predetermined duration. A corresponding graph can then be generated for each time window. For example, if one hour of NetFlow data was captured, this can be split into one-minute time windows to produce 60 partitions. Each of these partitions can then constructed as a separate graph representing the communications occurring in a respective one of the one-minute time windows. By splitting the connection data (e.g. the NetFlow data) into such partitions and creating separate graphs each covering a smaller time window, more accurate observations can be generated. This is because the potential loss of seasonality or data spikes that could result from using a single graph to cover the same overall time period can be avoided. Again, it will be appreciated that rather than analysing each time period sequentially as illustrated in figure 4, it is also possible for other time periods to be analysed substantially in parallel by separate instances of method 400.

If it is determined at operation 480 that there are no further periods of time to be analysed, or if operation 480 is not present in method 400, the method 400 ends.

Through the use of the above-described techniques, embodiments of the invention can detect anomalous behaviour in a network and can classify computer systems in the network based on their role in a detected attack (e.g. as being either the victim or attacker in a particular attack). The insight provided by these classifications into the activity occurring in a network can help to improve the effectiveness of any response that is applied to prevent or mitigate an attack (or future attacks). Furthermore, through the use of the aggregator blocks 320, the technique can be applied in a manner which is less dependent on the underlying network topology. This means the models can be trained on data from networks having different topologies from the networks upon which they are deployed to detect anomalous behaviour.

Insofar as embodiments of the invention described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example. Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention. It will be understood by those skilled in the art that, although the present invention has been described in relation to the above described example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which fall within the scope of the invention. The scope of the present invention includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.

Claims

1. A computer implemented method for detecting anomalous behaviour of a computer system in a set of computer systems, the method comprising: obtaining connection data representing communications occurring in each direction between the computer system and other computer systems in the set during a period of time; classifying the communications in each direction between the computer system and the other computer systems based on the connection data; aggregating the communications in each direction relative to the computer system based on the classifications to produce at least one respective aggregate property for the communications in each direction; and determining a classification for the computer system based, at least in part, on the at least one respective aggregate property in each direction, wherein at least one of the classes into which the computer system may be classified indicates anomalous behaviour of the computer system.

2. The method of claim 1 , wherein the connection data comprises network traffic data, the network traffic data indicating one or more properties of flows of network traffic between the computer systems in the set during the period of time.

3. The method of any one of the preceding claims, wherein the connection data comprises process data obtained from one or more respective processes running on at least one of the computer systems in the set.

4. The method of claim 3, wherein the process data comprises authentication data derived from one or more authentication services running on the at least one of the computer systems in the set.

5. The method of any one of the preceding claims, wherein at least one of the classes into which the computer system may be classified indicates that the computer system is attacking one or more of the other computer systems in the set.

6. The method of any one of the preceding claims, wherein at least one of the classes into which the computer system may be classified indicates that the computer system is the victim of an attack from one or more of the other computer systems in the set.

7. The method of any one of the preceding claims, wherein the at least one of the classes into which the computer system may be classified comprises a plurality of classes, each of the plurality of classes further indicating a respective type of attack involving the computer system.

8. The method of any one of the preceding claims, wherein the classification of the computer system is further based on one or more attributes of the computer system.

9. The method of any one of the preceding claims, wherein the classification of the computer system is further based on one or more other attributes that are not directly associated with a specific computer system or the communications between computer systems in the set.

10. The method of any one of the preceding claims further comprising implementing protective measures in respect of the computer system in response to the classification of the computer system indicating anomalous behaviour of the computer system.

11 . The method of claim 10, wherein the protective measures include one or more of: preventing network communication to and/or from the computer system; performing an anti-malware task on the computer system; disconnecting the computer system; increasing a level of monitoring of network communications involving the computer system; and issuing an alert.

12. The method of claim 1 , wherein the at least one respective aggregate property for the communications in each direction comprises a respective aggregate classification of the communications in each direction.

13. A computer system comprising a processor and a memory storing computer program code for performing a method according to any one of claims 1 to 12.

14. A computer program which, when executed by one or more processors, causes the one or more processors to carry out a method according to any one of claims 1 to 12.