WO2021262136A1

WO2021262136A1 - Monitoring an embedded system

Info

Publication number: WO2021262136A1
Application number: PCT/US2020/038957
Authority: WO
Inventors: Daniel Cameron ELLAM; Adrian John Baldwin; Jonathan Griffin; Stuart Lees
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2021-12-30
Also published as: TW202201253A

Abstract

A method is disclosed for monitoring an embedded system within a network, where the network has a plurality of users and the embedded system is configured to generate log messages related to user-interaction events. The log messages are received, from the embedded system, wherein the log messages indicate that one of a set of defined user-interaction events has occurred. For each user, for a defined time period, a feature vector is populated using the user-interaction events that occurred for that user in that time period. Clusters are assembled of the feature vectors from a subset of the plurality of users from that time period using a clustering method. The clusters are monitored to determine an anomalous feature vector based on an analysis of a stored record of acceptable clusters arrangements. An alert is output upon determination of an anomalous feature vector.

Description

MONITORING AN EMBEDDED SYSTEM

[0001] Embedded systems are in widespread use and may be found, for example, in consumer, industrial, and commercial applications. Embedded systems may be understood to refer to a computer system having a dedicated purpose within a larger system or device. Examples of embedded systems include controllers for domestic appliances, digital watches, printers, video-conferencing systems, and manufacturing assemblies. Embedded systems may be within computer networks, for example those within an enterprise which may be referred to as enterprise networks. Such networks may include different computing devices of different types, for example personal computing devices and devices including embedded systems.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Examples of the disclosure are further described hereinafter with reference to the accompanying drawings, in which:

Figure 1 is a block diagram of a monitoring system according to the disclosure within an example network;

Figure 2 is a flow chart of a method of monitoring an embedded system according to the disclosure;

Figure 3 is a flow chart of a method of monitoring an embedded system according to the disclosure;

Figure 4(a) is a graphical representation of clusters generated from a data set according to a method of the disclosure;

Figure 4(b) is an alternative graphical representation of clusters generated from the data set used for Figure 4(a) according to method of the disclosure;

Figure 5 is a block diagram of an example machine-readable medium with a processor.

DETAILED DESCRIPTION

[0003] Within an organisation, there will typically be a number of embedded systems that are used and administered by users within the organisation in a variety of ways. Such devices may be connected to an enterprise network of the organisation, and in such cases, they represent a potential security risk. The embedded systems may be used or administered improperly by those within the organisation and may be at risk of external attacks. Improper actions may be the result of human error or may be indicative of an attack. Even if the result of human error, this can leave the embedded system and the network open to external attack. Current approaches for monitoring risk in relation to embedded systems are based on existing systems for monitoring risk with PCs. However, many of these techniques are directed to issues that are not expected to arise with embedded systems. Additionally, a PC-risk based approach will not cover issues specific to embedded systems.

[0004] Referring now to Figure 1, there is shown a block diagram of a monitoring system 100. Such a monitoring system may be used to identify anomalous behaviour involving an embedded system in a network. Such anomalous behaviour may indicate inappropriate use of the embedded system, which may represent a security risk. The monitoring system 100 comprises a processor 102, memory 104, an input 106, and output 107, and is located in a network, indicated generally by the reference numeral 110. The input 106 is configured to receive log messages from an embedded system 108 in the network 110.

The example network of Figure 1 comprises, as well as the monitoring system, three computing devices 112a, 112b, 112c and the embedded system 108, connected together by various communication links. Figure 1 shows three users 114a, 114b, 114c who can interact with the embedded system 108, either directly or via one of the computing devices 112c. User interactions with the embedded system are referred to herein a user-interaction events of the embedded system. Each user-interaction event results in the embedded system generating a message relating to the user-interaction event. Such messages may be referred to as user activity logging messages, or simply logging or log messages. The log messages may be produced by the embedded system in response to a user input. In Figure 1 , these log messages are transmitted, at M, to the input 106 of the monitoring system 100. The transmission of the log messages may use standard networking protocols and may conform to the syslog standard.

[0005] The memory 104 stores computer executable instructions therein which, when executed by the processor 102, cause the processor 102 to carry out the actions described herein.

[0006] For each user 114a, 114b, 114c, for a defined time period, the monitoring system 100 generates a feature vector based on the user-interaction events that occurred for that user 114 in that time period. The feature vector captures how a user interacts with the embedded system and is discussed in more detail later in the description.

[0007] The monitoring system 100 uses a clustering method to group the feature vectors from a subset of the users 114 from that time period into clusters. The clusters provide a representation of user behaviour. The subset of the users may include all of the users of the network. Each data point in the clusters is a feature vector and represents the relationship between one user and the embedded system 108.

[0008] The monitoring system 100 monitors the clusters for each time period to identify a feature vector that represents an anomaly when compared with a stored record of acceptable clusters. The stored record may include acceptable clusters and cluster configurations; acceptable feature vector behaviour, such as moving between clusters; and other cluster-related metrics. The stored record of acceptable clusters may be created based on certain predefined rules and comparison between clusters of previous time periods, may be based on rules derived from training data; or may be otherwise defined.

[0009] The monitoring system 100 may repeat the generation of the feature vector, use of the clustering method and monitoring of the clusters for subsequent time periods.

[0010] The output 107 is configured to generate an alert when anomalous behaviour is detected by the monitoring system. The results may be output to a computing device within the network. For example, a computing device of a network administrator or a computing device that is part of a security system for the network.

[0011] The monitoring system 100 monitors the clusters to identify feature vectors of users whose behaviour deviates from an approved behaviour. In this way, the monitoring system of the disclosure recognises that the information contained in the log messages may be utilised to improve the safety of the network and embedded systems within it. The combination of the log messages and clustering algorithms of the monitoring system of the disclosure provides an effective and efficient system for detecting unusual user interactions with the embedded systems, including interactions that change over time. This information can be useful if identifying behaviour that represents a security issue in the network. It also provides useful information on the usage of the embedded system or systems in the network, which is useful for planning and maintenance purposes.

[0012] The stored record of acceptable clusters may be generated through a training phase, where the features vectors and clusters are generated as described herein, and then manually analysed and annotated to identify acceptable and non-acceptable cluster assignment and arrangements. The annotation may include identifying high risk and low risk changes in a user’s behaviour, and other potentially anomalous behaviour. Alternatively, the stored record may be based on a set of rules, and comparison with clusters from previous time periods. Such rules may include that any movement of a feature vector out of a previous cluster is an anomaly, that any feature vector located in a cluster predominantly comprised of users of another type is an anomaly, that any feature vector indicating high volumes of a user-interaction events is an anomaly, and other broad rules. A stored record of acceptable clusters not initially based on specific training data may be trained over time, as the rules are updated based on the use of the monitoring system over time.

[0013] While only one embedded system has been shown in Figure 1, it will be understood that the disclosure is not limited to the presence of only one embedded system in the network with monitoring system 100. The monitoring system 100 may receive log messages from more than one embedded system within the network and carry out the other actions described herein based on those log messages. By monitoring multiple embedded systems, it is possible to identify patterns of misuse that would be difficult to identify by monitoring a single embedded system alone.

[0014] The network 110 may be a network within a single organisation such as an enterprise network. Alternatively, the network may be any network comprising an embedded system.

[0015] Embedded systems may include a wide variety of devices or systems having a defined function, and including printers and multi-function print devices, video conferencing systems, display screens, door lock and access devices such as card readers, and so on. In industrial contexts embedded systems may include manufacturing equipment such as machining tools, 3D printers, and quality checking equipment.

[0016] The users of the embedded system may be classified according to a number of types for example, general user and administrator. Other user types may include a guest user or a supervisor user. Different user types can be expected to interact with embedded systems in different manners. Typically, a general user may have permission to use the embedded system for its intended purpose. A supervisor user may be defined for a user of an embedded system within an industrial setting. The supervisor user may have the same permissions as the general user, but also have some additional permissions related to the purpose of the device, such as being able to review the use of general users or adjust settings. In some instances, supervisor users may have some configuration permissions as well. An administrator user may have permission to configure the embedded system, and other administrative functions such as adding or deleting users. In some instances, for example, where the embedded system is in an industrial setting, the administrative users may not have permission to use the embedded system for its intended purpose. A guest user may have a very limited set of permissions.

[0017] The log messages may comprise an event stream consisting of events, where the log messages contain information such as a timestamp, entity identifier, and an activity identifier. The timestamp provides an indication of the time the event occurred. The entity identifier may comprise a device ID, username of the user doing the action, and any other entities involved in the transaction such as servers, computers and the like. In an example where the embedded system is a printer, the device ID could be a printer ID, such as a hostname, serial number, or the like of the printer being acted upon. The activity identifier provides a brief description of the event that occurred, such as “Successful login”, or “Print job completed”, etc. Alternatively, this could be an event code with an alphanumerical representation, such as <45>, <4224>, <12.AB.3F>. The structure and content of the log messages may depend on the embedded system and the communications protocols in use.

[0018] The feature vector is based on the user interaction events that have occurred for one user for a particular time period. In an example, the structure of the feature vector is derived by enumerating the possible user interaction events that may occur. The feature vector will then have that number of elements, each representing one of the possible user interaction events. A time period of data collection for this feature vector is defined and log messages are received and collated for that time period. At the end of the time period, the number of each type of event that has occurred is counted. The feature vector is constructed by assigning the count value to the relevant element of the feature vector. Alternative manners of structuring and populating the elements of the feature vector may be utilised within the scope of the disclosure.

[0019] The time period for may be varied as required. It will be understood that to gather suitable levels of data, the time period may be adjusted accordingly. In an example, the time period is chosen to, on average, include approximately one user interaction session with the embedded system. A user interaction session may be understood as a transaction, for example, logging in to a printer to print a document; logging into a multi function printer to print, and then scan a document; or an admin user logging in to adjust configuration settings on an embedded system.

[0020] The clustering methods used by the monitoring system 100 may comprise centroid-based clustering. Additionally or alternatively, the clustering methods used by the monitoring system 100 may comprise graphical based clustering. Other types and examples of clustering methods may be used within the scope of the disclosure. The clustering methods may be operable in real time.

[0021] In an example, the monitoring system the clustering method comprises t- Distributed Stochastic Neighbour Embedding (t-SNE). Clustering methods such as t-SNE can project a high-dimensional feature vector to 2-dimensional space. In this way, the use of t-SNE facilitates visualisation of the data. The t-SNE projection attempts to preserve distances between feature vectors which are close in the high dimensional space. [0022] In another example, the monitoring system 100 uses k-modes as part of its clustering process. k-Modes can be used to cluster data points in an analytical rather than graphical way. Cluster centroids are produced by the algorithm which can provide an interpretation of the clusters. Each data point is measured by distance from each centroid and is assigned to the cluster having the centroid closest thereto.

[0023] Properties of the clusters can be measured, such as cluster density, and intra cluster distances, i.e. distances between centroids). In some examples, both t-SNE and k- Modes are used to generate clusters in the monitoring system.

[0024] Distance metrics may be used to identify a feature vector that represents an anomaly. The distance metrics allow the distance between a feature vector of the time period being monitored to be checked compared with the stored record of acceptable clusters. The location of the feature vector in the 2D cluster space is checked against the locations representative of each of the clusters of the stored record, by calculating the distance between the two locations. The locations representative of each of the clusters of the stored record may be cluster centroids. The monitoring system 100 may use distance metrics such as Euclidian distance metric and the Jaccard dissimilarity index. Other suitable metrics for assessing similarity or distance may be used.

[0025] Anomalous feature vectors may be any feature vector indicating a change of user behaviour with respect to the embedded systems of the network. For example, if a feature vector is not included in the cluster in which it was included in the stored record of acceptable clusters, it may be an anomalous feature vector. Such a feature vector that moves out of its existing cluster may move to another cluster, including a new cluster, or may not be included in any cluster. Such a feature vector not be included in any cluster may be referred to as outlier.

[0026] Additionally, the clusters have a location in a cluster space and a feature vector represents an anomaly if it is located in a cluster having a changed location compared to the stored record of acceptable clusters.

[0027] Anomalous feature vectors identify anomalous behaviour, such as a user doing something that user has not done before, a user doing something new for their user type, or a user doing something associated with another user type.

[0028] An alert generated by the monitoring system may indicate the user and embedded system that gave rise to the anomalous feature vector. The alert may further provide information including the user-interaction event or events of the feature vector and may include the log messages associated with the user-interaction event. Additional context may also be included. [0029] Referring now to Figure 2, there is shown a flow chart of a method 200 for generating a record of acceptable clusters based on the usage of an embedded system within a network. The method 200 may be implemented by the monitoring system 100 of Figure 1 and may be considered a training phase therefor. The method 200 of Figure 2 may be carried out in real time, or on a block of data gathered over a training period and then processed according to this method. The training phase may run over a period of days, and may extend up to four weeks or more, depending on the size and complexity of the network in question.

[0030] At block 202, log messages are received from the embedded system or systems. The messages are processed, at 204, to identify the user interaction events to which they relate. These initial blocks are repeated throughout a predefined time period. An example time period is two hours, however, the time period may be varied.

[0031] At the end of the time period, at 206, a feature vector is generated for each user based on their user interaction events from that time period. In a network with more than one embedded system, a feature vector may be generated for each user-embedded system pairing. If the training phase is being conducted on a block of data covering the full training period and not in real time, the data may simply be processed in sub-blocks of a suitable time period, such that feature vector is generated for each sub-block for each user.

[0032] At 208, clustering algorithms are applied to the feature vectors for the time period to group the feature vectors into clusters. Suitable clustering methods may include, as discussed herein, t-SNE and k-modes.

[0033] At 210, the generated clusters are annotated to mark acceptable and unacceptable behaviours. The annotation may include identifying high risk and low risk changes in a user’s behaviour, such as the movement of data points in the cluster space between time periods and other potentially anomalous behaviour. Behaviours that are high risk, include a user’s feature vector being located in a cluster predominantly formed of feature vectors of users of another type those that are low risk, and so on.

[0034] Blocks 202 to 210 are repeated for a number time periods. The accumulated X annotated cluster arrangements are analysed to identify a baseline pattern of usage of the embedded system or systems. The baseline pattern of usage may comprise a stored record of acceptable clusters. The training data may be annotated manually by a user familiar with the monitoring system.

[0035] The training phase may comprise cross-validation of the training data to improve its reliability. [0036] In an example, a training phase is conducted to allow the monitoring system 100 to operate in an office network comprising general users and admin users and where the embedded systems predominantly comprise printers. In such a network, the following eight clusters can be expected to appear:

1. low volume login and low volume printing;

2. low volume failed login via administrative interface;

3. high volume password changes, and high-volume login via administrative interface;

4. medium volume printing;

5. high volume login via control panel

6. high volume printing;

7. low volume printing; and

8. low volume login via control panel.

[0037] It will be understood that these represent a possible set of clusters, and is not limiting on the clusters that appear from a particular network. However, this list of clusters can be used to assist in annotating the training data. For example, if one of these clusters was missing, the training data may need to be examined to identify why. The data may be annotated that if such a cluster were to appear while the monitoring system was in use, it should or should not be flagged an anomalous. Similarly, if an additional cluster or additional clusters were present in the training data, the reasons for their presence should be ascertained and annotated accordingly.

[0038] Annotating training clusters of this structure may include specifying that a feature vector movement between cluster 1 and cluster 4, between cluster 1 and cluster 7, or between cluster4 and cluster 6 should not be marked as anomalous, however, movement between all other pairs of clusters should be marked as anomalous.

[0039] The stored record of acceptable clusters and cluster behaviour that is generated by the method 200 of Figure 2 may be updated periodically to adapt the rules used therein.

[0040] Referring now to Figure 3, there is shown a flow chart of a method 300 according to the disclosure for monitoring an embedded system. The method 300 of Figure 3 may follow on from the method 200 of Figure 2. The method 300 may be implemented by the monitoring system described in relation to Figure 1.

[0041] In the method 300 of Figure 3, feature vectors 302 are generated as described in relation to blocks 202 to 206 of the method 200 of Figure 2. The method may comprise receiving, from the embedded system, the log messages related to user-interaction events, wherein log message indicates that one of a set of defined user-interaction events has occurred. Then, for each user, for a defined time period, the method may comprise populating a feature vector using the user-interaction events that occurred for that user in that time period.

[0042] Next, in a manner similar to the method 200 of Figure 2, the clusters are generated at 208. The method may comprise assembling the clusters of the feature vectors from a subset of the plurality of users from that time period using a clustering method. At 304, anomalous user interaction events are identified by reference to the clusters with a stored record of acceptable clusters. The method may comprise monitoring the clusters to determine an anomalous feature vector based on an analysis of a stored record of acceptable clusters arrangements. The acceptable clusters arrangements may be determined according to the method 200 of Figure 2.

[0043] At 306, an alert is generated for any anomalous feature vectors. The method may comprise outputting an alert upon determination of an anomalous feature vector.

[0044] The method then repeats for subsequent time periods.

[0045] Referring now to Figures 4(a) and 4(b), there is shown representations of user interaction event data processed according to the disclosure. The data reported here was using a 2-hour window and with grouped syslog messages. But the window size will depend on the type of embedded systems and their use patterns. The same data set is used to create both representations. However, in Figure 4(a), the data points are annotated according to the class of the user-interactions events in the feature vector. Alternatively, in Figure 4(b), the data points are annotated according to whether the user is an admin user or not. Each data point in Figures 4(a) and 4(b) represents a feature vector projected onto a 2-dimensional space. This feature vector is itself a representation of an interaction defined by a window of time between a user and a printer. As such, each user may have produced multiple datapoints in the figures, since the user may have interacted with multiple printers or with the same printer multiple times. The data points are annotated by the class of interaction between the user and printer. In Figure 4(a), the (+) data points correspond to a print job, (·) data points correspond to a printing from the control panel,

(_■) data points correspond to a logins, (¨) data points correspond to an admin task, (◄) data points correspond to a configuration change, and (T) data points correspond to a security configuration change. In Figure 4(b), the (+) data points correspond to a non admin user and (·) data points correspond to admin users. Clusters can be defined by drawing bounding-boxes (not shown) covering segments of the 2-dimensional space. Concretely, each bounding box can be defined by specifying four (x,y) coordinates (or possibly more), which together define a rectangle (or in general, n-gon). Any feature vector which maps to some (x,y) lying within a given bounding box, is said to lie in the cluster defined by that bounding box.

[0046] The representations in Figures 4(a) and 4(b) are generated based on t-SNE.

Some clusters are readily identifiable, such as the large circle centred at approximately (2.5, 10) representing any print job (by substantially non-admin users.) This represents low-level printing. Many user interaction events fall into this cluster, resulting in the defined circle seen. These are users who have performed a small number of print jobs within the time window. The circle towards the top of the graph here also represents print jobs, but in higher volumes. The cluster around (15, -5) are administrative actions. Any apparent non admin users showing activity here would be of interest. These clusters suggest for each type of high-level interaction between a user and a printer (i.e. any-print-job, print-from- control-panel, etc), there are only a small number of variations in how this happens in practice. If there were a large variation in, say, security-config-change actions, happening through the user also simultaneously performing print jobs, logins, general-config-changes, etc., then this would manifest as distance between the security-config-change events t- SNE would then show this up by spreading the brown points across the 2D space.

[0047] Figure 4(b) illustrates a distinction between admin and non-admin activities. There is a pure admin cluster approximately centred around (15, -5). However, anomalies can also be spotted -a closer inspection shows that there is a single non-admin data point (not shown) within this cluster. This is a clear anomaly detectable with the visualisation and/or combined with the bounding boxes, however it is not visible in Figure 4(b), as it located under an admin data point. However, it is clear from these figures that certain anomalous behaviour could be identified purely from the generation of clusters, without reference to the established record of acceptable behaviour.

[0048] In some examples, the log messages may be used as a base to create a log message data unit. The log message data unit includes the log message and at least one additional piece of information related to that user-interaction event, and the feature vector is generated based on the log message data unit. Additional pieces of information that may be added to the log message data unit include indications as to whether the event was a success or failure.

[0049] In relation to the feature vector, there is provided now a brief example as to how a feature vector may be generated. In the example, there is a single embedded system which is a multi-function printer. The multi-function printer has two user types, a general user and an admin user. This example embedded system has four potential user interaction events for general users, namely Print complete, Print fail, Scan complete, and Scan fail. The embedded system further has five potential admin user interaction events, namely Login, Logout, Change Config param X, Change Config Param Y, Change config Param Z. Thus there are nine possible user-interaction events, so there will be nine elements in the feature vector.

[0050] If an example general user Userl has three failed scans and no other user interaction events in the defined time period, their feature vector will be {0, 0, 0, 3, 0, 0, 0,

0, 0}. If an example general user User2 has one successful scan job, two successful print jobs, and no other user interaction events in the defined time period, their feature vector will be {2, 0, 1, 0, 0, 0, 0, 0, 0}. As the last five elements of the feature vector represent admin tasks, we would expect these to have a zero value. However, consider an example device management system such as a device management daemon, which is an admin user. In this example, it has changed all three config parameters for three printers, giving a feature vector of {0, 0, 0, 0, 3, 3, 3, 3, 3}. It will be understood that different schemes and structures for capturing the user-interaction events in a feature vector may be used within the scope of the disclosure.

[0051] The feature vectors can be normalized, or features may be generated. For example, the proportion of each activity from a given user can be normalised. Further, certain features could be represented by concepts such high, medium, low and none for number of documents printed where each value has a predefined range.

[0052] The feature vector may also include elements not related to a number of occurred user-interaction events. Some examples of additional data that may be included in the feature vector are contextual information in relation to location or timing. Location contextual information may comprise an indicator that a print job was completed by the local office printer for the user in question. Timing contextual information may comprise an indicator that a user-interaction event occurred at the weekend, or outside of office hours. This information is potentially relevant as a user engaging with the embedded system outside of normal working hours may be trying to hide their actions from the colleagues of superiors. Where a feature vector includes information indicating that a user interaction event took place out of hours, then that feature vector may be flagged as anomalous.

[0053] The monitoring system may use a number of different feature vector structures. For example, overview-type feature vectors may be defined having a longer time period than the time period for the basic feature vector. For example, where the time period for a basic time period may be one hour or two hours, the time period for an overview time period maybe a day or a number of days. The structure of the different feature vectors may differ by omitting certain elements or including additional elements.

[0054] In some cases, the log messages coming from the embedded system may be quite granular, such that a number of log messages may be generated to indicate what would be perceived as a single task. In such cases, the log messages may be assigned to a higher-level descriptions. For the example where the embedded system is s multi function printer, the higher-level descriptions may be login, print job, configuration change or so on. Identifying suitable higher-level descriptions may involve a message training phase, where the log messages are analysed by an administrator of the monitoring system and manually tagged as a particular class.

[0055] In an example, where the embedded system is a printer or multifunction printer, which can produce syslog messages when users interact with it. For example, these messages may relate to a user printing a document or scanning one. Equally, administrators changing the settings on the printer would also produce syslog messages. These messages may be grouped together and labelled according to the class of activity. For example, possible classes would include Remote Logins, Control Panel Logins, Print Jobs, Scan Jobs, Security Configuration Changes, Network Configuration Changes, Print Setting Changes, Other Configuration Changes, Plug-in installations and the like.

[0056] Throughout the description, examples have been provided relating to printers or multi-function print devices such as those that include other scanning and copying functionality. It will be understood that the disclosure is not limited to this type of embedded system but covers any type of embedded system having a defined purpose. Embedded systems with an office or home network, or within other networks, may use the systems and methods disclosed herein to improve their security, and obtain more detailed knowledge about the operation of the systems within the network.

[0057] This disclosure describes a method to enable enterprise IT and security administrators to track how users interact with the embedded systems found within an organisation. The disclosure teaches identifying and monitoring typical behaviours of a fleet of embedded systems within a network such as an enterprise are used and administered. This allows detection of changes in behaviour that may represent risks. The systems and methods disclosed herein allows the identification of users whose behaviours differ significantly from those of normal users, and of users whose behaviour changes to high risk behaviour, including different but known behaviour, or an outlying behaviour. These systems and methods further allow detection of users gaining privileged administrative roles, and detection of unusual patterns of administration or changes in patterns of administration. Such behaviour changes are not necessarily an indicator of an attack or improper behaviour, but it can help administrators identify and investigate changes.

[0058] Figure 5 is a schematic of an example machine-readable medium 502 with a processor 504. The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to perform the methods 100 described herein. The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to receive log messages related to user-interaction events from the embedded system, wherein the log messages indicate that at least one of a set of defined user-interaction events has occurred. The machine- readable medium 502 may comprise log message receiving instructions 506 to perform the receiving.

[0059] The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to, for each user, for a defined time period, produce a feature vector corresponding to the user-interaction events that occurred for that user in that time period. The machine-readable instructions may comprise feature vector producing instruction 508 to produce the feature vector.

[0060] The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to perform a clustering method on the feature vectors for that time period from a subset of the plurality of users to create clusters of feature vectors. The machine-readable medium 502 may comprise cluster creating instructions to create the clusters.

[0061] The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to for each time period, monitor the clusters to identify an anomalous feature vector in relation to a stored record of acceptable clusters patterns. The machine- readable medium 502 may comprise cluster monitoring instructions 512 to monitor the clusters.

[0062] The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to generate an alert upon identification of an anomalous feature vector. The machine- readable medium 502 may comprise alert generation instructions 514 to generate the alerts.

[0063] The machine-readable medium 502 may comprise instructions which, when executed by a processor 504, cause the processor to repeat the actions for subsequent time periods. The machine- readable medium 502 may comprise repeating instructions 516 to perform the repetition.

[0064] In some examples, the machine-readable medium 502 may comprise additional instructions which, when executed by a processor 504, cause the processor to perform further actions in line with the methods and examples described herein.

[0065] Examples in the present disclosure can be provided as methods, systems or machine-readable instructions, such as any combination of computer programme code, hardware, or the like. Such machine-readable instructions may be included on a machine- readable medium having computer readable program codes therein or thereon. The machine-readable medium can be realised using any type or volatile or non-volatile (non- transitory) storage such as, for example, memory, a ROM, RAM, EEPROM, optical storage and the like. The machine-readable medium may be a non-transitory machine-readable medium. The machine-readable medium may also be referred to as a computer-readable storage medium.

[0066] The present disclosure is described with reference to flow charts and/or block diagrams of the method, devices and systems according to examples of the present disclosure. Although the flow diagrams described above show a specific order of execution, the order of execution may differ from that which is depicted. Blocks described in relation to one flow chart may be combined with those of another flow chart. It shall be understood that each flow and/or block in the flow charts and/or block diagrams, as well as combinations of the flows and/or diagrams in the flow charts and/or block diagrams can be realized by machine readable instructions.

[0067] The machine-readable instructions may, for example, be executed by processing circuitry. The processing circuitry, for example the processor 102 or Figure 1 and the processing circuitry 502 referred to in relation to Figure 5 may be in the form of or comprised within a computing device. Such a computing device may include a general purpose computer, a special purpose computer, an embedded processor or processors or other programmable data processing devices to realize the functions described in the description and diagrams. In particular, a processor or processing apparatus may execute the machine-readable instructions. Thus, functional modules of the apparatus and devices may be implemented by a processor executing machine readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry. The term 'processor' is to be interpreted broadly to include a CPU, processing unit, ASIC, logic unit, or programmable gate array etc. The methods and functional modules may all be performed by a single processor or divided amongst several processors.

[0068] Such machine-readable instructions may also be stored in a computer readable storage that can guide the computer or other programmable data processing devices to operate in a specific mode.

[0069] Such machine readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operations to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices realize functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.

[0070] Further, the teachings herein may be implemented in the form of a computer software product, the computer software product being stored in a storage medium and comprising a plurality of instructions for making a computer device implement the methods recited in the examples of the present disclosure.

[0071] While the method, apparatus and related aspects have been described with reference to certain examples, various modifications, changes, omissions, and substitutions can be made without departing from the scope of the present disclosure. It is intended, therefore, that the method, apparatus and related aspects be limited only by the scope of the following claims and their equivalents. It should be noted that the above- mentioned examples illustrate rather than limit what is described herein, and that those skilled in the art will be able to design many alternative implementations without departing from the scope of the appended claims. Features described in relation to one example may be combined with features of another example.

[0072] The features of any dependent claim may be combined with the features of any of the independent claims or other dependent claims.

[0073] Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to”, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

[0074] Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not restricted to the details of any foregoing embodiments.

The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims

1. A monitoring system for an embedded system within a network, the network having a plurality of users, the embedded system being configured to generate log messages related to user-interaction events; the monitoring system comprising: an input to receive, from the embedded system, the log messages related to user-interaction events, wherein each log message indicates that one of a set of defined user-interaction events has occurred; a processor; a memory storing computer executable instructions therein which, when executed by the one or more processors, cause the processor to: for each user, for a defined time period, generate a feature vector based on the user-interaction events that occurred for that user in that time period, use a clustering method to group the feature vectors from a subset of the plurality of users from that time period into clusters, wherein the clusters provide a representation of user behaviour; monitor the clusters for each time period to identify a feature vector that represents an anomaly when compared with a stored record of acceptable clusters; and an output to transmit an alert if an anomaly is detected.

2. The monitoring system according to claim 1 wherein a feature vector represents an anomaly if it is not included in the cluster in which it was included in the stored record of acceptable clusters.

3. The monitoring system according to claim 1 wherein a feature vector represents an anomaly if it includes a user-interaction event that is novel for that user.

4. The monitoring system according to claim 1 wherein the user-interaction events relates to a plurality of activities and the computer executable instructions include instructions to group the log messages according to the related activity, and to create the feature vector based on the groups.

5. The monitoring system according to claim 1 wherein a user’s feature vector comprises one element for each possible user-interaction event, and the elements are assigned a value based on the number of those user-interaction events that have occurred in the defined time period for that user.

6. The monitoring system according to claim 5 wherein the feature vector further comprises an element based on location or time of user-interaction events.

7. The monitoring system according to claim 1 the elements included in the feature vector are variable depending on the duration of the time period.

8. The monitoring system according to claim 1 wherein the clustering method uses centroid-based clustering.

9. The monitoring system according to claim 1 wherein the clustering method uses graphical based clustering.

10. The monitoring system according to claim 1 wherein the computer executable instructions include instructions to create a log message data unit including the log message and at least one additional piece of information related to that user-interaction event, and the feature vector is generated based on the log message data unit.

11. The monitoring system according to claim 10 wherein the additional piece of information comprises a success indicator, a failure indicator, and/or an embedded system location indicator.

12. The monitoring system according to claim 1 wherein the computer executable instructions include instructions to repeat the generation of the feature vector, use of the clustering method and monitoring of the clusters for subsequent time periods.

13. A monitoring system as claimed in claim 1 wherein the embedded system comprises a printer.

14. A method for monitoring an embedded system within a network, the network having a plurality of users, the embedded system being configured to generate log messages related to user-interaction events; the method comprising: for each user, for a defined time period, populating a feature vector using the user-interaction events that occurred for that user in that time period, wherein the user- interaction events are derived from log messages received from the system, wherein each log message indicates that one of a set of defined user-interaction events has occurred; assembling clusters of the feature vectors from a subset of the plurality of users from that time period using a clustering method; monitoring the clusters to determine an anomalous feature vector based on an analysis of a stored record of acceptable clusters arrangements; outputting an alert upon determination of an anomalous feature vector.

15. A machine-readable medium comprising instructions, which when executed by a processor, cause the processor to: receive log messages related to user-interaction events from an embedded system, wherein the log messages indicate that at least one of a set of defined user-interaction events has occurred; for each of a plurality of users of the embedded system, for a defined time period, produce a feature vector corresponding to the user-interaction events that occurred for that user in that time period; perform a clustering method on the feature vectors for that time period from a subset of the plurality of users to create clusters of feature vectors; for each time period, monitor the clusters to identify an anomalous feature vector in relation to a stored record of acceptable clusters patterns; generate an alert upon identification of an anomalous feature vector.

16. A network comprising an embedded system and the monitoring system of claim 1.