WO2022115419A1

WO2022115419A1 - Method of detecting an anomaly in a system

Info

Publication number: WO2022115419A1
Application number: PCT/US2021/060491
Authority: WO
Inventors: Bruno Paes Leao; Leandro Pfleger De Aguiar; Matthew Stewart; Peter SCHERFF; Anton KOCHETUROV
Original assignee: Siemens Energy, Inc.
Priority date: 2020-11-25
Filing date: 2021-11-23
Publication date: 2022-06-02

Abstract

A method for detecting a cybersecurity event in a system includes monitoring at least one of an access monitoring system, an operational data system, and an operator activity system, detecting a first anomaly in a first system of the monitored systems, predicting a second anomaly in a second system that in combination with the first anomaly is indicative of a hostile cyber security threat. The method also includes reviewing data collected from the second system to determine if the second anomaly is present and identifying the first anomaly as a cybersecurity threat in response to the detection of the second anomaly in the data of the second system.

Description

METHOD OF DETECTING AN ANOMALY IN A SYSTEM

BACKGROUND

[0001] The energy sector currently consists of both legacy and next generation technologies. New technologies are rapidly introducing new intelligent sensors and components to the energy infrastructure which are communicating in more advanced ways (wired and wireless communications). Typical “analog” components are replaced by digital systems which can lead to increased exposure to cyber incidents and attacks in power plants, energy transmission infrastructures, and process technologies. Furthermore, exponential growth of data has opened many backdoors into plant systems. As power generation facilities are now reliant on the two- way exchange of data with other networks, this provides an opportunity for unauthorized access for plant networks.

SUMMARY

[0002] In one aspect, a method for detecting a cybersecurity event in a system includes monitoring at least one of an access monitoring system, an operational data system, and an operator activity system, detecting a first anomaly in a first system of the monitored systems, predicting a second anomaly in a second system that in combination with the first anomaly is indicative of a hostile cyber security threat. The method also includes reviewing data collected from the second system to determine if the second anomaly is present and identifying the first anomaly as a cybersecurity threat in response to the detection of the second anomaly in the data of the second system.

[0003] In one aspect, a method for detecting a cybersecurity event in a system includes integrating data received from a plurality of data collection systems, the plurality of data systems including an operational data system that collects operational data from an industrial process and an operator activity system that collects activity data generated by operator activity. The method also includes identifying a first anomaly in the data collected by the operational data system, analyzing a portion of the data collected by the operator activity system that is related to the first anomaly, and identifying the first anomaly as a cybersecurity threat in response to the analysis of the portion of the data collected by the operator activity system.

[0004] Also, before undertaking the Detailed Description below, it should be understood that various definitions for certain words and phrases are provided throughout this patent document, and those of ordinary skill in the art will understand that such definitions apply in many, if not most, instances to prior as well as future uses of such defined words and phrases. While some terms may include a wide variety of embodiments, the appended claims may expressly limit these terms to specific embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced.

[0006] FIG. 1 illustrates an aspect of the subject matter in accordance with one embodiment.

[0007] FIG. 2 illustrates an aspect of the subject matter in accordance with one embodiment.

[0008] FIG. 3 is a three-dimensional graph of a plurality of master vectors showing the clustering of those master vectors into states.

[0009] FIG. 4 illustrates an aspect of the subject matter in accordance with one embodiment. [0010] FIG. 5 is a flow chart illustrating a portion of the anomaly detection system.

[0011] FIG. 6 is a flow chart illustrating another aspect of the anomaly detection system. [0012] FIG. 7 schematically illustrates aspects of a sequence detection system.

[0013] FIG. 8 schematically illustrates another aspect of a sequence detection system. [0014] FIG. 9 is a flow chart illustrating the operation of a model training pipeline suitable for use in training a classifier for the anomaly detection system.

[0015] FIG. 10 is a flow chart illustrating the anomaly detection system.

[0016] FIG. 11 illustrates a functional block diagram of an example computer system that facilitates operation of an anomaly detection system.

[0017] FIG. 12 illustrates a block diagram of a data processing system in which the anomaly detection system may be implemented.

DETAILED DESCRIPTION

[0018] Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of components set forth in this description or illustrated in the following drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.

[0019] Various technologies that pertain to systems and methods will now be described with reference to the drawings, where like reference numerals represent like elements throughout. The drawings discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged apparatus. It is to be understood that functionality that is described as being carried out by certain system elements may be performed by multiple elements. Similarly, for instance, an element may be configured to perform functionality that is described as being carried out by multiple elements. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.

[0020] Also, it should be understood that the words or phrases used herein should be construed broadly, unless expressly limited in some examples. For example, the terms “including,” “having,” and “comprising,” as well as derivatives thereof, mean inclusion without limitation. The singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further, the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term “or” is inclusive, meaning and/or, unless the context clearly indicates otherwise. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like. Furthermore, while multiple embodiments or constructions may be described herein, any features, methods, steps, components, etc. described with regard to one embodiment are equally applicable to other embodiments absent a specific statement to the contrary.

[0021] As used herein, the terms “component” and “system” are intended to encompass hardware, software, or a combination of hardware and software. Thus, for example, a system or component may be a process, a process executing on a processor, or a processor. Additionally, a component or system may be localized on a single device or distributed across several devices.

[0022] Further the phrase "at least one" before an element (e.g., a processor) that is configured to carry out more than one function/process may correspond to one or more elements (e.g., processors) that each carry out the functions/processes and may also correspond to two or more of the elements (e.g., processors) that respectively carry out different ones of the one or more different functions/processes

[0023] Also, although the terms “first”, “second”, “third” and so forth may be used herein to refer to various elements, information, functions, or acts, these elements, information, functions, or acts should not be limited by these terms. Rather these numeral adjectives are used to distinguish different elements, information, functions or acts from each other. For example, a first element, information, function, or act could be termed a second element, information, function, or act, and, similarly, a second element, information, function, or act could be termed a first element, information, function, or act, without departing from the scope of the present disclosure. [0024] In addition, the term “adjacent to” may mean that an element is relatively near to but not in contact with a further element or that the element is in contact with the further portion, unless the context clearly indicates otherwise. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise. Terms “about” or “substantially” or like terms are intended to cover variations in a value that are within normal industry manufacturing tolerances for that dimension. If no industry standard is available, a variation of twenty percent would fall within the meaning of these terms unless otherwise stated.

[0025] Secure networks are an important part of the future of smart-grid technology to accommodate a more connected infrastructure. Power generation systems (e.g., plants, wind farms, solar farms, etc.) sometimes employ SCADA systems to control the overall operation of the power generation system. Vulnerabilities of these SCADA systems to cyberattacks needs to be addressed to inhibit the unwanted theft of information, shut down of equipment, disruption of various operating parameters, and damaging equipment.

[0026] The use of gateways, firewalls, routers and switches can be a viable solution to secure data and protect assets by limiting the attack vector, providing redundancy in protection, and containing network disruptions. As there is no one solution for each application, further security improvements and techniques can be implemented for a secure, highly reliable and cost-efficient network. For example, application-level firewalls with deeper packet investigations in both information technology and operational technologies can be an area of improvement to detect and contain malicious activity.

[0027] Detection of some cybersecurity events in an Operation Technology (OT) domain (e.g., a power plant environment) can be challenging. Existing commercial IT security tools are used to capture important cybersecurity events in most power generation plants, and it is often the case that IT security event detection leads to the discovery of OT cybersecurity events. Operation technology (OT) data are not restricted to sensor measurements collected from the processes, but comprises all information generated by industrial equipment. This includes, for instance, network traffic coming from industrial networks and logs generated by controllers. OT data may be generated by devices such as Industrial process sensors and actuators, Programmable Logic Controllers (PLCs) and I\0, and Supervisory Control and Data Acquisition (SCADA) and data historian. [0028] The OT security monitoring framework discussed herein is built upon behavior-based anomaly detection applied to industrial process sensors and actuators, Programmable Logic Controllers (PLCs) and I\0, and Supervisory Control and Data Acquisition (SCAD A) and data historian devices.

[0029] The data sources available for OT cybersecurity events detection differ in many aspects from the data used by standard IT security tools. This fact often drives a need for developing domain specific solutions. Data sources such as event logs, log files, industrial network data, and processed data are important information sources which are used in OT security monitoring.

[0030] Industrial Control Systems (ICS) produce many types of event log files, mostly from SCADA systems but also from PLCs. ICS event log files may indicate a variety of event types occurring at different system levels. Many of these events may be related to processes, operation alarms, operator login/logout, actions performed by operators and system administrators. Other more specific events contain information such as PLC logic changes or software errors.

[0031] Log files vary widely in format and content. It should be noted that the content, format and frequency of log file generation depends on how the ICS is configured. It is helpful to configure the ICS in a way that the generated log files can be most efficiently ingested and analyzed for the data processing purposes. Logs usually contain textual descriptions of the events, such that Natural Language Processing (NLP) tools may be used for their analysis. A pre-processing step for free text information mining in logs identifies among all the log entries those that present similar patterns, differentiating between frequent terms occurring at certain positions, which form patterns, and infrequent ones which may correspond to parameters associated to the pattern. The result is that each entry is associated with a group (cluster).

[0032] Industrial network data are generated between SCADA systems and PLCs. OT environments employ specific industrial communication protocols which are usually proprietary.

[0033] Processed data includes data collected directly from the processes and can be useful in the detection of OT cybersecurity events. The data comprise not only the readings from the complete set of sensors in all the processes, but all other data associated with process tags (unique process data identifiers), such as actuation commands, discrete data and status information. Process data are often easily accessible as ICSs are usually integrated with data historians which record historical process data.

[0034] Different solutions may be required for storing data from different sources. The data volume, and frequency at which it is generated is usually different among different data sources. For instance, log data may be very low in volume, especially if considering log files focused on specific types of events that do not happen constantly, such as alarms. On the other side of the spectrum, industrial network data will usually continuously generate large volumes of data. Process data may also come in large volumes with high frequency updates. While regular databases are capable of handling log data, specialized databases may be required for dealing with large volume data sources collected at high sampling rates. It may also be the case that some data sources are already stored elsewhere (e.g., at the historians that record the process data), and it may be a better option to read directly from such databases than creating a duplicated data storage.

[0035] In order to use data from different sources, the data should be integrated. Once integrated, the data can be accessed by a data analytics engine during a training and testing phase as well as during a production phase. A data pipeline may be created so that the data analytics engine can easily ingest data from all the data sources. One additional consideration is that timestamps should be consistent among the different data sources so that data may be properly combined.

[0036] For the purpose of detecting certain events of interest, historical data corresponding to a reasonable number of labeled occurrences of such events should be used for training and testing the data analytics engine. However, this is not the case for IT security in general, because attack events are rare and there are many possibilities of carrying out attack vectors. In an OT environment (including power generation scenarios), attack events are even more rare than the IT environment which renders many detection methods less effective. Furthermore, industrial attack data sets are rarely publicly available. Power generation companies do not in general develop their own data analytics solutions for OT cybersecurity, but rather acquire it from equipment vendors or third parties. Therefore, data available for the development of OT cyber intrusion detection solutions are usually limited to normal operation data. [0037] Designing appropriate anomaly detection methods in the power generation context is a challenge considering one often has access only to normal operation data. A threshold can be used to determine the boundaries between what is normal and what is an anomaly.

Such a threshold should be defined based on the tradeoff between the true positive rate (TPR) and the false alarm rate (FAR). However, if no anomaly is present in the data, it is not possible to evaluate the true positive rate directly and this can lead to largely ineffective solutions. In data sets where there are no anomalies, only the FAR can be evaluated and one can use it alone to adjust the threshold. Methods such as deep learning may be applied even when no labeled anomalies are present in the data. These methods are especially suited when a large amount of normal data is available. Unsupervised deep learning methods such as Autoencoders and Generative Adversarial Networks (GAN) can be employed in this case.

[0038] Since actual OT cybersecurity events are expected to be rare, even solutions presenting a very small FAR can potentially generate many more false alarms than true events. An excessive number of false alarms may reduce the confidence of the user in the system, but on the other hand reducing thresholds will increase the chance of missing a true event. One approach that can be used in this case is to define multiple thresholds corresponding to different criticality levels which are clear to the user. By doing so, the user will have the information that events were detected, one will also have enough information to reason about the actual relevance of each indication. For instance, anomaly indications associated with low confidence which turn out to be false alarms should not reduce the user confidence of the system, but when he finds such an indication combined with other evidence, it can still be a relevant input for the decision making.

[0039] Multiple systems can be monitored by the data analytics engine in an effort to detect anomalies. The anomalies are then further analyzed as will be described to determine if the detected anomaly is part of a cybersecurity threat or attack.

[0040] One system that may be monitored is an operator activity system that monitors the operator activity access time and focuses on the detection of operation time anomalies inside a power plant. In this case, the hypothesis proposed by the domain experts is that some types of operations requiring high access levels occurred only or mostly during regular office hours or at least during predictable time periods such that activities outside of these time periods are potential anomalies. [0041] Another system that is monitored is an operational data system that collects data from real operations. This data consists of log files that record operator actions and corresponding timestamps. Based on this data it is possible to create a very focused anomaly detection solution which can provide informative outcomes for a security analyst. Specifically, the solution may consist of a simple non-parametric statistical model that can be visualized as a two-dimensional histogram (example illustrated in FIG. 4) with a corresponding alarm threshold.

[0042] A log data anomaly detection system analyzes log contents to detect potential anomalies in the data. Two types of logs can be analyzed, one related to operator actions and another related to alarms. The latter provides an operational context which improves the evaluation accuracy of the former.

[0043] FIG. 1 is a block diagram describing all the steps in a processing pipeline. In some systems, each entry in an event log list (log data 102) is associated with other events to define clusters (clustering 104), which can be used as a categorical input for data analytics algorithms. However, in the solution described here, this categorical information is not used directly but rather transformed into numerical inputs by counting their frequency of occurrences at time windows of pre-defined length (event counting 106). The number of occurrences of each cluster in the time window correspond to a numerical feature. Given those numerical features, a large variety of standard anomaly detection algorithms can be employed, ranging from simple multivariate distances such as Euclidean or Mahalanobis to more elaborate methods such as Isolation Forests. It is preferred that the features resulting from operator logs and alarm logs are combined and considered simultaneously by means of multivariate methods, which take into account the dependency among all the features in defining what is abnormal. Another level of clustering (clustering 108) may be performed using as inputs the numerical features. This can be based on standard clustering methods such as k-means or DBSCAN.

[0044] Anomaly detection 110 in this case may be implemented based on unsupervised learning models, such as clustering using the sensor information directly as inputs or using dimensionality reduction methods such as Principal Component Analysis (PCA) or Autoencoder Neural Networks as a pre-processing step. Alternatively, it is also possible to use supervised learning models. In this case, regression models can be created to represent the input-output relationships in the data and the anomaly detection 110 can be applied to the residuals, i.e., the difference between the estimated outputs produced from the models and the real outputs obtained from the measurements. It must be noticed that the dynamics in the process will usually be relevant and proper models that take into consideration the time dependence of the variables, such as Recurrent Neural Networks should be employed.

[0045] Considering process data alone, especially when performing anomaly detection, will likely produce false alarms not related to cyber events due to faults in the system. Anomaly detection based on process data will be more effective if combined with additional information, such as the outcome from other analytics solutions associated with cybersecurity in the OT or IT domains. Process anomaly detection in the power generation context preferably should take into consideration proper selection of variables that are affected by relevant cyberattack scenarios.

[0046] One very direct way of monitoring operation anomalies is to use the information of how the operator navigates through HMI (human Machine Interface) screens (e.g., SCADA HMI) using a screen navigation monitoring system. It is known that cyberattacks will aim at providing an attacker with access to the HMI, so one can impersonate the role of an operator.

It should be noted that the term “screen navigation monitoring system” should be broadly interpreted as a “user navigation monitoring system” because the system monitors much more than movements or interactions on a screen. The system may be capable of monitoring mouse movements and selections, keyboard strokes, or any other input device that may be used with a computer or a screen. In addition, the system can be extended to monitor user actuation of switches, buttons, or other controls that may or may not be associated with a computer or other control system.

[0047] If one performs security monitoring, the challenge is that such detailed information in terms of operator navigation may not be logged directly by the system. However, as illustrated in FIG. 2 industrial network traffic (industrial network data 202) may provide enough information for this purpose, as the HMI application may need to establish communication with different devices in the network as the operator navigates through different screens. Domain knowledge may be used to filter the subset of communication traffic which is associated with HMI screens (pre-filtering based on domain knowledge 204). From this subset, it is possible to identify patterns (identification of patterns 206) based on groups of packets that frequently occur within a short time range from each other, which indicate the switching to different screens. From historical data, it is also possible to infer the probability of switching from a certain pattern to another, corresponding to the probability of switching between two specific HMI screens (calculation of probabilities of transition between patterns 208). Once such information is obtained, it is possible to define thresholds to indicate screen navigation which does not correspond to usual behaviors and may be classified as an anomaly (anomaly detection 210). FIG. 2 presents the pipeline required for implementing the described analytics solution.

[0048] Another way to evaluate operator action anomalies is to explicitly calculate a context representation in the form of system states. A context representation may be built from a combination of different data sources, including all different data types discussed herein. The operator action anomalies example builds its context representation based on log and network data. The core of the data analytics solution here is the application of embedding methods which originate from the NLP domain for creation of word embeddings.

[0049] Embedding methods, such as Word2vec, associate a numerical vector of a pre-defined length to each entity of interest. Entities correspond to words in the NLP domain applications, whereas in our case, entities correspond to events. The vectors are defined in such a way that the distance between them indicate the similarity among the entities, the closer the vectors the greater the similarity. Similarity in this case corresponds to the frequent occurrence in similar contexts, where the entity context is in turn defined by a set of other entities that occur in its vicinity. Cosine distance is usually employed for measuring the distance between vectors.

[0050] Embeddings obtained for individual entities can also be combined to form embeddings for sets of entities. Those sets can correspond, for instance, to sentences in the NLP domain. In the application, the set of entities correspond to entities occurring during a time window of fixed length. One simple way of obtaining the set embedding is to simply average the corresponding entity embeddings. Weighted averaging using, for instance, TF-IDF may improve the results compared to standard averaging. Once time window embeddings are obtained, they can be clustered to form a discrete representation of the system state. Standard clustering methods as mentioned above can be employed in this case. FIG. 3 presents sample results for clustering three-dimensional time window embeddings resulting in five clusters (cluster 302a-302e). Finally, once a discrete representation of the system state is available, historical data can be used to associate user actions to each state, or the probability of occurrence of actions at each state and this can be applied for anomaly detection. [0051] The data analytics-based security monitoring solutions discussed above allow security analysts to identify security issues in data points and logs observed in the OT and IT systems.

In solving the complexity issue related to integrating OT monitoring solutions with an existing analyst system, engineers must come up with several layers of software and hardware stacks to host the relevant anomaly detection solutions. This step is usually accomplished by designing an on-premises Security Operation Center (SoC) which is responsible for displaying plant operation statuses, security alerts, and process conditions. In the current state of the art of Security Operation Centers (SOC), security data are processed through verticals, i.e., specific rules are created to spot incidents in some layers of the OT infrastructure. Alarms and metrics are visualized and judged by human operators who observe monitoring dashboards. Assessment of security risk is based on the expertise of the analyst, who evaluates the relevance of alarms based on contextual content of system events reported in the dashboard.

[0052] Robust security monitoring frameworks rely on ingestion of stable data sources such as system event logs, plant process data, and network data, reliable data processing hosted on on- premises/cloud servers, and an intuitively easy-to-use user interface (UI). There are various ways of implementing this monitoring architecture.

[0053] Edge devices and low-cost sensors allow for low-cost data acquisition and pre processing, local and cloud-based image and sound processing as well as control operations. Applications range from purely measuring process and asset data to local anomaly detection and in rare cases even control functions. Some of the challenges these use cases face lie in the need for very large data reduction and batch transmissions to achieve longer service time per battery charge, streaming high resolution and high bandwidth data will become widely applicable with the advent of 5G networks.

[0054] Wireless, intelligent and cheap edge devices, in combination with cloud-based data processing, can reduce power plant erection cost by drastically reducing wiring effort and space requirements for local cabinets. Cloud-based control systems provide increased availability and reliability, as opposed to systems that need to be managed locally. Operational cost gets reduced because unmanned plants come in reach. Stationary as well as maneuverable edge devices on drones and ground-based robots are already available. They sense all kinds of environmental aspects as well as visual and audible information and collect buffered information from remote edge devices.

[0055] It is important to mention that wireless edge devices and sensors need ubiquitous wireless network availability which raises serious cyber security concerns. New technologies like 5G cellular networks are considered to provide the necessary backbone for the Internet of Things (IoT). 5G will enable real time communication between devices and the cloud and will be mandatory for the expected huge number of IoT devices in the future. However, emerging platform implementations appear to be so complex that thorough cyber security assessments are hard to apply. Frequent software, firmware and hardware updates make it even harder to freeze, test and check a system's functionality to ensure the system’s aptitude as backbone for secure plant control purposes.

[0056] Adoption of the public cloud is becoming a viable and preferred approach for implementing large scale, distributed security monitoring solutions. Comparing with on premises solutions, the public cloud approach shows that properly configured, multi-tenant, logically separated environments can provide a level of security superior to dedicated private cloud deployments, while providing significant advantages in availability, scalability and lower cost than traditional on-premises solutions.

[0057] On-premises solutions often require physically dedicated environments for hosting security monitoring applications because of concerns around third-party or unauthorized access to systems, applications, or data. Cloud service providers such as AWS addresses physical separation concerns by providing security controls and logical separations capabilities in the cloud. The result is that the strength of logical separation capabilities offered by CSPs combined with the automation and flexibility that they provide is on par or even better than the security controls seen in traditional on-premises, physically separated environment.

[0058] Sophisticated cyber-attacks aimed at OT devices are often intentionally camouflaged under normal network traffic or hidden inside legitimate systems with methods that avoid traditional detection such as signature-based monitoring. OT focused commercial detection tools apply a combination of passive intrusion detection and deep-packet-inspection (DPI) of the industrial protocols observed at the application layer. Such methods rely on the assumption that any incoming traffic corresponds to the actual observed traffic on the network. In fact, this assumption is not always warranted. For example, “manipulation of process view” type of attacks often uses legitimate HMI workstations to display a different status of the monitored process variables (e.g., temperature, pressure, etc.) in a compromised production environment. In addition, in many other APT cases (e.g., Stuxnet), legitimate control systems have been accessed by unauthorized remote users to perform malicious actions with the system, performing silent damaging attacks that manipulate both the operator feedback (process view) and the process outputs (process control). Such actions are, however, most commonly, performed out of context, with direct commands issued without legitimate existing conditions in the process demanding such interaction.

[0059] Historically, IT systems have used two main types of intrusion detection methods, Signature-based intrusion detection systems that recognize “bad patterns” that have been previously observed and studied in other networks and behavioral (or anomaly) based intrusion detection systems that detect deviations from a well-known baseline of preselected and monitored KPIs (e.g., traffic volume in kbps, traffic direction, and typical connection or igin/destinati on) .

[0060] Both options have been tailored to the special needs of ICS systems. Adapting existing signature-based intrusion detection options to ICS was a natural step for OT security practitioners. Mature tools with reasonable footprints in corporate IT networks are available, open-source extensible options exist, and some security vendors already offer such resources. Given the heterogeneity of the OT industry and the historical predominance of proprietary industrial protocols, however, these options had limited scalability in OT systems and creating signatures for such closed industrial turnkey packages can be time-consuming and limit its value for security detection.

[0061] Network based behavioral (AI-based) detection has largely addressed some of the existing gaps found on signature-based intrusion detection systems both for IT and OT. However similar to signature based-IDS systems, a significant issue with adopting any of the existing two approaches alone is that both methods are not exhaustive and might not respond well when legitimate users, machines, software, and protocols are used to perform the attack. In addition, several additional OT capabilities are typically not available on existing methods. For example, the ability to correlate relevant security monitored event sources (e.g., system configuration changes, network behavior changes) to deviations on the production process and supporting control system itself is not available. This kind of correlation provides a comprehensive view of the impact of one event on another. This would help one to understand how the processes and systems are changing and provide an idea of potential consequences of operation. In addition, the ability to comprehensively instrument the complete control system network and provide holistic behavioral detection for network, host, and process data is not generally employed. The ability to perform consistency checks with these many sources of data collected at different network levels to detect anomalies based on the causal relationship between data generation, transmission, and transformation at the different network zones is also not generally available. For example, a command issued on the network to a Programmable Logic Controller (PLC) might have the source of its status change on an HMI interaction, which will be consistent with other recurring actions on the target system (such as generation of host-based logs, sending of certain network communication packets, display of certain feedback on the HMi, etc). Finally, the ability to utilize process variable data to contextualize actions, commands, and system responses is not commonly available.

[0062] Many data analytics methods have been developed to detect cyberattacks but most of them are based on information from industrial network traces, not process measurements. Some cyberattacks detection methods are developed based on the application of data analytics methods to measurements from industrial processes. In such cases, models based on the process data alone are employed in cyberattack detection. There are also many systems related to monitoring of process measurements for other purposes, such as equipment failure diagnosis and prognosis.

[0063] FIG. 5 illustrates a detection system 500 that is implemented using one or more computers or computer systems and is built or trained using a combination of different data sources. In some examples, historical log data or event data 512 which may include network data, process data, log data, operating condition data, status data, or alarm conditions is available.

[0064] The event data 512 is subjected to a process of event embedding 502 such as those which originate from the Natural Language Processing (NLP) domain for transforming words into numerical vectors of a defined length. These NLP methods apply to event data 512 such as log and computer network data. In addition, NLP can be applied to process data if combined with some discretization method such as symbolic aggregate approximation (SAX). Alternatively, dimensionality reduction methods such as Autoencoders can be applied to transform time series sample data into a numerical vector of defined length. Therefore, the description of FIG. 5 will be based on event data for simplification without loss of generality.

[0065] FIG. 5 illustrates the basic data processing pipeline associated with the method and detection system 500. Event embedding 502 can employ a number of embedding methods, such as Word2vec, to associate a numerical vector of a pre-defined length to each entity of interest. In the NLP domain, entities correspond to words and in the application described here, entities correspond to events or event data 512. The vectors are defined in such a way that the distance between them is indicative of the similarity between the entities. The closer the vectors, the greater the similarity. Similarity in this case corresponds to the frequent occurrence in similar contexts, where the entity context is in turn defined by a set of other entities that occur in its vicinity. Cosine distance is usually employed for measuring the distance between vectors. Results of event embedding 502 obtained for individual entities can also be combined to form embeddings for sets of entities. Those sets can correspond, for instance, to sentences in the NLP domain.

[0066] Once the event embedding 502 is complete, the results are analyzed to perform a process of time window embedding 504, where the time window corresponds to a set of entities, each entity corresponding to an event. The vectors from the event embedding 502 are grouped according to the time of their occurrence. Specifically, a fixed predefined time duration (e.g., five minutes or less, one minute or less, thirty seconds or less, ten seconds, etc.) is used to group the vectors. In other constructions, the predefined time duration can be fixed time windows. For example, a time window could be from 1 :00PM to 1 :05 PM. Once the vectors are grouped by the predetermined time duration, they are combined into a plurality of master vectors with each master vector corresponding to the events in one of the predefined time duration windows. One process of time window embedding 504 or set embedding is to simply average the corresponding entity embeddings or vectors. Weighted averaging using, for instance, Term Frequency -Inverse Document Frequency (TF-IDF) may improve the results compared to standard averaging. However, other embedding methods employed in NLP (for embedding words of sentences directly) or other domains can be employed. For example, transformer deep neural network-based methods such as BERT could be employed. [0067] Once the process of time window embedding 504 is complete, a process of clustering 506 can be performed on the master vectors to form a discrete representation of the system state as illustrated in FIG. 3. The process of clustering 506 may employ standard clustering methods such as k-means and DBS CAN. Of course, other clustering methods could be employed if desired.

[0068] Returning to FIG. 5, after the completion of clustering 506 a discrete representation of the system state is available. Specifically, a plurality of states or clusters 302a-302e are established. At this point, an optional step of action association 508 can be performed. Historical data 514 can be used to associate operator or system actions to each state or cluster 302a-302e. The actions can include specific actions taken or can include the probability of one or more actions occurring at each state, or a combination thereof. In one example, it is possible that for a particular state or cluster 302a a first action has a first probability, a second action has a second probability, and a third action has a third probability of occurring.

[0069] Once clustering 506 is complete and, in the cases where action association 508 is completed the detection system 500 can be employed to perform anomaly detection 510. Anomaly detection 510 can be based solely on an analysis of the real-time or current state of the system or on a comparison of the real-time actions taken by a user compared to those associated with the states or clusters 302a-302e. In the case of anomaly detection 510 based on the real-time actions of the user, the anomaly detection 510 may be based on a probability threshold.

[0070] Before proceeding, it should be noted that the term “real-time” or “real-time data” is meant to refer to new data or recent data that is being analyzed. The data could be older data that is being presented for review. Thus, unanalyzed data from virtually any time period could be considered real-time data for purposes of this description.

[0071] The following describes a configurable anomaly detection tool to support a downstream process of cyberattack detection. FIG. 6 represents the training phase where configuration information (configuration 602) is the only input required from the user, defining information for one or more models that can be used to support cyberattack detection. Those configurations may include, for instance, what time ranges of data will be used, how much data will be held-out for residual statistics calculations, what are the required tags 610, what are the input/output relationships 614 among them and percentiles for calculation of anomaly threshold percentiles 612. The diagram presents a pipeline for creation of the anomaly detection analytics. First sufficient historical data is collected (data collection 604) which includes the required process data tags 610. Machine learning regression models (regression model training 606) are created using methods such as Random Forest or Artificial Neural Networks which provide reliable means for modeling non-linear input-output relationships. Adequate measures should be taken to avoid or mitigate issues such as over-fitting of the training data 616, in order to make the training pipeline adequate for creation of general-purpose models. Trained models 618 are employed to calculate residuals based on hold-out data 620 (residual statistics and thresholds 608). The statistics of those residuals are combined with the configured percentiles to define thresholds. For instance, an empirical cumulative density function can be used for converting the percentiles into thresholds.

[0072] Once the analytics models are trained, they can be used in combination with other cyberattack analytics to support the detection of cyberattacks. The other cyberattack analytics can be based on other types of data, such as industrial network communication traces and log files from the process so that the combination of the information yielded by all of them is sufficient for identification of the cyberattack. The simplest way of combination of the analytics results for cyberattack detection is based on manually defined rules, but more sophisticated methods, such as machine learning-based classification models can also be employed for that purpose.

[0073] Multiple process data anomaly detection models could be combined with different sets of analytics using specific methods to enable the detection of different types of cyberattacks. For example, the combination of analytics results based on process data with analytics results based on other types of data (e.g., control system data, process variable data, etc.) to detect cyberattacks can be employed.

[0074] User and Entity Behavior Analytics (UEBA) has seen a growing attention in the domain of IT for analysis of users and entities interactions within complex systems. The focus of UEBA is to construct a partial model of normal behavior of users and entities in the system for consecutive use in such application as knowledge discovery, anomaly detection and cyber security. A combination of Operation Technology (OT) and IT arises naturally in ICS, where a large volume of process data, events, and user-systems interactions is captured in the form of logs, databases, and traffic packets.

[0075] The focus of the following is on UEBA analysis in the context of user behavior while the user monitors an ICS and how the user responds to events in the system. A spatio-temporal behavior model is constructed, which will answer when and where a particular behavior has happened rather than what has happened. The following presents an implementation method and algorithm that instantiates the idea of OT-UEBA, leveraging multi-source methods and their cascaded cause and consequence behavior throughout multiple states that can be assumed.

[0076] The following is based on the concept of an event 702. Each event 702 has attributes describing its start and end times. For event A, E. start = timestampl and E. end = timestamp2 will denote start and end times of event E. An event 702 can be instant, in this case, E. start =

E. end. Events 702 may come from different data sources and be associated with other attributes as well. Event 702 form sequences 704, where the events 702 are usually placed in an increasing order of their start times. For example, a sequence {A, B, C, A} means that first event A happened, then B, then C, then again A (A. start <B. start < ···).

[0077] Such events 702 and sequences 704 can be used to find Temporal Patterns (TP), or subsequences of events 706 that satisfy some search criteria or possess certain properties of interest. An example of such a property is, given a set of sequences {A, B, C , D}, {A, E, C , D}, and {F, A, C , D}, find the longest subsequence that appears in all sequences, with {A, C , D} as the answer. Without giving precise definitions, FIG. 7 illustrates an example of a more complex event sequence and a temporal pattern, for which the property of interest was defined as to be frequent. Sequential Pattern Mining and Frequent Temporal Pattern Mining as subfields of Data Mining deal with finding of complex temporal patterns. Several efficient algorithms to find such patterns under different assumptions and search criteria are known.

[0078] A natural extension of a temporal pattern is a temporal association rule (TAS). Such a rule has a form {A, B, ... , X} {Y, ... , Z} meaning that, if events A, B, ..., X happen in this order, then Y,..., Z will happen next in this order. Several algorithms exist to find such rules.

[0079] In most ICSs there are users and roles. A user monitors the system, changes its behavior, and reacts to events arising in the system. A role defines what a user can and cannot do in the system. The system finds temporal patterns and association rules with respect to a user, role, or a user-role pair. This brings several advantages and opens a way to new applications. Both IT and OT data sources are presented that allows to analyze user behavior in a much richer context than just IT data.

[0080] The workflow depicted in FIG. 8 includes the steps of collection 802, preprocessing 804, extraction 806, and mining 808. Collection 802 includes the collection of raw data from an ICS in the form of network data, event logs, data bases, user activity logs, etc.

Preprocessing 804 the raw data includes filtering unnecessary or repetitive items, merging fragmented records together, and the like to place the data in a more usable format. Extraction 806 includes the extraction of events with their timestamps and attributes from the preprocessed data. Finally, mining 808 includes mining items of interest. Items of interest can include the definition of TPs and TASs of interest and the specification of sources of data, search criteria, and inclusion/exclusion rules. Mining 808 may also include the selection of any subset of user, role, or user-role pairs to mine the items, the use of an appropriate algorithm to find the patterns/rules, and the extraction of additional features from the mined TPs and TASs suitable for further analysis of user behavior. For example, for frequent TP {‘open window A’, ‘observe data on window A’, ‘generate report C’} extracted for users, mean time of subsequence ‘observe data on window A’, ‘generate report C’ may be a statistically different between the users and, therefore, can be used as a signature for user identification.

[0081] The system is able to mine complex temporal patterns and association rules for different users and/or roles in an ICS environment which benefits from existence of OT data sources. The system also analyzes many sources of uneven time series information simultaneously without a need to align the data or utilize artificial time windows to compute statistics, which usually leads to a significant loss of meaningful data and captures time dependencies between different events and data sources naturally, which opens a way for root- cause and “what-if” type of analysis. The system further combines pure data-driven search of patterns of interest with domain knowledge easily. For example, if one is to find a most frequent pattern that has a critical event X as the first event in the pattern, the pattern will describe the most common behavior of user/system when event X happens. Finally, mined TPs and TASs together with features or statistics extracted from them may be used as building blocks for a variety of applications including anomaly detection, user and/or role detection, user behavior prediction and forecasting, business process understanding and rule generation, learning and teaching.

[0082] The system uses a specific approach to detect suspect interactions of a human operator with a system. Statistical models are created of the days and times when such interaction typically occur. For instance, non-parametric statistical models can be used, such as two- dimensional histograms to visualize the data. The heatmap (FIG. 3) represents a normalized view of the number of times activity was performed at a certain time at each day of the week. For other cases, other choices of days and times could be made, such as day of month. It can be clearly noticed from the heatmap that there are certain times at certain days where no interaction of the user with the system are expected. Such non-parametric representation can be used directly for identifying days and times when no interaction is expected based on a threshold value. Additional processing and alternative methods can also be used to improve performance or to compensate for limited data availability. For instance, smoothing of the data using some form of filter, such as a Gaussian filter, could be employed as additional processing and kernel-based methods for probability density function estimation could be employed as an alternative to the 2D histogram. In all cases, thresholds based on frequencies or probabilities could be used to separate regions of normal operation from anomalies.

[0083] Additional strategies for cyberattack detection which can be employed in place or in combination with the time-based strategy include but are not limited to range of operator commands issued per time window, range of operator visualization commands per time window (to detect rogue operator trying to understand the system), i.e. HMI navigation patterns, consideration of different classes of process state, identification of patterns related to different days of the month, or months of the year, identification of patterns which are specific to each operator user, and specialized use case where we take into consideration the use of administrative privileges to the system.

[0084] FIG. 9 illustrates a model training pipeline 900 that can be used to train a classifier 906 that can then be used for anomaly detection 510. As illustrated in FIG. 9, logs 908 are used as event data 512. The logs 908 are simply historical data for the operation of the system (e.g., the turbogenerator system). The data in the logs 908 is provided for event embedding 502, time window embedding 504, and clustering 506 as described with regard to FIG. 5. The clustering 506 ultimately results in a number of system states 904 or a plurality of system states 904. [0085] A portion of the data contained in the logs 908 can be reused as if it is real-time data in order to develop and test the classifier 906. If used for testing, the data is again provided for event embedding 502 and time window embedding 504 such that the data is converted to a plurality of real-time master vectors.

[0086] Each of the real-time master vectors and the system states 904 are provided for classifier training 902. During training and testing, the classifier 906 which can include machine learning or other AI-based techniques analyzes each real-time master vector and assigns a predicted system state to that real-time master vector. The assigned predicted system state can then be compared to the system state determined for that master vector through clustering 506 and the classifier 906 adjusted until they match. Using this process yields a classifier 906 that is capable of predicting the system state for a real-time master vector without performing a clustering analysis.

[0087] FIG. 10 illustrates a detection system 1000 that could be used to detect anomalies in both system states and user actions. The detection system 1000 receives real-time data from new logs 1010 and that data is processed through event embedding 502 and time window embedding 504 to produce real-time master vectors for each time window in the real-time data.

[0088] The real-time master vectors are used in a state anomaly detection routine 1006 to determine if the current or real-time state of the system is itself an anomaly. The state anomaly detection routine 1006 utilizes the classifier 906 to assign a predicted state to each of the real time master vectors generated by the time window embedding 504. The anomaly decision 1008 then compares those states to the known operating states of the system and if there is no match, the predicted state is identified as anomalous. In other words, the detection system 1000 includes anomaly detection based on the state representation itself. If the time window embedding is not close enough to any of the clusters identified during training, it is considered to be anomalous.

[0089] In some constructions, an optional second anomaly detection routine 1004 is provided as part of the detection system 1000. After the first anomaly decision 1008, the second anomaly detection routine 1004 can be initiated if the anomaly decision 1008 indicated no anomaly or it could be initiated regardless of the results of the anomaly decision 1008. In order to perform the second anomaly detection routine 1004, the real-time master vectors are passed from the time window embedding 504 to a classification routine 1002. The classification routine 1002 includes the classifier 906 and classifies the real-time master vectors to determine predicted real-time states. Alternatively, the predicted real-time states can be passed from the state anomaly detection routine 1006. In addition, the associated user or system actions for each state are passed from the action association 508 to the classification routine 1002. This information is then used by the second anomaly detection routine 1004 along with the actual or real-time actions 1012 associated with the real-time master vector. A comparison is made between the real-time actions 1012 and the associated user or system actions for the state in which the real-time master vector is classified. If the comparison shows that the actions do not match, the real-time actions are deemed anomalous. As discussed, the associated actions can be probabilities of one or more actions. In this case, a threshold value may be set to determine if the actions do not match.

[0090] In this case, the anomaly is not based on the time window of events itself being different from usual but on other information associated with the resulting discrete state. For example, each time the operator or user performs a certain action, the system is in a first state, if this same action is now performed when the system is in a second state, this can be considered an anomaly.

[0091] States as defined here can also be applied in alternative settings for anomaly detection. One way of doing this would be to model the transition between states, for instance in the form of a Markov chain, and use information associated with this transition, e.g., the transition probability in the case of a Markov chain, for detecting anomalies. In this case, a threshold could be defined such that transitions with probability lower than the threshold would be considered anomalies.

[0092] The detection system 1000 can be used to detect both state anomalies, and user action anomalies in many different systems including the turbogenerator discussed with regard to FIG. 5. Continuing that example, the arrangement of FIG. 9 can use stored data logs 908 that include event data 512 as well as user or system action data associated with the event data 512. The data is used to train the classifier 906 for use in the detection system 1000 of FIG. 10. While actual operating data is used for the training, other similar data, from similar engines for example, may be employed for training. [0093] During operation of the process (e.g., power plant or turbogenerator), data is constantly collected and stored in the new logs 1010. That data could be analyzed in real-time to detect anomalies or could be reviewed periodically after it is collected. The data is converted to real-time master vectors as has been described and is analyzed to determine if an anomalous state exists. For example, the process could have a normal operation state, a base load state, a load following state, and the like. In addition, the control system, which houses the detection system 1000 could have an on-line state, an offline state, and any number of other states. If these were the only states and the real-time master vectors did not fit into any of these states, the detection system 1000 would identify that condition as an anomalous state.

[0094] In addition, the detection system 1000 can determine which state the real-time master vectors fall under and can include associated user or system actions. These would be the actions normally taken by the system or an operator during operation in the particular state. As noted, the actions can be specific actions or can be probabilities of a particular action. In addition, multiple actions can be associated with a given state.

[0095] The new logs 1010 contain the actual actions taken by the system and the user and these actual or real-time user and system actions can be compared to the associated user and system actions for the particular state. If the real-time user and system actions do not match the associated user and system actions (or fall within an acceptable limit), the detection system 1000 can identify the action as an anomalous action.

[0096] The detection system 1000 provides easy ways of combining multiple sources, possibly heterogeneous data sources into a standard representation which can comprehensively describe the state of the system or process of interest for proper evaluation of anomalies. This can potentially provide better performance in terms of greater detection rate and lower false alarm rates compared to existing solutions and provide information for better diagnosing the nature of anomalies.

[0097] With reference to FIG. 11, an example system 1100 is described that enables operation of the anomaly detection system 500, 1000 described herein. The system 1100 employs at least one data processing system 1102. A data processing system may comprise at least one processor 1116 (e.g., a microprocessor/ CPU). The processor 1116 may be configured to carry out various processes and functions described herein by executing from a memory 1126, computer/processor executable instructions 1128 corresponding to one or more applications 1130 (e.g., software and/or firmware) or portions thereof that are programmed to cause the at least one processor to carry out the various processes and functions described herein.

[0098] Such a memory 1126 may correspond to an internal or external volatile or nonvolatile processor memory 1118 (e.g., main memory, RAM, and/or CPU cache), that is included in the processor and/or in operative connection with the processor. Such a memory may also correspond to non-transitory nonvolatile storage device 1120 (e.g., flash drive, SSD, hard drive, ROM, EPROMs, optical discs/drives, or other non-transitory computer readable media) in operative connection with the processor.

[0099] The described data processing system 1102 may optionally include at least one display device 1112 and at least one input device 1114 in operative connection with the processor 1116. The display device, for example, may include an LCD or AMOLED display screen, monitor, VR headset, projector, or any other type of display device capable of displaying outputs from the processor. The input device, for example, may include a mouse, keyboard, touch screen, touch pad, trackball, buttons, keypad, game controller, gamepad, camera, microphone, motion sensing devices that capture motion gestures, or other type of input device capable of providing user inputs or other information to the processor.

[0100] The data processing system 1102 may be configured to execute one or more applications 1130 that facilitates the features described herein. Such an application, for example, may correspond to a component included as part of the anomaly detection system 500, 1000 described below.

[0101] For example, as illustrated in FIG. 11, the at least one processor 1116 may be configured via executable instructions 1128 (e.g., included in the one or more applications 1130) included in at least one memory or data store 1104 to operate the anomaly detection system 500, 1000, a graphical user interface (GUI), or other programs, systems, or software.

[0102] While the methodology is described as being a series of acts that are performed in a sequence, it is to be understood that the methodology may not be limited by the order of the sequence. For instance, unless stated otherwise, some acts may occur in a different order than what is described herein. In addition, in some cases, an act may occur concurrently with another act. Furthermore, in some instances, not all acts may be required to implement a methodology described herein.

[0103] It should be appreciated that this described methodology may include additional acts and/or alternative acts corresponding to the features described previously with respect to the data processing system 1100.

[0104] It is also important to note that while the disclosure includes a description in the context of a fully functional system and/or a series of acts, those skilled in the art will appreciate that at least portions of the mechanism of the present disclosure and/or described acts may be capable of being distributed in the form of computer/processor executable instructions 1128 (e.g., software/firmware applications 1130) contained within a storage device 1120 that corresponds to a non-transitory machine-usable, computer-usable, or computer- readable medium in any of a variety of forms. The computer/processor executable instructions 1128 may include a routine, a sub-routine, programs, applications, modules, libraries, and/or the like. Further, it should be appreciated that computer/processor executable instructions may correspond to and/or may be generated from source code, byte code, runtime code, machine code, assembly language, Java, JavaScript, Python, Julia, C, C#, C++ or any other form of code that can be programmed/configured to cause at least one processor to carry out the acts and features described herein. Still further, results of the described/claimed processes or functions may be stored in a computer-readable medium, displayed on a display device, and/or the like.

[0105] It should be appreciated that acts associated with the above-described methodologies, features, and functions (other than any described manual acts) may be carried out by one or more data processing systems 1102 via operation of one or more of the processors 1116. Thus, it is to be understood that when referring to a data processing system, such a system may be implemented across several data processing systems organized in a distributed system in communication with each other directly or via a network.

[0106] As used herein a processor corresponds to any electronic device that is configured via hardware circuits, software, and/or firmware to process data. For example, processors described herein may correspond to one or more (or a combination) of a microprocessors, CPU, GPU or any other integrated circuit (IC) or other type of circuit that is capable of processing data in a data processing system 1102. As discussed previously, the processor 1116 that is described or claimed as being configured to carry out a particular described/claimed process or function may correspond to a CPU that executes computer/processor executable instructions 1128 stored in a memory 1126 in the form of software to carry out such a described/claimed process or function. However, it should also be appreciated that such a processor may correspond to an IC that is hardwired with processing circuitry (e.g., an FPGA or ASIC IC) to carry out such a described/claimed process or function. Also, it should be understood, that reference to a processor may include multiple physical processors or cores that are configured to carry out the functions described herein. In addition, it should be appreciated that a data processing system and/or a processor may correspond to a controller that is operative to control at least one operation.

[0107] In addition, it should also be understood that a processor that is described or claimed as being configured to carry out a particular described/claimed process or function may correspond to the combination of the processor 1116 with the executable instructions 1128 (e.g., software/firmware applications 1130) loaded/installed into the described memory 1126 (volatile and/or non-volatile), which are currently being executed and/or are available to be executed by the processor to cause the processor to carry out the described/claimed process or function. Thus, a processor that is powered off or is executing other software, but has the described software loaded/stored in a storage device 1120 in operative connection therewith (such as on a hard drive or SSD) in a manner that is available to be executed by the processor (when started by a user, hardware and/or other software), may also correspond to the described/claimed processor that is configured to carry out the particular processes and functions described/claimed herein.

[0108] FIG. 12 illustrates a further example of a data processing system 1200 with which one or more embodiments of the data processing system 1102 described herein may be implemented. For example, in some embodiments, the at least one processor 1116 (e.g., a CPU/GPU) may be connected to one or more bridges/buses/controllers 1202 (e.g., a north bridge, a south bridge). One of the buses for example, may include one or more I/O buses such as a PCI Express bus. Also connected to various buses in the depicted example may include the processor memory 1118 (e.g., RAM) and a graphics controller 1204. The graphics controller 1204 may generate a video signal that drives the display device 1112. It should also be noted that the processor 1116 in the form of a CPU/GPU or other processor may include a memory therein such as a CPU cache memory. Further, in some embodiments one or more controllers (e.g., graphics, south bridge) may be integrated with the CPU (on the same chip or die). Examples of CPU architectures include IA-32, x86-64, and ARM processor architectures.

[0109] Other peripherals connected to one or more buses may include communication controller 1214 (Ethernet controllers, WiFi controllers, cellular controllers) operative to connect to a network 1222 such as a local area network (LAN), Wide Area Network (WAN), the Internet, a cellular network, and/or any other wired or wireless networks or communication equipment. The data processing system 1200 may be operative to communicate with one or more servers 1224, and/or any other type of device or other data processing system, that is connected to the network 1210. For example, in some embodiments, the data processing system 1200 may be operative to communicate with a memory 1126. Examples of a database may include a relational database (e.g., Oracle, Microsoft SQL Server). Also, it should be appreciated that is some embodiments, such a database may be executed by the processor 1116.

[0110] Further components connected to various busses may include one or more I/O controllers 1212 such as USB controllers, Bluetooth controllers, and/or dedicated audio controllers (connected to speakers and/or microphones). It should also be appreciated that various peripherals may be connected to the I/O controller(s) (via various ports and connections) including the input device 1114, and an output device 1206 (e.g., printers, speakers) or any other type of device that is operative to provide inputs to and/or receive outputs from the data processing system.

[0111] Also, it should be appreciated that many devices referred to as input devices or output devices may both provide inputs and receive outputs of communications with the data processing system 1200. For example, the processor 1116 may be integrated into a housing (such as a tablet) that includes a touch screen that serves as both an input and display device. Further, it should be appreciated that some input devices (such as a laptop) may include a plurality of different types of input devices (e.g., touch screen, touch pad, and keyboard). Also, it should be appreciated that other hardware 1208 connected to the I/O controllers 1212 may include any type of device, machine, sensor, or component that is configured to communicate with a data processing system. [0112] Additional components connected to various busses may include one or more storage controllers 1210 (e.g., SATA). A storage controller 1210 may be connected to a storage device 1120 such as one or more storage drives and/or any associated removable media. Also, in some examples, a storage device 1120 such as an NVMe M.2 SSD may be connected directly to a bus 1202 such as a PCI Express bus.

[0113] It should be understood that the data processing system 1200 may directly or over the network 1222 be connected with one or more other data processing systems such as a server 1224 (which may in combination correspond to a larger data processing system). For example, a larger data processing system may correspond to a plurality of smaller data processing systems implemented as part of a distributed system in which processors associated with several smaller data processing systems may be in communication by way of one or more network connections and may collectively perform tasks described as being performed by a single larger data processing system.

[0114] A data processing system in accordance with an embodiment of the present disclosure may include an operating system 1216. Such an operating system may employ a command line interface (CLI) shell and/or a graphical user interface (GUI) shell. The GUI shell permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application. A cursor or pointer in the graphical user interface may be manipulated by a user through a pointing device such as a mouse or touch screen. The position of the cursor/pointer may be changed and/or an event, such as clicking a mouse button or touching a touch screen, may be generated to actuate a desired response. Examples of operating systems that may be used in a data processing system may include Microsoft Windows, Linux, UNIX, iOS, macOS, and Android operating systems.

[0115] As used herein, the processor memory 1118, storage device 1120, and memory 1126 may all correspond to the previously described memory 1126. Also, the previously described applications 1130, operating system 1216, and data 1220 may be stored in one more of these memories or any other type of memory or data store. Thus, the processor 1116 may be configured to manage, retrieve, generate, use, revise, and/or store applications 1130, data 1220 and/or other information described herein from/in the processor memory 1118, storage device 1120 and/or memory 1126. [0116] In addition, it should be appreciated that data processing systems may include virtual machines in a virtual machine architecture or cloud environment that execute the executable instructions. For example, the processor and associated components may correspond to the combination of one or more virtual machine processors of a virtual machine operating in one or more physical processors of a physical data processing system 1200. Examples of virtual machine architectures include VMware ESCi, Microsoft Hyper -V, Xen, and KVM. Further, the described executable instructions 1128 may be bundled as a container that is executable in a containerization environment such as Docker executed by the processor 1116.

[0117] Also, it should be noted that the processor described herein may correspond to a remote processor located in a data processing system such as a server that is remote from the display and input devices described herein. In such an example, the described display device and input device may be included in a client data processing system (which may have its own processor) that communicates with the server (which includes the remote processor) through a wired or wireless network (which may include the Internet). In some embodiments, such a client data processing system, for example, may execute a remote desktop application or may correspond to a portal device that carries out a remote desktop protocol with the server in order to send inputs from an input device to the server and receive visual information from the server to display through a display device. Examples of such remote desktop protocols include Teradici's PCoIP, Microsoft's RDP, and the RFB protocol. In another example, such a client data processing system may execute a web browser or thin client application. Inputs from the user may be transmitted from the web browser or thin client application to be evaluated on the server, rendered by the server, and an image (or series of images) sent back to the client data processing system to be displayed by the web browser or thin client application. Also, in some examples, the remote processor described herein may correspond to a combination of a virtual processor of a virtual machine executing in a physical processor of the server.

[0118] Those of ordinary skill in the art will appreciate that the hardware and software depicted for the data processing system may vary for particular implementations. The depicted examples are provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure. Also, those skilled in the art will recognize that, for simplicity and clarity, the full structure and operation of all data processing systems suitable for use with the present disclosure is not being depicted or described herein. Instead, only so much of a data processing system as is unique to the present disclosure or necessary for an understanding of the present disclosure is depicted and described. The remainder of the construction and operation of the data processing system 1200 may conform to any of the various current implementations and practices known in the art.

[0119] Although an exemplary embodiment of the present disclosure has been described in detail, those skilled in the art will understand that various changes, substitutions, variations, and improvements disclosed herein may be made without departing from the spirit and scope of the disclosure in its broadest form.

[0120] None of the description in the present application should be read as implying that any particular element, step, act, or function is an essential element, which must be included in the claim scope: the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke a means plus function claim construction unless the exact words "means for" are followed by a participle.

[0121] Although an exemplary embodiment of the present disclosure has been described in detail, those skilled in the art will understand that various changes, substitutions, variations, and improvements disclosed herein may be made without departing from the spirit and scope of the disclosure in its broadest form.

[0122] None of the description in the present application should be read as implying that any particular element, step, act, or function is an essential element, which must be included in the claim scope: the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke a means plus function claim construction unless the exact words "means for" are followed by a participle.

Claims

CLAIMS What is claimed is:

1. A method for detecting a cybersecurity event in a system, the method comprising: collecting data from at least one of an access monitoring system, an operational data system, and a user navigation monitoring system; determining a system state based at least in part on the collected data; comparing the collected data to the determined state to identify a first anomaly for which the collected data is not indicative of the determined system state; identifying a second anomaly in a second system that in combination with the first anomaly is indicative of a cyber security threat; and identifying the first anomaly and the second anomaly as a cybersecurity threat.

2. The method of claim 1, wherein the access monitoring system collects access data, the operational data system collects operational data, and the user navigation monitoring system collects user navigation data, and wherein each of the access data, the operational data, and the user navigation data includes discrete events and each event includes a time stamp.

3. The method of claim 2, further comprising integrating the access data, the operational data, and the user navigation data, and clustering the data at least in part based on the content of the access data, the operational data, and the user navigation data.

4. The method of claim 2, wherein the user navigation monitoring system tracks operator navigation through HMI (human machine interface) screens and keystrokes through a keyboard.

5. The method of claim 1, wherein the detecting step includes operating an AI- based trained model that operates to detect the first anomaly in the first system.

6. The method of claim 1, wherein the access monitoring system performs a user behavior analytics step to determine if a user’s actions are indicative of an anomaly.

7. The method of claim 6, wherein the user behavior analytics step includes defining specific events that occur within the system, constructing a spatio-temporal behavioral model operable to determine if one of the specific events occurs in an unexpected sequence of the specific events and to identify that the specific event as an anomaly.

8. A method for detecting a cybersecurity event in a system, the method comprising: integrating data received from a plurality of data collection systems, the plurality of data collection systems including an operational data system that collects operational data from an industrial process and a user navigation monitoring system that collects activity data generated by operator activity; determining a system state based at least in part on the integrated data; identifying a first anomaly in the data collected by the operational data system, the first anomaly being inconsistent with the determined system state; evaluating the data collected by the user navigation monitoring system that is related to the first anomaly; and identifying the first anomaly as a cybersecurity threat in response to the evaluating step identifying data that is inconsistent with the determined system state.

9. The method of claim 8, wherein plurality of data collection systems further comprise an access monitoring system that collects access data, and wherein each of the access data, the operational data, and the activity data includes discrete events and each event includes a time stamp.

10. The method of claim 9, further comprising integrating the access data, the operational data, and the activity data and clustering the data at least in part based on the content of the access data, the operational data, and the activity data.

11. The method of claim 9, wherein the user navigation monitoring system tracks operator navigation through HMI (human machine interface) screens and keystrokes through a keyboard.

12. The method of claim 9, wherein the access monitoring system performs a user and entity behavior analytics step to determine if a user’s actions are indicative of an anomaly.

13. The method of claim 12, wherein the user and entity behavior analytics step includes defining specific events that occur within the system, constructing a spatio- temporal behavioral model operable to determine if one of the specific events occurs in an unexpected sequence of the specific events and to identify that specific event as an anomaly.

14. The method of claim 8, wherein the identifying the first anomaly step includes operating an AI-based trained model that operates to detect the first anomaly in the first system.

15. A method for detecting a cybersecurity event in a system, the method comprising: collecting data points from a plurality of monitored data points; clustering the data points based on a characteristic of the data points; determining a system state based on the clustered data points; identifying at least one state anomaly from the plurality of monitored data points, the state anomaly being indicative of operation outside of the determined system state; monitoring user actions; associating the user actions with the determined system state; identifying at least one user anomaly from the monitored user actions, the user anomaly indicating that at least one of the user actions is outside of an expected user action for the determined system state; integrating the state anomaly and the user anomaly to determine if the combination of anomalies is indicative of a cybersecurity event.

16. The method of claim 15, wherein the monitored data points include a plurality of operating data points.

17. The method of claim 15, wherein the system state comprises one of a time and an operating state of the system.

18. The method of claim 15, further comprising monitoring system access data, associating the system access data with the determined state, and identifying at least one system access anomaly from the monitored system access data, the system access anomaly indicating that at least one of the system access data is outside of the expected system access data for the determined system state.

19. The method of claim 15, wherein the plurality of monitored data points include user navigation data collected from a user navigation monitoring system that tracks operator navigation through HMI (human machine interface) screens, keystrokes through a keyboard, and control inputs through any one of a plurality of system control devices.

20. The method of claim 15, wherein the plurality of monitored data points include process sensor data collected from operational data from the system.