US20200026594A1 - System and method for real-time detection of anomalies in database usage - Google Patents

System and method for real-time detection of anomalies in database usage Download PDF

Info

Publication number
US20200026594A1
US20200026594A1 US16/562,950 US201916562950A US2020026594A1 US 20200026594 A1 US20200026594 A1 US 20200026594A1 US 201916562950 A US201916562950 A US 201916562950A US 2020026594 A1 US2020026594 A1 US 2020026594A1
Authority
US
United States
Prior art keywords
events
data streams
anomaly
data
streams
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/562,950
Inventor
Donald Steiner
John Day
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northrop Grumman Systems Corp
Original Assignee
Northrop Grumman Systems Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northrop Grumman Systems Corporation filed Critical Northrop Grumman Systems Corporation
Priority to US16/562,950 priority Critical patent/US20200026594A1/en
Publication of US20200026594A1 publication Critical patent/US20200026594A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/86Event-based monitoring

Definitions

  • Embodiments are in the technical field of database or application usage. More particularly, embodiments disclosed herein relate to systems and methods for real-time detection of anomalies in database or application usage which, inter alia, foster discovery of patterns of behavior and anomalies from across a plurality of heterogeneous data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream alone.
  • Real-time Monitoring of Suspected Individuals If an individual is suspected of malicious activity, then real-time monitoring mechanisms can be configured and installed to directly monitor that individual's activity and detect any malicious activity. These mechanisms, however, are time consuming to install and also require dedicated analysts to conduct the real-time monitoring and detection, often at great expense to the affected enterprise.
  • the insider threat remains one of the most significant problems confronting enterprises and government agencies of all sizes today.
  • the threat is multi-faceted with a high degree of variability in the perpetrator, the type of attack, the intent of the attack, and the access means.
  • No solution today adequately addresses the detection of insider threats due to the highly variable nature of the problem.
  • No existing systems or solutions takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these plurality of data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream alone.
  • Embodiments are directed to a method for real-time detection of anomalies.
  • the method comprises: receiving a plurality of heterogeneous data streams, wherein the heterogeneous data streams are received from at least two of a group consisting of agents located at databases, agents located at applications, audit programs located at user workstations, and sensors located in, or at access points to, a network; correlating the heterogeneous data streams, wherein the correlation identifies corresponding events in different ones of the heterogeneous data streams; identifying patterns of events across the correlated heterogeneous data streams; building a model of normalcy from the identified pattern of events, wherein the model of normalcy is stored in an analysis database; creating rules that determine how and whether anomalies are detected, how a detected anomaly is treated and characterized, and what reaction to employ upon detection of the anomaly; receiving a plurality of additional heterogeneous data streams from the at least two of a group consisting of the agents, audit programs, and sensors; applying, using an analysis engine, the model of normalc
  • the correlating is performed using a complex event processor for con ⁇ elating the heterogeneous data streams and integrating the heterogeneous data streams into a single integrated data stream.
  • the correlating may include synchronizing the time of each of the heterogeneous data streams in order to correlate events across time.
  • the correlating may be performed by application of a small-space algorithm (SSA).
  • SSA small-space algorithm
  • the analysis engine may use the complex event processor for applying the model of normalcy and rules to the additional heterogeneous data streams and for analyzing data from the additional heterogeneous data streams against the model of normalcy and rules.
  • the heterogeneous data streams and/or the additional heterogeneous data streams may be processed using an automatic event tabulator and correlator (AETAC) algorithm to reduce event data complexity and facilitate search, retrieval, and correlation of the event data to thereby produce uniform event data whereby complex event processing of the heterogeneous data streams and/or the additional heterogeneous data streams by the complex event processor is simplified.
  • AETAC automatic event tabulator and correlator
  • the identified pattern of events may be associated with ordinary, authorized, and benign database usage, workstation usage, user behavior or application usage.
  • the heterogeneous data streams may be multi-modal asynchronous signals.
  • one of the heterogeneous data streams corresponds to an ordinary, authorized, and benign database query and another of the heterogeneous data streams corresponds to an ordinary, authorized, and benign user interaction at the user workstation, and wherein the identified pattern of events is the ordinary, authorized, and benign database query and the ordinary, authorized, and benign user interaction at the user workstation.
  • the detecting step may detect the anomaly when an event within the additional heterogeneous data streams corresponding to a database query is not preceded by a user interaction at the user workstation within a predetermined period of time resulting in the event not fitting the model of normalcy.
  • the alert may comprise at least one of a group consisting of alarm message, a communication triggering further analysis and/or action, a command instructing the restriction or shutting down of an affected workstation, database, network or network access, initiation of additional targeted monitoring, analysis, and/or applications to capture additional detailed information regarding an attack, continued monitoring of a user, placement of a flag in a file for further follow-up, restricting access to a network, alerting security, and restricting or locking down a building or a portion of a building.
  • the heterogeneous data streams may be received by the analysis engine.
  • the databases, applications, workstations, or networks may be in an enterprise environment.
  • Embodiments are also directed to system for real-time detection of anomalies.
  • the system includes one or more computers, each computer including a processor and memory, wherein the memory includes instructions that are executed by the processor for performing the above-mentioned method.
  • FIG. 1 is a block diagram illustrating an embodiment of a system for real-time detection of anomalies in database or application usage.
  • FIG. 2 is a block diagram illustrating an example of multiple data stream event correlation and anomaly detection.
  • FIGS. 3A and 3B are flowcharts illustrating an embodiment of a method for real-time detection of anomalies in database or application usage.
  • FIG. 4 is a block diagram illustrating exemplary hardware for implementing an embodiment of a system and method for real-time detection of anomalies in database or application usage.
  • FIG. 5 is a graph illustrating space compression (SC) versus epsilon.
  • FIG. 6 is a graph illustrating false negative rate (FNR) versus epsilon.
  • Embodiments overcome the problems described above.
  • Embodiments address the above problems and improves the state of the art by providing an automated mechanism to continuously monitor ALL users and systems for abnormalities, and automatically alerting on violations and deviations from expected behaviors as they are occurring in real time without incurring heavy overhead expenses in human time and labor.
  • Embodiments provide a system and method that takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream.
  • Embodiments provide a mechanism to detect anomalies in database access and usage, such as data exfiltration attempts, first by identifying correlations (e.g., patterns or models of normalcy) in events across different relevant yet heterogeneous data streams (such as those associated with ordinary, authorized and benign database usage, workstation usage, user behavior or application usage) and second by identifying deviations from these patterns or models of normalcy across data streams in real time as data is being accessed.
  • correlations e.g., patterns or models of normalcy
  • relevant yet heterogeneous data streams such as those associated with ordinary, authorized and benign database usage, workstation usage, user behavior or application usage
  • Embodiments identify and alert in real-time (i. e., as events are occurring, not after the fact) insider threat attacks targeting database and systems using databases, such as file storage and sharing systems.
  • threats may involve unauthorized manipulation and falsification of data, sabotage of databases, and exfiltration of data.
  • Data exfiltration refers to users (or agents and/or software systems acting on behalf of users, possibly unknown to the user) illicitly accessing, retrieving, and downloading data that is confidential and proprietary to an enterprise, often with the malicious intent of distributing the data outside the enterprise for personal gain or simply detriment to the enterprise.
  • Embodiments of a system and method for real-time detection of anomalies in database or application usage may include a mechanism and processes that provide and analyze, in real-time, a variety of heterogeneous streams within the enterprise comprising an amalgam of relevant events pertaining to data access, user behavior, and computer and network activity. Taken together, these event streams can identify anomalous system behavior that is indicative of insider threats and data exfiltration. Embodiments can identify anomalous system behavior that cannot be identified from analyzing any single event stream on its own.
  • Such relevant enterprise event streams that may be monitored and analyzed by embodiments described herein include, but are not limited to:
  • embodiments of the system and method Using complex event processing (CEP) and unsupervised or semi-supervised machine learning techniques, embodiments of the system and method develop models of normalcy correlating the events derived from the selected enterprise streams (e.g., the above-identified streams) based on typical (authorized and benign) behavior of the users, computers, databases, applications, and networks. These models are then used by embodiments of the system and method to identify anomalies in the event streams that may be predictive or indicative of insider threat attacks, including unauthorized data manipulation, falsification, sabotage and data exfiltration.
  • CEP complex event processing
  • unsupervised or semi-supervised machine learning techniques develop models of normalcy correlating the events derived from the selected enterprise streams (e.g., the above-identified streams) based on typical (authorized and benign) behavior of the users, computers, databases, applications, and networks. These models are then used by embodiments of the system and method to identify anomalies in the event streams that may be predictive or indicative of insider threat attacks, including unauthorized data manipulation, falsification
  • a key inventive feature of embodiments described herein is the application of automated machine learning concepts for correlating events and detecting anomalies across heterogeneous data streams to the specific data streams relating to database, user, application, computer, and network activity in order to detect insider threats and data exfiltration attempts taking all the available information into account while it is happening in real time. Monitoring of any individual stream alone will not detect all anomalous events.
  • Existing technologies do not provide these advantages. For example:
  • Database monitoring Stand-alone systems such as IBM® InfoSphere Guardium monitor and analyze database activity. However, such systems do not take other user, application, or network activity into account.
  • Stand-alone systems such as Centrifytt DirectAudit monitor and analyze user activity in real time on a workstation or desktop. However, such systems do not take other application, database, or network activity into account.
  • Stand-alone systems such as SNORT monitor and analyze network activity in an enterprise network. Such systems do not take user activity, application, or database activity into account.
  • No system takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream.
  • System 100 may include an analysis engine 102 and an analysis database 104 .
  • Analysis engine 102 may receive data streams, such as the data streams described above.
  • analysis engine 102 may receive data streams indicative of user, database or other data access, application, computer and network behavior and activity.
  • data streams may be generated and received from various agents, sensors and audit programs located at workstations, in networks or at network access points, data storage (e.g., database) locations, and data processing (e.g., application) locations.
  • system 100 may include monitoring agents (A), such as IBM Guardium agents, located at (operating on) databases 106, agents (A) located at applications (e.g., SharePoint) 107 , direct audit programs (DA), such as Centrify® DirectAudit, located at (operating on) user workstations 108 , and network sensors (NS), such as OpenNMS, located in, or at access points to, a network(s) 110 .
  • monitoring agents such as IBM Guardium agents, located at (operating on) databases 106
  • agents (A) located at applications (e.g., SharePoint) 107 agents
  • DA direct audit programs
  • DA Centrify® DirectAudit
  • NS network sensors
  • Each type of agent, program or sensor may produce different types of data streams.
  • IBM® Guardium agents may generate data streams indicative of database interaction on a database server including a timestamp, client machine IP, database user ID, database server IP, and the database query (SQL query).
  • Centrify® DirectAudit may generate a data stream indicative of user interaction on a user machine/workstation including a timestamp, machine user ID and user commands (e.g., as typed into TTY/Shell).
  • the output of some agents, programs and sensors may be modified to work with embodiments described herein. For example, some agents, programs and sensors produce GUI output. Scripts and other mechanisms to extract relevant data and output to, e.g., syslogger, may be used.
  • Analysis engine 102 may include a complex event processor (CEP) that correlates the multiple data streams and integrates such data streams into an integrated data stream. Such correlation may include synchronizing the time of each data stream in order to correlate events across time.
  • the streams of data may include raw, meta and derived data.
  • the CEP platform may ingest multi-modal asynchronous signals from heterogeneous sources.
  • the analysis engines 102 CEP platform may then apply the models of normalcy to the integrated data stream.
  • analysis engine 102 may receive and process such data streams to (a) determine models or patterns of normalcy and (b) to analyze real-time behavior and activity against such models or patterns of normalcy in order to detect anomalies.
  • analysis engine 102 detects events in the data streams, compares the events, or more particularly, the patterns of events in the data streams, against the models of normalcy per rules that are based on such models and designed to enable the analysis engine 102 to determine when a variance from the model of normalcy is indicative of an anomaly, and, when an anomaly is detected as a result of such comparison and application of such rules, issues an alert.
  • An alert may include an alarm message(s), communications to relevant personal triggering further analysis and/or action, commands instructing the shutting down of the affected workstation, database, network or network access, or starting of more targeted monitoring and analysis systems, applications and/or other efforts to capture more detailed information regarding an attack.
  • Results of detection analysis may be stored in analysis database 104 .
  • analysis engine 102 receives data streams that result from controlled, known typical (authorized and benign) behavior of the users, computers, databases, applications, and networks, analyzes such data streams to determine the pattern of events resulting from such typical behavior and builds a model of the patterns of events occurring during such typical behavior.
  • Analysis engine 102 may generate the aforementioned rules based on the models of normalcy built from such patterns. Ordinarily, the greater the amount of such typical behavior that is analyzed and used to build models of normalcy, the larger number of typical patterns of events may be recognized and incorporated into the models.
  • model 100 may build different models of normalcy that are applicable to different operating conditions and which are applied by analysis engine 102 according to the prevalent operating condition.
  • the models of normalcy and rules may be stored in analysis database 104 .
  • embodiments may identify correlations or patterns of behavior through a variety of monitoring agents, programs and sensors and then identify anomalies by detecting deviations from the patterns in real-time.
  • Embodiments of system 100 may develop the models of normalcy and rules through machine learning techniques applied by the CEP platform. Such machine learning techniques may be unsupervised or semi-supervised. As system 100 operates, analysis engine 102 may continue to apply machine learning techniques to further update the models and rules.
  • a user makes regular (authorized) queries to a database.
  • An Advanced Persistent Threat (APT) malware is installed on the user's machine.
  • the APT makes an (unauthorized) query to the database.
  • Pattern Database query is preceded by user interaction
  • FIG. 2 shown is an illustration of data streams received from a user X workstation audit program, e.g., Centrify® DirectAudit, and a database monitoring agent, e.g., IBM®s Guardium.
  • the audit program stream shows the keyboard interactions of user X.
  • the database monitoring agent stream may include numerous data queries, some from the user X workstation and others from other workstations, etc.
  • the CEP platform of analysis engine 102 may correlate the keyboard interactions from the user X audit data stream with the data queries shown by the agent data stream.
  • analysis engine 102 may determine that one of the data queries from user X is not correlated with/preceded by a keyboard interaction on user x machine. If the applicable model of normalcy indicates that typical data queries are always preceded by/correlated with a keyboard interaction, analysis engine 102 may characterize the uncorrelated data query as an anomaly (and, therefore, issue an alert).
  • the correlation between events in different data streams may be quite granular.
  • a model or pattern of normalcy may dictate that a certain event from one data stream is always preceded within, e.g., five (5) seconds, by a certain event from a second data stream.
  • the model of normalcy may dictate that the event from one data stream is always preceded by one or more of a variety of events from a second data stream.
  • Embodiments may apply a time window to detect correlations.
  • Embodiments may increase or decrease the time window used in order to increase or decrease the potential number of correlations.
  • Embodiments of analysis engine 102 may apply a small-space algorithm (SSA) to process the data streams and correlate events.
  • SSA is a new form of stream processing over distributed massive streams. SSA estimates frequently occurring items on a logarithmic space scale (tractable) and permits online extraction of persistent objects in a streaming network.
  • SSA was developed by Prof. Srikanta Tirthapura at Iowa State University.
  • SSA identifies persistent events in data streams. For the purposes of this description, an event is time-stamped data. A persistent event is time-stamped data that appears regularly over time. Characteristics of data streams typically require that all algorithms operate on data in a single pass. Events may be sparse, occur only infrequently, or even appear in different distributed streams.
  • Embodiments may use statistical data sampling to reduce size of stream without overlooking persistent events.
  • SSA determines associations between events in the data stream. Such associations may be temporal, spatial or generalized Associations over other metrics.
  • SSA makes use of “Frequency Moments” that estimate the total number of objects in a stream without having to search the entire stream.
  • SSA can learn the persistent events in a data stream without any prior knowledge and without having to track all of the events in the data stream.
  • Implementations of SSA perform association rule mining to exploit prior and collateral domain knowledge to increase the selectivity of event persistence detection. This decreases false negative errors and increase the ability to detect more transient events.
  • Embodiments identify anomalous behavior by finding associations (or correlations) between events occurring in different (distributed, heterogeneous) data streams.
  • Embodiments identify “patterns of normalcy” and monitor events for disruption from these patterns.
  • Data for making the predictions may come from sensor networks (e.g., agents, audit programs, network and other sensors) generating heterogeneous streams of observations (‘event’ loosely defined here as ‘time-stamped data’).
  • a challenge is to detect and recognize, from sensor samples, precursor events for “hidden” spatiotemporal processes.
  • a “na ⁇ ve” algorithm is a baseline or obvious way to perform a task, contrasted with the present embodiment's algorithm, which improves, optimizes, or otherwise enhances the naive algorithm.
  • the naive algorithm for SSA is merely to sample and count every packet in the stream. But these streams tend to be too large to be exhaustively sampled on current computing platforms, so SSA proposes a method involving subsampling to estimate the total counts, optimizing storage space at the expense of accuracy.
  • the epsilon parameter is used to tune SSA in order to achieve the optimal trade-off between accuracy and storage space.
  • correlation of events received from a plurality of various heterogeneous data streams often requires certain processing, modification and manipulation of the raw stream data. This processing, etc., enables embodiments of the system and method for real-time detection of anomalies in database or application usage to correlate heterogeneous data streams, detect events, correlate events across data streams and otherwise perform real-time detection.
  • An embodiment of a system and method for processing of heterogeneous data streams may be referred to as an automatic event tabulator and correlator (AETAC).
  • AETAC automatic event tabulator and correlator
  • Embodiments of AETAC include an algorithm that automatically tabulates and correlates event data collected by sensors and other automated data collection devices.
  • Embodiments of AETAC can process events of all types uniformly, even if the data definitions for each device are different (i.e., “heterogeneous”).
  • Embodiments of AETAC operate by imposing a mathematic structure (“homomorphism”) on each event type that makes all events look the same to the tabulating device and then further imposing a requirement that this structure be preserved through successive processing steps (“closure”), such that all outputs from the tabulator have the same mathematical structure as the input events.
  • This uniformity reduces data complexity and facilitates searches, retrievals and correlation of event data at any stage of processing, such that complex event processing (CEP) may be reduced to abstractions equivalent to evaluating simple mathematical expressions.
  • a purpose of AETAC is to improve the performance of complex event processing (CEP) systems that monitor large networks of sensors or other kinds of data collection devices.
  • Embodiments of AETAC reduce or eliminate the need to write customized code for each device that collects data.
  • Embodiments achieve this simplification by imposing several mild restrictions on the allowable formats of the data, which do not impede the functioning of the collection devices.
  • the resulting uniformity makes it easier to compare events across space and time and, consequently, increases the overall “situational awareness” of networks organized under AETAC principles.
  • AETAC reduces complexity by imposing a simple structure on events that requires the data conform to these mild restrictions:
  • Each generated list is discrete and unique, in the sense that it bears a unique identifier and timestamp. (This is easily accomplished by using one-up sequences or hash codes);
  • the lists can be decomposed into deterministic types, such that the composition of each list is fixed for each type. Note that this requirement is automatically fulfilled for most sensors, which use standardized packet protocols for defining data fields.
  • DETAIL algebraic structure
  • ID TIMESTAMP
  • TYPE can serve as an intrinsic structure for any event.
  • Additional fields generated are aggregated in such a manner that they form a DETAIL object, indexed by the ID.
  • DETAIL objects do not have to be uniquely defined, but are guaranteed to be a surjective mapping (one to many) or injective mapping (one to one) because the ID's are uniquely defined.
  • DETAIL fields may be optional (equivalent to endomorphic mapping) if the ID, TIMESTAMP, TYPE fields suffice to define an event completely.
  • closure property simply means that all transformations of events or results obtained through processing must also conform to the above restrictions. This guarantees that all operators used on the input events can also be applied recursively to outputs of transformations and other processing results.
  • a classification of an event would generate a “classification event” ⁇ (ID, TIMESTAMP, CLASSIFICATION EVENT) with a DETAIL record containing the results of the classification.
  • events in a data stream have been defined, conforming to the above restrictions, it is then possible to process events arriving through multiple, heterogeneous channels using a basic AETAC algorithm. (Note: in embodiments, events must be ordered by timestamp such that each arrival channel is an ordered time-series):
  • CEP Complex event processing
  • Embodiments of AETAC simplify this process by providing a shared mathematical structure to standardize the creation of event handlers and optimizing the reuse of code for allowing many different kinds of devices to use exactly the same code to process its data.
  • results and outputs of processing will also share this same structure and so can allow each layer of the system to automatically feed to the next layer recursively.
  • AETAC An innovative concept of embodiments of AETAC is the application of algebraic structure on the inputs and outputs of event processing, which allows AETAC to be an automatic ‘tabulator’ device for events, which can be easily correlated with other events because of the shared structure and closure properties.
  • embodiments of AETAC impose an algebraic structure on event processing that makes the processing independent of the type of event being processed, and insures consistent and uniform processing of events, where outputs are new kinds of events requiring further processing.
  • FIGS. 3A-3B shown is are flowcharts illustrating an embodiment of a method 300 for real-time detection of anomalies in database or application usage.
  • FIG. 3A illustrates portion or section of method 300 for real-time detection of anomalies in database or application usage that builds model of normalcy and rules as described above.
  • FIG. 3B illustrates portion or section of method 300 for real-time detection of anomalies in database or application usage that applies model of normalcy and rules to detect anomalies.
  • Method 300 may be repeated to continuously update the model of normalcy and rules through, e.g., machine learning techniques, as described above. Likewise, method 300 may be performed continuously as data streams are received to continuously detect anomalies in real-time.
  • Method 300 may implement SSA as described to build model of normalcy and rules and to detect anomalies.
  • method 300 receives a plurality of data streams, block 302 .
  • the received data streams may be heterogeneous data streams received from a plurality of agents, programs, and/or sensors, etc.
  • the data streams are correlated, block 304 .
  • the correlation 304 may include processing, e.g., with an embodiment of AETAC.
  • the correlation 304 identifies events in the various data streams.
  • Method 300 identifies patterns of events across the various data streams, block 306 . The patterns may provide indications of relations between events in different data streams under typical operating conditions.
  • Method 300 builds/creates a model or pattern of normalcy from the identified patterns of events, block 308 .
  • method 300 may build/create rules, block 310 , that determine how and whether anomalies are detected, how method 300 treats, characterizes and reacts to a detected anomaly, etc. For example, an event that may be characterized as an anomaly when occurring off-hours may not be an anomaly or an anomaly worth issuing an alert if occurring during normal business hours.
  • Method 300 may repeat 302 - 310 , block 312 , over time using machine learning techniques to continue to build and update 308 the model of normalcy and build and update 310 the rules.
  • method 300 may apply the model of normalcy and rules to operational behavior to detect anomalies.
  • Method 300 receives a plurality of data streams 314 .
  • the received data streams typically heterogeneous from a plurality of agents, programs, and/or sensors, etc., are processed (e.g., using an embodiment of AETAC) and the data from the data streams is analyzed against the model of normalcy and the rules, block 316 .
  • method 300 determines whether events are anomalous, thereby detecting anomalies, block 318 . If an anomaly is detected 318 , method 300 may determine the characteristics of the anomalous event(s), block 320 .
  • an anomaly may be a user accessing a secured server after hours. If the user does access a secured server from time to time after hours, such an anomaly may not trigger an alert. If, however, the user is accessing the server from his office after hours but the user did not “badge in” (i.e., user's employee badge was not read by security at entrance to the building), then the anomaly would trigger an alert. Consequently, the rules and the characteristics of an anomaly may determine whether to issue an alert, block 322 . For example, an anomaly indicating improper access to a server may only trigger continued monitoring of the user. The characteristics of an anomaly may also determine what type of alert to issue.
  • Some anomalies may require a simple flag in a file for further follow-up by a human agent monitoring anomalies.
  • Other anomalies may require immediate action, such as restricting or shutting off access to a network, locking down a building or portion of a building, alerting security, etc. If method 300 determines to issue an alert and the type of alert is determined, the alert is issued, block 324 . Method 300 may continue to repeat 314 to 324 , block 326 , so long as systems, etc., are being monitored.
  • Exemplary hardware implementation of system 100 may include multiple computing devices 400 (e.g., computing system N).
  • Computing devices 400 may be, e.g., blade servers or other stack servers.
  • each component shown in system 100 may be implemented as software running on one or more computing devices 400 .
  • components and functionality of each may be combined and implemented as software running on a single computing device 400 .
  • steps of method 300 may be implemented as software modules executed on one or more computing devices 400 .
  • Computing device 400 may include a memory 402 , a secondary storage device 404 , a processor 406 , and a network connection 408 .
  • Computing device 400 may be connected a display device 410 (e.g., a terminal connected to multiple computing devices 400 ) and output device 412 .
  • Memory 402 may include RAM or similar types of memory, and it may store one or more applications (e.g., software for performing functions or including software modules described herein) for execution by processor 406 .
  • Secondary storage device 404 may include a hard disk drive, DVD-ROM drive, or other types of non-volatile data storage.
  • Processor 406 executes the applications, which are stored in memory 402 or secondary storage 404 , or received from the Internet or other network 414 .
  • Network connection 408 may include any device connecting computing device 400 to a network 414 and through which information is received and through which information (e.g., analysis results) is transmitted to other computing devices.
  • Network connection 408 may include network connection providing connection to internal enterprise network, network connection provided connection to Internet or other similar connection.
  • Network connection 408 may also include bus connections providing connections to other computing devices 400 in system 100 (e.g., other servers in server stack).
  • Display device 410 may include any type of device for presenting visual information such as, for example, a computer monitor or flat-screen display.
  • Output device 412 may include any type of device for presenting a hard copy of information, such as a printer, and other types of output devices include speakers or any device for providing information in audio form.
  • Computing device 400 may also include input device, such as keyboard or mouse, permitting direct input into computing device 400 .
  • Computing device 400 may store a database structure in secondary storage 404 for example, for storing and maintaining information needed or used by the software stored on computing device 400 .
  • processor 402 may execute one or more software applications in order to provide the functions described in this specification, specifically in the methods described above, and the processing may be implemented in software, such as software modules, for execution by computers or other machines. The processing may provide and support web pages and other user interfaces.
  • computing device 400 is depicted with various components, one skilled in the art will appreciate that the servers can contain additional or different components.
  • aspects of an implementation consistent with the above are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on or read from other types of computer program products or computer-readable media.
  • the computer-readable media may include instructions for controlling a computer system, such as computing device 400 , to perform a particular method, such as method 300 .

Abstract

ABSTRACT A system and method for real-time detection of anomalies in database or application usage is disclosed. Embodiments provide a mechanism to detect anomalies in database or application usage, such as data exfiltration attempts, first by identifying correlations (e.g., patterns of normalcy) in events across different heterogeneous data streams (such as those associated with ordinary, authorized and benign database usage, workstation usage, user behavior or application usage) and second by identifying deviations/anomalies from these patterns of normalcy across data streams in real-time as data is being accessed. An alert is issued upon detection of an anomaly, wherein a type of alert is determined based on a characteristic of the detected anomaly.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation application of U.S. patent application Ser. No. 14/732,162 filed on Jun. 5, 2015 which claims priority of U.S. Provisional Application Ser. No. 62/009,736, filed on Jun. 9, 2014, which is hereby incorporated herein by reference in its entirety.
  • FIELD
  • Embodiments are in the technical field of database or application usage. More particularly, embodiments disclosed herein relate to systems and methods for real-time detection of anomalies in database or application usage which, inter alia, foster discovery of patterns of behavior and anomalies from across a plurality of heterogeneous data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream alone.
  • BACKGROUND
  • Significant damage, both reported and unreported, has been caused to enterprises, government agencies, and national security through “insider threat” attacks, especially data exfiltration. For example, consider recent retrieval and release of secret and top-secret information from the defense and intelligence communities by Manning and Snowden. Data exfiltration attacks may include, e.g., intentional user activity (e.g., a user impermissibly downloads sensitive data and removes the data from the enterprise) or automated activity (e.g., malicious software operates on behalf of or as a user with or without the user's knowledge). Unfortunately, the current state of the art for addressing such problems is quite limited.
  • 1. Monitoring of all System Data: Due to the large volume of data that is required to monitor ALL users and systems in an enterprise, this data (often only portions of the required data) is usually stored and analyzed off-line in a database or data warehouse. Unfortunately, this analysis only reveals issues after the fact, i.e., after any actual data exfiltration attempts have occurred and at such a point in time when it may be too late to take any action or to prevent damage from the exfiltration. In a best case, the offending user or system is still there and may be expected to perform such actions again, where they can be targeted for further analysis and prosecution. However, many times, the offending user or system is no longer present, the damage has already been done or it is otherwise too late to take effective action.
  • 2. Real-time Monitoring of Suspected Individuals: If an individual is suspected of malicious activity, then real-time monitoring mechanisms can be configured and installed to directly monitor that individual's activity and detect any malicious activity. These mechanisms, however, are time consuming to install and also require dedicated analysts to conduct the real-time monitoring and detection, often at great expense to the affected enterprise.
  • As noted above, there are significant problems with the above approaches. The above approaches are slow to react, may not catch a user in time, and have a hard time detecting malicious software that is installed and operating on behalf of a user without the user's knowledge or authorization. Furthermore, the above approaches to addressing data exfiltration problems are expensive to deploy and consume human analyst time and resources.
  • The insider threat remains one of the most significant problems confronting enterprises and government agencies of all sizes today. The threat is multi-faceted with a high degree of variability in the perpetrator, the type of attack, the intent of the attack, and the access means. No solution today adequately addresses the detection of insider threats due to the highly variable nature of the problem.
  • No existing systems or solutions takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these plurality of data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream alone.
  • Thus, it is desirable to provide a system and method for real-time detection of anomalies in database or application usage which are able to overcome the above disadvantages.
  • SUMMARY
  • Embodiments are directed to a method for real-time detection of anomalies. The method comprises: receiving a plurality of heterogeneous data streams, wherein the heterogeneous data streams are received from at least two of a group consisting of agents located at databases, agents located at applications, audit programs located at user workstations, and sensors located in, or at access points to, a network; correlating the heterogeneous data streams, wherein the correlation identifies corresponding events in different ones of the heterogeneous data streams; identifying patterns of events across the correlated heterogeneous data streams; building a model of normalcy from the identified pattern of events, wherein the model of normalcy is stored in an analysis database; creating rules that determine how and whether anomalies are detected, how a detected anomaly is treated and characterized, and what reaction to employ upon detection of the anomaly; receiving a plurality of additional heterogeneous data streams from the at least two of a group consisting of the agents, audit programs, and sensors; applying, using an analysis engine, the model of normalcy and rules to the additional heterogeneous data streams and analyzing data from the additional heterogeneous data streams against the model of normalcy and rules; detecting an anomaly in real-time by determining whether an anomalous event is present, by the application of the rules and whether events, in relation to other events within the additional heterogeneous data streams, fit or do not fit the model of normalcy; determining at least one characteristic of the detected anomaly; and issuing an alert upon detection of the anomaly, wherein a type of alert is determined based on the at least one determined characteristic of the detected anomaly. The detected anomaly is indicative of unauthorized manipulation or falsification of data, sabotage of a database, or exfiltration of data.
  • In an embodiment, the correlating is performed using a complex event processor for con elating the heterogeneous data streams and integrating the heterogeneous data streams into a single integrated data stream. The correlating may include synchronizing the time of each of the heterogeneous data streams in order to correlate events across time. The correlating may be performed by application of a small-space algorithm (SSA). In the applying step, the analysis engine may use the complex event processor for applying the model of normalcy and rules to the additional heterogeneous data streams and for analyzing data from the additional heterogeneous data streams against the model of normalcy and rules.
  • In an embodiment, the heterogeneous data streams and/or the additional heterogeneous data streams may be processed using an automatic event tabulator and correlator (AETAC) algorithm to reduce event data complexity and facilitate search, retrieval, and correlation of the event data to thereby produce uniform event data whereby complex event processing of the heterogeneous data streams and/or the additional heterogeneous data streams by the complex event processor is simplified.
  • In an embodiment, the identified pattern of events may be associated with ordinary, authorized, and benign database usage, workstation usage, user behavior or application usage.
  • In an embodiment, the heterogeneous data streams may be multi-modal asynchronous signals.
  • In an embodiment, one of the heterogeneous data streams corresponds to an ordinary, authorized, and benign database query and another of the heterogeneous data streams corresponds to an ordinary, authorized, and benign user interaction at the user workstation, and wherein the identified pattern of events is the ordinary, authorized, and benign database query and the ordinary, authorized, and benign user interaction at the user workstation. The detecting step may detect the anomaly when an event within the additional heterogeneous data streams corresponding to a database query is not preceded by a user interaction at the user workstation within a predetermined period of time resulting in the event not fitting the model of normalcy.
  • In an embodiment, the alert may comprise at least one of a group consisting of alarm message, a communication triggering further analysis and/or action, a command instructing the restriction or shutting down of an affected workstation, database, network or network access, initiation of additional targeted monitoring, analysis, and/or applications to capture additional detailed information regarding an attack, continued monitoring of a user, placement of a flag in a file for further follow-up, restricting access to a network, alerting security, and restricting or locking down a building or a portion of a building.
  • In an embodiment, in the receiving a plurality of heterogeneous data streams step, the heterogeneous data streams may be received by the analysis engine.
  • In an embodiment, the databases, applications, workstations, or networks may be in an enterprise environment.
  • Embodiments are also directed to system for real-time detection of anomalies. The system includes one or more computers, each computer including a processor and memory, wherein the memory includes instructions that are executed by the processor for performing the above-mentioned method.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The detailed description will refer to the following drawings, wherein like numerals refer to like elements, and wherein:
  • FIG. 1 is a block diagram illustrating an embodiment of a system for real-time detection of anomalies in database or application usage.
  • FIG. 2 is a block diagram illustrating an example of multiple data stream event correlation and anomaly detection.
  • FIGS. 3A and 3B are flowcharts illustrating an embodiment of a method for real-time detection of anomalies in database or application usage.
  • FIG. 4 is a block diagram illustrating exemplary hardware for implementing an embodiment of a system and method for real-time detection of anomalies in database or application usage.
  • FIG. 5 is a graph illustrating space compression (SC) versus epsilon.
  • FIG. 6 is a graph illustrating false negative rate (FNR) versus epsilon.
  • DETAILED DESCRIPTION
  • It is to be understood that the figures and descriptions of the present invention may have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for purposes of clarity, other elements found in a typical database system or typical method of using a database. Those of ordinary skill in the art will recognize that other elements may be desirable and/or required in order to implement the present invention. However, because such elements are well known in the art, and because they do not facilitate a better understanding of the present invention, a discussion of such elements is not provided herein. It is also to be understood that the drawings included herewith only provide diagrammatic representations of the presently preferred structures of the present invention and that structures falling within the scope of the present invention may include structures different than those shown in the drawings. Reference will now be made to the drawings wherein like structures are provided with like reference designations.
  • Described herein are embodiments of a system and method for real-time detection of anomalies in database or application usage. Embodiments overcome the problems described above. Embodiments address the above problems and improves the state of the art by providing an automated mechanism to continuously monitor ALL users and systems for abnormalities, and automatically alerting on violations and deviations from expected behaviors as they are occurring in real time without incurring heavy overhead expenses in human time and labor. Embodiments provide a system and method that takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream.
  • Embodiments provide a mechanism to detect anomalies in database access and usage, such as data exfiltration attempts, first by identifying correlations (e.g., patterns or models of normalcy) in events across different relevant yet heterogeneous data streams (such as those associated with ordinary, authorized and benign database usage, workstation usage, user behavior or application usage) and second by identifying deviations from these patterns or models of normalcy across data streams in real time as data is being accessed.
  • Embodiments identify and alert in real-time (i. e., as events are occurring, not after the fact) insider threat attacks targeting database and systems using databases, such as file storage and sharing systems. Such threats may involve unauthorized manipulation and falsification of data, sabotage of databases, and exfiltration of data. Data exfiltration refers to users (or agents and/or software systems acting on behalf of users, possibly unknown to the user) illicitly accessing, retrieving, and downloading data that is confidential and proprietary to an enterprise, often with the malicious intent of distributing the data outside the enterprise for personal gain or simply detriment to the enterprise.
  • Embodiments of a system and method for real-time detection of anomalies in database or application usage may include a mechanism and processes that provide and analyze, in real-time, a variety of heterogeneous streams within the enterprise comprising an amalgam of relevant events pertaining to data access, user behavior, and computer and network activity. Taken together, these event streams can identify anomalous system behavior that is indicative of insider threats and data exfiltration. Embodiments can identify anomalous system behavior that cannot be identified from analyzing any single event stream on its own.
  • Such relevant enterprise event streams that may be monitored and analyzed by embodiments described herein include, but are not limited to:
  • Data Access
      • File-system access
      • Meta-data concerning Database I/O
      • Content of database queries (such as SQL queries) and responses
  • User Behavior
      • User logon/logoff events
      • User keyboard/mouse events, including system commands and application interaction
      • Email activity (SMTP)
      • Web activity
  • Computer Activity
      • Process activity
      • Local I/O activity (e.g., USB ports)
  • Network Activity
      • packet capture (PCAP) data from user machine
      • PCAP data from servers
  • Using complex event processing (CEP) and unsupervised or semi-supervised machine learning techniques, embodiments of the system and method develop models of normalcy correlating the events derived from the selected enterprise streams (e.g., the above-identified streams) based on typical (authorized and benign) behavior of the users, computers, databases, applications, and networks. These models are then used by embodiments of the system and method to identify anomalies in the event streams that may be predictive or indicative of insider threat attacks, including unauthorized data manipulation, falsification, sabotage and data exfiltration.
  • A key inventive feature of embodiments described herein is the application of automated machine learning concepts for correlating events and detecting anomalies across heterogeneous data streams to the specific data streams relating to database, user, application, computer, and network activity in order to detect insider threats and data exfiltration attempts taking all the available information into account while it is happening in real time. Monitoring of any individual stream alone will not detect all anomalous events. Existing technologies do not provide these advantages. For example:
  • 1. Traditional insider threat detection systems such as Raytheon's InnerView and SureView™ record and store user and computer activity for off-line static analysis;
  • 2. Database monitoring: Stand-alone systems such as IBM® InfoSphere Guardium monitor and analyze database activity. However, such systems do not take other user, application, or network activity into account.
  • 3. User monitoring: Stand-alone systems such as Centrifytt DirectAudit monitor and analyze user activity in real time on a workstation or desktop. However, such systems do not take other application, database, or network activity into account.
  • 4. Network activity. Stand-alone systems such as SNORT monitor and analyze network activity in an enterprise network. Such systems do not take user activity, application, or database activity into account.
  • No system takes user, database, application, and network activity all into account at the same time while using event processing techniques to discover patterns of behavior and anomalies from across these data streams in real-time in order to detect anomalies that could not have been detected by monitoring any single data stream.
  • With reference now to FIG. 1, shown is a block diagram illustrating components of an embodiment of a system 100 for real-time detection of anomalies in database or application usage. System 100 may include an analysis engine 102 and an analysis database 104. Analysis engine 102 may receive data streams, such as the data streams described above. For example, analysis engine 102 may receive data streams indicative of user, database or other data access, application, computer and network behavior and activity. Such data streams may be generated and received from various agents, sensors and audit programs located at workstations, in networks or at network access points, data storage (e.g., database) locations, and data processing (e.g., application) locations.
  • For example, system 100 may include monitoring agents (A), such as IBM Guardium agents, located at (operating on) databases 106, agents (A) located at applications (e.g., SharePoint) 107, direct audit programs (DA), such as Centrify® DirectAudit, located at (operating on) user workstations 108, and network sensors (NS), such as OpenNMS, located in, or at access points to, a network(s) 110. Each type of agent, program or sensor may produce different types of data streams. For example, IBM® Guardium agents may generate data streams indicative of database interaction on a database server including a timestamp, client machine IP, database user ID, database server IP, and the database query (SQL query). Centrify® DirectAudit may generate a data stream indicative of user interaction on a user machine/workstation including a timestamp, machine user ID and user commands (e.g., as typed into TTY/Shell). The output of some agents, programs and sensors may be modified to work with embodiments described herein. For example, some agents, programs and sensors produce GUI output. Scripts and other mechanisms to extract relevant data and output to, e.g., syslogger, may be used. Analysis engine 102 may include a complex event processor (CEP) that correlates the multiple data streams and integrates such data streams into an integrated data stream. Such correlation may include synchronizing the time of each data stream in order to correlate events across time. The streams of data may include raw, meta and derived data. The CEP platform may ingest multi-modal asynchronous signals from heterogeneous sources. The analysis engines 102 CEP platform may then apply the models of normalcy to the integrated data stream.
  • With continuing reference to FIG. 1, analysis engine 102 may receive and process such data streams to (a) determine models or patterns of normalcy and (b) to analyze real-time behavior and activity against such models or patterns of normalcy in order to detect anomalies. In analyzing real-time behavior and activity, analysis engine 102 detects events in the data streams, compares the events, or more particularly, the patterns of events in the data streams, against the models of normalcy per rules that are based on such models and designed to enable the analysis engine 102 to determine when a variance from the model of normalcy is indicative of an anomaly, and, when an anomaly is detected as a result of such comparison and application of such rules, issues an alert. An alert may include an alarm message(s), communications to relevant personal triggering further analysis and/or action, commands instructing the shutting down of the affected workstation, database, network or network access, or starting of more targeted monitoring and analysis systems, applications and/or other efforts to capture more detailed information regarding an attack. Results of detection analysis may be stored in analysis database 104.
  • To build the models of normalcy, analysis engine 102 receives data streams that result from controlled, known typical (authorized and benign) behavior of the users, computers, databases, applications, and networks, analyzes such data streams to determine the pattern of events resulting from such typical behavior and builds a model of the patterns of events occurring during such typical behavior. Analysis engine 102 may generate the aforementioned rules based on the models of normalcy built from such patterns. Ordinarily, the greater the amount of such typical behavior that is analyzed and used to build models of normalcy, the larger number of typical patterns of events may be recognized and incorporated into the models. Likewise, greater refinement in the models may be achieved, for example by building models for subsets of typical behavior—e.g., typical behavior under normal operating conditions, typical behavior under emergency operating conditions, typical behavior under off-hours operating conditions, etc. In other words, embodiments of system 100 may build different models of normalcy that are applicable to different operating conditions and which are applied by analysis engine 102 according to the prevalent operating condition. The models of normalcy and rules may be stored in analysis database 104.
  • To summarize, embodiments may identify correlations or patterns of behavior through a variety of monitoring agents, programs and sensors and then identify anomalies by detecting deviations from the patterns in real-time.
  • Embodiments of system 100 may develop the models of normalcy and rules through machine learning techniques applied by the CEP platform. Such machine learning techniques may be unsupervised or semi-supervised. As system 100 operates, analysis engine 102 may continue to apply machine learning techniques to further update the models and rules.
  • Experiments demonstrate that anomalies often cannot be detected by looking at a single event. Rather, analyzing a pattern or sequence of events and comparing to other patterns or sequence of events can detect anomalies not detectable by looking at a single event. For example, a database query by itself would not appear to be anomalous. However, when most database queries are preceded by a user interaction (e.g., input into a keyboard), a database query that is not so preceded would likely be anomalous. To illustrate, consider the following scenario:
  • A user makes regular (authorized) queries to a database.
      • Keyboard is used to enter queries
  • An Advanced Persistent Threat (APT) malware is installed on the user's machine.
  • The APT makes an (unauthorized) query to the database.
      • Keyboard is NOT used
        Detecting the threat may include the analysis engine 102 recognizing that
  • Pattern: Database query is preceded by user interaction
  • Anomaly: Database query is NOT preceded by user interaction
  • Such pattern recognition often requires a correlation between the data streams received from different agents, programs and sensors. With reference now to FIG. 2, shown is an illustration of data streams received from a user X workstation audit program, e.g., Centrify® DirectAudit, and a database monitoring agent, e.g., IBM®s Guardium. As shown, the audit program stream shows the keyboard interactions of user X. The database monitoring agent stream may include numerous data queries, some from the user X workstation and others from other workstations, etc. The CEP platform of analysis engine 102 may correlate the keyboard interactions from the user X audit data stream with the data queries shown by the agent data stream. In so doing, analysis engine 102 may determine that one of the data queries from user X is not correlated with/preceded by a keyboard interaction on user x machine. If the applicable model of normalcy indicates that typical data queries are always preceded by/correlated with a keyboard interaction, analysis engine 102 may characterize the uncorrelated data query as an anomaly (and, therefore, issue an alert).
  • The correlation between events in different data streams may be quite granular. For example, a model or pattern of normalcy may dictate that a certain event from one data stream is always preceded within, e.g., five (5) seconds, by a certain event from a second data stream. Likewise, the model of normalcy may dictate that the event from one data stream is always preceded by one or more of a variety of events from a second data stream. Embodiments may apply a time window to detect correlations. Embodiments may increase or decrease the time window used in order to increase or decrease the potential number of correlations.
  • Embodiments of analysis engine 102 may apply a small-space algorithm (SSA) to process the data streams and correlate events. SSA is a new form of stream processing over distributed massive streams. SSA estimates frequently occurring items on a logarithmic space scale (tractable) and permits online extraction of persistent objects in a streaming network. SSA was developed by Prof. Srikanta Tirthapura at Iowa State University. SSA identifies persistent events in data streams. For the purposes of this description, an event is time-stamped data. A persistent event is time-stamped data that appears regularly over time. Characteristics of data streams typically require that all algorithms operate on data in a single pass. Events may be sparse, occur only infrequently, or even appear in different distributed streams. Embodiments may use statistical data sampling to reduce size of stream without overlooking persistent events. SSA determines associations between events in the data stream. Such associations may be temporal, spatial or generalized Associations over other metrics. SSA makes use of “Frequency Moments” that estimate the total number of objects in a stream without having to search the entire stream. Importantly, SSA can learn the persistent events in a data stream without any prior knowledge and without having to track all of the events in the data stream. Implementations of SSA perform association rule mining to exploit prior and collateral domain knowledge to increase the selectivity of event persistence detection. This decreases false negative errors and increase the ability to detect more transient events.
  • Embodiments identify anomalous behavior by finding associations (or correlations) between events occurring in different (distributed, heterogeneous) data streams. Embodiments identify “patterns of normalcy” and monitor events for disruption from these patterns. Data for making the predictions may come from sensor networks (e.g., agents, audit programs, network and other sensors) generating heterogeneous streams of observations (‘event’ loosely defined here as ‘time-stamped data’). A challenge is to detect and recognize, from sensor samples, precursor events for “hidden” spatiotemporal processes.
  • The following describes applicable metrics of SSA that are applicable to embodiments of the system and method for real-time detection of anomalies in database or application usage described herein. Different algorithms identify those events in a stream occurring with a given frequency value α (alpha). A naive algorithm identifies all a-persistent events given unlimited resources (memory). The SSA algorithm identifies most of the persistent events at an accuracy rate determined by a given ε value (epsilon). Space compression and false negative rate are directly proportional to the chosen epsilon value as shown below and in FIGS. 5 and 6.
  • A “naïve” algorithm is a baseline or obvious way to perform a task, contrasted with the present embodiment's algorithm, which improves, optimizes, or otherwise enhances the naive algorithm. The naive algorithm for SSA is merely to sample and count every packet in the stream. But these streams tend to be too large to be exhaustively sampled on current computing platforms, so SSA proposes a method involving subsampling to estimate the total counts, optimizing storage space at the expense of accuracy. The epsilon parameter is used to tune SSA in order to achieve the optimal trade-off between accuracy and storage space.
  • Space Compression (SC):
  • # of tuples created by naïve algorithm # of tuples created by SSA
  • False Negative Rate (FNR):
  • # of α - persistent objects reported by naïve algorithm that were not reported by SSA # of α - persistent objects reported by naïve algorithm
  • The ε value (epsilon) controls the trade-off between these two quantities:
      • High ε yields high compression, but also high FNR
      • Low ε yields low FNR, but also low compression
  • Caveats: Sensitive to the distribution of a-persistent items in the stream
  • α-persistent: a is percentage of monitored timeslots in which object occurs FIGS. 5 and 6 respectively graphically illustrate space compression (SC) and false negative rate (FNR) versus epsilon. As illustrated in FIGS. 5 and 6, these graphs show that SC and FNR increase roughly linearly with respect to epsilon. The graphs assume α=0.2 with insider threat events occurring in 18-minute intervals.
  • In certain implementations that utilize heterogeneous data streams, correlation of events received from a plurality of various heterogeneous data streams often requires certain processing, modification and manipulation of the raw stream data. This processing, etc., enables embodiments of the system and method for real-time detection of anomalies in database or application usage to correlate heterogeneous data streams, detect events, correlate events across data streams and otherwise perform real-time detection.
  • An embodiment of a system and method for processing of heterogeneous data streams may be referred to as an automatic event tabulator and correlator (AETAC). Embodiments of AETAC include an algorithm that automatically tabulates and correlates event data collected by sensors and other automated data collection devices. Embodiments of AETAC can process events of all types uniformly, even if the data definitions for each device are different (i.e., “heterogeneous”). Embodiments of AETAC operate by imposing a mathematic structure (“homomorphism”) on each event type that makes all events look the same to the tabulating device and then further imposing a requirement that this structure be preserved through successive processing steps (“closure”), such that all outputs from the tabulator have the same mathematical structure as the input events. This uniformity reduces data complexity and facilitates searches, retrievals and correlation of event data at any stage of processing, such that complex event processing (CEP) may be reduced to abstractions equivalent to evaluating simple mathematical expressions.
  • A purpose of AETAC is to improve the performance of complex event processing (CEP) systems that monitor large networks of sensors or other kinds of data collection devices. Embodiments of AETAC reduce or eliminate the need to write customized code for each device that collects data. Embodiments achieve this simplification by imposing several mild restrictions on the allowable formats of the data, which do not impede the functioning of the collection devices. The resulting uniformity makes it easier to compare events across space and time and, consequently, increases the overall “situational awareness” of networks organized under AETAC principles.
  • AETAC reduces complexity by imposing a simple structure on events that requires the data conform to these mild restrictions:
  • 1) The events recognized by the sensor can be described as lists of names and values;
  • 2) Each generated list is discrete and unique, in the sense that it bears a unique identifier and timestamp. (This is easily accomplished by using one-up sequences or hash codes);
  • 3) The lists can be decomposed into deterministic types, such that the composition of each list is fixed for each type. Note that this requirement is automatically fulfilled for most sensors, which use standardized packet protocols for defining data fields.
  • These restrictions establish an algebraic structure (homomorphic mapping with closure property) such that ID, TIMESTAMP, TYPE can serve as an intrinsic structure for any event. Additional fields generated are aggregated in such a manner that they form a DETAIL object, indexed by the ID. These DETAIL objects do not have to be uniquely defined, but are guaranteed to be a surjective mapping (one to many) or injective mapping (one to one) because the ID's are uniquely defined. DETAIL fields may be optional (equivalent to endomorphic mapping) if the ID, TIMESTAMP, TYPE fields suffice to define an event completely.
  • The closure property simply means that all transformations of events or results obtained through processing must also conform to the above restrictions. This guarantees that all operators used on the input events can also be applied recursively to outputs of transformations and other processing results.
  • For example, a classification of an event would generate a “classification event”→(ID, TIMESTAMP, CLASSIFICATION EVENT) with a DETAIL record containing the results of the classification.
  • Once events in a data stream have been defined, conforming to the above restrictions, it is then possible to process events arriving through multiple, heterogeneous channels using a basic AETAC algorithm. (Note: in embodiments, events must be ordered by timestamp such that each arrival channel is an ordered time-series):
      • Step 1: Select earliest event from available channels using a harmonized timestamp field;
      • Step 2: Assign a unique ID (to be used as unique key to DETAIL object, if any, step 4);
      • Step 3: Assign a TYPE to event tuple, insuring that event tuple is fixed with respect to TYPE;
      • Step 4: Aggregate any remaining fields in DETAIL object, indexed by ID (step 2); and
      • Step 5: Repeat steps 1 through 4 until all events are defined.
  • Complex event processing (CEP) generally processes many kinds of events from a variety of sensors and collection devices. Each device poses an integration problem involving writing software to accept, validating and interpreting the input data and determine what actions need to be done for each sensor.
  • Embodiments of AETAC simplify this process by providing a shared mathematical structure to standardize the creation of event handlers and optimizing the reuse of code for allowing many different kinds of devices to use exactly the same code to process its data.
  • Furthermore, due to the mathematical closure property of embodiments of AETAC, the results and outputs of processing will also share this same structure and so can allow each layer of the system to automatically feed to the next layer recursively.
  • An innovative concept of embodiments of AETAC is the application of algebraic structure on the inputs and outputs of event processing, which allows AETAC to be an automatic ‘tabulator’ device for events, which can be easily correlated with other events because of the shared structure and closure properties.
  • Existing systems tend to become specialized in the symbols and formats used to characterize each system. As such, existing systems are dependent on specialized features, which may not be uniformly or consistently applied.
  • An analogous example of this kind of specialization is Roman Numerals, where for historical reasons, natural numbers can be represented and manipulated mathematically as sequential combinations of these seven symbols: I,V,X,L,C,D,M. The laws for combining these symbols are not applied consistently. For example, the immediate successor to any number is usually obtained by concatenating the symbol “I” to the number. So the successor of “I” is “II,” and the successor of “V” is “VI”. The successors for “III” and “VIII,” however, are computed by prefixing “I” to the next larger symbol, namely “IV” and “IX,” respectively. Roman Numerals are not complete. There are no Roman symbols for representing zero or negative numbers. Consequently, there is no simple or automatic way to tabulate a collection of Roman Numerals into another collection of Roman Numerals without a lot of complicated rules and processing.
  • Modern Arabic numerals, on the other hand, are written using combinations of the ten symbols 0,1,2,3,4,5,6,7,8,9. The successor to any number is simply and consistently determined by addition tables which are applied to any range of numbers using simple rules of arithmetic. Further, the Arabic numeral system is complete in the sense that any real number can be expressed, including zero and negative numbers. All outputs of calculations can be used as inputs to succeeding calculations in a completely ‘mechanical’ fashion.
  • In this same sense, embodiments of AETAC impose an algebraic structure on event processing that makes the processing independent of the type of event being processed, and insures consistent and uniform processing of events, where outputs are new kinds of events requiring further processing.
  • With reference now to FIGS. 3A-3B, shown is are flowcharts illustrating an embodiment of a method 300 for real-time detection of anomalies in database or application usage. FIG. 3A illustrates portion or section of method 300 for real-time detection of anomalies in database or application usage that builds model of normalcy and rules as described above. FIG. 3B illustrates portion or section of method 300 for real-time detection of anomalies in database or application usage that applies model of normalcy and rules to detect anomalies. Method 300 may be repeated to continuously update the model of normalcy and rules through, e.g., machine learning techniques, as described above. Likewise, method 300 may be performed continuously as data streams are received to continuously detect anomalies in real-time. Method 300 may implement SSA as described to build model of normalcy and rules and to detect anomalies.
  • With continuing reference to FIG. 3A, method 300 receives a plurality of data streams, block 302. The received data streams may be heterogeneous data streams received from a plurality of agents, programs, and/or sensors, etc. The data streams are correlated, block 304. The correlation 304 may include processing, e.g., with an embodiment of AETAC. The correlation 304 identifies events in the various data streams. Method 300 identifies patterns of events across the various data streams, block 306. The patterns may provide indications of relations between events in different data streams under typical operating conditions. Method 300 builds/creates a model or pattern of normalcy from the identified patterns of events, block 308. Utilizing the model of normalcy, method 300 may build/create rules, block 310, that determine how and whether anomalies are detected, how method 300 treats, characterizes and reacts to a detected anomaly, etc. For example, an event that may be characterized as an anomaly when occurring off-hours may not be an anomaly or an anomaly worth issuing an alert if occurring during normal business hours. Method 300 may repeat 302-310, block 312, over time using machine learning techniques to continue to build and update 308 the model of normalcy and build and update 310 the rules.
  • With reference again to FIG. 3B, method 300 may apply the model of normalcy and rules to operational behavior to detect anomalies. Method 300 receives a plurality of data streams 314. The received data streams, typically heterogeneous from a plurality of agents, programs, and/or sensors, etc., are processed (e.g., using an embodiment of AETAC) and the data from the data streams is analyzed against the model of normalcy and the rules, block 316. Based on the rules and how events, in relation to other events, fit or do not fit the model of normalcy, method 300 determines whether events are anomalous, thereby detecting anomalies, block 318. If an anomaly is detected 318, method 300 may determine the characteristics of the anomalous event(s), block 320. For example, an anomaly may be a user accessing a secured server after hours. If the user does access a secured server from time to time after hours, such an anomaly may not trigger an alert. If, however, the user is accessing the server from his office after hours but the user did not “badge in” (i.e., user's employee badge was not read by security at entrance to the building), then the anomaly would trigger an alert. Consequently, the rules and the characteristics of an anomaly may determine whether to issue an alert, block 322. For example, an anomaly indicating improper access to a server may only trigger continued monitoring of the user. The characteristics of an anomaly may also determine what type of alert to issue. Some anomalies may require a simple flag in a file for further follow-up by a human agent monitoring anomalies. Other anomalies may require immediate action, such as restricting or shutting off access to a network, locking down a building or portion of a building, alerting security, etc. If method 300 determines to issue an alert and the type of alert is determined, the alert is issued, block 324. Method 300 may continue to repeat 314 to 324, block 326, so long as systems, etc., are being monitored.
  • With reference now to FIG. 4, shown is a block diagram of exemplary hardware that may be used to provide system 100 and perform method 300 for real-time detection of anomalies in database or application usage. Exemplary hardware implementation of system 100 may include multiple computing devices 400 (e.g., computing system N). Computing devices 400 may be, e.g., blade servers or other stack servers. For example, each component shown in system 100 may be implemented as software running on one or more computing devices 400. Alternatively, components and functionality of each may be combined and implemented as software running on a single computing device 400. Furthermore, steps of method 300 may be implemented as software modules executed on one or more computing devices 400.
  • Computing device 400 may include a memory 402, a secondary storage device 404, a processor 406, and a network connection 408. Computing device 400 may be connected a display device 410 (e.g., a terminal connected to multiple computing devices 400) and output device 412. Memory 402 may include RAM or similar types of memory, and it may store one or more applications (e.g., software for performing functions or including software modules described herein) for execution by processor 406. Secondary storage device 404 may include a hard disk drive, DVD-ROM drive, or other types of non-volatile data storage. Processor 406 executes the applications, which are stored in memory 402 or secondary storage 404, or received from the Internet or other network 414. Network connection 408 may include any device connecting computing device 400 to a network 414 and through which information is received and through which information (e.g., analysis results) is transmitted to other computing devices. Network connection 408 may include network connection providing connection to internal enterprise network, network connection provided connection to Internet or other similar connection. Network connection 408 may also include bus connections providing connections to other computing devices 400 in system 100 (e.g., other servers in server stack).
  • Display device 410 may include any type of device for presenting visual information such as, for example, a computer monitor or flat-screen display. Output device 412 may include any type of device for presenting a hard copy of information, such as a printer, and other types of output devices include speakers or any device for providing information in audio form. Computing device 400 may also include input device, such as keyboard or mouse, permitting direct input into computing device 400.
  • Computing device 400 may store a database structure in secondary storage 404 for example, for storing and maintaining information needed or used by the software stored on computing device 400. Also, processor 402 may execute one or more software applications in order to provide the functions described in this specification, specifically in the methods described above, and the processing may be implemented in software, such as software modules, for execution by computers or other machines. The processing may provide and support web pages and other user interfaces.
  • Although computing device 400 is depicted with various components, one skilled in the art will appreciate that the servers can contain additional or different components. In addition, although aspects of an implementation consistent with the above are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on or read from other types of computer program products or computer-readable media. The computer-readable media may include instructions for controlling a computer system, such as computing device 400, to perform a particular method, such as method 300.
  • More generally, even though the present disclosure and exemplary embodiments are described above with reference to the examples according to the accompanying drawings, it is to be understood that they are not restricted thereto. Rather, it is apparent to those skilled in the art that the disclosed embodiments can be modified in many ways without departing from the scope of the disclosure herein. Moreover, the terms and descriptions used herein are set forth by way of illustration only and are not meant as limitations. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the disclosure as defined in the following claims, and their equivalents, in which all terms are to be understood in their broadest possible sense unless otherwise indicated.

Claims (20)

What is claimed is:
1. A method for real-time detection of anomalies occurring in an enterprise computer network, comprising:
receiving a plurality of heterogeneous data streams from sources in the network, the sources including two levels, first level sources and second level sources,
wherein the first level sources include one or more selected from a group consisting of agents located at databases, agents located at applications, audit programs located at user workstations, sensors located in the network, and sensors located at access points to the network,
wherein the second level sources include one or more selected from a group consisting of data access, user behavior, computer activity and network activity, and
wherein the first level sources monitor event streams of the second level sources and generate data streams indicative of corresponding second level source activity in a uniform format;
processing the heterogeneous data streams obtained by combining at least two of the first level sources to identify events therein, each event being identified by at least a unique ID, a timestamp, and an event type, wherein the processing of the heterogeneous data streams includes combining at least two of the first level sources into a single data stream;
correlating the processed heterogeneous data streams to form an integrated data stream comprising a plurality of identified events;
detecting the existence and at least one characteristic of an anomaly in the computer network by application of a predetermined model of normalcy and one or more anomaly rules to the integrated data stream comprising the plurality of identified events; and
issuing an alert based on the at least one characteristic of the anomaly.
2. The method of claim 1 further comprising creating the predetermined model of normalcy and the one or more anomaly rules by:
receiving additional data comprising the plurality of heterogeneous data streams, wherein the additional data corresponds to authorized and benign usage of network resources;
processing the heterogeneous data streams to identify events therein, each even being identified by at least a unique ID, a timestamp, and an event type;
correlating the processed data streams to form an integrated data stream comprising a plurality of identified events;
identifying one or more patterns from relations between identified events comprising the integrated data stream; and
creating the model of normalcy and the one or more anomaly rules based on the identified one or more patterns.
3. The method of claim 1 wherein the one or more anomaly rules relate to at least one of how and whether anomalies are detected, how a detected anomaly is treated and characterized, and what reaction to employ in response to the detected anomaly.
4. The method of claim 1 further comprising:
estimating a number or frequency of one or more event types in the processed data stream without searching the entire processed data stream; and
determining one or more temporal, spatial, or generalized associations between a plurality of events in the processed data stream.
5. The method of claim 1 wherein the detected anomaly is indicative of unauthorized manipulation or falsification of data, sabotage of a database, or exfiltration of data.
6. The method of claim 1 wherein the heterogeneous data streams comprise multi-modal asynchronous signals.
7. The method of claim 1 wherein the program code includes an algorithm that detects and extracts persistent events among the plurality of identified events in at least one of the plurality of heterogeneous data streams, and wherein the persistent events are time-stamped data that appear regularly over time.
8. The method of claim 7 wherein the persistent events appear in different distributed streams among the plurality of heterogeneous data streams.
9. The method of claim 7 wherein the at least one of the plurality of heterogeneous data streams is statistically sampled to reduce stream size of the at least one of the plurality of heterogeneous data streams, without overlooking the persistent events.
10. A method for real-time detection of anomalies occurring in an enterprise computer network, comprising:
receiving a plurality of heterogeneous data streams from sources in the network, the sources including two levels, first level sources and second level sources,
wherein the first level sources include one or more selected from a group consisting of agents located at databases; agents located at applications; audit programs located at user workstations; sensors located in the network; and sensors located at access points to the network,
wherein the second level sources include one or more selected from a group consisting of data access, user behavior, computer activity and network activity, and wherein the first level sources monitor event streams of the second level sources and generate data streams indicative of corresponding second level source activity in a uniform format;
processing the heterogeneous data streams obtained by combining at least two of the first level sources to identify events therein, each event being identified by at least a unique ID, a timestamp, and an event type, wherein the processing of the heterogeneous data streams includes:
combining at least two of the first level sources into a single data stream; and
operating on the single data stream using an algorithm that identifies spatiotemporal relationships;
correlating the processed heterogeneous data streams to form an integrated data stream comprising a plurality of identified events;
detecting the existence and at least one characteristic of an anomaly in the computer network by application of a predetermined model of normalcy and one or more anomaly rules to the integrated data stream comprising the plurality of identified events; and
issuing an alert based on the at least one characteristic of the anomaly.
11. The method of claim 10 further comprising creating the predetermined model of normalcy and the one or more anomaly rules by:
receiving additional data comprising the plurality of heterogeneous data streams, wherein the additional data corresponds to authorized and benign usage of network resources;
processing the heterogeneous data streams to identify events therein, each even being identified by at least a unique ID, a timestamp, and an event type;
correlating the processed data streams to form an integrated data stream comprising a plurality of identified events;
identifying one or more patterns from relations between identified events comprising the integrated data stream; and
creating the model of normalcy and the one or more anomaly rules based on the identified one or more patterns.
12. The method of claim 10 wherein the one or more anomaly rules relate to at least one of how and whether anomalies are detected, how a detected anomaly is treated and characterized, and what reaction to employ in response to the detected anomaly.
13. The method of claim 10 further comprising:
estimating a number or frequency of one or more event types in the processed data stream without searching the entire processed data stream; and
determining one or more temporal, spatial, or generalized associations between a plurality of events in the processed data stream.
14. The method of claim 10 wherein the detected anomaly is indicative of unauthorized manipulation or falsification of data, sabotage of a database, or exfiltration of data.
15. The method of claim 10 wherein the heterogeneous data streams comprise multi-modal asynchronous signals.
16. The method of claim 10 wherein the program code includes an algorithm that detects and extracts persistent events among the plurality of identified events in at least one of the plurality of heterogeneous data streams, and wherein the persistent events are time-stamped data that appear regularly over time.
17. The method of claim 16 wherein the persistent events appear in different distributed streams among the plurality of heterogeneous data streams.
18. The method of claim 16 wherein the at least one of the plurality of heterogeneous data streams is statistically sampled to reduce stream size of the at least one of the plurality of heterogeneous data streams, without overlooking the persistent events.
19. A method for real-time detection of anomalies occurring in a computer network, comprising:
receiving a plurality of heterogeneous data streams from sources in the network, the sources including first level sources and second level sources, wherein the first level sources include one or more selected from a group consisting of agents located at databases, agents located at applications, audit programs located at user workstations, sensors located in the network, and sensors located at access points to the network; wherein the second level sources include event streams to be analyzed,
wherein the first level sources monitor the event streams of the second level sources and generate data streams indicative of corresponding second level source activity in a uniform format, and
wherein each of the heterogeneous data streams is obtained by combining at least two of the first level sources into a data stream;
processing the heterogeneous data streams to identify events therein, each event being identified by at least a unique ID, a timestamp, and an event type;
correlating the processed heterogeneous data streams to form an integrated data stream comprising a plurality of identified events;
detecting the existence and at least one characteristic of an anomaly in the computer network by application of a predetermined model of normalcy and one or more anomaly rules to the integrated data stream comprising the plurality of identified events; and
issuing an alert based on the at least one characteristic of the anomaly.
20. The method of claim 19 wherein the second level sources include one or more selected from the group consisting of data access, user behavior, computer activity and network activity.
US16/562,950 2014-06-09 2019-09-06 System and method for real-time detection of anomalies in database usage Abandoned US20200026594A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/562,950 US20200026594A1 (en) 2014-06-09 2019-09-06 System and method for real-time detection of anomalies in database usage

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201462009736P 2014-06-09 2014-06-09
US14/732,162 US10409665B2 (en) 2014-06-09 2015-06-05 System and method for real-time detection of anomalies in database usage
US16/562,950 US20200026594A1 (en) 2014-06-09 2019-09-06 System and method for real-time detection of anomalies in database usage

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/732,162 Continuation US10409665B2 (en) 2014-06-09 2015-06-05 System and method for real-time detection of anomalies in database usage

Publications (1)

Publication Number Publication Date
US20200026594A1 true US20200026594A1 (en) 2020-01-23

Family

ID=54769650

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/732,162 Active 2035-08-06 US10409665B2 (en) 2014-06-09 2015-06-05 System and method for real-time detection of anomalies in database usage
US16/562,950 Abandoned US20200026594A1 (en) 2014-06-09 2019-09-06 System and method for real-time detection of anomalies in database usage

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/732,162 Active 2035-08-06 US10409665B2 (en) 2014-06-09 2015-06-05 System and method for real-time detection of anomalies in database usage

Country Status (3)

Country Link
US (2) US10409665B2 (en)
EP (1) EP3152697A4 (en)
WO (1) WO2015191394A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11288494B2 (en) 2020-01-29 2022-03-29 Bank Of America Corporation Monitoring devices at enterprise locations using machine-learning models to protect enterprise-managed information and resources

Families Citing this family (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10311171B2 (en) 2015-03-02 2019-06-04 Ca, Inc. Multi-component and mixed-reality simulation environments
US9639443B2 (en) * 2015-03-02 2017-05-02 Ca, Inc. Multi-component and mixed-reality simulation environments
US10320813B1 (en) * 2015-04-30 2019-06-11 Amazon Technologies, Inc. Threat detection and mitigation in a virtualized computing environment
US9665460B2 (en) * 2015-05-26 2017-05-30 Microsoft Technology Licensing, Llc Detection of abnormal resource usage in a data center
US10671470B2 (en) * 2015-06-11 2020-06-02 Instana, Inc. Application performance management system with dynamic discovery and extension
US10341355B1 (en) * 2015-06-23 2019-07-02 Amazon Technologies, Inc. Confidential malicious behavior analysis for virtual computing resources
US9699205B2 (en) 2015-08-31 2017-07-04 Splunk Inc. Network security system
US10462116B1 (en) * 2015-09-15 2019-10-29 Amazon Technologies, Inc. Detection of data exfiltration
US10708151B2 (en) * 2015-10-22 2020-07-07 Level 3 Communications, Llc System and methods for adaptive notification and ticketing
US10437831B2 (en) * 2015-10-29 2019-10-08 EMC IP Holding Company LLC Identifying insider-threat security incidents via recursive anomaly detection of user behavior
US10043026B1 (en) * 2015-11-09 2018-08-07 8X8, Inc. Restricted replication for protection of replicated databases
US10021120B1 (en) 2015-11-09 2018-07-10 8X8, Inc. Delayed replication for protection of replicated databases
US10003607B1 (en) * 2016-03-24 2018-06-19 EMC IP Holding Company LLC Automated detection of session-based access anomalies in a computer network through processing of session data
US10079842B1 (en) 2016-03-30 2018-09-18 Amazon Technologies, Inc. Transparent volume based intrusion detection
US10142290B1 (en) 2016-03-30 2018-11-27 Amazon Technologies, Inc. Host-based firewall for distributed computer systems
US10148675B1 (en) 2016-03-30 2018-12-04 Amazon Technologies, Inc. Block-level forensics for distributed computing systems
US10320750B1 (en) 2016-03-30 2019-06-11 Amazon Technologies, Inc. Source specific network scanning in a distributed environment
US10333962B1 (en) * 2016-03-30 2019-06-25 Amazon Technologies, Inc. Correlating threat information across sources of distributed computing systems
US10178119B1 (en) * 2016-03-30 2019-01-08 Amazon Technologies, Inc. Correlating threat information across multiple levels of distributed computing systems
CN107291586B (en) * 2016-04-01 2021-04-27 腾讯科技(深圳)有限公司 Application program analysis method and device
US10061925B2 (en) * 2016-06-20 2018-08-28 Sap Se Detecting attacks by matching of access frequencies and sequences in different software layers
US10063579B1 (en) * 2016-06-29 2018-08-28 EMC IP Holding Company LLC Embedding the capability to track user interactions with an application and analyzing user behavior to detect and prevent fraud
US10200262B1 (en) 2016-07-08 2019-02-05 Splunk Inc. Continuous anomaly detection service
US10146609B1 (en) 2016-07-08 2018-12-04 Splunk Inc. Configuration of continuous anomaly detection service
US10536476B2 (en) * 2016-07-21 2020-01-14 Sap Se Realtime triggering framework
US10482241B2 (en) 2016-08-24 2019-11-19 Sap Se Visualization of data distributed in multiple dimensions
US10542016B2 (en) 2016-08-31 2020-01-21 Sap Se Location enrichment in enterprise threat detection
US11348016B2 (en) * 2016-09-21 2022-05-31 Scianta Analytics, LLC Cognitive modeling apparatus for assessing values qualitatively across a multiple dimension terrain
US20180082190A1 (en) * 2016-09-21 2018-03-22 Scianta Analytics, LLC System for dispatching cognitive computing across multiple workers
US10673879B2 (en) 2016-09-23 2020-06-02 Sap Se Snapshot of a forensic investigation for enterprise threat detection
US10630705B2 (en) 2016-09-23 2020-04-21 Sap Se Real-time push API for log events in enterprise threat detection
US10534908B2 (en) 2016-12-06 2020-01-14 Sap Se Alerts based on entities in security information and event management products
US10534907B2 (en) 2016-12-15 2020-01-14 Sap Se Providing semantic connectivity between a java application server and enterprise threat detection system using a J2EE data
US10530792B2 (en) 2016-12-15 2020-01-07 Sap Se Using frequency analysis in enterprise threat detection to detect intrusions in a computer system
US11470094B2 (en) 2016-12-16 2022-10-11 Sap Se Bi-directional content replication logic for enterprise threat detection
US10552605B2 (en) 2016-12-16 2020-02-04 Sap Se Anomaly detection in enterprise threat detection
US10764306B2 (en) 2016-12-19 2020-09-01 Sap Se Distributing cloud-computing platform content to enterprise threat detection systems
US10643137B2 (en) 2016-12-23 2020-05-05 Cerner Innovation, Inc. Integrating flexible rule execution into a near real-time streaming environment
US10462199B2 (en) * 2016-12-23 2019-10-29 Cerner Innovation, Inc. Intelligent and near real-time monitoring in a streaming environment
US10205735B2 (en) 2017-01-30 2019-02-12 Splunk Inc. Graph-based network security threat detection across time and entities
US10489584B2 (en) * 2017-02-14 2019-11-26 Microsoft Technology Licensing, Llc Local and global evaluation of multi-database system
US11237939B2 (en) * 2017-03-01 2022-02-01 Visa International Service Association Predictive anomaly detection framework
US10733079B2 (en) * 2017-05-31 2020-08-04 Oracle International Corporation Systems and methods for end-to-end testing of applications using dynamically simulated data
US10530794B2 (en) 2017-06-30 2020-01-07 Sap Se Pattern creation in enterprise threat detection
US11106996B2 (en) * 2017-08-23 2021-08-31 Sap Se Machine learning based database management
US11025693B2 (en) 2017-08-28 2021-06-01 Banjo, Inc. Event detection from signal data removing private information
US10581945B2 (en) * 2017-08-28 2020-03-03 Banjo, Inc. Detecting an event from signal data
US10313413B2 (en) 2017-08-28 2019-06-04 Banjo, Inc. Detecting events from ingested communication signals
US10587484B2 (en) * 2017-09-12 2020-03-10 Cisco Technology, Inc. Anomaly detection and reporting in a network assurance appliance
US10635565B2 (en) * 2017-10-04 2020-04-28 Servicenow, Inc. Systems and methods for robust anomaly detection
US10733180B2 (en) * 2017-11-13 2020-08-04 Lendingclub Corporation Communication graph tracking of multi-system operations in heterogeneous database systems
US10986111B2 (en) 2017-12-19 2021-04-20 Sap Se Displaying a series of events along a time axis in enterprise threat detection
US10681064B2 (en) 2017-12-19 2020-06-09 Sap Se Analysis of complex relationships among information technology security-relevant entities using a network graph
US10970395B1 (en) * 2018-01-18 2021-04-06 Pure Storage, Inc Security threat monitoring for a storage system
US11144638B1 (en) * 2018-01-18 2021-10-12 Pure Storage, Inc. Method for storage system detection and alerting on potential malicious action
US11010233B1 (en) 2018-01-18 2021-05-18 Pure Storage, Inc Hardware-based system monitoring
US10585724B2 (en) 2018-04-13 2020-03-10 Banjo, Inc. Notifying entities of relevant events
US11055417B2 (en) * 2018-04-17 2021-07-06 Oracle International Corporation High granularity application and data security in cloud environments
US11146467B2 (en) 2018-08-30 2021-10-12 Streamworx.Ai Inc. Systems, methods and computer program products for scalable, low-latency processing of streaming data
FR3086407B1 (en) * 2018-09-21 2021-08-13 Continental Automotive France ANOMALY IDENTIFICATION PROCESS FOR VEHICLE
US10749768B2 (en) 2018-11-02 2020-08-18 Cisco Technology, Inc. Using a multi-network dataset to overcome anomaly detection cold starts
US10776231B2 (en) * 2018-11-29 2020-09-15 International Business Machines Corporation Adaptive window based anomaly detection
US11700270B2 (en) * 2019-02-19 2023-07-11 The Aerospace Corporation Systems and methods for detecting a communication anomaly
US11303653B2 (en) * 2019-08-12 2022-04-12 Bank Of America Corporation Network threat detection and information security using machine learning
US11687418B2 (en) 2019-11-22 2023-06-27 Pure Storage, Inc. Automatic generation of recovery plans specific to individual storage elements
US11941116B2 (en) 2019-11-22 2024-03-26 Pure Storage, Inc. Ransomware-based data protection parameter modification
US11500788B2 (en) 2019-11-22 2022-11-15 Pure Storage, Inc. Logical address based authorization of operations with respect to a storage system
US11520907B1 (en) 2019-11-22 2022-12-06 Pure Storage, Inc. Storage system snapshot retention based on encrypted data
US11720714B2 (en) 2019-11-22 2023-08-08 Pure Storage, Inc. Inter-I/O relationship based detection of a security threat to a storage system
US11341236B2 (en) 2019-11-22 2022-05-24 Pure Storage, Inc. Traffic-based detection of a security threat to a storage system
US11645162B2 (en) 2019-11-22 2023-05-09 Pure Storage, Inc. Recovery point determination for data restoration in a storage system
US11615185B2 (en) 2019-11-22 2023-03-28 Pure Storage, Inc. Multi-layer security threat detection for a storage system
US11657155B2 (en) 2019-11-22 2023-05-23 Pure Storage, Inc Snapshot delta metric based determination of a possible ransomware attack against data maintained by a storage system
US11675898B2 (en) 2019-11-22 2023-06-13 Pure Storage, Inc. Recovery dataset management for security threat monitoring
US11720692B2 (en) 2019-11-22 2023-08-08 Pure Storage, Inc. Hardware token based management of recovery datasets for a storage system
US11651075B2 (en) 2019-11-22 2023-05-16 Pure Storage, Inc. Extensible attack monitoring by a storage system
US11625481B2 (en) 2019-11-22 2023-04-11 Pure Storage, Inc. Selective throttling of operations potentially related to a security threat to a storage system
US11755751B2 (en) 2019-11-22 2023-09-12 Pure Storage, Inc. Modify access restrictions in response to a possible attack against data stored by a storage system
US11321213B2 (en) 2020-01-16 2022-05-03 Vmware, Inc. Correlation key used to correlate flow and con text data
US11675896B2 (en) 2020-04-09 2023-06-13 International Business Machines Corporation Using multimodal model consistency to detect adversarial attacks
US11931127B1 (en) 2021-04-08 2024-03-19 T-Mobile Usa, Inc. Monitoring users biological indicators using a 5G telecommunication network
US11307915B1 (en) 2021-04-29 2022-04-19 International Business Machines Corporation Grouping anomalous components of a distributed application
CN113238666B (en) * 2021-05-24 2024-01-23 江苏科技大学 Prediction method of ship motion attitude of GRU (generic routing framework) optimized based on sparrow search algorithm
US20230011957A1 (en) * 2021-07-09 2023-01-12 Vmware, Inc. Detecting threats to datacenter based on analysis of anomalous events
CN113612779A (en) * 2021-08-05 2021-11-05 杭州中尔网络科技有限公司 Advanced sustainable attack behavior detection method based on flow information
US11848766B2 (en) * 2021-10-30 2023-12-19 Hewlett Packard Enterprise Development Lp Session detection and inference
US20230153420A1 (en) * 2021-11-16 2023-05-18 Saudi Arabian Oil Company Sql proxy analyzer to detect and prevent unauthorized sql queries
CN114285627B (en) * 2021-12-21 2023-12-22 安天科技集团股份有限公司 Flow detection method and device, electronic equipment and computer readable storage medium
US20240054124A1 (en) * 2022-08-15 2024-02-15 At&T Intellectual Property I, L.P. Machine learning-based database integrity verification
CN115292561B (en) * 2022-10-08 2023-02-28 国网江西省电力有限公司信息通信分公司 Power grid measurement data dynamic collection method, system and storage medium

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7225343B1 (en) * 2002-01-25 2007-05-29 The Trustees Of Columbia University In The City Of New York System and methods for adaptive model generation for detecting intrusions in computer systems
EP1661047B1 (en) * 2003-08-11 2017-06-14 Triumfant, Inc. Systems and methods for automated computer support
US7089250B2 (en) * 2003-10-08 2006-08-08 International Business Machines Corporation Method and system for associating events
US7506307B2 (en) * 2003-10-24 2009-03-17 Microsoft Corporation Rules definition language
US20050203881A1 (en) * 2004-03-09 2005-09-15 Akio Sakamoto Database user behavior monitor system and method
GB0406401D0 (en) * 2004-03-22 2004-04-21 British Telecomm Anomaly management scheme for a multi-agent system
US20060236395A1 (en) * 2004-09-30 2006-10-19 David Barker System and method for conducting surveillance on a distributed network
US7558796B1 (en) * 2005-05-19 2009-07-07 Symantec Corporation Determining origins of queries for a database intrusion detection system
US20070291118A1 (en) * 2006-06-16 2007-12-20 Shu Chiao-Fe Intelligent surveillance system and method for integrated event based surveillance
US7979744B2 (en) * 2006-12-04 2011-07-12 Electronics And Telecommunications Research Institute Fault model and rule based fault management apparatus in home network and method thereof
US8041996B2 (en) * 2008-01-11 2011-10-18 Alcatel Lucent Method and apparatus for time-based event correlation
US8358909B2 (en) * 2008-02-26 2013-01-22 Microsoft Corporation Coordinated output of messages and content
US8230269B2 (en) * 2008-06-17 2012-07-24 Microsoft Corporation Monitoring data categorization and module-based health correlations
WO2011046228A1 (en) * 2009-10-15 2011-04-21 日本電気株式会社 System operation management device, system operation management method, and program storage medium
US8838779B2 (en) * 2009-11-04 2014-09-16 International Business Machines Corporation Multi-level offload of model-based adaptive monitoring for systems management
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US8589475B2 (en) * 2010-01-28 2013-11-19 Hewlett-Packard Development Company, L.P. Modeling a cloud computing system
US8504876B2 (en) * 2010-04-30 2013-08-06 The Mitre Corporation Anomaly detection for database systems
CN103026345B (en) * 2010-06-02 2016-01-20 惠普发展公司,有限责任合伙企业 For the dynamic multidimensional pattern of event monitoring priority
US9507683B2 (en) * 2011-05-20 2016-11-29 International Business Machines Corporation Monitoring service in a distributed platform
US9223632B2 (en) * 2011-05-20 2015-12-29 Microsoft Technology Licensing, Llc Cross-cloud management and troubleshooting
US20130019309A1 (en) 2011-07-12 2013-01-17 Raytheon Bbn Technologies Corp. Systems and methods for detecting malicious insiders using event models
WO2013055311A1 (en) * 2011-10-10 2013-04-18 Hewlett-Packard Development Company, L.P. Methods and systems for identifying action for responding to anomaly in cloud computing system
US8793790B2 (en) 2011-10-11 2014-07-29 Honeywell International Inc. System and method for insider threat detection
US8996690B1 (en) * 2011-12-29 2015-03-31 Emc Corporation Time-based analysis of data streams
US9058263B2 (en) * 2012-04-24 2015-06-16 International Business Machines Corporation Automated fault and recovery system
US8862727B2 (en) * 2012-05-14 2014-10-14 International Business Machines Corporation Problem determination and diagnosis in shared dynamic clouds
US10237290B2 (en) 2012-06-26 2019-03-19 Aeris Communications, Inc. Methodology for intelligent pattern detection and anomaly detection in machine to machine communication network
US8914317B2 (en) * 2012-06-28 2014-12-16 International Business Machines Corporation Detecting anomalies in real-time in multiple time series data with automated thresholding
WO2014088559A1 (en) * 2012-12-04 2014-06-12 Hewlett-Packard Development Company, L.P. Determining suspected root causes of anomalous network behavior
US9449278B2 (en) * 2013-04-12 2016-09-20 Apple Inc. Cloud-based diagnostics and remediation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11288494B2 (en) 2020-01-29 2022-03-29 Bank Of America Corporation Monitoring devices at enterprise locations using machine-learning models to protect enterprise-managed information and resources
US11763548B2 (en) 2020-01-29 2023-09-19 Bank Of America Corporation Monitoring devices at enterprise locations using machine-learning models to protect enterprise-managed information and resources
US11763547B2 (en) 2020-01-29 2023-09-19 Bank Of America Corporation Monitoring devices at enterprise locations using machine-learning models to protect enterprise-managed information and resources
US11790638B2 (en) 2020-01-29 2023-10-17 Bank Of America Corporation Monitoring devices at enterprise locations using machine-learning models to protect enterprise-managed information and resources

Also Published As

Publication number Publication date
EP3152697A1 (en) 2017-04-12
US20150355957A1 (en) 2015-12-10
US10409665B2 (en) 2019-09-10
WO2015191394A1 (en) 2015-12-17
EP3152697A4 (en) 2018-04-11

Similar Documents

Publication Publication Date Title
US20200026594A1 (en) System and method for real-time detection of anomalies in database usage
Bhatt et al. The operational role of security information and event management systems
US8805995B1 (en) Capturing data relating to a threat
Kholidy Detecting impersonation attacks in cloud computing environments using a centric user profiling approach
EP3079337A1 (en) Event correlation across heterogeneous operations
JP7302019B2 (en) Hierarchical Behavior Modeling and Detection Systems and Methods for System-Level Security
US11522895B2 (en) Anomaly detection
JP7120350B2 (en) SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM
CN111726357A (en) Attack behavior detection method and device, computer equipment and storage medium
US10462170B1 (en) Systems and methods for log and snort synchronized threat detection
US20190044965A1 (en) Systems and methods for discriminating between human and non-human interactions with computing devices on a computer network
Albanese et al. Recognizing unexplained behavior in network traffic
US10972484B1 (en) Enriching malware information for use with network security analysis and malware detection
EP3660719A1 (en) Method for detecting intrusions in an audit log
US10951645B2 (en) System and method for prevention of threat
US10262133B1 (en) System and method for contextually analyzing potential cyber security threats
RU148692U1 (en) COMPUTER SECURITY EVENTS MONITORING SYSTEM
Claycomb et al. Identifying indicators of insider threats: Insider IT sabotage
Lambert II Security analytics: Using deep learning to detect Cyber Attacks
CN113901441A (en) User abnormal request detection method, device, equipment and storage medium
Elshoush et al. Intrusion alert correlation framework: An innovative approach
Mohammad et al. A novel local network intrusion detection system based on support vector machine
CN111104670B (en) APT attack identification and protection method
CN112287340B (en) Evidence obtaining and tracing method and device for terminal attack and computer equipment
WO2021170249A1 (en) Cyberattack identification in a network environment

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION