US20190130101A1 - Methods and apparatus for detecting a side channel attack using hardware performance counters - Google Patents
Methods and apparatus for detecting a side channel attack using hardware performance counters Download PDFInfo
- Publication number
- US20190130101A1 US20190130101A1 US16/234,144 US201816234144A US2019130101A1 US 20190130101 A1 US20190130101 A1 US 20190130101A1 US 201816234144 A US201816234144 A US 201816234144A US 2019130101 A1 US2019130101 A1 US 2019130101A1
- Authority
- US
- United States
- Prior art keywords
- machine learning
- learning model
- data
- hardware performance
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/53—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/034—Test or assess a computer or a system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/045—Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
Definitions
- This disclosure relates generally to anomaly detection, and, more particularly, to methods and apparatus for detecting a side channel attack using hardware performance counters.
- micro-architectural side channel attacks have evolved from theoretical attacks on cryptographic algorithm implementations to highly practical generic attack primitives.
- vulnerabilities such as the Meltdown and Spectre attacks exploit vulnerabilities in modern processors and break memory isolation among processes or privilege layers to gain access to data from other applications and/or the operating system (OS).
- OS operating system
- data may include passwords stored in a password manager or browser, personal photos, emails, instant messages, and even business-critical documents.
- Side channel attacks exploit the fact that hardware resources are physically shared among processes running in different isolation domains.
- FIG. 1 is a block diagram of an example system constructed in accordance with teachings of this disclosure for detecting a side channel attack using hardware performance counters.
- FIG. 2 is a block diagram of an example Gated Recurrent Unit used to detect a side channel attack.
- FIG. 3 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 .
- FIG. 4 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to gather time-series Hardware Performance Counter (HPC) data.
- HPC Hardware Performance Counter
- FIG. 5 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to train a machine learning model on time-series HPC data.
- FIG. 6 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to determine an anomaly detection threshold.
- FIG. 7 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to perform anomaly detection using the machine learning model and anomaly detection threshold against time-series HPC data.
- FIG. 8 is a block diagram of an example processing platform structured to execute the instructions of FIGS. 3, 4, 5, 6 , and/or 7 to implement the example side channel anomaly detector of FIG. 1 .
- SCA Cache Side Channel Attacks
- VMs processes/virtual machines
- a cache of the central processing unit (CPU) is one of the most dangerous shared resources since the CPU cache is shared by all of the cores in a CPU package.
- the CPU cache represents a possible attack vector to perform fine-grained, high-bandwidth, low-noise cross-core attacks.
- Example approaches disclosed herein utilize a lightweight anomaly detection framework for detection of cache side channel attacks.
- Example approaches disclosed herein utilize a machine learning algorithm to perform time-series analysis of Hardware Performance Counter (HPC) data, and develops an anomaly detection model using stacked gated recurrent units (GRU) to detect cache side channel attacks.
- the stacked GRUs are built on the multivariate time-series of the hardware performance counters rather than built on a single time-series of the HPC.
- attack data is not used for training of such anomaly detection models (but may be used for determination of anomaly detection thresholds).
- the anomaly detection approach is generalizable to detect newly evolved unseen attacks.
- the example machine-learning framework utilized herein is based on anomaly detection of time-series hardware performance counter data, and can be used for runtime detection of cache side channel attacks.
- the example framework utilizes four main activities: (1) collect hardware performance counters, (2) train a machine learning model, (3) determine an anomaly detection threshold, and (4) detect an anomaly in time-series data using the trained model and anomaly detection threshold.
- the machine learning model uses one-class anomaly detection, which can effectively detect attacks not seen before. As a result, the system possesses a degree of resiliency against newly evolved attacks.
- example approaches disclosed herein utilize multivariate time-series processing and prediction, which does not require the use of one model per time-series of hardware performance counters. As a result, such processing can all be performed at once, instead of having to perform each separate time series using separate models.
- the machine learning model is implemented as a stacked GRU.
- other types of machine learning models such as, for example, a long short-term memory (LSTM) recurrent neural network (RNN) may additionally or alternatively be used.
- LSTM long short-term memory
- RNN recurrent neural network
- a stacked GRU implementation is more resource efficient and faster than approaches that utilize an LSTM-based machine learning model. In some examples, such increased resource efficiency comes at the cost of decreased accuracy. Since the LSTM-based architecture sometimes produces higher accuracy than the GRU-based architecture, but the GRU-based architecture includes fewer gates and, as a result, can be executed more quickly.
- the stacked GRU-based architecture may be used to first predict a probability of observing the error being slightly above the detection threshold. Example approaches may then utilize the LSTM-based architecture for further analysis of whether an anomaly has been detected.
- FIG. 1 is a block diagram of an example system constructed in accordance with teachings of this disclosure for detecting a side channel attack.
- a machine-learning based detection system is used to detect speculative and traditional cache side channel attacks based on changes in values of hardware performance counters of a computing system.
- the example system 100 of FIG. 1 includes a side channel anomaly detector 102 , a processor 105 , and an operating system/virtual machine manager (VMM) 110 .
- the example processor 105 includes one or more hardware process counter(s) 108 that are utilized by processes executing on the processor 105 .
- the example system 100 of the illustrated example of FIG. 1 shows a benign process 112 , an attack process 114 and an unknown process 116 . Such processes 112 , 114 , 116 may be executed at the direction of the OS/VMM 110 .
- the example processor 105 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor.
- a logic circuit such as, for example, a hardware processor.
- any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)), field programmable logic device(s) (FPLD(s)), digital signal processor(s) (DSP(s)), etc.
- hardware performance counter(s) 108 included in the processor 105 include one or more registers of the processor 105 that stores counts of hardware-related activities of the processor.
- a set of hardware performance counters is maintained by each core of the processor.
- the counter(s) of the hardware performance counter(s) 108 respectively store a value corresponding to a particular type of hardware value and/or event that has occurred at the processor 105 .
- the hardware performance counter(s) 108 may include a counter to identify a number of cache misses, include a counter to identify a number of branch mis-predictions, etc.
- the hardware performance counter(s) 108 offered by processor 105 may depend on the manufacturer, model, type, etc. of the processor 105 .
- the example OSS/VMM 110 of the illustrated example of FIG. 1 represents at least one of the operating system and/or virtual machine manager of the computing system 100 .
- the OS/VMM 110 manages execution of processes by the processor 105 .
- the OS/VMM 110 controls isolation of the processes executed by the processor by, for example, instructing the processor to physically separate the process domains of various processes.
- the processor 105 may, at the direction of the OS/VMM 110 , physically separate (e.g., on two or more separate cores, on two or more separate CPUs, etc.) the execution space and/or memory accessible to various processes.
- Such separation reduces (e.g., minimizes) the shared hardware resources between the domains (process, VM, etc.) and thereby reduces (e.g., minimizes) a risk that sensitive data may be exposed.
- the example benign process 112 of the illustrated example of FIG. 1 is a process that stores sensitive information (e.g., passwords, images, documents, etc.) in a cache of the processor 105 .
- the example attack process 114 of the illustrated example of FIG. 1 is a process that seeks to perform a side channel attack to gain access to sensitive information stored by the benign process 112 .
- the example attack 114 is not a malicious process, in that the attack process 114 does not actually share the sensitive information outside of the computing system.
- An attack pattern may be simulated by such a non-malicious attack process without actually exposing any sensitive user information (e.g., passwords, images, documents, etc.).
- the attack process 114 is a malicious process and may attempt to share the sensitive information outside of the computing system 100 .
- additional safeguards may be put in place to stop the actual sharing of sensitive information such as, for example, a firewall that prevents communications including the sensitive information from reaching their destination.
- the example unknown process 116 of the illustrated example of FIG. 1 represents a process that is not known to be a benign process or an attack (malicious or non-malicious) process.
- the side channel anomaly detector 102 monitors hardware performance counter values (e.g., hardware performance counter values associated with the unknown process 116 ), and processes such hardware performance counter values to attempt to determine whether the unknown process 116 is performing an attack.
- the example side channel anomaly detector 102 of the illustrated example of FIG. 1 includes an anomaly detection controller 120 , an HPC interface 125 , an HPC data organizer 126 , an HPC data datastore, a machine learning model processor 145 , a machine learning model datastore 150 , a machine learning model trainer 155 , an error vector generator 160 , an error vector analyzer, and a threshold determiner 170 .
- the example anomaly detection controller 120 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor.
- a logic circuit such as, for example, a hardware processor.
- any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the anomaly detection controller 120 implements means for causing performance of a responsive action to mitigate a side channel attack.
- the means for causing may additionally or alternatively be implemented by a processor executing, for example, blocks 370 , 390 , 395 , 510 , and/or 690 of FIGS.
- the example anomaly detection controller 120 controls operation of the side channel anomaly detector 102 and interfaces with the OS/VMM 110 to identify the potential occurrence of an anomalous behavior (e.g., a side channel attack).
- the example anomaly detector 102 interfaces with the OS/VMM 110 to instruct the OS/VMM to execute one or more of the benign process 112 and/or the attack process 114 .
- the anomaly detection controller 120 compares a returned probability value to a threshold value to determine whether an anomaly (e.g., an attack) has been detected.
- the example HPC interface 125 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the example HPC interface 125 retrieves hardware performance counter values from the hardware performance counters 108 .
- the example HPC interface 125 provides the retrieved HPC counter values to the HPC data organizer to enable organization of the retrieved HPC data. In examples disclosed herein, retrieval of HPC values is performed at periodic monitoring intervals for a threshold amount of time (e.g., once per minute for ten minutes).
- the example HPC data organizer 126 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor.
- a logic circuit such as, for example, a hardware processor.
- any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the HPC data organizer 126 implements means for collecting a hardware performance counter value(s).
- the means for collecting may additionally or alternatively be implemented by a processor executing, for example, blocks 405 , 410 , 415 , 420 , 430 , 450 , and/or 460 of FIG. 4 .
- the example HPC data organizer 126 identifies one or more type(s) of HPC data to be collected and a length of a time period for which such data is to be collected. In some examples, the HPC data organizer 126 identifies a frequency at which such data is to be collected (e.g., once every minute, once every ten seconds, etc.). The example HPC data organizer 126 then collects the HPC data for each HPC type. Upon completion of the HPC data collection for each of the HPC data types, the example HPC data organizer 126 analyzes the returned data to determine whether any values are missing.
- Data may be missing when, for example, values for a first data type are collected at a first frequency (e.g., once every minute) while values for a second data type are collected at a second frequency different from the first frequency (e.g., once every ten seconds).
- Data may be considered missing when, for example, a value having a first timestamp appears in connection with a first data type, but no value having the first timestamp (or a timestamp within a threshold amount of time from the first timestamp) within a second data type.
- the example HPC data organizer 126 imputes missing values to fill in those data points missing from the HPC data. In examples disclosed herein, the example HPC data organizer 126 imputes the missing values using, for example, average values, median values, etc. In some examples, if the time-series data is of different lengths, padding can be used to achieve equal time length.
- the example HPC data datastore 127 of the illustrated example of FIG. 1 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, solid state memory, hard drive(s), thumb drive(s), etc.
- the data stored in the example HPC data datastore 127 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, etc.
- the HPC data datastore 127 is illustrated as a single device, the example HPC data datastore 127 and/or any other data storage devices described herein may be implemented by any number and/or type(s) of memories. In the illustrated example of FIG.
- the example HPC data datastore 127 stores HPC data organized by the HPC data organizer 126 .
- the HPC data datastore 127 may store HPC data created by HP data organizer(s) of another side channel anomaly detector 102 . That is, HPC data may be generated by one computing system and supplied to another computing system to facilitate operation thereof.
- HPC data in the HPC data datastore 127 is labeled according to whether the HPC data represents benign activity, attack activity, and/or other types of activities.
- the example machine learning model processor 145 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the machine learning model processor 145 implements means for predicting a value using a machine learning model.
- the means for predicting may additionally or alternatively be implemented by a processor executing, for example, blocks 530 , 610 , 670 , and/or 710 of FIGS. 5, 6 , and/or 7 .
- the example machine learning model processor 145 implements a machine learning model (e.g., a neural network) according to the model information stored in the model datastore 150 .
- the example machine learning model implements one or more stacked GRU(s).
- any other past, present, and/or future machine learning topology(ies) and/or architecture(s) may additionally or alternatively be used such as, for example, deep neural network (DNN), a convolutional neural network (CNN), a feed-forward neural network, a long short-term memory (LSTM) recurrent neural network (RNN).
- DNN deep neural network
- CNN convolutional neural network
- LSTM long short-term memory
- RNN recurrent neural network
- the example machine learning model datastore 150 of the illustrated example of FIG. 1 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, solid state memory, hard drive(s), thumb drive(s), etc.
- the data stored in the example model data store 150 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, etc.
- SQL structured query language
- the model data store 150 is illustrated as a single device, the example model data store 150 and/or any other data storage devices described herein may be implemented by any number and/or type(s) of memories. In the illustrated example of FIG.
- the example model data store 150 stores machine learning models trained by the machine learning model trainer 155 .
- the model(s) stored in the example model data store 150 may be retrieved from another computing system (e.g., a server that provides the model(s) to the side channel anomaly detector 102 ).
- the example machine learning model trainer 155 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the example machine learning model trainer 155 implements means for training a machine learning model.
- the means for training may additionally or alternatively be implemented by a processor executing, for example, block 520 of FIG. 5 .
- the example machine learning model trainer 155 performs training of the model stored in the model data store 150 . In examples disclosed herein, training is performed using Stochastic Gradient Descent. However, any other approach to training a machine learning model may additionally or alternatively be used.
- the example error vector generator 160 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the error vector generator 160 implements means for generating an error vector.
- the means for generating may additionally or alternatively be implemented by a processor executing, for example, blocks 560 , 615 , 635 , 675 , and/or 720 of FIGS. 5, 6 , and/or 7 .
- the example error vector generator 160 generates an error vector e t .
- the error vector e t represents a difference between the predicted time-series HPC data and actual time-series HPC data.
- the error vector e t is calculated using the following equation:
- the example error vector analyzer 165 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the example error vector analyzer 165 implements means for determining a probability.
- the means for determining may additionally or alternatively be implemented by a processor executing, for example, blocks 570 , 620 , 640 , 680 , and/or 730 of FIGS. 5, 6 , and/or 7 .
- the example error vector analyzer 165 creates an error model representing the error vector e t .
- the error vector is modeled as a multivariate Gaussian distribution parameterized by N( ⁇ , ⁇ ).
- the error model parameters are determined by the error vector analyzer 165 using a multivariate Gaussian distribution via maximum likelihood estimation (MLE).
- MLE maximum likelihood estimation
- the parameter ⁇ represents a d-dimensional mean, and the parameter ⁇ represents a covariance matrix.
- Such parameters can later be used to determine a probability of observing a particular error vector. That is, using a later-computed error vector and the error model parameters, the example error vector analyzer 165 can generate a probability of whether an anomaly (e.g., an attack) has been detected.
- an anomaly e.g., an attack
- the example threshold determiner 170 of the illustrated example of FIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc.
- the threshold determiner 170 implements means for selecting.
- the means for selecting may additionally or alternatively be implemented by any of the structure identified above for implementing the example threshold determiner 170 .
- the example threshold determiner 170 selects a threshold ⁇ that is used to determine whether the probability value computed by the error vector analyzer 165 represents an anomaly or not.
- the threshold determiner 170 selects the threshold based on a first probability associated with benign HPC data and a second probability associated with attack HPC data to reduce false positives and false negatives.
- the threshold is selected such that it is intermediate the first probability and the second probability (e.g., the mean of the first probability and the second probability).
- any other approach for selecting the threshold may additionally or alternatively be used.
- FIG. 2 is a block diagram of an example Gated Recurrent Unit (GRU) 201 used to detect a side channel attack.
- the example GRU 201 of FIG. 2 accepts inputs x t 201 and h t-1 202 , and outputs h t 204 and y t 205 .
- the input x t 201 represents the current state (e.g., a value from the HPC data), and h t-1 203 represents a hidden state extracted from a previous cell (e.g., another GRU in a multi-GRU stack).
- the example GRU 201 includes a r function 210 (e.g., a reset gate vector), a z function 220 (e.g., an update gate vector), and an ht function 230 .
- the example GRU includes a first Hadamard product function 240 , a second Hadamard product function 250 , a third Hadamard product function 260 , and a pairwise matrix addition function 270 .
- h t represents the hidden state.
- z ⁇ h t-1 represents the forgetting of hidden state information
- (1 ⁇ z) ⁇ h′ represents the remembrance of information from current nodes.
- Wz and Wr represent weighting values that are selected via training.
- h t forgets some information from previous h t-1 state and includes information from current node.
- a single GRU is shown, multiple GRUs may be stacked together to provide a corresponding number of forward-looking predicted values.
- hidden values are passed from one GRU to the next. That is, the output h t of a first GRU is used as the input h t-1 of a second GRU.
- the stacked GRUs are connected via a fully connected hidden layer through feedforward connections.
- While an example manner of implementing the side channel anomaly detector 102 is illustrated in FIG. 1 , one or more of the elements, processes and/or devices illustrated in FIG. 1 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the example anomaly detection controller 120 , the example HPC interface 125 , the example HPC data organizer 126 , the example machine learning model processor 145 , the example machine learning model trainer 155 , the example error vector generator 160 , the example error vector analyzer 165 , the example threshold determiner 170 , and/or, more generally, the example side channel anomaly detector 102 of FIG. 1 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware.
- 1 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)).
- analog or digital circuit(s) logic circuits
- programmable processor(s) programmable controller(s)
- GPU graphics processing unit
- DSP digital signal processor
- ASIC application specific integrated circuit
- PLD programmable logic device
- FPLD field programmable logic device
- At least one of the example anomaly detection controller 120 , the example HPC interface 125 , the example HPC data organizer 126 , the example machine learning model processor 145 , the example machine learning model trainer 155 , the example error vector generator 160 , the example error vector analyzer 165 , the example threshold determiner 170 , and/or, more generally, the example side channel anomaly detector 102 of FIG. 1 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware.
- the example side channel anomaly detector 102 of FIG. 1 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 1 , and/or may include more than one of any or all of the illustrated elements, processes and devices.
- the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.
- FIGS. 3, 4, 5, 6 , and/or 7 Flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example side channel anomaly detector 102 of FIG. 1 are shown in FIGS. 3, 4, 5, 6 , and/or 7 .
- the machine readable instructions may be an executable program or portion of an executable program for execution by a computer processor such as the processor 812 shown in the example processor platform 800 discussed below in connection with FIG. 8 .
- the program may be embodied in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 812 , but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 812 and/or embodied in firmware or dedicated hardware.
- a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 812 , but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 812 and/or embodied in firmware or dedicated hardware.
- the example program is described with reference to the flowchart(s) illustrated in FIGS. 3, 4, 5, 6 , and/or 7 , many other methods of implementing the example side channel anomaly detector 102 may alternatively be used. For example, the
- any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.) structured to perform the corresponding operation without executing software or firmware.
- hardware circuits e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.
- FIGS. 3, 4, 5, 6 , and/or 7 may be implemented using executable instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
- a non-transitory computer readable medium is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.
- A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C.
- the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
- the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
- the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
- the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
- FIG. 3 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector 102 of FIG. 1 .
- the example process 300 of FIG. 3 includes an initialization phase 305 and an inference phase 350 .
- the example process 300 of FIG. 3 begins when the anomaly detection controller 120 is initialized. Such initialization may occur, for example, upon startup of the example computing system 100 of FIG. 1 , at the direction of a user, etc.
- the example anomaly detection controller 120 enters the training phase 305 , where the example anomaly detection controller 120 gathers time-series HPC data for benign activity. (Block 310 a ). That is, time-series HPC data is collected while an attack is not being performed.
- the example anomaly detection controller 120 prior to collecting the benign HPC data, causes the OS/VMM 110 to execute the benign process 112 , and causes the execution of the benign process 112 to be terminated upon completion of the collection of the HPC data.
- An example process for collecting the time-series HPC data is described below in connection with FIG. 4 .
- the time-series HPC data for benign operation is stored in the HPC data datastore 127 .
- the example anomaly detection controller 120 gathers time-series HPC data for attack activity. (Block 310 b ). That is, time-series HPC data is collected while an attack is being performed (e.g., by running a non-malicious attack process 114 ). For example, prior to collecting the attack HPC data, the example anomaly detection controller 120 causes the OS/VMM 110 to execute the attack process 114 , and causes the execution of the attack process 114 to be terminated upon completion of the collection of the attack HPC data. In some examples, an attack process 114 is executed to simulate an attack (e.g., a side channel attack).
- an attack process 114 is executed to simulate an attack (e.g., a side channel attack).
- the attack process 114 is not a malicious process, in that the attack process 114 does not actually share the sensitive information outside of the computing system. In this manner, the attack may be simulated without actually exposing any sensitive user information (e.g., passwords, images, documents, etc.). However, in some examples, the example attack process 114 may be a malicious process and may attempt to share the sensitive information outside of the computing system 100 . In such examples, additional safeguards may be put in place to prevent the actual sharing of sensitive information such as, for example, a firewall that prevents communications including the sensitive information from reaching their destination.
- prior time-series HPC data may be gathered (e.g., time-series HPC data identified in connection with a prior attack).
- Such prior time-series HPC attack data may be retrieved from, for example, the HPC data datastore 127 , from an external resource (e.g., a remote side channel anomaly detector), etc.
- the time-series HPC data for the attack is not used for training the machine learning model but is, instead, used for selecting anomaly detection thresholds (e.g., to reduce the number of false positives). That is, a machine learning model is trained without utilizing attack time-series HPC data.
- anomaly detection may be performed without use of the attack HPC data.
- detection of an anomaly may use a threshold (e.g., a pre-determined threshold) which, in some examples, may be more prone to false positive and/or false negatives than an anomaly detection threshold based on time-series attack HPC data.
- a threshold e.g., a pre-determined threshold
- the benign HPCs are considered to be normal operation.
- collected time-series HPC data e.g., the benign HPC data and the attack HPC data
- the benign HPCs of the illustrated example are split into four sets including a benign training set, two benign validation sets, and one benign test set.
- the attack HPCs of the illustrated example are divided into two sets, including an attack validation set and an attack test set.
- the time-series HPC data may be split into any number of smaller data sets for training and/or validation purposes.
- the attack data is not used for training of the machine learning model, but is instead used for determination of an anomaly detection threshold.
- the example side channel anomaly detector 102 trains a machine learning model using the benign time-series HPC data. (Block 320 ).
- the trained model is stored in the machine learning model datastore 150 for future use.
- An example process for training the machine learning model and the anomaly detection thresholds is described below in connection with the illustrated example of FIG. 5 .
- a machine learning model implemented using a stacked gated recurrent unit (GRU) is trained to create predictions of forward-looking HPC counter values. Those predictions are used to calculate an error vector representing deviations in the predicted values from the actual values (e.g., actual values included in the time-series HPC data).
- the error vector is then used to determine parameters including, for example, a d-dimensional mean and a covariance matrix. Such parameters are used to determine subsequent probabilities of observing a particular error vector.
- the example side channel anomaly detector 102 determines an anomaly detection threshold ⁇ . (Block 330 ).
- the anomaly detection threshold represents a threshold probability that collected HPC data represents benign data.
- An example process for determining the anomaly detection threshold is described below in connection with the illustrated example of FIG. 6 .
- one or more of the benign data sets e.g., the first benign validation data set, the second benign validation data set, the benign test set, etc.
- the attack validation set are used to determine a value for the anomaly detection threshold that reduces a number of false positives in the benign test set.
- the attack test set is used to determine a false positive rate.
- the example side channel anomaly detector 102 enters the operational phase 350 .
- the example side channel anomaly detector 102 gathers time-series HPC data.
- the gathered time-series HPC data represents live operations of the computing system and can be used in connection with the trained machine learning model and determined anomaly detection threshold to determine whether an anomaly is detected.
- the example side channel anomaly detector 102 performs anomaly detection using the trained machine learning model and, using a result of the machine learning model, determines a probability (referred to herein as a p-value) of the time-series HPC data being benign. (Block 360 ).
- An example approach to performing such anomaly detection is described in further detail in connection with FIG. 7 , below.
- the example p-value produced by the side channel anomaly detector 102 represents a similarity of the collected time-series HPC data and benign time-series HPC data that can be used to determine if the collected HPC data is more similar to an attack operation or benign operation.
- p-values and their corresponding thresholds are created on a scale of zero to one. However, any other scale or nomenclature for representing a similarity may additionally or alternatively be used.
- the example anomaly detection controller 120 determines whether an anomaly has been detected. (Block 370 ).
- the anomaly is detected when the p-value is less than the anomaly detection threshold ⁇ .
- the example anomaly detection controller 120 implements one or more responsive actions (e.g., error handling techniques) to further analyze and/or mitigate such side channel attacks. (Block 390 ).
- the anomaly detection controller 120 may inform the corresponding system software (OS/VMM) 110 of the detected anomaly through available inter-process communication and/or other communication approaches (e.g., flags, interrupts, etc.).
- additional information such as, for example, attacker and/or victim domain identifiers (e.g., process identifiers and/or virtual machine identifiers of the process suspected to be under attack, process identifiers and/or virtual machine identifiers of the process suspected to be performing an attack) are identified in the HPC data and, as such, the OS/VMM 110 is notified of that information as well.
- such information is obtained by a runtime environment and/or scheduler of the OS/VMM 110 .
- Such information enables the domains (e.g., an attack domain and a victim domain) to be physically separated (e.g., on two separate cores, on two separate CPUs) by the scheduler of the OS/VMM 110 .
- Such separation reduces (e.g., minimizes) the shared hardware resources between the two domains (process, VM, etc.) and thereby reduces (e.g., minimizes) a risk that sensitive data may be exposed.
- the anomaly detection controller 120 informs the OSS/VMM 110 about potential onset of the side channel attack.
- the OS/VMM 110 can enable one or more architectural feature(s) that defend against cache side channel attacks.
- Such architectural features may be disabled by default to avoid performance costs, but may be enabled in situations where the potential onset of such an attack is detected.
- Such architectural features may include, for example, cache partitioning through cache allocation technology in a last level cache (LLC) of that CPU, activating memory tagging based capabilities for Level 1-Instruction (L1-I) and/or Level 1-Data (L1-D) caches, limiting speculation of memory accesses across domains, activating flushing of at least the L1-I/D caches across context switches, etc.
- the performance of the responsive action involves further analysis to determine whether a side channel attack (or a particular phase thereof) is being performed. That is, the detection/identification disclosed above in connection with FIG. 3 may be used as a first level of screening. For example, more resource-intensive analysis of the histogram(s), statistics of the histogram(s), etc. may additionally be performed. For example, further processing of the time-series HPC data may be performed using more computationally intensive techniques such as, for example, using a machine learning model implemented using a long short-term memory (LSTM) recurrent neural network (RNN). As such, further responsive actions may be performed based on a result of the more computationally intensive techniques.
- LSTM long short-term memory
- RNN recurrent neural network
- the potential attacker process is sandboxed (through methods guaranteed to be side channel attack safe) by the OS/VMM 110 and more extensive monitoring is applied to the activities performed by the process such as, for example, trace-based profiling, dynamic binary-instrumentation based checks, etc.
- the example anomaly detection controller 120 determines whether any re-training is to occur. (Block 395 ). In some examples, such re-training may occur in parallel with ongoing monitoring. That is, training may occur in an online fashion. In some examples, regularization is imposed to penalize false positives through, for example, a feedback loop. For example, as the anomaly detection controller 120 produces anomaly predictions, subsequent training can be performed using information identifying whether the detected anomaly was truly an anomaly.
- control may return to block 320 for further training utilizing additional information concerning the false positives.
- further training serves to reduce the number of false positives.
- false negatives may also be reduced. If no retraining is to be performed (e.g., block 395 returns a result of NO), control proceeds to block 310 c , where further monitoring is performed.
- a single threshold is used to determine whether an anomaly has been detected
- multiple thresholds may be used. For example, if the p-value is less than or equal to a first threshold (e.g., indicating an anomaly), a responsive action may be performed; if the p-value is greater than the first threshold and less than or equal to a second threshold (e.g., indicating a potential anomaly), further analysis may be performed (and a responsive action may be performed if the further analysis identifies that an anomaly has actually occurred); and finally, if the p-value is greater than the second threshold (e.g., indicating no anomaly), no action is taken.
- FIG. 4 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to gather time-series Hardware Performance Counter (HPC) data.
- the collection of the time-series HPC is considered a D-dimensional tensor (a tensor as used herein is defined to be a multi-dimensional array), where D is the number of HPCs to be collected.
- Each HPC produces a data matrix of size M n ⁇ d , where n is the number of time-series samples, and d is the number of values collected during the time period. Take, for example, a cache miss value.
- the example process of FIG. 4 begins when the example HPC data organizer 126 identifies types of HPC data to be collected, and a length of a time period for which such data is to be collected. (Block 405 ). In some examples, the HPC data organizer 126 identifies a frequency at which such data is to be collected. The example HPC data organizer 126 then collects the HPC data for each HPC type at the corresponding rate. (Block 410 ). In the illustrated example of FIG. 4 , the example HPC data organizer 126 collects HPC data for different types of HPC data in parallel (represented by blocks 411 and 412 ). However, in some examples, the collection of the HPC data for the other types of HPC data may be performed serially.
- the example HPC data organizer 126 requests the HPC value from the processor using a type index (e.g., an index identifying the type of data to be retrieved) via the HPC interface 125 . (Block 415 ).
- the example HPC data organizer 126 adds a timestamp to the retrieved HPC data and stores the retrieved HPC data (and timestamp) in the example HPC data organizer 126 . (Block 420 ). In some examples, the timestamp value may be omitted.
- the example HPC data organizer 126 then waits an amount of time according to the rate at which the HPC data is to be collected. (Block 425 ). The example HPC data organizer 126 determines whether collection of the HPC data is complete. (Block 430 ). The example HPC data organizer 126 may determine that collection of the HPC data is complete when the length of time to collect HPC data has elapsed (e.g., from the execution of the first iteration of block 415 ). In some examples, the data collection is considered complete when a threshold number of samples (e.g., a number of samples based on the length of time to collect HPC data and the sampling frequency) has been reached. If data collection is not complete (e.g., block 430 returns a result of NO), control returns to block 415 , where the process of blocks 415 through 430 is repeated until block 430 determines that data collection is complete.
- a threshold number of samples e.g., a number of samples based on the length of time to collect HPC data and
- the example HPC data organizer 126 Upon completion of the collection of the HPC data for each of the HPC data types (e.g., upon completion of blocks 410 , 411 , 412 , etc.) the example HPC data organizer 126 analyzes the returned data to determine whether any values are missing. (Block 450 ). Data may be missing when, for example, values for a first data type are collected at a first frequency while values for a second data type are collected at a second frequency different from the first frequency. Data may be considered missing when, for example, a value having a first timestamp appears in connection with a first data type, but no value having the first timestamp (or a timestamp within a threshold amount of time from the first timestamp) is present within a second data type.
- the example HPC data organizer 126 imputes missing values to fill in those data points missing from the HPC data. (Block 460 ). In examples disclosed herein, the example HPC data organizer 126 imputes the missing values using for example, average values, median values, etc. In some examples, if the time-series data is of different lengths, padding can be used to achieve equal time length.
- the example process of FIG. 4 terminates. The example process 310 of FIG. 4 may be re-executed in response to, for example, a request to collect further HPC data.
- FIG. 5 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to train a machine learning model on time-series HPC data.
- the example process 320 of FIG. 5 begins when the example anomaly detection controller 120 identifies a number of forward-looking values to be predicted by the trained machine learning model. (Block 510 ).
- the example machine learning model trainer 155 trains one or more models based on the benign training set to produce the identified number of forward-looking values. (Block 520 ).
- GRUs stacked gated recurrent units
- any other type of machine learning model may additionally or alternatively be used such as, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) neural network, etc.
- RNN recurrent neural network
- LSTM long short-term memory
- the example machine learning model trainer 155 updates the model(s) stored in the model datastore 150 to reduce an amount of error generated by the example machine learning model processor 145 when using input HPC data to attempt to predict the number of forward-looking values.
- training is performed using Stochastic Gradient Descent.
- any other approach to training a machine learning model may additionally or alternatively be used.
- the example machine learning model processor 145 tests the machine learning model using the first benign validation set (e.g., the first benign validation set created at block 315 ). (Block 530 ). To perform the testing, the final l values are omitted form the first benign validation set and are used to determine whether the machine learning model processor 145 properly predicted the final l values.
- the example machine learning model trainer 155 calculates an accuracy between the predicted l values and the actual l values (e.g., the values omitted from the first benign validation set). The example machine learning model trainer 155 compares the calculated accuracy to an accuracy threshold. (Block 540 ).
- the example machine learning model processor 145 uses the model trained at block 520 , processes a first portion of the second benign validation set data to predict l next values appearing in a second portion of the second benign validation set. (Block 550 ).
- the example error vector generator 160 generates an error vector e t . (Block 560 ).
- the error vector e t represents the difference between the predicted time-series HPC data and the second portion of the captured time-series HPC data. In examples disclosed herein, the error vector e t is calculated using the following equation:
- the example error vector analyzer 165 then creates an error model representing the error vector e t . (Block 570 ).
- the error vector is modeled as a multivariate Gaussian distribution parameterized by N( ⁇ , ⁇ ).
- the error model parameters are determined using a multivariate Gaussian distribution via maximum likelihood estimation (MLE).
- MLE maximum likelihood estimation
- the parameter ⁇ represents a d-dimensional mean, and the parameter ⁇ represents a covariance matrix. Such parameters can later be used to determine a probability of observing a particular error vector (e.g., during the testing described below in connection with FIG. 7 ).
- the example process 320 of FIG. 5 then terminates.
- monitoring and detection described in connection with the inference phase 350 of FIG. 3 can be performed (e.g., without having determined the anomaly detection threshold in connection with block 330 of FIG. 3 ).
- Such processing would produce a probability of observing an error, which is compared against an error threshold.
- the anomaly detection threshold has not yet been determined.
- the example threshold determiner 170 determines an anomaly detection threshold that can be used in connection with returned probability values to determine whether an anomaly has been detected.
- FIG. 6 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector 102 of FIG. 1 to determine an anomaly detection threshold.
- the processing of FIG. 6 utilizes the attack validation set, the benign test set, and the attack test set determined in connection with block 315 of FIG. 3 .
- the example process 330 of FIG. 6 begins when the example machine learning model processor 145 processes a first portion of the attack validation set using the trained machine learning model (e.g., the model trained at block 520 ) to generate a forward-looking prediction. (Block 610 ).
- the error vector generator 160 compares the forward-looking prediction to a second portion of the attack validation set to generate an attack validation error vector. (Block 615 ).
- the example error vector analyzer 165 calculates a first probability of detecting an anomaly (e.g., an attack). (Block 620 ).
- the example machine learning model processor 145 processes a first portion of the benign test set using the trained machine learning model (e.g., the model trained at block 520 ) to generate a forward-looking prediction. (Block 630 ).
- the error vector generator 160 compares the forward-looking prediction to a second portion of the benign test set to generate a benign error vector. (Block 635 ).
- the example error vector analyzer 165 uses the benign error vector and the error model parameters N( ⁇ , ⁇ ) determined in connection with block 570 of FIG. 5 , the example error vector analyzer 165 calculates a second probability of detecting an anomaly (e.g., an attack). (Block 640 ).
- the first probability (based on the attack validation set) is expected to be less than the second probability (based on the benign test set).
- the example threshold determiner 170 selects a threshold ⁇ based on the first probability and the second probability to reduce false positives and false negatives. (Block 660 ).
- the threshold is selected such that it is intermediate the first probability and the second probability (e.g., the mean of the first probability and the second probability).
- any other approach for selecting the threshold may additionally or alternatively be used.
- the process 330 continues to determine a false positive rate (block 670 through 690 ). However, in some examples, the process 330 of FIG. 6 terminates after selection of the threshold ⁇ .
- the example machine learning model processor 145 processes a first portion of the attack test set using the trained machine learning model (e.g., the model trained at block 520 ) to generate a forward-looking prediction. (Block 670 ).
- the example error vector generator 160 compares the forward-looking prediction to a second portion of the attack test set to generate an attack test error vector. (Block 675 ).
- the example error vector analyzer 165 uses the attack test error vector and the error model parameters N( ⁇ , ⁇ ) determined in connection with block 570 of FIG. 5 , the example error vector analyzer 165 calculates a third probability of detecting an anomaly (e.g., an attack). (Block 680 ).
- the example anomaly detection controller 120 compares the third probability to the threshold ⁇ to determine the false positive rate.
- a new value for the threshold ⁇ may be adjusted if, for example, the false positive rate is greater than an acceptable rate of false positives. In some other examples, the false positive rate is reported to a user and/or administrator of the computer system 100 .
- FIG. 7 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector of FIG. 1 to perform anomaly detection using the machine learning model and anomaly detection threshold against time-series HPC data.
- the example process 360 of FIG. 7 is performed using captured HPC data gathered in connection with block 310 c of FIG. 3 .
- the example machine learning model processor 145 uses the captured HPC data and the machine learning model trained in connection with block 320 of FIG. 3 , the example machine learning model processor 145 processes a first portion of the HPC data to predict l next values appearing in the second portion of the HPC data. (Block 710 ).
- the example error vector generator 160 generates an error vector e t . (Block 720 ).
- the error vector e t represents the difference between the predicted time-series HPC data and the second portion of the captured time-series HPC data.
- the example error vector analyzer 165 uses the error model parameters N( ⁇ , ⁇ ) determined in connection with block 570 of FIG. 5 to calculate a probability of observing the error vector. (Block 730 ).
- the example error vector analyzer 165 returns the probability of having detected an anomaly to the anomaly detection controller 120 (Block 740 ,) enabling the anomaly detection controller 120 at block 370 of FIG. 3 to determine whether an anomaly has been detected.
- FIG. 8 is a block diagram of an example processor platform 800 structured to execute the instructions of FIGS. 3, 4, 5, 6 , and/or 7 to implement the example side channel anomaly detector 102 of FIG. 1 .
- the processor platform 800 can be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPadTM), a personal digital assistant (PDA), an Internet appliance, a DVD player, a CD player, a digital video recorder, a Blu-ray player, a gaming console, a personal video recorder, a set top box, a headset or other wearable device, or any other type of computing device.
- a self-learning machine e.g., a neural network
- a mobile device e.g., a cell phone, a smart phone, a tablet such as an iPadTM
- PDA personal digital assistant
- an Internet appliance
- the processor platform 800 of the illustrated example includes a processor 812 .
- the processor 812 of the illustrated example is hardware.
- the processor 812 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer.
- the hardware processor may be a semiconductor based (e.g., silicon based) device.
- the processor implements the example anomaly detection controller 120 , the example HPC interface 125 , the example HPC data organizer 126 , the example machine learning model processor 145 , the example machine learning model trainer 155 , the example error vector generator 160 , the example error vector analyzer 165 , the example threshold determiner 170 .
- the processor 812 of the illustrated example includes a local memory 813 (e.g., a cache).
- the processor 812 of the illustrated example is in communication with a main memory including a volatile memory 814 and a non-volatile memory 816 via a bus 818 .
- the volatile memory 814 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device.
- the non-volatile memory 816 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 814 , 816 is controlled by a memory controller.
- the processor platform 800 of the illustrated example also includes an interface circuit 820 .
- the interface circuit 820 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
- one or more input devices 822 are connected to the interface circuit 820 .
- the input device(s) 822 permit(s) a user to enter data and/or commands into the processor 812 .
- the input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
- One or more output devices 824 are also connected to the interface circuit 820 of the illustrated example.
- the output devices 824 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker.
- the interface circuit 820 of the illustrated example thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
- the interface circuit 820 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 826 .
- the communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
- DSL digital subscriber line
- the processor platform 800 of the illustrated example also includes one or more mass storage devices 828 for storing software and/or data.
- mass storage devices 828 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
- the machine executable instructions 832 of FIGS. 3, 4, 5, 6 , and/or 7 may be stored in the mass storage device 828 , in the volatile memory 814 , in the non-volatile memory 816 , and/or on a removable non-transitory computer readable storage medium such as a CD or DVD.
- the example mass storage device 828 implements the example HPC data datastore 127 and the example machine learning model datastore 150 .
- Example 1 includes an apparatus for detecting side channel anomalies, the apparatus comprising a hardware performance counter data organizer to collect a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, a machine learning model processor to apply a machine learning model to predict a third value corresponding to the second time, an error vector generator to generate an error vector representing a difference between the second value and the third value, an error vector analyzer to determine a probability of the error vector indicating an anomaly, and an anomaly detection orchestrator to, in response to the probability satisfying a threshold, cause the performance of a responsive action to mitigate a side channel anomaly.
- a hardware performance counter data organizer to collect a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time
- a machine learning model processor to apply a machine learning model to predict a third value corresponding to the second time
- an error vector generator to generate an error vector representing a difference between the second value and the third value
- Example 2 includes the apparatus of example 1, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 3 includes the apparatus of example 1, further including a machine learning model trainer to train the machine learning model based on benign hardware performance counter data.
- Example 4 includes the apparatus of example 3, wherein the machine learning model trainer does not train the machine learning model based on attack hardware performance counter data.
- Example 5 includes the apparatus of example 1, further including a threshold determiner to determine the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 6 includes the apparatus of example 1, wherein the hardware performance counter data organizer is further to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 7 includes the apparatus of example 1, wherein the machine learning model is a first machine learning model, and the responsive action includes utilization of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 8 includes at least one non-transitory computer-readable medium comprising instructions that, when executed, cause at least one processor to at least collect a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, apply a machine learning model to predict a third value corresponding to the second time, generate an error vector representing a difference between the second value and the third value, determine a probability of the error vector indicating an anomaly, and cause, in response to determining that the probability satisfying a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 9 includes the at least one non-transitory computer-readable medium of example 8, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 10 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to train the machine learning model based on benign hardware performance counter data.
- Example 11 includes the at least one non-transitory computer-readable medium of example 10, wherein the instructions, when executed, further cause the at least one processor to train the machine learning model without using attack hardware performance counter data.
- Example 12 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to determine the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 13 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 14 includes the at least one non-transitory computer-readable medium of example 8, wherein the machine learning model is a first machine learning model, and the responsive action includes utilization of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 15 includes an apparatus for detecting side channel anomalies, the apparatus comprising means for collecting a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, means for predicting a third value corresponding to the second time using a machine learning model, means for generating an error vector representing a difference between the second value and the third value, means for determining a probability of the error vector indicating an anomaly, and means for causing, in response to determining that the probability satisfies a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 16 includes the apparatus of example 15, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 17 includes the apparatus of example 15, further including means for training the machine learning model based on benign hardware performance counter data.
- Example 18 includes the apparatus of example 17, wherein the means for training is not to train the machine learning model based on attack hardware performance counter data.
- Example 19 includes the apparatus of example 15, further including means for selecting the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 20 includes the apparatus of example 15, wherein the means for collecting is further to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 21 includes the apparatus of example 15, wherein the machine learning model is a first machine learning model, and means for causing is further to cause the use of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 22 includes a method for detecting side channel anomalies, the method comprising collecting a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, applying, by executing an instruction with a processor, a machine learning model to predict a third value corresponding to the second time, generating, by executing an instruction with the processor, an error vector representing a difference between the second value and the third value, determining, by executing an instruction with the processor, a probability of the error vector indicating an anomaly, and causing, in response to determining that the probability satisfies a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 23 includes the method of example 22, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 24 includes the method of example 22, further including training the machine learning model based on benign hardware performance counter data.
- Example 25 includes the method of example 24, wherein the training of the machine learning model does not utilize attack hardware performance counter data.
- Example 26 includes the method of example 22, further including determining the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 27 includes the method of example 22, wherein further including imputing a fourth value having a timestamp intermediate the first time and the second time.
- Example 28 includes the method of example 22, wherein the machine learning model is a first machine learning model, and further including utilizing a long short-term memory recurrent neural network to determine whether an anomaly is detected.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- This disclosure relates generally to anomaly detection, and, more particularly, to methods and apparatus for detecting a side channel attack using hardware performance counters.
- Over the past few years, micro-architectural side channel attacks have evolved from theoretical attacks on cryptographic algorithm implementations to highly practical generic attack primitives. For example, vulnerabilities such as the Meltdown and Spectre attacks exploit vulnerabilities in modern processors and break memory isolation among processes or privilege layers to gain access to data from other applications and/or the operating system (OS). Such data may include passwords stored in a password manager or browser, personal photos, emails, instant messages, and even business-critical documents. Side channel attacks exploit the fact that hardware resources are physically shared among processes running in different isolation domains.
-
FIG. 1 is a block diagram of an example system constructed in accordance with teachings of this disclosure for detecting a side channel attack using hardware performance counters. -
FIG. 2 is a block diagram of an example Gated Recurrent Unit used to detect a side channel attack. -
FIG. 3 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 . -
FIG. 4 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to gather time-series Hardware Performance Counter (HPC) data. -
FIG. 5 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to train a machine learning model on time-series HPC data. -
FIG. 6 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to determine an anomaly detection threshold. -
FIG. 7 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to perform anomaly detection using the machine learning model and anomaly detection threshold against time-series HPC data. -
FIG. 8 is a block diagram of an example processing platform structured to execute the instructions ofFIGS. 3, 4, 5, 6 , and/or 7 to implement the example side channel anomaly detector ofFIG. 1 . - The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
- Side channel attacks exploit the fact that hardware resources of a computing system, such as a cache, a branch predictor, a branch target buffer, an execution unit, etc., are physically shared among processes running on the computing system. Mitigations against side channel attacks mainly focused on patching and proposing new architecture designs. However, not all systems can be patched. Even where possible, patching can be difficult. Moreover, patching sometimes introduces a large amount of operational overhead including, for example, physically replacing hardware components. Example approaches disclosed herein seek to mitigate side channel attacks by early detection of such attacks, enabling responsive actions to be taken to avoid the impact(s) of a side channel attack.
- Cache Side Channel Attacks (SCA) are serious threats to information security where multiple processes/virtual machines (VMs) execute on the same physical machine (e.g., share hardware resources of the physical machine). A cache of the central processing unit (CPU) is one of the most dangerous shared resources since the CPU cache is shared by all of the cores in a CPU package. As a result, the CPU cache represents a possible attack vector to perform fine-grained, high-bandwidth, low-noise cross-core attacks.
- Example approaches disclosed herein utilize a lightweight anomaly detection framework for detection of cache side channel attacks. Example approaches disclosed herein utilize a machine learning algorithm to perform time-series analysis of Hardware Performance Counter (HPC) data, and develops an anomaly detection model using stacked gated recurrent units (GRU) to detect cache side channel attacks. The stacked GRUs are built on the multivariate time-series of the hardware performance counters rather than built on a single time-series of the HPC. In examples disclosed herein, attack data is not used for training of such anomaly detection models (but may be used for determination of anomaly detection thresholds). As a result, the anomaly detection approach is generalizable to detect newly evolved unseen attacks.
- The example machine-learning framework utilized herein is based on anomaly detection of time-series hardware performance counter data, and can be used for runtime detection of cache side channel attacks. The example framework utilizes four main activities: (1) collect hardware performance counters, (2) train a machine learning model, (3) determine an anomaly detection threshold, and (4) detect an anomaly in time-series data using the trained model and anomaly detection threshold. In examples disclosed herein, the machine learning model uses one-class anomaly detection, which can effectively detect attacks not seen before. As a result, the system possesses a degree of resiliency against newly evolved attacks. Moreover, example approaches disclosed herein utilize multivariate time-series processing and prediction, which does not require the use of one model per time-series of hardware performance counters. As a result, such processing can all be performed at once, instead of having to perform each separate time series using separate models.
- In example approaches disclosed herein, the machine learning model is implemented as a stacked GRU. However, other types of machine learning models such as, for example, a long short-term memory (LSTM) recurrent neural network (RNN) may additionally or alternatively be used. In examples disclosed herein, a stacked GRU implementation is more resource efficient and faster than approaches that utilize an LSTM-based machine learning model. In some examples, such increased resource efficiency comes at the cost of decreased accuracy. Since the LSTM-based architecture sometimes produces higher accuracy than the GRU-based architecture, but the GRU-based architecture includes fewer gates and, as a result, can be executed more quickly. The stacked GRU-based architecture may be used to first predict a probability of observing the error being slightly above the detection threshold. Example approaches may then utilize the LSTM-based architecture for further analysis of whether an anomaly has been detected.
-
FIG. 1 is a block diagram of an example system constructed in accordance with teachings of this disclosure for detecting a side channel attack. In examples disclosed herein, a machine-learning based detection system is used to detect speculative and traditional cache side channel attacks based on changes in values of hardware performance counters of a computing system. The example system 100 ofFIG. 1 includes a sidechannel anomaly detector 102, aprocessor 105, and an operating system/virtual machine manager (VMM) 110. Theexample processor 105 includes one or more hardware process counter(s) 108 that are utilized by processes executing on theprocessor 105. The example system 100 of the illustrated example ofFIG. 1 shows abenign process 112, anattack process 114 and anunknown process 116.Such processes - The
example processor 105 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)), field programmable logic device(s) (FPLD(s)), digital signal processor(s) (DSP(s)), etc. In examples disclosed herein, hardware performance counter(s) 108 included in theprocessor 105 include one or more registers of theprocessor 105 that stores counts of hardware-related activities of the processor. In some examples, a set of hardware performance counters is maintained by each core of the processor. Thus, in examples where theprocessor 105 includes multiple cores, there may be multiple sets of hardware performance counter(s) 108. The counter(s) of the hardware performance counter(s) 108 respectively store a value corresponding to a particular type of hardware value and/or event that has occurred at theprocessor 105. For example, the hardware performance counter(s) 108 may include a counter to identify a number of cache misses, include a counter to identify a number of branch mis-predictions, etc. In some examples, the hardware performance counter(s) 108 offered byprocessor 105 may depend on the manufacturer, model, type, etc. of theprocessor 105. - The example OSS/VMM 110 of the illustrated example of
FIG. 1 represents at least one of the operating system and/or virtual machine manager of the computing system 100. In examples disclosed herein, the OS/VMM 110 manages execution of processes by theprocessor 105. In some examples, the OS/VMM 110 controls isolation of the processes executed by the processor by, for example, instructing the processor to physically separate the process domains of various processes. For example, theprocessor 105 may, at the direction of the OS/VMM 110, physically separate (e.g., on two or more separate cores, on two or more separate CPUs, etc.) the execution space and/or memory accessible to various processes. Such separation reduces (e.g., minimizes) the shared hardware resources between the domains (process, VM, etc.) and thereby reduces (e.g., minimizes) a risk that sensitive data may be exposed. - The example
benign process 112 of the illustrated example ofFIG. 1 is a process that stores sensitive information (e.g., passwords, images, documents, etc.) in a cache of theprocessor 105. Theexample attack process 114 of the illustrated example ofFIG. 1 is a process that seeks to perform a side channel attack to gain access to sensitive information stored by thebenign process 112. In some examples, theexample attack 114 is not a malicious process, in that theattack process 114 does not actually share the sensitive information outside of the computing system. An attack pattern may be simulated by such a non-malicious attack process without actually exposing any sensitive user information (e.g., passwords, images, documents, etc.). However, in some examples, theattack process 114 is a malicious process and may attempt to share the sensitive information outside of the computing system 100. In such examples, additional safeguards may be put in place to stop the actual sharing of sensitive information such as, for example, a firewall that prevents communications including the sensitive information from reaching their destination. - The example
unknown process 116 of the illustrated example ofFIG. 1 represents a process that is not known to be a benign process or an attack (malicious or non-malicious) process. As a result, the sidechannel anomaly detector 102 monitors hardware performance counter values (e.g., hardware performance counter values associated with the unknown process 116), and processes such hardware performance counter values to attempt to determine whether theunknown process 116 is performing an attack. - The example side
channel anomaly detector 102 of the illustrated example ofFIG. 1 includes ananomaly detection controller 120, anHPC interface 125, anHPC data organizer 126, an HPC data datastore, a machinelearning model processor 145, a machinelearning model datastore 150, a machinelearning model trainer 155, anerror vector generator 160, an error vector analyzer, and athreshold determiner 170. - The example
anomaly detection controller 120 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, theanomaly detection controller 120 implements means for causing performance of a responsive action to mitigate a side channel attack. The means for causing may additionally or alternatively be implemented by a processor executing, for example, blocks 370, 390, 395, 510, and/or 690 ofFIGS. 3, 5 , and/or 6. The exampleanomaly detection controller 120 controls operation of the sidechannel anomaly detector 102 and interfaces with the OS/VMM 110 to identify the potential occurrence of an anomalous behavior (e.g., a side channel attack). In some examples, to facilitate training, theexample anomaly detector 102 interfaces with the OS/VMM 110 to instruct the OS/VMM to execute one or more of thebenign process 112 and/or theattack process 114. In some examples, theanomaly detection controller 120 compares a returned probability value to a threshold value to determine whether an anomaly (e.g., an attack) has been detected. - The
example HPC interface 125 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. Theexample HPC interface 125 retrieves hardware performance counter values from the hardware performance counters 108. Theexample HPC interface 125 provides the retrieved HPC counter values to the HPC data organizer to enable organization of the retrieved HPC data. In examples disclosed herein, retrieval of HPC values is performed at periodic monitoring intervals for a threshold amount of time (e.g., once per minute for ten minutes). - The example
HPC data organizer 126 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, theHPC data organizer 126 implements means for collecting a hardware performance counter value(s). The means for collecting may additionally or alternatively be implemented by a processor executing, for example, blocks 405, 410, 415, 420, 430, 450, and/or 460 ofFIG. 4 . The exampleHPC data organizer 126 identifies one or more type(s) of HPC data to be collected and a length of a time period for which such data is to be collected. In some examples, theHPC data organizer 126 identifies a frequency at which such data is to be collected (e.g., once every minute, once every ten seconds, etc.). The exampleHPC data organizer 126 then collects the HPC data for each HPC type. Upon completion of the HPC data collection for each of the HPC data types, the exampleHPC data organizer 126 analyzes the returned data to determine whether any values are missing. Data may be missing when, for example, values for a first data type are collected at a first frequency (e.g., once every minute) while values for a second data type are collected at a second frequency different from the first frequency (e.g., once every ten seconds). Data may be considered missing when, for example, a value having a first timestamp appears in connection with a first data type, but no value having the first timestamp (or a timestamp within a threshold amount of time from the first timestamp) within a second data type. If any data points are missing, the exampleHPC data organizer 126 imputes missing values to fill in those data points missing from the HPC data. In examples disclosed herein, the exampleHPC data organizer 126 imputes the missing values using, for example, average values, median values, etc. In some examples, if the time-series data is of different lengths, padding can be used to achieve equal time length. - The example HPC data datastore 127 of the illustrated example of
FIG. 1 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, solid state memory, hard drive(s), thumb drive(s), etc. Furthermore, the data stored in the example HPC data datastore 127 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, etc. While, in the illustrated example, the HPC data datastore 127 is illustrated as a single device, the example HPC data datastore 127 and/or any other data storage devices described herein may be implemented by any number and/or type(s) of memories. In the illustrated example ofFIG. 1 , the example HPC data datastore 127 stores HPC data organized by theHPC data organizer 126. In some examples, the HPC data datastore 127 may store HPC data created by HP data organizer(s) of another sidechannel anomaly detector 102. That is, HPC data may be generated by one computing system and supplied to another computing system to facilitate operation thereof. In examples disclosed herein, HPC data in the HPC data datastore 127 is labeled according to whether the HPC data represents benign activity, attack activity, and/or other types of activities. - The example machine
learning model processor 145 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, the machinelearning model processor 145 implements means for predicting a value using a machine learning model. The means for predicting may additionally or alternatively be implemented by a processor executing, for example, blocks 530, 610, 670, and/or 710 ofFIGS. 5, 6 , and/or 7. The example machinelearning model processor 145 implements a machine learning model (e.g., a neural network) according to the model information stored in themodel datastore 150. The example machine learning model implements one or more stacked GRU(s). However, any other past, present, and/or future machine learning topology(ies) and/or architecture(s) may additionally or alternatively be used such as, for example, deep neural network (DNN), a convolutional neural network (CNN), a feed-forward neural network, a long short-term memory (LSTM) recurrent neural network (RNN). - The example machine learning model datastore 150 of the illustrated example of
FIG. 1 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, solid state memory, hard drive(s), thumb drive(s), etc. Furthermore, the data stored in the examplemodel data store 150 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, etc. While, in the illustrated example, themodel data store 150 is illustrated as a single device, the examplemodel data store 150 and/or any other data storage devices described herein may be implemented by any number and/or type(s) of memories. In the illustrated example ofFIG. 1 , the examplemodel data store 150 stores machine learning models trained by the machinelearning model trainer 155. In some examples, the model(s) stored in the examplemodel data store 150 may be retrieved from another computing system (e.g., a server that provides the model(s) to the side channel anomaly detector 102). - The example machine
learning model trainer 155 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, the example machinelearning model trainer 155 implements means for training a machine learning model. The means for training may additionally or alternatively be implemented by a processor executing, for example, block 520 ofFIG. 5 . The example machinelearning model trainer 155 performs training of the model stored in themodel data store 150. In examples disclosed herein, training is performed using Stochastic Gradient Descent. However, any other approach to training a machine learning model may additionally or alternatively be used. - The example
error vector generator 160 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, theerror vector generator 160 implements means for generating an error vector. The means for generating may additionally or alternatively be implemented by a processor executing, for example, blocks 560, 615, 635, 675, and/or 720 ofFIGS. 5, 6 , and/or 7. The exampleerror vector generator 160 generates an error vector et. The error vector et represents a difference between the predicted time-series HPC data and actual time-series HPC data. In examples disclosed herein, the error vector et is calculated using the following equation: -
e t=(e 11 , . . . ,e 1l)=|predicted(x)−actual(x)|Equation 1 - However, any other approach to computing an error vector may additionally or alternatively be used.
- The example
error vector analyzer 165 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, the exampleerror vector analyzer 165 implements means for determining a probability. The means for determining may additionally or alternatively be implemented by a processor executing, for example, blocks 570, 620, 640, 680, and/or 730 ofFIGS. 5, 6 , and/or 7. The exampleerror vector analyzer 165 creates an error model representing the error vector et. In examples disclosed herein, the error vector is modeled as a multivariate Gaussian distribution parameterized by N(μ,Σ). In examples disclosed herein, the error model parameters are determined by theerror vector analyzer 165 using a multivariate Gaussian distribution via maximum likelihood estimation (MLE). However, any other approach to selecting the error model parameters may additionally or alternatively be used. The parameter μ represents a d-dimensional mean, and the parameter Σ represents a covariance matrix. Such parameters can later be used to determine a probability of observing a particular error vector. That is, using a later-computed error vector and the error model parameters, the exampleerror vector analyzer 165 can generate a probability of whether an anomaly (e.g., an attack) has been detected. - The
example threshold determiner 170 of the illustrated example ofFIG. 1 is implemented by a logic circuit such as, for example, a hardware processor. However, any other type of circuitry may additionally or alternatively be used such as, for example, one or more analog or digital circuit(s), logic circuits, programmable processor(s), ASIC(s), PLD(s), FPLD(s), programmable controller(s), GPU(s), DSP(s), etc. In this example, thethreshold determiner 170 implements means for selecting. The means for selecting may additionally or alternatively be implemented by any of the structure identified above for implementing theexample threshold determiner 170. Theexample threshold determiner 170 selects a threshold τ that is used to determine whether the probability value computed by theerror vector analyzer 165 represents an anomaly or not. In examples disclosed herein, thethreshold determiner 170 selects the threshold based on a first probability associated with benign HPC data and a second probability associated with attack HPC data to reduce false positives and false negatives. In examples disclosed herein, the threshold is selected such that it is intermediate the first probability and the second probability (e.g., the mean of the first probability and the second probability). However, any other approach for selecting the threshold may additionally or alternatively be used. -
FIG. 2 is a block diagram of an example Gated Recurrent Unit (GRU) 201 used to detect a side channel attack. Theexample GRU 201 ofFIG. 2 accepts inputs xt 201 andh t-1 202, andoutputs h t 204 andy t 205. The input xt 201 represents the current state (e.g., a value from the HPC data), andh t-1 203 represents a hidden state extracted from a previous cell (e.g., another GRU in a multi-GRU stack). Theexample GRU 201 includes a r function 210 (e.g., a reset gate vector), a z function 220 (e.g., an update gate vector), and anht function 230. The example GRU includes a firstHadamard product function 240, a secondHadamard product function 250, a thirdHadamard product function 260, and a pairwisematrix addition function 270. For theGRU 201, assuming the input at time t is xt, then the following equations hold: -
- In equation 2, ht represents the hidden state. In equation 2, z⊙ht-1 represents the forgetting of hidden state information, while (1−z)⊙h′ represents the remembrance of information from current nodes. In equation 2, Wz and Wr represent weighting values that are selected via training. Thus, ht forgets some information from previous ht-1 state and includes information from current node. While in the illustrated example of
FIG. 2 a single GRU is shown, multiple GRUs may be stacked together to provide a corresponding number of forward-looking predicted values. In such an example stacking architecture, hidden values are passed from one GRU to the next. That is, the output ht of a first GRU is used as the input ht-1 of a second GRU. In examples disclosed herein, the stacked GRUs are connected via a fully connected hidden layer through feedforward connections. - While an example manner of implementing the side
channel anomaly detector 102 is illustrated inFIG. 1 , one or more of the elements, processes and/or devices illustrated inFIG. 1 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the exampleanomaly detection controller 120, theexample HPC interface 125, the exampleHPC data organizer 126, the example machinelearning model processor 145, the example machinelearning model trainer 155, the exampleerror vector generator 160, the exampleerror vector analyzer 165, theexample threshold determiner 170, and/or, more generally, the example sidechannel anomaly detector 102 ofFIG. 1 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the exampleanomaly detection controller 120, theexample HPC interface 125, the exampleHPC data organizer 126, the example machinelearning model processor 145, the example machinelearning model trainer 155, the exampleerror vector generator 160, the exampleerror vector analyzer 165, theexample threshold determiner 170, and/or, more generally, the example sidechannel anomaly detector 102 ofFIG. 1 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the exampleanomaly detection controller 120, theexample HPC interface 125, the exampleHPC data organizer 126, the example machinelearning model processor 145, the example machinelearning model trainer 155, the exampleerror vector generator 160, the exampleerror vector analyzer 165, theexample threshold determiner 170, and/or, more generally, the example sidechannel anomaly detector 102 ofFIG. 1 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware. Further still, the example sidechannel anomaly detector 102 ofFIG. 1 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated inFIG. 1 , and/or may include more than one of any or all of the illustrated elements, processes and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events. - Flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example side
channel anomaly detector 102 ofFIG. 1 are shown inFIGS. 3, 4, 5, 6 , and/or 7. The machine readable instructions may be an executable program or portion of an executable program for execution by a computer processor such as theprocessor 812 shown in theexample processor platform 800 discussed below in connection withFIG. 8 . The program may be embodied in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with theprocessor 812, but the entire program and/or parts thereof could alternatively be executed by a device other than theprocessor 812 and/or embodied in firmware or dedicated hardware. Further, although the example program is described with reference to the flowchart(s) illustrated inFIGS. 3, 4, 5, 6 , and/or 7, many other methods of implementing the example sidechannel anomaly detector 102 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.) structured to perform the corresponding operation without executing software or firmware. - As mentioned above, the example processes of
FIGS. 3, 4, 5, 6 , and/or 7 may be implemented using executable instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non-transitory computer readable medium is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. - “Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
-
FIG. 3 is a flowchart representative of example machine readable instructions which may be executed to implement the example sidechannel anomaly detector 102 ofFIG. 1 . Theexample process 300 ofFIG. 3 includes aninitialization phase 305 and aninference phase 350. Theexample process 300 ofFIG. 3 begins when theanomaly detection controller 120 is initialized. Such initialization may occur, for example, upon startup of the example computing system 100 ofFIG. 1 , at the direction of a user, etc. The exampleanomaly detection controller 120 enters thetraining phase 305, where the exampleanomaly detection controller 120 gathers time-series HPC data for benign activity. (Block 310 a). That is, time-series HPC data is collected while an attack is not being performed. In some examples, prior to collecting the benign HPC data, the exampleanomaly detection controller 120 causes the OS/VMM 110 to execute thebenign process 112, and causes the execution of thebenign process 112 to be terminated upon completion of the collection of the HPC data. An example process for collecting the time-series HPC data is described below in connection withFIG. 4 . The time-series HPC data for benign operation is stored in the HPC data datastore 127. - The example
anomaly detection controller 120 gathers time-series HPC data for attack activity. (Block 310 b). That is, time-series HPC data is collected while an attack is being performed (e.g., by running a non-malicious attack process 114). For example, prior to collecting the attack HPC data, the exampleanomaly detection controller 120 causes the OS/VMM 110 to execute theattack process 114, and causes the execution of theattack process 114 to be terminated upon completion of the collection of the attack HPC data. In some examples, anattack process 114 is executed to simulate an attack (e.g., a side channel attack). In such an example, theattack process 114 is not a malicious process, in that theattack process 114 does not actually share the sensitive information outside of the computing system. In this manner, the attack may be simulated without actually exposing any sensitive user information (e.g., passwords, images, documents, etc.). However, in some examples, theexample attack process 114 may be a malicious process and may attempt to share the sensitive information outside of the computing system 100. In such examples, additional safeguards may be put in place to prevent the actual sharing of sensitive information such as, for example, a firewall that prevents communications including the sensitive information from reaching their destination. - In some examples, instead of collecting the time-series HPC data for an attack, prior time-series HPC data may be gathered (e.g., time-series HPC data identified in connection with a prior attack). Such prior time-series HPC attack data may be retrieved from, for example, the HPC data datastore 127, from an external resource (e.g., a remote side channel anomaly detector), etc. In examples disclosed herein, the time-series HPC data for the attack is not used for training the machine learning model but is, instead, used for selecting anomaly detection thresholds (e.g., to reduce the number of false positives). That is, a machine learning model is trained without utilizing attack time-series HPC data. As a result, anomaly detection may be performed without use of the attack HPC data. In such an example, detection of an anomaly may use a threshold (e.g., a pre-determined threshold) which, in some examples, may be more prone to false positive and/or false negatives than an anomaly detection threshold based on time-series attack HPC data.
- In examples disclosed herein, the benign HPCs are considered to be normal operation. During training, collected time-series HPC data (e.g., the benign HPC data and the attack HPC data) are split into smaller data sets. (Block 315). The benign HPCs of the illustrated example are split into four sets including a benign training set, two benign validation sets, and one benign test set. The attack HPCs of the illustrated example are divided into two sets, including an attack validation set and an attack test set. However, the time-series HPC data may be split into any number of smaller data sets for training and/or validation purposes. As noted above, the attack data is not used for training of the machine learning model, but is instead used for determination of an anomaly detection threshold.
- The example side
channel anomaly detector 102 trains a machine learning model using the benign time-series HPC data. (Block 320). The trained model is stored in the machinelearning model datastore 150 for future use. An example process for training the machine learning model and the anomaly detection thresholds is described below in connection with the illustrated example ofFIG. 5 . In this example, a machine learning model implemented using a stacked gated recurrent unit (GRU) is trained to create predictions of forward-looking HPC counter values. Those predictions are used to calculate an error vector representing deviations in the predicted values from the actual values (e.g., actual values included in the time-series HPC data). The error vector is then used to determine parameters including, for example, a d-dimensional mean and a covariance matrix. Such parameters are used to determine subsequent probabilities of observing a particular error vector. - The example side
channel anomaly detector 102 determines an anomaly detection threshold τ. (Block 330). In examples disclosed herein, the anomaly detection threshold represents a threshold probability that collected HPC data represents benign data. An example process for determining the anomaly detection threshold is described below in connection with the illustrated example ofFIG. 6 . In this example, one or more of the benign data sets (e.g., the first benign validation data set, the second benign validation data set, the benign test set, etc.) and the attack validation set are used to determine a value for the anomaly detection threshold that reduces a number of false positives in the benign test set. In some examples, the attack test set is used to determine a false positive rate. - Once training is complete, the example side
channel anomaly detector 102 enters theoperational phase 350. The example sidechannel anomaly detector 102 gathers time-series HPC data. (Block 310 c). The gathered time-series HPC data represents live operations of the computing system and can be used in connection with the trained machine learning model and determined anomaly detection threshold to determine whether an anomaly is detected. The example sidechannel anomaly detector 102 performs anomaly detection using the trained machine learning model and, using a result of the machine learning model, determines a probability (referred to herein as a p-value) of the time-series HPC data being benign. (Block 360). An example approach to performing such anomaly detection is described in further detail in connection withFIG. 7 , below. The example p-value produced by the sidechannel anomaly detector 102 represents a similarity of the collected time-series HPC data and benign time-series HPC data that can be used to determine if the collected HPC data is more similar to an attack operation or benign operation. In examples disclosed herein, p-values and their corresponding thresholds are created on a scale of zero to one. However, any other scale or nomenclature for representing a similarity may additionally or alternatively be used. - Using the returned probability value, the example
anomaly detection controller 120 determines whether an anomaly has been detected. (Block 370). In examples disclosed herein, the anomaly is detected when the p-value is less than the anomaly detection threshold τ. In response to the detection of the anomaly signifying potential onset or incidence of a cache side channel attack, (block 370 returning a result of YES), the exampleanomaly detection controller 120 implements one or more responsive actions (e.g., error handling techniques) to further analyze and/or mitigate such side channel attacks. (Block 390). - For example, the
anomaly detection controller 120 may inform the corresponding system software (OS/VMM) 110 of the detected anomaly through available inter-process communication and/or other communication approaches (e.g., flags, interrupts, etc.). In some examples, additional information such as, for example, attacker and/or victim domain identifiers (e.g., process identifiers and/or virtual machine identifiers of the process suspected to be under attack, process identifiers and/or virtual machine identifiers of the process suspected to be performing an attack) are identified in the HPC data and, as such, the OS/VMM 110 is notified of that information as well. In some examples, such information is obtained by a runtime environment and/or scheduler of the OS/VMM 110. Such information enables the domains (e.g., an attack domain and a victim domain) to be physically separated (e.g., on two separate cores, on two separate CPUs) by the scheduler of the OS/VMM 110. Such separation reduces (e.g., minimizes) the shared hardware resources between the two domains (process, VM, etc.) and thereby reduces (e.g., minimizes) a risk that sensitive data may be exposed. - In some examples, the
anomaly detection controller 120 informs the OSS/VMM 110 about potential onset of the side channel attack. The OS/VMM 110 can enable one or more architectural feature(s) that defend against cache side channel attacks. Such architectural features may be disabled by default to avoid performance costs, but may be enabled in situations where the potential onset of such an attack is detected. Such architectural features may include, for example, cache partitioning through cache allocation technology in a last level cache (LLC) of that CPU, activating memory tagging based capabilities for Level 1-Instruction (L1-I) and/or Level 1-Data (L1-D) caches, limiting speculation of memory accesses across domains, activating flushing of at least the L1-I/D caches across context switches, etc. - In some examples, the performance of the responsive action involves further analysis to determine whether a side channel attack (or a particular phase thereof) is being performed. That is, the detection/identification disclosed above in connection with
FIG. 3 may be used as a first level of screening. For example, more resource-intensive analysis of the histogram(s), statistics of the histogram(s), etc. may additionally be performed. For example, further processing of the time-series HPC data may be performed using more computationally intensive techniques such as, for example, using a machine learning model implemented using a long short-term memory (LSTM) recurrent neural network (RNN). As such, further responsive actions may be performed based on a result of the more computationally intensive techniques. In some examples, the potential attacker process is sandboxed (through methods guaranteed to be side channel attack safe) by the OS/VMM 110 and more extensive monitoring is applied to the activities performed by the process such as, for example, trace-based profiling, dynamic binary-instrumentation based checks, etc. - Returning to block 370, if the example
anomaly detection controller 120 determines that no anomaly is detected, the exampleanomaly detection controller 120 determines whether any re-training is to occur. (Block 395). In some examples, such re-training may occur in parallel with ongoing monitoring. That is, training may occur in an online fashion. In some examples, regularization is imposed to penalize false positives through, for example, a feedback loop. For example, as theanomaly detection controller 120 produces anomaly predictions, subsequent training can be performed using information identifying whether the detected anomaly was truly an anomaly. For example, after a threshold number of false positives are detected (e.g., block 395 returns a result of YES), further training may be performed (e.g., control may return to block 320 for further training utilizing additional information concerning the false positives). In effect, such further training serves to reduce the number of false positives. In addition, false negatives may also be reduced. If no retraining is to be performed (e.g., block 395 returns a result of NO), control proceeds to block 310 c, where further monitoring is performed. - While in the illustrated example of
FIG. 3 , a single threshold is used to determine whether an anomaly has been detected, in some examples, multiple thresholds may be used. For example, if the p-value is less than or equal to a first threshold (e.g., indicating an anomaly), a responsive action may be performed; if the p-value is greater than the first threshold and less than or equal to a second threshold (e.g., indicating a potential anomaly), further analysis may be performed (and a responsive action may be performed if the further analysis identifies that an anomaly has actually occurred); and finally, if the p-value is greater than the second threshold (e.g., indicating no anomaly), no action is taken. -
FIG. 4 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to gather time-series Hardware Performance Counter (HPC) data. In examples disclosed herein, the collection of the time-series HPC is considered a D-dimensional tensor (a tensor as used herein is defined to be a multi-dimensional array), where D is the number of HPCs to be collected. Each HPC produces a data matrix of size Mn×d, where n is the number of time-series samples, and d is the number of values collected during the time period. Take, for example, a cache miss value. If ten benign workloads were executed independently for five times, and a spectre-kernel-read attack were executed for twenty independent times, then there would be seventy time-series samples (10×5+1×20=70) to be collected. Further, suppose that each cache-miss value were sampled at 500 millisecond frequency for ten minutes (e.g., twelve hundred samples), the resultant matrix would include eighty four thousand cache miss data values. In operation it is expected that many other types of HPC values may additionally be collected. In some examples, such HPC values are sampled at different rates. If the time-series are sampled at different frequencies, example approaches disclosed herein impute missing values using for example, average values, median values, etc. In some examples, if the time-series data is of different lengths, padding can be used to achieve equal time length. - The example process of
FIG. 4 begins when the exampleHPC data organizer 126 identifies types of HPC data to be collected, and a length of a time period for which such data is to be collected. (Block 405). In some examples, theHPC data organizer 126 identifies a frequency at which such data is to be collected. The exampleHPC data organizer 126 then collects the HPC data for each HPC type at the corresponding rate. (Block 410). In the illustrated example ofFIG. 4 , the exampleHPC data organizer 126 collects HPC data for different types of HPC data in parallel (represented byblocks 411 and 412). However, in some examples, the collection of the HPC data for the other types of HPC data may be performed serially. To collect the HPC data, the exampleHPC data organizer 126 requests the HPC value from the processor using a type index (e.g., an index identifying the type of data to be retrieved) via theHPC interface 125. (Block 415). The exampleHPC data organizer 126 adds a timestamp to the retrieved HPC data and stores the retrieved HPC data (and timestamp) in the exampleHPC data organizer 126. (Block 420). In some examples, the timestamp value may be omitted. - The example
HPC data organizer 126 then waits an amount of time according to the rate at which the HPC data is to be collected. (Block 425). The exampleHPC data organizer 126 determines whether collection of the HPC data is complete. (Block 430). The exampleHPC data organizer 126 may determine that collection of the HPC data is complete when the length of time to collect HPC data has elapsed (e.g., from the execution of the first iteration of block 415). In some examples, the data collection is considered complete when a threshold number of samples (e.g., a number of samples based on the length of time to collect HPC data and the sampling frequency) has been reached. If data collection is not complete (e.g., block 430 returns a result of NO), control returns to block 415, where the process ofblocks 415 through 430 is repeated untilblock 430 determines that data collection is complete. - Upon completion of the collection of the HPC data for each of the HPC data types (e.g., upon completion of
blocks HPC data organizer 126 analyzes the returned data to determine whether any values are missing. (Block 450). Data may be missing when, for example, values for a first data type are collected at a first frequency while values for a second data type are collected at a second frequency different from the first frequency. Data may be considered missing when, for example, a value having a first timestamp appears in connection with a first data type, but no value having the first timestamp (or a timestamp within a threshold amount of time from the first timestamp) is present within a second data type. If any data points are missing, the exampleHPC data organizer 126 imputes missing values to fill in those data points missing from the HPC data. (Block 460). In examples disclosed herein, the exampleHPC data organizer 126 imputes the missing values using for example, average values, median values, etc. In some examples, if the time-series data is of different lengths, padding can be used to achieve equal time length. Upon completion of the missing value imputation (block 460), or upon determination that there are no missing values in the HPC data (e.g., block 450 returning a result of NO), the example process ofFIG. 4 terminates. Theexample process 310 ofFIG. 4 may be re-executed in response to, for example, a request to collect further HPC data. -
FIG. 5 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to train a machine learning model on time-series HPC data. Theexample process 320 ofFIG. 5 begins when the exampleanomaly detection controller 120 identifies a number of forward-looking values to be predicted by the trained machine learning model. (Block 510). - The example machine
learning model trainer 155, in connection with the example machinelearning model processor 145, trains one or more models based on the benign training set to produce the identified number of forward-looking values. (Block 520). In examples disclosed herein, the machine learning model is implemented using stacked gated recurrent units (GRUs). Stacked GRUs capture the structure of time-series data (e.g., time-series HPC data). Given a time-series X={x1, x2, . . . , xn}, where each xi is a value of the HPC at a specific sampling time, the stacked GRU is trained to predict the next l forward-looking values of HPCs. - While stacked GRUs are used in the examples disclosed herein, any other type of machine learning model may additionally or alternatively be used such as, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) neural network, etc. During training, the example machine
learning model trainer 155 updates the model(s) stored in the model datastore 150 to reduce an amount of error generated by the example machinelearning model processor 145 when using input HPC data to attempt to predict the number of forward-looking values. In examples disclosed herein, training is performed using Stochastic Gradient Descent. However, any other approach to training a machine learning model may additionally or alternatively be used. - The example machine
learning model processor 145 tests the machine learning model using the first benign validation set (e.g., the first benign validation set created at block 315). (Block 530). To perform the testing, the final l values are omitted form the first benign validation set and are used to determine whether the machinelearning model processor 145 properly predicted the final l values. The example machinelearning model trainer 155 calculates an accuracy between the predicted l values and the actual l values (e.g., the values omitted from the first benign validation set). The example machinelearning model trainer 155 compares the calculated accuracy to an accuracy threshold. (Block 540). If the threshold accuracy is not satisfied (e.g., the calculated accuracy does not meet the accuracy threshold, block 540 returns a result of NO), control returns to block 520 where further training is performed. If the threshold accuracy is satisfied (e.g., the calculated accuracy is greater than or equal to the accuracy threshold, block 540 returns a result of YES), the example machinelearning model processor 145, using the model trained atblock 520, processes a first portion of the second benign validation set data to predict l next values appearing in a second portion of the second benign validation set. (Block 550). The exampleerror vector generator 160 generates an error vector et. (Block 560). The error vector et represents the difference between the predicted time-series HPC data and the second portion of the captured time-series HPC data. In examples disclosed herein, the error vector et is calculated using the following equation: -
e t=(e 11 , . . . ,e 1l)=|predicted(x)−actual(x)| Equation 3 - The example
error vector analyzer 165 then creates an error model representing the error vector et. (Block 570). In examples disclosed herein, the error vector is modeled as a multivariate Gaussian distribution parameterized by N(μ,Σ). In examples disclosed herein, the error model parameters are determined using a multivariate Gaussian distribution via maximum likelihood estimation (MLE). However, any other approach to selecting the error model parameters may additionally or alternatively be used. The parameter μ represents a d-dimensional mean, and the parameter Σ represents a covariance matrix. Such parameters can later be used to determine a probability of observing a particular error vector (e.g., during the testing described below in connection withFIG. 7 ). - The
example process 320 ofFIG. 5 then terminates. At this point, monitoring and detection described in connection with theinference phase 350 ofFIG. 3 can be performed (e.g., without having determined the anomaly detection threshold in connection withblock 330 ofFIG. 3 ). Such processing would produce a probability of observing an error, which is compared against an error threshold. However, at this point the anomaly detection threshold has not yet been determined. As noted above in connection withblock 330 ofFIG. 3 , theexample threshold determiner 170 determines an anomaly detection threshold that can be used in connection with returned probability values to determine whether an anomaly has been detected. -
FIG. 6 is a flowchart representative of example machine readable instructions which may be executed to implement the example sidechannel anomaly detector 102 ofFIG. 1 to determine an anomaly detection threshold. In examples disclosed herein, the processing ofFIG. 6 utilizes the attack validation set, the benign test set, and the attack test set determined in connection withblock 315 ofFIG. 3 . Theexample process 330 ofFIG. 6 begins when the example machinelearning model processor 145 processes a first portion of the attack validation set using the trained machine learning model (e.g., the model trained at block 520) to generate a forward-looking prediction. (Block 610). Theerror vector generator 160 compares the forward-looking prediction to a second portion of the attack validation set to generate an attack validation error vector. (Block 615). Using the attack validation error vector and the error model parameters N(μ,Σ) determined in connection withblock 570 ofFIG. 5 , the exampleerror vector analyzer 165 calculates a first probability of detecting an anomaly (e.g., an attack). (Block 620). - The example machine
learning model processor 145 processes a first portion of the benign test set using the trained machine learning model (e.g., the model trained at block 520) to generate a forward-looking prediction. (Block 630). Theerror vector generator 160 then compares the forward-looking prediction to a second portion of the benign test set to generate a benign error vector. (Block 635). Using the benign error vector and the error model parameters N(μ,Σ) determined in connection withblock 570 ofFIG. 5 , the exampleerror vector analyzer 165 calculates a second probability of detecting an anomaly (e.g., an attack). (Block 640). In examples disclosed herein, the first probability (based on the attack validation set) is expected to be less than the second probability (based on the benign test set). - The
example threshold determiner 170 selects a threshold τ based on the first probability and the second probability to reduce false positives and false negatives. (Block 660). In examples disclosed herein, the threshold is selected such that it is intermediate the first probability and the second probability (e.g., the mean of the first probability and the second probability). However, any other approach for selecting the threshold may additionally or alternatively be used. In the illustrated example ofFIG. 6 , theprocess 330 continues to determine a false positive rate (block 670 through 690). However, in some examples, theprocess 330 ofFIG. 6 terminates after selection of the threshold τ. - To determine the false positive rate, the example machine
learning model processor 145 processes a first portion of the attack test set using the trained machine learning model (e.g., the model trained at block 520) to generate a forward-looking prediction. (Block 670). The exampleerror vector generator 160 then compares the forward-looking prediction to a second portion of the attack test set to generate an attack test error vector. (Block 675). Using the attack test error vector and the error model parameters N(μ,Σ) determined in connection withblock 570 ofFIG. 5 , the exampleerror vector analyzer 165 calculates a third probability of detecting an anomaly (e.g., an attack). (Block 680). The exampleanomaly detection controller 120 compares the third probability to the threshold τ to determine the false positive rate. (Block 690). In some examples, a new value for the threshold τ may be adjusted if, for example, the false positive rate is greater than an acceptable rate of false positives. In some other examples, the false positive rate is reported to a user and/or administrator of the computer system 100. -
FIG. 7 is a flowchart representative of example machine readable instructions which may be executed to implement the example side channel anomaly detector ofFIG. 1 to perform anomaly detection using the machine learning model and anomaly detection threshold against time-series HPC data. Theexample process 360 ofFIG. 7 is performed using captured HPC data gathered in connection withblock 310 c ofFIG. 3 . Using the captured HPC data and the machine learning model trained in connection withblock 320 ofFIG. 3 , the example machinelearning model processor 145 processes a first portion of the HPC data to predict l next values appearing in the second portion of the HPC data. (Block 710). The exampleerror vector generator 160 generates an error vector et. (Block 720). The error vector et represents the difference between the predicted time-series HPC data and the second portion of the captured time-series HPC data. Using the error model parameters N(μ,Σ) determined in connection withblock 570 ofFIG. 5 , the exampleerror vector analyzer 165 calculates a probability of observing the error vector. (Block 730). The exampleerror vector analyzer 165 returns the probability of having detected an anomaly to the anomaly detection controller 120 (Block 740,) enabling theanomaly detection controller 120 atblock 370 ofFIG. 3 to determine whether an anomaly has been detected. -
FIG. 8 is a block diagram of anexample processor platform 800 structured to execute the instructions ofFIGS. 3, 4, 5, 6 , and/or 7 to implement the example sidechannel anomaly detector 102 ofFIG. 1 . Theprocessor platform 800 can be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a personal digital assistant (PDA), an Internet appliance, a DVD player, a CD player, a digital video recorder, a Blu-ray player, a gaming console, a personal video recorder, a set top box, a headset or other wearable device, or any other type of computing device. - The
processor platform 800 of the illustrated example includes aprocessor 812. Theprocessor 812 of the illustrated example is hardware. For example, theprocessor 812 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the exampleanomaly detection controller 120, theexample HPC interface 125, the exampleHPC data organizer 126, the example machinelearning model processor 145, the example machinelearning model trainer 155, the exampleerror vector generator 160, the exampleerror vector analyzer 165, theexample threshold determiner 170. - The
processor 812 of the illustrated example includes a local memory 813 (e.g., a cache). Theprocessor 812 of the illustrated example is in communication with a main memory including avolatile memory 814 and anon-volatile memory 816 via abus 818. Thevolatile memory 814 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. Thenon-volatile memory 816 may be implemented by flash memory and/or any other desired type of memory device. Access to themain memory - The
processor platform 800 of the illustrated example also includes aninterface circuit 820. Theinterface circuit 820 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface. - In the illustrated example, one or
more input devices 822 are connected to theinterface circuit 820. The input device(s) 822 permit(s) a user to enter data and/or commands into theprocessor 812. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system. - One or
more output devices 824 are also connected to theinterface circuit 820 of the illustrated example. Theoutput devices 824 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. Theinterface circuit 820 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor. - The
interface circuit 820 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via anetwork 826. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc. - The
processor platform 800 of the illustrated example also includes one or moremass storage devices 828 for storing software and/or data. Examples of suchmass storage devices 828 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives. - The machine
executable instructions 832 ofFIGS. 3, 4, 5, 6 , and/or 7 may be stored in themass storage device 828, in thevolatile memory 814, in thenon-volatile memory 816, and/or on a removable non-transitory computer readable storage medium such as a CD or DVD. In the illustrated example ofFIG. 8 , the examplemass storage device 828 implements the example HPC data datastore 127 and the example machinelearning model datastore 150. - From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that enable detection of side channel attacks. Some such methods, apparatus and articles of manufacture disclosed herein improve the efficiency of using a computing device by enabling detection of an ongoing side channel attack before a data leak can occur. In this manner, data leaks can be prevented without the need for patching existing systems, applications, and/or hardware, thereby achieving one or more improvement(s) in the functioning of a computer.
- Example 1 includes an apparatus for detecting side channel anomalies, the apparatus comprising a hardware performance counter data organizer to collect a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, a machine learning model processor to apply a machine learning model to predict a third value corresponding to the second time, an error vector generator to generate an error vector representing a difference between the second value and the third value, an error vector analyzer to determine a probability of the error vector indicating an anomaly, and an anomaly detection orchestrator to, in response to the probability satisfying a threshold, cause the performance of a responsive action to mitigate a side channel anomaly.
- Example 2 includes the apparatus of example 1, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 3 includes the apparatus of example 1, further including a machine learning model trainer to train the machine learning model based on benign hardware performance counter data.
- Example 4 includes the apparatus of example 3, wherein the machine learning model trainer does not train the machine learning model based on attack hardware performance counter data.
- Example 5 includes the apparatus of example 1, further including a threshold determiner to determine the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 6 includes the apparatus of example 1, wherein the hardware performance counter data organizer is further to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 7 includes the apparatus of example 1, wherein the machine learning model is a first machine learning model, and the responsive action includes utilization of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 8 includes at least one non-transitory computer-readable medium comprising instructions that, when executed, cause at least one processor to at least collect a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, apply a machine learning model to predict a third value corresponding to the second time, generate an error vector representing a difference between the second value and the third value, determine a probability of the error vector indicating an anomaly, and cause, in response to determining that the probability satisfying a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 9 includes the at least one non-transitory computer-readable medium of example 8, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 10 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to train the machine learning model based on benign hardware performance counter data.
- Example 11 includes the at least one non-transitory computer-readable medium of example 10, wherein the instructions, when executed, further cause the at least one processor to train the machine learning model without using attack hardware performance counter data.
- Example 12 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to determine the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 13 includes the at least one non-transitory computer-readable medium of example 8, wherein the instructions, when executed, further cause the at least one processor to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 14 includes the at least one non-transitory computer-readable medium of example 8, wherein the machine learning model is a first machine learning model, and the responsive action includes utilization of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 15 includes an apparatus for detecting side channel anomalies, the apparatus comprising means for collecting a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, means for predicting a third value corresponding to the second time using a machine learning model, means for generating an error vector representing a difference between the second value and the third value, means for determining a probability of the error vector indicating an anomaly, and means for causing, in response to determining that the probability satisfies a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 16 includes the apparatus of example 15, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 17 includes the apparatus of example 15, further including means for training the machine learning model based on benign hardware performance counter data.
- Example 18 includes the apparatus of example 17, wherein the means for training is not to train the machine learning model based on attack hardware performance counter data.
- Example 19 includes the apparatus of example 15, further including means for selecting the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 20 includes the apparatus of example 15, wherein the means for collecting is further to impute a fourth value having a timestamp intermediate the first time and the second time.
- Example 21 includes the apparatus of example 15, wherein the machine learning model is a first machine learning model, and means for causing is further to cause the use of a second machine learning model implemented using a long short-term memory recurrent neural network.
- Example 22 includes a method for detecting side channel anomalies, the method comprising collecting a first value of a hardware performance counter at a first time and a second value of the hardware performance counter at a second time, applying, by executing an instruction with a processor, a machine learning model to predict a third value corresponding to the second time, generating, by executing an instruction with the processor, an error vector representing a difference between the second value and the third value, determining, by executing an instruction with the processor, a probability of the error vector indicating an anomaly, and causing, in response to determining that the probability satisfies a threshold, performance of a responsive action to mitigate a side channel anomaly.
- Example 23 includes the method of example 22, wherein the machine learning model is implemented using a stacked gated recurrent unit architecture.
- Example 24 includes the method of example 22, further including training the machine learning model based on benign hardware performance counter data.
- Example 25 includes the method of example 24, wherein the training of the machine learning model does not utilize attack hardware performance counter data.
- Example 26 includes the method of example 22, further including determining the threshold based on a first probability associated with benign hardware performance data and a second probability associated with attack hardware performance data.
- Example 27 includes the method of example 22, wherein further including imputing a fourth value having a timestamp intermediate the first time and the second time.
- Example 28 includes the method of example 22, wherein the machine learning model is a first machine learning model, and further including utilizing a long short-term memory recurrent neural network to determine whether an anomaly is detected.
- Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/234,144 US11188643B2 (en) | 2018-12-27 | 2018-12-27 | Methods and apparatus for detecting a side channel attack using hardware performance counters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/234,144 US11188643B2 (en) | 2018-12-27 | 2018-12-27 | Methods and apparatus for detecting a side channel attack using hardware performance counters |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190130101A1 true US20190130101A1 (en) | 2019-05-02 |
US11188643B2 US11188643B2 (en) | 2021-11-30 |
Family
ID=66243991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/234,144 Active 2039-09-13 US11188643B2 (en) | 2018-12-27 | 2018-12-27 | Methods and apparatus for detecting a side channel attack using hardware performance counters |
Country Status (1)
Country | Link |
---|---|
US (1) | US11188643B2 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190004922A1 (en) * | 2017-06-29 | 2019-01-03 | Intel Corporation | Technologies for monitoring health of a process on a compute device |
CN110245094A (en) * | 2019-06-18 | 2019-09-17 | 华中科技大学 | A kind of block grade cache prefetching optimization method and system based on deep learning |
US20210027157A1 (en) * | 2019-07-24 | 2021-01-28 | Nec Laboratories America, Inc. | Unsupervised concept discovery and cross-modal retrieval in time series and text comments based on canonical correlation analysis |
US20210058415A1 (en) * | 2019-08-23 | 2021-02-25 | Mcafee, Llc | Methods and apparatus for detecting anomalous activity of an iot device |
CN112416378A (en) * | 2020-12-02 | 2021-02-26 | 北京航智信息技术有限公司 | Cloud architecture system for silent installation of student mobile terminal application |
CN112464248A (en) * | 2020-12-04 | 2021-03-09 | 中国科学院信息工程研究所 | Processor exploit threat detection method and device |
FR3103039A1 (en) * | 2019-11-07 | 2021-05-14 | Electricite De France | Detecting attacks using hardware performance counters |
WO2021122920A1 (en) | 2019-12-19 | 2021-06-24 | Thales | Method for monitoring an electronic system using low-level performance counters and comprising at least one set of uncontrolled software applications that are executed on a processor, and a monitoring device |
US11057425B2 (en) * | 2019-11-25 | 2021-07-06 | Korea Internet & Security Agency | Apparatuses for optimizing rule to improve detection accuracy for exploit attack and methods thereof |
CN113158181A (en) * | 2021-04-15 | 2021-07-23 | 上海交通大学 | Method for carrying out end-to-end attack on original side channel data by using neural network |
US20210319098A1 (en) * | 2018-12-31 | 2021-10-14 | Intel Corporation | Securing systems employing artificial intelligence |
US20220343031A1 (en) * | 2021-04-23 | 2022-10-27 | Korea University Research And Business Foundation | Apparatus and method of detecting cache side-channel attack |
US20220391754A1 (en) * | 2021-06-03 | 2022-12-08 | Oracle International Corporation | Monte carlo simulation framework that produces anomaly-free training data to support ml-based prognostic surveillance |
US20230026135A1 (en) * | 2021-07-20 | 2023-01-26 | Bank Of America Corporation | Hybrid Machine Learning and Knowledge Graph Approach for Estimating and Mitigating the Spread of Malicious Software |
US11567878B2 (en) * | 2020-12-23 | 2023-01-31 | Intel Corporation | Security aware prefetch mechanism |
US20230092190A1 (en) * | 2021-09-22 | 2023-03-23 | The Regents Of The University Of California | Two-layer side-channel attacks detection method and devices |
US11722382B2 (en) | 2012-09-28 | 2023-08-08 | Intel Corporation | Managing data center resources to achieve a quality of service |
EP4235469A1 (en) | 2022-02-25 | 2023-08-30 | Commissariat À L'Énergie Atomique Et Aux Énergies Alternatives | A system for detecting malwares in a resources constrained device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020208639A2 (en) * | 2019-04-11 | 2020-10-15 | Saferide Technologies Ltd | A system and method for detection of anomalous controller area network (can) messages |
US11943245B2 (en) * | 2021-07-05 | 2024-03-26 | Allot Ltd. | System, device, and method of protecting electronic devices against fraudulent and malicious activities |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9774614B2 (en) * | 2014-06-24 | 2017-09-26 | Qualcomm Incorporated | Methods and systems for side channel analysis detection and protection |
US9792435B2 (en) * | 2014-12-30 | 2017-10-17 | Battelle Memorial Institute | Anomaly detection for vehicular networks for intrusion and malfunction detection |
WO2016115280A1 (en) * | 2015-01-14 | 2016-07-21 | Virta Laboratories, Inc. | Anomaly and malware detection using side channel analysis |
US9910984B2 (en) * | 2015-02-27 | 2018-03-06 | Qualcomm Incorporated | Methods and systems for on-device high-granularity classification of device behaviors using multi-label models |
US9842209B2 (en) * | 2015-05-08 | 2017-12-12 | Mcafee, Llc | Hardened event counters for anomaly detection |
US9904587B1 (en) * | 2015-12-18 | 2018-02-27 | Amazon Technologies, Inc. | Detecting anomalous behavior in an electronic environment using hardware-based information |
US20180300621A1 (en) * | 2017-04-13 | 2018-10-18 | International Business Machines Corporation | Learning dependencies of performance metrics using recurrent neural networks |
JP7010641B2 (en) * | 2017-09-27 | 2022-01-26 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Abnormality diagnosis method and abnormality diagnosis device |
US20190138719A1 (en) | 2018-12-27 | 2019-05-09 | Salmin Sultana | Methods and apparatus for detecting a side channel attack using a cache state |
-
2018
- 2018-12-27 US US16/234,144 patent/US11188643B2/en active Active
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11722382B2 (en) | 2012-09-28 | 2023-08-08 | Intel Corporation | Managing data center resources to achieve a quality of service |
US20190004922A1 (en) * | 2017-06-29 | 2019-01-03 | Intel Corporation | Technologies for monitoring health of a process on a compute device |
US10592383B2 (en) * | 2017-06-29 | 2020-03-17 | Intel Corporation | Technologies for monitoring health of a process on a compute device |
US20210319098A1 (en) * | 2018-12-31 | 2021-10-14 | Intel Corporation | Securing systems employing artificial intelligence |
CN110245094A (en) * | 2019-06-18 | 2019-09-17 | 华中科技大学 | A kind of block grade cache prefetching optimization method and system based on deep learning |
US20210027157A1 (en) * | 2019-07-24 | 2021-01-28 | Nec Laboratories America, Inc. | Unsupervised concept discovery and cross-modal retrieval in time series and text comments based on canonical correlation analysis |
US20210058415A1 (en) * | 2019-08-23 | 2021-02-25 | Mcafee, Llc | Methods and apparatus for detecting anomalous activity of an iot device |
US11616795B2 (en) * | 2019-08-23 | 2023-03-28 | Mcafee, Llc | Methods and apparatus for detecting anomalous activity of an IoT device |
FR3103039A1 (en) * | 2019-11-07 | 2021-05-14 | Electricite De France | Detecting attacks using hardware performance counters |
WO2021089357A1 (en) * | 2019-11-07 | 2021-05-14 | Electricite De France | Method for detecting attacks using hardware performance counters |
US11057425B2 (en) * | 2019-11-25 | 2021-07-06 | Korea Internet & Security Agency | Apparatuses for optimizing rule to improve detection accuracy for exploit attack and methods thereof |
WO2021122920A1 (en) | 2019-12-19 | 2021-06-24 | Thales | Method for monitoring an electronic system using low-level performance counters and comprising at least one set of uncontrolled software applications that are executed on a processor, and a monitoring device |
FR3105483A1 (en) | 2019-12-19 | 2021-06-25 | Thales | PROCESS FOR CONTROL OF AN ELECTRONIC SYSTEM BY LOW-LEVEL PERFORMANCE METERS AND INCLUDING AT LEAST ONE SET OF UNCONTROLLED SOFTWARE APPLICATION (S) EXECUTING ON A PROCESSOR AND A CONTROL DEVICE |
CN112416378A (en) * | 2020-12-02 | 2021-02-26 | 北京航智信息技术有限公司 | Cloud architecture system for silent installation of student mobile terminal application |
CN112464248A (en) * | 2020-12-04 | 2021-03-09 | 中国科学院信息工程研究所 | Processor exploit threat detection method and device |
US11567878B2 (en) * | 2020-12-23 | 2023-01-31 | Intel Corporation | Security aware prefetch mechanism |
CN113158181A (en) * | 2021-04-15 | 2021-07-23 | 上海交通大学 | Method for carrying out end-to-end attack on original side channel data by using neural network |
US20220343031A1 (en) * | 2021-04-23 | 2022-10-27 | Korea University Research And Business Foundation | Apparatus and method of detecting cache side-channel attack |
US20220391754A1 (en) * | 2021-06-03 | 2022-12-08 | Oracle International Corporation | Monte carlo simulation framework that produces anomaly-free training data to support ml-based prognostic surveillance |
US20230026135A1 (en) * | 2021-07-20 | 2023-01-26 | Bank Of America Corporation | Hybrid Machine Learning and Knowledge Graph Approach for Estimating and Mitigating the Spread of Malicious Software |
US11914709B2 (en) * | 2021-07-20 | 2024-02-27 | Bank Of America Corporation | Hybrid machine learning and knowledge graph approach for estimating and mitigating the spread of malicious software |
US20230092190A1 (en) * | 2021-09-22 | 2023-03-23 | The Regents Of The University Of California | Two-layer side-channel attacks detection method and devices |
EP4235469A1 (en) | 2022-02-25 | 2023-08-30 | Commissariat À L'Énergie Atomique Et Aux Énergies Alternatives | A system for detecting malwares in a resources constrained device |
Also Published As
Publication number | Publication date |
---|---|
US11188643B2 (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11188643B2 (en) | Methods and apparatus for detecting a side channel attack using hardware performance counters | |
Wei et al. | Leaky dnn: Stealing deep-learning model secret with gpu context-switching side-channel | |
US20190138719A1 (en) | Methods and apparatus for detecting a side channel attack using a cache state | |
Abdelsalam et al. | Malware detection in cloud infrastructures using convolutional neural networks | |
Alam et al. | Performance counters to rescue: A machine learning based safeguard against micro-architectural side-channel-attacks | |
Gulmezoglu et al. | Fortuneteller: Predicting microarchitectural attacks via unsupervised deep learning | |
US11966473B2 (en) | Methods and apparatus to detect side-channel attacks | |
Brown et al. | Automated machine learning for deep learning based malware detection | |
EP3812929A1 (en) | Utilizing a neural network model to determine risk associated with an application programming interface of a web application | |
CN112396173A (en) | Method, system, article of manufacture, and apparatus for mapping workloads | |
JP2017508210A (en) | Application execution control using ensemble machine learning for identification | |
WO2017014896A1 (en) | Memory hierarchy monitoring systems and methods | |
JP2020523649A (en) | Method, apparatus, and electronic device for identifying risk regarding transaction to be processed | |
US20190362269A1 (en) | Methods and apparatus to self-generate a multiple-output ensemble model defense against adversarial attacks | |
US12032711B2 (en) | System and method for controlling confidential information | |
Wang et al. | Enabling micro ai for securing edge devices at hardware level | |
Mirbagher-Ajorpaz et al. | Perspectron: Detecting invariant footprints of microarchitectural attacks with perceptron | |
Belhadi et al. | Reinforcement learning multi-agent system for faults diagnosis of mircoservices in industrial settings | |
Zheng et al. | CBA-detector: A self-feedback detector against cache-based attacks | |
He et al. | Image-based zero-day malware detection in iomt devices: A hybrid ai-enabled method | |
Gao et al. | Deeptheft: Stealing dnn model architectures through power side channel | |
Gogineni et al. | Foreseer: Efficiently forecasting malware event series with long short-term memory | |
Kasarapu et al. | Resource-and workload-aware malware detection through distributed computing in iot networks | |
KR102124443B1 (en) | A Real-Time Detection System And Method On Flush Reload Attack Using PCM | |
US20200302017A1 (en) | Chat analysis using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BASAK, ABHISHEK;GOTTSCHLICH, JUSTIN;CHEN, LI;AND OTHERS;SIGNING DATES FROM 20190102 TO 20190107;REEL/FRAME:048136/0651 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |