WO2015120243A1 - Application execution control utilizing ensemble machine learning for discernment - Google Patents

Application execution control utilizing ensemble machine learning for discernment Download PDF

Info

Publication number
WO2015120243A1
WO2015120243A1 PCT/US2015/014769 US2015014769W WO2015120243A1 WO 2015120243 A1 WO2015120243 A1 WO 2015120243A1 US 2015014769 W US2015014769 W US 2015014769W WO 2015120243 A1 WO2015120243 A1 WO 2015120243A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
execute
feature
executing
preventing
Prior art date
Application number
PCT/US2015/014769
Other languages
French (fr)
Other versions
WO2015120243A8 (en
Inventor
Ryan PERMEH
Derek A. SOEDER
Glenn Chisholm
Braden RUSSELL
Gary Golomb
Matthew Wolff
Carl A. KUKKONEN
Original Assignee
Cylance Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cylance Inc. filed Critical Cylance Inc.
Priority to JP2016550628A priority Critical patent/JP6662782B2/en
Priority to EP15708931.9A priority patent/EP3103070B1/en
Priority to CA2938580A priority patent/CA2938580C/en
Priority to AU2015213797A priority patent/AU2015213797B2/en
Publication of WO2015120243A1 publication Critical patent/WO2015120243A1/en
Publication of WO2015120243A8 publication Critical patent/WO2015120243A8/en
Priority to HK17105692.1A priority patent/HK1232326A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2115Third party
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2125Just-in-time application of countermeasures, e.g., on-the-fly decryption, just-in-time obfuscation or de-obfuscation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Definitions

  • the subject matter described herein relates to techniques for selectively allowing applications to execute that utilize ensemble machine learning models.
  • the current subject matter is directed to enabling computers to efficiently determine if they should run a program based on an immediate (i.e., real-time, etc.) analysis of the program.
  • This approach leverages highly trained ensemble machine learning algorithms to create a real-time discernment on a combination of static and dynamic features collected from the program, the computer's current environment, and external factors.
  • data is received (i.e., received from a remote data source, loaded into memory, accessed from local or connected storage, etc.) that includes at least one feature associated with a program. Thereafter, it is determined, based on the received data and using at least one machine learning model, whether to allow the program to execute or continue to execute (if it is already executing). The program executes or continues to execute if it is determined that the program is allowed to execute. Otherwise, the program is prevented from executing or continuing to execute if it is determined that the program is not allowed to execute.
  • One or more of the utilized machine learning models can be trained using feature data derived from a plurality of different programs.
  • one or more of the machine learning models can be trained using supervised learning.
  • one or more of the machine learning models can be trained using unsupervised learning.
  • the at least one feature of the program can be collected by a feature collector.
  • the feature collector can collect features at a pre-specified point in time (e.g., at commencement of execution of the program or subsequent to execution of the program).
  • the at least one feature collected by the feature collector can include a combination of point in time measurements and ongoing measurements during execution of the program.
  • the at least one feature collected by the feature collector can include one or more operational features that are passively collected prior to execution of the program, and such operational features can be stored in a cache.
  • the at least one feature can include at least one operational feature that characterizes an operational environment of a system to execute the program.
  • the at least one operational feature can include one or more of: program reputation, contextual information, system state, system operating statistics, time-series data, existing programs, operating system details, program run status, and configuration variables.
  • the at least one features can include at least one static feature that characterizes the program.
  • the at least one static feature can be, for example, measurements of the program, structural elements of the program, or contents of the program.
  • the at least one feature can include at least one dynamic feature that characterizes execution of the program.
  • the at least one dynamic feature can include, for example, interactions with an operating system, subroutine executions, process state, program or system execution statistics, or an order of an occurrence of events associated with the program.
  • the at least one feature can include at least one external feature from a source external to a system to execute the program.
  • the external feature or features can be obtained, for example, from at least one remote database or other data source.
  • At least one feature can take a format selected from a group consisting of: binary, continuous, and categorical.
  • the at least one machine learning model can include an ensemble of machine learning models.
  • the ensemble of machine learning models can include one or more models such as neural network models, support vector machine models, scorecard models, logistic regression models, Bayesian models, decision tree models or other applicable classification models.
  • An output of two or more machine learning models can be combined and used to determine whether or not to allow the program to execute or continue to execute.
  • the determination can include generating a score characterizing a level of safety for executing the program.
  • the generated score can be used to determine whether or not to allow the program to execute.
  • the determination can also include generating a confidence level for the generated score that is used to determine whether or not to allow the program to execute.
  • Preventing the program from executing or continuing to execute can include at least one of many actions. These actions can include one or more of: blocking at least a portion of the program from loading into memory, determining that a dynamic library associated with the program is unsafe, blocking the dynamic library associated with the program from loading into memory, unloading a previously loaded module (portion of code, etc.) associated with the program, disabling the program while it is running, implementing constraints on the program prior to it being run or before it continues to run, quarantining at least a portion of the program, or deleting at least a portion of the program.
  • preventing the program from executing or continuing to execute can include one or more of preventing the program from executing individual operations, by modifying an access level of the program, selectively blocking attempted operations, or preventing an attempted operation and instead causing an alternative operation.
  • Non-transitory computer program products i.e., physically embodied computer program products
  • store instructions which when executed by one or more data processors of one or more computing systems, cause at least one data processor to perform operations herein.
  • computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein.
  • methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems.
  • Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
  • a network e.g. the Internet, a wireless wide area network, a local
  • FIG. 1 is a system diagram illustrating elements used to provide application execution control discernment
  • FIG. 2 is a diagram characterizing ensemble discernment
  • FIG. 3 is a process flow diagram illustrating a method for discernment using at least one machine learning model.
  • discernment refers to the characterization of whether or not to allow a particular application / application module to execute on a particular computing system or systems.
  • FIG. 1 can include a feature collection system 110 (sometimes referred to as a feature collector), a discernment engine 120, and an enforcement system 130.
  • the feature collection system 110 sometimes referred to as a feature collector
  • the feature collection system 110 can include a feature collection system 110 (sometimes referred to as a feature collector), a discernment engine 120, and an enforcement system 130.
  • a “feature” as used herein can include any salient data / data point that can be used to measure the implied safety of a potentially run program.
  • a “program” as used herein is a piece of executable computer code that a user or system wishes to execute, and may include associated data / metadata.
  • “Discernment” as used herein is the process of deciding whether the program should be executed or not (including whether or not to continue executing a program).
  • “Enforcement” as used herein is a process in which the effects of discernment are made effective in a computer system.
  • the current subject matter can utilize one or more machine learning models that are each a mathematically based understanding of a particular situation and one or more algorithms defined to determine an outcome from a particular input against the model.
  • an ensemble of machine learning models can be used which is a collection of models utilized in a particular way to generally improve accuracy or reduce variance.
  • the current subject matter offers an effective method of application control that differs from traditional approaches in a variety of ways.
  • Traditional approaches utilize either the concept of a "blacklist”, or a set of programs to explicitly disallow, or a "whitelist”, or a set of programs to explicitly allow.
  • the current subject matter foregoes both as primary selection criteria and instead measures various features from the system and uses these against a previously trained machine learning model and/or ensemble of machine learning models.
  • the ensemble of machine learning models can be devised and trained before application control. Due to the predictive nature of various machine learning algorithms, a trained model allows a "fuzzy" match against safe and unsafe programs. By carefully selecting and training the models in the ensemble, the system can act resiliently against change over time, accommodating small and large changes in program behaviors that resemble "safety" or a lack thereof.
  • a machine learning model may be characterized by an algorithm it incorporates, which may include, as an example, neural networks, support vector machines, logistic regressions, scorecard models, Bayesian algorithms, and decision trees.
  • a machine learning model can be trained using supervised learning, in which a training set of input samples labeled with the desired output values conditions the model to correctly classify samples that do not occur in the training set, or it may be trained using unsupervised learning, in which an algorithm identifies hidden structure in unlabeled data. Reinforcement learning represents a third process for training a model.
  • the feature collector 110 can send passive features (operational and dynamic) on an ongoing basis to the discernment engine 120.
  • the discernment engine 120 can request point in time features from the feature collector 110 at a particular decision point, such as execution. These point in time features can include observations about the computer's state extrinsic to the program or related features from an external source.
  • the discernment engine 120 can then decide if the program should execute. If execution is allowed, the program executes; if execution is disallowed, the enforcement system 130 prevents the application from executing.
  • FIG. 2 is a diagram 200 characterizing ensemble discernment in which an original vector 210 can be passed to the discernment engine 120 for scoring 230.
  • the discernment engine 120 can use a model selector 220 to choose one or more models to run (in this example, Models A, B, C).
  • the selection of a model can be be predicated on features provided by the feature collector 110, a user configuration, the current availability or scarcity of computing resources, and/or other state information.
  • Each such model can be comprised of several possible algorithms.
  • the output of the various algorithms and models can be combined (using, for example, a weighting arrangement or model) in a scoring component 230.
  • a final output can be a decision (or in some cases a score) characterizing the results and a confidence level.
  • Feature collection can be a combination of point in time and ongoing measurements, and can include the passive collection of features into a general cache.
  • Features can be used to generate data points for which the discernment engine 120 makes a decision.
  • the discernment engine 120 can utilize the features collected to make a decision based on previously collected data.
  • the enforcement system 130 can implement the technical details of operation regarding the decisions made from the discernment engine 120.
  • a user or other program wishes to execute a program, it will first ask the discernment engine 120 to decide if this is a positive action.
  • the discernment engine 120 can either answer with previous discernments, or create a new discernment using a combination of previously collected features and features collected via a point in time analysis.
  • the enforcement system 130 can implement the logic to allow or disallow execution of the program, and any other elements necessary to implement the discernment decision in an ongoing manner.
  • features can be collected from four primary sources.
  • a first source can comprise operational features that relate to the operational environment of the system. Operational features can include existing programs, details about the operating system, run status of the program, configuration variables associated with the program, and other measures particular to the environment in which the program is intended to run. Some of these features can be ongoing (i.e., they are active features); others can be determined at a particular point in time (i.e., they are passive features).
  • a second source can comprise static features that concern the program that wishes to run. Measurements about the program itself, including structural elements and program contents, can be collected. These features can be calculated by examining the contents of the file and processing through analytic methods.
  • One example of a static feature of a program is the size of such program.
  • Examples of structural elements of a program can include the number of sections it comprises, the proportion of the program described by each section, and the proportion of the program not described by any section.
  • the computed Shannon entropy of each section is an example of a feature derived from processing.
  • a third source can comprise dynamic features that relate to individual program execution. Dynamic features can generally be collected in an ongoing manner. The dynamic features can be associated with a particular program, rather than the system itself. These features can be used to determine potentially hostile activities from a program that was either unable to receive a high confidence discernment prior to execution or otherwise authorized to run under direct management policy.
  • a fourth source can comprise external features that can be generally extracted from sources of information outside of the host computer itself, generally via a remote data source such as a lookup on the network. This lookup can include a query against a cloud database, or a deeper analysis of certain elements on a network based computer. For example, external features can include a determination by a trusted third party as to a program's authenticity, a program's prevalence among a larger population of computers, and/or the reputations of other computers contacted by a program.
  • these features entail knowledge that is impractical to host on an individual computer due to size, complexity, or frequency of updates. Due to the latency of a network lookup, these features can generally be collected in response to a particular request from the discernment engine 120, at a particular point in time.
  • Features can be collected into efficient computer data structures, such as hash tables, binary trees, and vectors, and the features can be passed to the discernment engine 120.
  • Ongoing features can be collected and held for an appropriate amount of time to ensure their ability to usefully affect the discernment process.
  • Point in time features can be collected in an on-demand manner, typically on the event of discernment.
  • Features can be binary, continuous, or categorical in nature.
  • Binary features can only be in one of two states.
  • Continuous features can represent a value along a range, and are generally numeric in nature.
  • Categorical features can represent a value within a discrete set of possible values.
  • First order features are features measured directly from the source. These features can be combined or further analyzed by various methods to generate second order features. Such further analyzing can include making a mathematical analysis of the value of a first order feature, or by applying combinations of first order features to develop a truly unique second order feature.
  • the discernment engine 120 can create a decision on the anticipated safety of an application.
  • the discernment engine 120 can receive input from the feature collector 110 and apply an ensemble of machine learning models to calculate a score that determines if an application is safe to run or not, as well as a confidence in the accuracy of the score.
  • the discernment engine 120 can take features in combination or singly and can, in some cases, use a process known as vectorization to turn individual features into a mathematical vector. This process can involve creating a compact and efficient representation of the input. The vector can be used by the various machine learning algorithms to generate a score.
  • Ensemble models and/or their outputs can be combined using individualized measured error rates in a weighting scheme (such as a scorecard model). Each model that scores can be normalized and adjusted by its measured error rate. This final combination allows for the most accurate understanding from a variety of sources.
  • the enforcement system 130 can be a component that implements methods for disabling execution of a program.
  • the enforcement system 130 can use a variety of tactics to disable execution in a safe and reliable way.
  • the enforcement system 130 can implement one or more of blocking a process or dynamic library from loading into memory, unloading a previously loaded module, disabling a running program, implementing constraints on a program to be run, quarantining hostile applications, and/or deleting hostile applications. It is often desirable for the enforcement system 130 to issue an alert when a module determined to be hostile is accessed and/or when action is attempted against a hostile module.
  • the enforcement system 130 can utilize processes implemented both in the operating system core, and implanted in each process. These processes can allow for high degrees of control from both the core operating system level, as well as deep introspection and control from within the application itself.
  • the enforcement system 130 can utilize tactics for preventing an application from running or restricting its level of access. Such tactics can include moving, renaming, or deleting the program; applying attributes or access controls to the program; forcing the application to run with reduced privileges; forcing the application to run in a "sandbox," where certain actions are redirected to access a virtualized system state; and/or other monitoring and controlling the actions an application may perform.
  • the systems / technique herein can go into effect when an attempt is made to run a program, or a decision is otherwise warranted by user defined behavior, such as intentionally scanning a file to ascertain its safety.
  • the features originating from the operating system and the dynamic feature collection system 110 can continue to stream into the discernment engine 120 in an ongoing manner. These can be generally available for use within the discernment engine 120, and may initiate a discernment action if one is warranted.
  • the system / methods can be activated during the actions of the system or the user when they choose to either start an application or otherwise choose to determine a file's safety.
  • the discernment engine 120 can request additional details from the feature collector.
  • the feature collector 110 can then gather the appropriate details and pass them to the discernment engine 120.
  • the discernment engine 120 can take all collected features, and use a vectorization process to develop a vector as input (see diagram 200 of FIG. 2).
  • the input vector 210 can be associated with one or more models by the model selector 220 of the discernment engine 120. For each model the model selector 220 chooses, the input vector 210 can be applied.
  • Each model can have one or more algorithms associated with it, generating a series of individual scores.
  • the outputs of the individual models can be combined in a scoring component 230, utilizing a weighting scheme (e.g., a scorecard model).
  • the scoring component 230 can generate a final score, comprised of a result (e.g., safe or not) and a confidence in that result.
  • FIG. 3 is a process flow diagram 300 in which, at 310, data is received (i.e., accessed, obtained, etc.) that comprises at least one feature associated with a program. Thereafter, at 320, it can be determined, based on the received data and using at least one machine learning model, whether to allow at least a portion of the program to execute.
  • the at least one machine learning model used in this regard can be trained using, for example, supervised learning and/or unsupervised learning (in some cases there may be a combination of models that use each type of learning).
  • the program can execute if it is determined that at least a portion of the program is allowed to execute. Otherwise, at 330, at least a portion of the program is prevented from executing / continuing to execute if it is determined that the program (or portion thereof) is not allowed to execute.
  • One or more aspects or features of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include
  • a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device (e.g., mouse, touch screen, etc.), and at least one output device.
  • a programmable processor which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device (e.g., mouse, touch screen, etc.), and at least one output device.
  • These computer programs which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional
  • machine-readable medium refers to physically embodied apparatus and/or device, such as for example magnetic disks, optical discs, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable data processor, including a machine-readable medium that receives machine instructions as a machine -readable signal.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable data processor.
  • the machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium.
  • the machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
  • the subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front- end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components.
  • the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN”), a wide area network (“WAN”), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing system may include clients and servers.
  • a client and server are generally remote from each other and typically interact through a
  • client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

Described are techniques to enable computers to efficiently determine if they should run a program based on an immediate (i.e., real-time, etc.) analysis of the program. Such an approach leverages highly trained ensemble machine learning algorithms to create a real-time discernment on a combination of static and dynamic features collected from the program, the computer's current environment, and external factors. Related apparatus, systems, techniques and articles are also described.

Description

Application Execution Control Utilizing Ensemble Machine
Learning For Discernment
RELATED APPLICATION
[0001] This application claims priority to U.S. Pat. App. Ser. No. 61/937,379 filed on February 7, 2014, the contents of which are hereby fully incorporated by reference.
TECHNICAL FIELD
[0002] The subject matter described herein relates to techniques for selectively allowing applications to execute that utilize ensemble machine learning models.
BACKGROUND
[0003] Conventional techniques of application execution control for programs run on computer systems rely on static methods such as databases of signatures to determine if a computer can safely run a particular program. Existing application control systems require frequent updates to these databases, and require significant overhead to manage this process. Additionally, their ability to control execution efficiently and correctly reduces as their databases grow. Such approaches utilize significant resources (e.g., memory, CPU, etc.) and additionally have a high management overhead.
SUMMARY
[0004] The current subject matter is directed to enabling computers to efficiently determine if they should run a program based on an immediate (i.e., real-time, etc.) analysis of the program. This approach leverages highly trained ensemble machine learning algorithms to create a real-time discernment on a combination of static and dynamic features collected from the program, the computer's current environment, and external factors.
[0005] In one aspect, data is received (i.e., received from a remote data source, loaded into memory, accessed from local or connected storage, etc.) that includes at least one feature associated with a program. Thereafter, it is determined, based on the received data and using at least one machine learning model, whether to allow the program to execute or continue to execute (if it is already executing). The program executes or continues to execute if it is determined that the program is allowed to execute. Otherwise, the program is prevented from executing or continuing to execute if it is determined that the program is not allowed to execute.
[0006] One or more of the utilized machine learning models can be trained using feature data derived from a plurality of different programs. In addition or in the alternative, one or more of the machine learning models can be trained using supervised learning. Further in addition or in the alternative, one or more of the machine learning models can be trained using unsupervised learning.
[0007] The at least one feature of the program can be collected by a feature collector. The feature collector can collect features at a pre-specified point in time (e.g., at commencement of execution of the program or subsequent to execution of the program).
[0008] The at least one feature collected by the feature collector can include a combination of point in time measurements and ongoing measurements during execution of the program. The at least one feature collected by the feature collector can include one or more operational features that are passively collected prior to execution of the program, and such operational features can be stored in a cache.
[0009] The at least one feature can include at least one operational feature that characterizes an operational environment of a system to execute the program. The at least one operational feature can include one or more of: program reputation, contextual information, system state, system operating statistics, time-series data, existing programs, operating system details, program run status, and configuration variables.
[0010] The at least one features can include at least one static feature that characterizes the program. The at least one static feature can be, for example, measurements of the program, structural elements of the program, or contents of the program.
[0011] The at least one feature can include at least one dynamic feature that characterizes execution of the program. The at least one dynamic feature can include, for example, interactions with an operating system, subroutine executions, process state, program or system execution statistics, or an order of an occurrence of events associated with the program.
[0012] The at least one feature can include at least one external feature from a source external to a system to execute the program. The external feature or features can be obtained, for example, from at least one remote database or other data source.
[0013] At least one feature can take a format selected from a group consisting of: binary, continuous, and categorical. [0014] The at least one machine learning model can include an ensemble of machine learning models. The ensemble of machine learning models can include one or more models such as neural network models, support vector machine models, scorecard models, logistic regression models, Bayesian models, decision tree models or other applicable classification models. An output of two or more machine learning models can be combined and used to determine whether or not to allow the program to execute or continue to execute.
[0015] The determination can include generating a score characterizing a level of safety for executing the program. The generated score can be used to determine whether or not to allow the program to execute. The determination can also include generating a confidence level for the generated score that is used to determine whether or not to allow the program to execute.
[0016] Preventing the program from executing or continuing to execute can include at least one of many actions. These actions can include one or more of: blocking at least a portion of the program from loading into memory, determining that a dynamic library associated with the program is unsafe, blocking the dynamic library associated with the program from loading into memory, unloading a previously loaded module (portion of code, etc.) associated with the program, disabling the program while it is running, implementing constraints on the program prior to it being run or before it continues to run, quarantining at least a portion of the program, or deleting at least a portion of the program.
[0017] In some cases, preventing the program from executing or continuing to execute can include one or more of preventing the program from executing individual operations, by modifying an access level of the program, selectively blocking attempted operations, or preventing an attempted operation and instead causing an alternative operation.
[0018] Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, cause at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
[0019] The subject matter described herein provides many advantages. For example, the current subject matter provides more rapid discernment while, at the same time, consuming fewer resources such as memory and processors.
[0020] The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
DESCRIPTION OF DRAWINGS
[0021] FIG. 1 is a system diagram illustrating elements used to provide application execution control discernment;
[0022] FIG. 2 is a diagram characterizing ensemble discernment; and [0023] FIG. 3 is a process flow diagram illustrating a method for discernment using at least one machine learning model.
DETAILED DESCRIPTION
[0024] The current subject matter can be implemented, in some examples, using three major elements to produce an efficient method of discernment. In this regard, discernment refers to the characterization of whether or not to allow a particular application / application module to execute on a particular computing system or systems.
These major software elements are illustrated in diagram 100 of FIG. 1 and can include a feature collection system 110 (sometimes referred to as a feature collector), a discernment engine 120, and an enforcement system 130. The feature collection system
110 collects or otherwise accesses features characterizing a program and/or the environment in which the program is being executed or to be executed. These features are passed on to the discernment engine 120 which can make a decision on whether or not to allow the program to execute. If it is determined that the program should not execute, the enforcement system 130 takes action to prevent the application from executing / continuing to execute. [0025] A "feature" as used herein can include any salient data / data point that can be used to measure the implied safety of a potentially run program. A "program" as used herein is a piece of executable computer code that a user or system wishes to execute, and may include associated data / metadata. "Discernment" as used herein is the process of deciding whether the program should be executed or not (including whether or not to continue executing a program). "Enforcement" as used herein is a process in which the effects of discernment are made effective in a computer system. The current subject matter can utilize one or more machine learning models that are each a mathematically based understanding of a particular situation and one or more algorithms defined to determine an outcome from a particular input against the model. In some variations, an ensemble of machine learning models can be used which is a collection of models utilized in a particular way to generally improve accuracy or reduce variance.
[0026] The current subject matter offers an effective method of application control that differs from traditional approaches in a variety of ways. Traditional approaches utilize either the concept of a "blacklist", or a set of programs to explicitly disallow, or a "whitelist", or a set of programs to explicitly allow. The current subject matter foregoes both as primary selection criteria and instead measures various features from the system and uses these against a previously trained machine learning model and/or ensemble of machine learning models.
[0027] The ensemble of machine learning models can be devised and trained before application control. Due to the predictive nature of various machine learning algorithms, a trained model allows a "fuzzy" match against safe and unsafe programs. By carefully selecting and training the models in the ensemble, the system can act resiliently against change over time, accommodating small and large changes in program behaviors that resemble "safety" or a lack thereof. A machine learning model may be characterized by an algorithm it incorporates, which may include, as an example, neural networks, support vector machines, logistic regressions, scorecard models, Bayesian algorithms, and decision trees. A machine learning model can be trained using supervised learning, in which a training set of input samples labeled with the desired output values conditions the model to correctly classify samples that do not occur in the training set, or it may be trained using unsupervised learning, in which an algorithm identifies hidden structure in unlabeled data. Reinforcement learning represents a third process for training a model.
[0028] Referring back again to diagram 1 of FIG. 1, the feature collector 110 can send passive features (operational and dynamic) on an ongoing basis to the discernment engine 120. The discernment engine 120 can request point in time features from the feature collector 110 at a particular decision point, such as execution. These point in time features can include observations about the computer's state extrinsic to the program or related features from an external source. The discernment engine 120 can then decide if the program should execute. If execution is allowed, the program executes; if execution is disallowed, the enforcement system 130 prevents the application from executing.
[0029] FIG. 2 is a diagram 200 characterizing ensemble discernment in which an original vector 210 can be passed to the discernment engine 120 for scoring 230. The discernment engine 120 can use a model selector 220 to choose one or more models to run (in this example, Models A, B, C). The selection of a model can be be predicated on features provided by the feature collector 110, a user configuration, the current availability or scarcity of computing resources, and/or other state information. Each such model can be comprised of several possible algorithms. The output of the various algorithms and models can be combined (using, for example, a weighting arrangement or model) in a scoring component 230. A final output can be a decision (or in some cases a score) characterizing the results and a confidence level.
[0030] Feature collection can be a combination of point in time and ongoing measurements, and can include the passive collection of features into a general cache. Features can be used to generate data points for which the discernment engine 120 makes a decision. The discernment engine 120 can utilize the features collected to make a decision based on previously collected data. The enforcement system 130 can implement the technical details of operation regarding the decisions made from the discernment engine 120.
[0031] If a user or other program wishes to execute a program, it will first ask the discernment engine 120 to decide if this is a positive action. The discernment engine 120 can either answer with previous discernments, or create a new discernment using a combination of previously collected features and features collected via a point in time analysis. With the decision made, the enforcement system 130 can implement the logic to allow or disallow execution of the program, and any other elements necessary to implement the discernment decision in an ongoing manner.
[0032] Features can be collected from various sources. In one
implementation, features can be collected from four primary sources. [0033] A first source can comprise operational features that relate to the operational environment of the system. Operational features can include existing programs, details about the operating system, run status of the program, configuration variables associated with the program, and other measures particular to the environment in which the program is intended to run. Some of these features can be ongoing (i.e., they are active features); others can be determined at a particular point in time (i.e., they are passive features).
[0034] A second source can comprise static features that concern the program that wishes to run. Measurements about the program itself, including structural elements and program contents, can be collected. These features can be calculated by examining the contents of the file and processing through analytic methods. One example of a static feature of a program is the size of such program. Examples of structural elements of a program can include the number of sections it comprises, the proportion of the program described by each section, and the proportion of the program not described by any section. The computed Shannon entropy of each section is an example of a feature derived from processing.
[0035] A third source can comprise dynamic features that relate to individual program execution. Dynamic features can generally be collected in an ongoing manner. The dynamic features can be associated with a particular program, rather than the system itself. These features can be used to determine potentially hostile activities from a program that was either unable to receive a high confidence discernment prior to execution or otherwise authorized to run under direct management policy. [0036] A fourth source can comprise external features that can be generally extracted from sources of information outside of the host computer itself, generally via a remote data source such as a lookup on the network. This lookup can include a query against a cloud database, or a deeper analysis of certain elements on a network based computer. For example, external features can include a determination by a trusted third party as to a program's authenticity, a program's prevalence among a larger population of computers, and/or the reputations of other computers contacted by a program.
Frequently, these features entail knowledge that is impractical to host on an individual computer due to size, complexity, or frequency of updates. Due to the latency of a network lookup, these features can generally be collected in response to a particular request from the discernment engine 120, at a particular point in time.
[0037] Features can be collected into efficient computer data structures, such as hash tables, binary trees, and vectors, and the features can be passed to the discernment engine 120. Ongoing features can be collected and held for an appropriate amount of time to ensure their ability to usefully affect the discernment process. Point in time features can be collected in an on-demand manner, typically on the event of discernment.
[0038] Features can be binary, continuous, or categorical in nature. Binary features can only be in one of two states. Continuous features can represent a value along a range, and are generally numeric in nature. Categorical features can represent a value within a discrete set of possible values.
[0039] Features can be considered first order or second order or nth order. First order features are features measured directly from the source. These features can be combined or further analyzed by various methods to generate second order features. Such further analyzing can include making a mathematical analysis of the value of a first order feature, or by applying combinations of first order features to develop a truly unique second order feature.
[0040] The discernment engine 120 can create a decision on the anticipated safety of an application. The discernment engine 120 can receive input from the feature collector 110 and apply an ensemble of machine learning models to calculate a score that determines if an application is safe to run or not, as well as a confidence in the accuracy of the score.
[0041] The discernment engine 120 can take features in combination or singly and can, in some cases, use a process known as vectorization to turn individual features into a mathematical vector. This process can involve creating a compact and efficient representation of the input. The vector can be used by the various machine learning algorithms to generate a score.
[0042] The use of ensembles allows multiple, distinct models to be tailored to suit more specialized combinations of features within the more common types of programs. Each sample can be approached with a model that is more appropriate for its type. In addition to model specificity, the general ensemble can offer multiple different learning algorithms per model. This allows sample discernment to benefit from multiple different assessments. Some specific models have lower error rates for particular algorithms, and combining them in a weighted manner helps achieve the highest results.
[0043] Ensemble models and/or their outputs can be combined using individualized measured error rates in a weighting scheme (such as a scorecard model). Each model that scores can be normalized and adjusted by its measured error rate. This final combination allows for the most accurate understanding from a variety of sources.
[0044] The enforcement system 130 can be a component that implements methods for disabling execution of a program. The enforcement system 130 can use a variety of tactics to disable execution in a safe and reliable way.
[0045] Decisions regarding a program may not always be determined before program execution, and so there may be some more complex scenarios that require additional handling. The enforcement system 130 can be integrated deeply with the computer operating system and act on behalf of the discernment engine 120.
[0046] The enforcement system 130 can implement one or more of blocking a process or dynamic library from loading into memory, unloading a previously loaded module, disabling a running program, implementing constraints on a program to be run, quarantining hostile applications, and/or deleting hostile applications. It is often desirable for the enforcement system 130 to issue an alert when a module determined to be hostile is accessed and/or when action is attempted against a hostile module.
[0047] The enforcement system 130 can utilize processes implemented both in the operating system core, and implanted in each process. These processes can allow for high degrees of control from both the core operating system level, as well as deep introspection and control from within the application itself.
[0048] Additionally, the enforcement system 130 can utilize tactics for preventing an application from running or restricting its level of access. Such tactics can include moving, renaming, or deleting the program; applying attributes or access controls to the program; forcing the application to run with reduced privileges; forcing the application to run in a "sandbox," where certain actions are redirected to access a virtualized system state; and/or other monitoring and controlling the actions an application may perform.
[0049] The systems / technique herein can go into effect when an attempt is made to run a program, or a decision is otherwise warranted by user defined behavior, such as intentionally scanning a file to ascertain its safety.
[0050] With reference again to diagram 100 of FIG. 1, the features originating from the operating system and the dynamic feature collection system 110 can continue to stream into the discernment engine 120 in an ongoing manner. These can be generally available for use within the discernment engine 120, and may initiate a discernment action if one is warranted.
[0051] Generally, however, the system / methods can be activated during the actions of the system or the user when they choose to either start an application or otherwise choose to determine a file's safety. When one of these events is triggered, the discernment engine 120 can request additional details from the feature collector. The feature collector 110 can then gather the appropriate details and pass them to the discernment engine 120. These features may originate via static, dynamic, operational, or external features.
[0052] The discernment engine 120 can take all collected features, and use a vectorization process to develop a vector as input (see diagram 200 of FIG. 2). The input vector 210 can be associated with one or more models by the model selector 220 of the discernment engine 120. For each model the model selector 220 chooses, the input vector 210 can be applied. Each model can have one or more algorithms associated with it, generating a series of individual scores. The outputs of the individual models can be combined in a scoring component 230, utilizing a weighting scheme (e.g., a scorecard model). The scoring component 230 can generate a final score, comprised of a result (e.g., safe or not) and a confidence in that result.
[0053] FIG. 3 is a process flow diagram 300 in which, at 310, data is received (i.e., accessed, obtained, etc.) that comprises at least one feature associated with a program. Thereafter, at 320, it can be determined, based on the received data and using at least one machine learning model, whether to allow at least a portion of the program to execute. The at least one machine learning model used in this regard can be trained using, for example, supervised learning and/or unsupervised learning (in some cases there may be a combination of models that use each type of learning). Subsequently, at 330, the program can execute if it is determined that at least a portion of the program is allowed to execute. Otherwise, at 330, at least a portion of the program is prevented from executing / continuing to execute if it is determined that the program (or portion thereof) is not allowed to execute.
[0054] One or more aspects or features of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include
implementation in one or more computer programs that are executable and/or
interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device (e.g., mouse, touch screen, etc.), and at least one output device.
[0055] These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional
programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term "machine-readable medium" (sometimes referred to as a computer program product) refers to physically embodied apparatus and/or device, such as for example magnetic disks, optical discs, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable data processor, including a machine-readable medium that receives machine instructions as a machine -readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable data processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
[0056] The subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front- end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), and the Internet.
[0057] The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a
communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[0058] The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all
implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flow(s) depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:
1. A method comprising :
receiving data comprising at least one feature associated with a program;
determining, based on the received data and using at least one machine learning model, whether to allow the program to execute or continue to execute;
allowing at least a portion of the program to execute or continue executing if it is determined that the program is allowed to execute; and
preventing at least a portion of the program from executing or continuing to execute if it is determined that the program is not allowed to execute.
2. A method as in claim 1 , wherein one or more of the at least one machine learning model is trained using feature data derived from a plurality of different programs.
3. A method as in claim 1 or 2, wherein one or more of the at least one machine learning model is trained using supervised learning.
4. A method as in any of the preceding claims, wherein one or more of the at least one machine learning model is trained using unsupervised learning.
5. A method as in any of the preceding claims further comprising:
collecting the at least one feature of the program by a feature collector.
6. A method as in claim 5, wherein the feature collector collects features at a pre- specified point in time.
7. A method as in claim 6, wherein the pre-specified point in time is at execution of the program.
8. A method as in any of claim 5 to 7, wherein the at least one feature comprises a combination of point in time measurements and ongoing measurements during execution of the program.
9. A method as in any of claim 5 to 8, wherein the at least one feature comprises one or more operational features that are passively collected prior to execution of the program and the method further comprises:
storing the operational features in a cache.
10. A method as in any of the preceding claims, wherein the at least one feature comprises at least one operational feature that characterizes an operational environment of a system to execute the program.
11. A method as in claim 10, wherein the at least one operational feature is selected from a group consisting of: program reputation, contextual information, system state, system operating statistics, time-series data, existing programs, operating system details, program run status, or configuration variables.
12. A method as in any of the preceding claims, wherein the at least one features comprises at least one static feature that characterizes the program.
13. A method as in claim 12, wherein the at least one static feature is selected from a group consisting of: measurements of the program, structural elements of the program, or contents of the program.
14. A method as in any of the preceding claims, wherein the at least one feature comprises at least one dynamic feature that characterizes execution of the program.
15. A method as in claim 11, wherein the at least one dynamic feature is selected from a group consisting of: interactions with an operating system, subroutine executions, process state, program or system execution statistics, or an order of an occurrence of events associated with the program.
16. A method as in any of the preceding claims, wherein the at least one feature comprises at least one external feature from a source external to a system to execute the program.
17. A method as in claim 16, wherein the at least one external feature is obtained from at least one remote database or other data source.
18. A method as in any of the preceding claims, wherein the at least one feature takes a format selected from a group consisting of: binary, continuous, and categorical.
19. A method as in any of the preceding claims, wherein the at least one machine learning model comprises an ensemble of machine learning models.
20. A method as in claim 19, wherein the ensemble of machine learning models is selected from a group consisting of: neural network models, support vector machine models, scorecard models, logistic regression models, Bayesian models, or decision tree models.
21. A method as in claim 19, wherein an output of two or more machine learning models is combined and used to determine whether or not to allow the program to execute or continue to execute.
22. A method as in any of the preceding claims, wherein the determining comprises generating a score characterizing a level of safety for executing the program, wherein the generated score is used to determine whether or not to allow the at least a portion of the program to execute.
23. A method as in claim 22, wherein the determining further comprises generating a confidence level for the generated score, wherein the generated confidence level is used to determine whether or not to allow the at least a portion of the program to execute.
24. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises blocking at least a portion of the program from loading into memory.
25. A method as in in any of the preceding claims further comprising:
determining that a dynamic library associated with the program is unsafe;
wherein preventing the at least a portion of the program from executing or continuing to execute comprises: blocking the dynamic library associated with the program from loading into memory.
26. A method as in in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: unloading a previously loaded module associated with the program.
27. A method in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: disabling the program while it is running.
28. A method in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: implementing constraints on the program prior to it being run or before it continues to run.
29. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: quarantining at least a portion of the program.
30. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: deleting at least a portion of the program.
31. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: preventing the program from executing individual operations.
32. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: modifying an access level of the program.
33. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: selectively blocking attempted operations.
34. A method as in any of the preceding claims, wherein preventing the at least a portion of the program from executing or continuing to execute comprises: preventing an attempted operation and instead causing an alternative operation.
35. A non-transitory computer program product storing instructions which, when executed by at least one hardware data processor forming part of at least one computing device, result in a method as in any of the preceding claims.
36. A system comprising :
at least one data processor; and
memory storing instructions which, when executed by the at least one data processor, result in a method as in any of claims 1 to 34.
PCT/US2015/014769 2014-02-07 2015-02-06 Application execution control utilizing ensemble machine learning for discernment WO2015120243A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2016550628A JP6662782B2 (en) 2014-02-07 2015-02-06 Application execution control using ensemble machine learning for identification
EP15708931.9A EP3103070B1 (en) 2014-02-07 2015-02-06 Application execution control utilizing ensemble machine learning for discernment
CA2938580A CA2938580C (en) 2014-02-07 2015-02-06 Application execution control utilizing ensemble machine learning for discernment
AU2015213797A AU2015213797B2 (en) 2014-02-07 2015-02-06 Application execution control utilizing ensemble machine learning for discernment
HK17105692.1A HK1232326A1 (en) 2014-02-07 2017-06-08 Application execution control utilizing ensemble machine learning for discernment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201461937379P 2014-02-07 2014-02-07
US61/937,379 2014-02-07

Publications (2)

Publication Number Publication Date
WO2015120243A1 true WO2015120243A1 (en) 2015-08-13
WO2015120243A8 WO2015120243A8 (en) 2016-09-09

Family

ID=52633591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/014769 WO2015120243A1 (en) 2014-02-07 2015-02-06 Application execution control utilizing ensemble machine learning for discernment

Country Status (7)

Country Link
US (2) US10235518B2 (en)
EP (1) EP3103070B1 (en)
JP (1) JP6662782B2 (en)
AU (1) AU2015213797B2 (en)
CA (1) CA2938580C (en)
HK (1) HK1232326A1 (en)
WO (1) WO2015120243A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018200342A1 (en) * 2017-04-25 2018-11-01 Xaxis, Inc. Double blind machine learning insight interface apparatuses, methods and systems

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015120243A1 (en) 2014-02-07 2015-08-13 Cylance Inc. Application execution control utilizing ensemble machine learning for discernment
WO2016128491A1 (en) 2015-02-11 2016-08-18 British Telecommunications Public Limited Company Validating computer resource usage
US9465940B1 (en) 2015-03-30 2016-10-11 Cylance Inc. Wavelet decomposition of software entropy to identify malware
US9495633B2 (en) 2015-04-16 2016-11-15 Cylance, Inc. Recurrent neural networks for malware analysis
WO2017021153A1 (en) 2015-07-31 2017-02-09 British Telecommunications Public Limited Company Expendable access control
WO2017021154A1 (en) 2015-07-31 2017-02-09 British Telecommunications Public Limited Company Access control
US10853750B2 (en) 2015-07-31 2020-12-01 British Telecommunications Public Limited Company Controlled resource provisioning in distributed computing environments
KR101625660B1 (en) * 2015-11-20 2016-05-31 한국지질자원연구원 Method for making secondary data using observed data in geostatistics
US9602531B1 (en) 2016-02-16 2017-03-21 Cylance, Inc. Endpoint-based man in the middle attack detection
CA3015352A1 (en) 2016-02-23 2017-08-31 Carbon Black, Inc. Cybersecurity systems and techniques
US9928363B2 (en) * 2016-02-26 2018-03-27 Cylance Inc. Isolating data for analysis to avoid malicious attacks
WO2017147441A1 (en) * 2016-02-26 2017-08-31 Cylance Inc. Sub-execution environment controller
US11159549B2 (en) 2016-03-30 2021-10-26 British Telecommunications Public Limited Company Network traffic threat identification
US11153091B2 (en) 2016-03-30 2021-10-19 British Telecommunications Public Limited Company Untrusted code distribution
WO2017167548A1 (en) 2016-03-30 2017-10-05 British Telecommunications Public Limited Company Assured application services
EP3437007B1 (en) 2016-03-30 2021-04-28 British Telecommunications public limited company Cryptocurrencies malware based detection
WO2017167544A1 (en) 2016-03-30 2017-10-05 British Telecommunications Public Limited Company Detecting computer security threats
US10681059B2 (en) 2016-05-25 2020-06-09 CyberOwl Limited Relating to the monitoring of network security
CN105975861A (en) * 2016-05-27 2016-09-28 百度在线网络技术(北京)有限公司 Application detection method and device
US10586171B2 (en) 2016-05-31 2020-03-10 International Business Machines Corporation Parallel ensemble of support vector machines
WO2017214131A1 (en) * 2016-06-08 2017-12-14 Cylance Inc. Deployment of machine learning models for discernment of threats
WO2018039792A1 (en) 2016-08-31 2018-03-08 Wedge Networks Inc. Apparatus and methods for network-based line-rate detection of unknown malware
KR20180070103A (en) * 2016-12-16 2018-06-26 삼성전자주식회사 Method and apparatus for recognition
WO2018135881A1 (en) 2017-01-19 2018-07-26 Samsung Electronics Co., Ltd. Vision intelligence management for electronic devices
US10909371B2 (en) * 2017-01-19 2021-02-02 Samsung Electronics Co., Ltd. System and method for contextual driven intelligence
WO2018178034A1 (en) * 2017-03-30 2018-10-04 British Telecommunications Public Limited Company Anomaly detection for computer systems
EP3382591B1 (en) 2017-03-30 2020-03-25 British Telecommunications public limited company Hierarchical temporal memory for expendable access control
EP3602380B1 (en) 2017-03-30 2022-02-23 British Telecommunications public limited company Hierarchical temporal memory for access control
EP3622448A1 (en) 2017-05-08 2020-03-18 British Telecommunications Public Limited Company Adaptation of machine learning algorithms
US11698818B2 (en) 2017-05-08 2023-07-11 British Telecommunications Public Limited Company Load balancing of machine learning algorithms
WO2018206408A1 (en) 2017-05-08 2018-11-15 British Telecommunications Public Limited Company Management of interoperating machine leaning algorithms
WO2018206405A1 (en) 2017-05-08 2018-11-15 British Telecommunications Public Limited Company Interoperation of machine learning algorithms
US10958422B2 (en) * 2017-06-01 2021-03-23 Cotiviti, Inc. Methods for disseminating reasoning supporting insights without disclosing uniquely identifiable data, and systems for the same
US10592666B2 (en) * 2017-08-31 2020-03-17 Micro Focus Llc Detecting anomalous entities
CN107944259A (en) * 2017-11-21 2018-04-20 广东欧珀移动通信有限公司 Using the management-control method of startup, device and storage medium and mobile terminal
US10360482B1 (en) * 2017-12-04 2019-07-23 Amazon Technologies, Inc. Crowd-sourced artificial intelligence image processing services
KR102456579B1 (en) 2017-12-07 2022-10-20 삼성전자주식회사 Computing apparatus and method thereof robust to encryption exploit
US11164086B2 (en) 2018-07-09 2021-11-02 International Business Machines Corporation Real time ensemble scoring optimization
CN109167882A (en) * 2018-09-27 2019-01-08 努比亚技术有限公司 A kind of association starting control method, terminal and computer readable storage medium
KR102277172B1 (en) * 2018-10-01 2021-07-14 주식회사 한글과컴퓨터 Apparatus and method for selecting artificaial neural network
US11321611B2 (en) 2018-10-03 2022-05-03 International Business Machines Corporation Deployment verification of authenticity of machine learning results
US10880328B2 (en) * 2018-11-16 2020-12-29 Accenture Global Solutions Limited Malware detection
JP2022535658A (en) * 2019-04-02 2022-08-10 トライノミアル グローバル リミティド Remote management of user devices
US11144735B2 (en) 2019-04-09 2021-10-12 International Business Machines Corporation Semantic concept scorer based on an ensemble of language translation models for question answer system
CN110362995B (en) * 2019-05-31 2022-12-02 电子科技大学成都学院 Malicious software detection and analysis system based on reverse direction and machine learning
US11620207B2 (en) 2020-01-08 2023-04-04 International Business Machines Corporation Power efficient machine learning in cloud-backed mobile systems
KR102330081B1 (en) * 2020-11-20 2021-11-23 부산대학교 산학협력단 Operation method and device for blockchain based android malicious app detection ensemble model
EP4256488A1 (en) * 2020-12-02 2023-10-11 Deep Forest Sciences, Inc. Differentiable machines for physical systems
KR20230089966A (en) * 2021-12-14 2023-06-21 주식회사 엔젤게임즈 Method and system for controlling training an artificial intelligence robot and trading aritifical intelligence model that trains the aritificial intelligence robot
US20240036999A1 (en) * 2022-07-29 2024-02-01 Dell Products, Lp System and method for predicting and avoiding hardware failures using classification supervised machine learning

Family Cites Families (85)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5841947A (en) * 1996-07-12 1998-11-24 Nordin; Peter Computer implemented machine learning method and system
US6430590B1 (en) 1999-01-29 2002-08-06 International Business Machines Corporation Method and apparatus for processing executable program modules having multiple dependencies
US6546551B1 (en) 1999-09-28 2003-04-08 International Business Machines Corporation Method for accurately extracting library-based object-oriented applications
US7181768B1 (en) 1999-10-28 2007-02-20 Cigital Computer intrusion detection system and method based on application monitoring
US20110238855A1 (en) * 2000-09-25 2011-09-29 Yevgeny Korsunsky Processing data flows with a data flow processor
US6898737B2 (en) * 2001-05-24 2005-05-24 Microsoft Corporation Automatic classification of event data
US7065764B1 (en) 2001-07-20 2006-06-20 Netrendered, Inc. Dynamically allocated cluster system
US7240048B2 (en) 2002-08-05 2007-07-03 Ben Pontius System and method of parallel pattern matching
EP1636731A2 (en) 2003-06-25 2006-03-22 Siemens Medical Solutions USA, Inc. Systems and methods for automated diagnosis and decision support for breast imaging
JP2005044330A (en) * 2003-07-24 2005-02-17 Univ Of California San Diego Weak hypothesis generation device and method, learning device and method, detection device and method, expression learning device and method, expression recognition device and method, and robot device
US8301584B2 (en) * 2003-12-16 2012-10-30 International Business Machines Corporation System and method for adaptive pruning
JP4482796B2 (en) 2004-03-26 2010-06-16 ソニー株式会社 Information processing apparatus and method, recording medium, and program
US20060047807A1 (en) 2004-08-25 2006-03-02 Fujitsu Limited Method and system for detecting a network anomaly in a network
US20060112388A1 (en) 2004-11-22 2006-05-25 Masaaki Taniguchi Method for dynamic scheduling in a distributed environment
US20090282070A1 (en) * 2004-12-01 2009-11-12 Nec Corporation Application contention management system method thereof, and information processing terminal using the same
JP4654776B2 (en) 2005-06-03 2011-03-23 富士ゼロックス株式会社 Question answering system, data retrieval method, and computer program
US7716645B2 (en) 2005-06-10 2010-05-11 International Business Machines Corporation Using atomic sets of memory locations
US7945902B1 (en) 2005-07-13 2011-05-17 Oracle America, Inc. Detection of non-standard application programming interface usage via analysis of executable code
US7912698B2 (en) * 2005-08-26 2011-03-22 Alexander Statnikov Method and system for automated supervised data analysis
US20080134326A2 (en) 2005-09-13 2008-06-05 Cloudmark, Inc. Signature for Executable Code
US7536373B2 (en) * 2006-02-14 2009-05-19 International Business Machines Corporation Resource allocation using relational fuzzy modeling
JP2007280031A (en) * 2006-04-06 2007-10-25 Sony Corp Information processing apparatus, method and program
JPWO2007135723A1 (en) * 2006-05-22 2009-09-24 富士通株式会社 Neural network learning apparatus, method, and program
WO2008055156A2 (en) * 2006-10-30 2008-05-08 The Trustees Of Columbia University In The City Of New York Methods, media, and systems for detecting an anomalous sequence of function calls
JP2008129714A (en) * 2006-11-17 2008-06-05 Univ Of Tsukuba Abnormality detection method, abnormality detection device, abnormality detection program, and learning model generation method
US8370818B2 (en) 2006-12-02 2013-02-05 Time Warner Cable Inc. Methods and apparatus for analyzing software interface usage
US20080133571A1 (en) 2006-12-05 2008-06-05 International Business Machines Corporation Modifying Behavior in Messaging Systems According to Organizational Hierarchy
US9009649B2 (en) 2007-05-16 2015-04-14 Accenture Global Services Limited Application search tool for rapid prototyping and development of new applications
KR100942795B1 (en) 2007-11-21 2010-02-18 한국전자통신연구원 A method and a device for malware detection
US7958068B2 (en) * 2007-12-12 2011-06-07 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification
US8364528B2 (en) * 2008-05-06 2013-01-29 Richrelevance, Inc. System and process for improving product recommendations for use in providing personalized advertisements to retail customers
US8347272B2 (en) 2008-07-23 2013-01-01 International Business Machines Corporation Call graph dependency extraction by static source code analysis
US8108325B2 (en) 2008-09-15 2012-01-31 Mitsubishi Electric Research Laboratories, Inc. Method and system for classifying data in system with limited memory
US8504504B2 (en) 2008-09-26 2013-08-06 Oracle America, Inc. System and method for distributed denial of service identification and prevention
US20100082400A1 (en) * 2008-09-29 2010-04-01 Yahoo! Inc.. Scoring clicks for click fraud prevention
US20100107245A1 (en) * 2008-10-29 2010-04-29 Microsoft Corporation Tamper-tolerant programs
US8505015B2 (en) 2008-10-29 2013-08-06 Teradata Us, Inc. Placing a group work item into every prioritized work queue of multiple parallel processing units based on preferred placement of the work queues
US9239740B2 (en) 2009-06-16 2016-01-19 Microsoft Technology Licensing, Llc Program partitioning across client and cloud
US8726254B2 (en) 2009-06-20 2014-05-13 Microsoft Corporation Embedded annotation and program analysis
US8370613B1 (en) * 2009-06-30 2013-02-05 Symantec Corporation Method and apparatus for automatically optimizing a startup sequence to improve system boot time
US8560465B2 (en) * 2009-07-02 2013-10-15 Samsung Electronics Co., Ltd Execution allocation cost assessment for computing systems and environments including elastic computing systems and environments
US8429097B1 (en) * 2009-08-12 2013-04-23 Amazon Technologies, Inc. Resource isolation using reinforcement learning and domain-specific constraints
US9081958B2 (en) 2009-08-13 2015-07-14 Symantec Corporation Using confidence about user intent in a reputation system
US8516452B2 (en) 2009-12-08 2013-08-20 International Business Machines Corporation Feedback-directed call graph expansion
US8818923B1 (en) 2011-06-27 2014-08-26 Hrl Laboratories, Llc Neural network device with engineered delays for pattern storage and matching
US8887163B2 (en) 2010-06-25 2014-11-11 Ebay Inc. Task scheduling based on dependencies and resources
US8856545B2 (en) * 2010-07-15 2014-10-07 Stopthehacker Inc. Security level determination of websites
US8359223B2 (en) * 2010-07-20 2013-01-22 Nec Laboratories America, Inc. Intelligent management of virtualized resources for cloud database systems
US9262228B2 (en) 2010-09-23 2016-02-16 Microsoft Technology Licensing, Llc Distributed workflow in loosely coupled computing
WO2012071989A1 (en) 2010-11-29 2012-06-07 北京奇虎科技有限公司 Method and system for program identification based on machine learning
AU2011336466C1 (en) * 2010-12-01 2017-01-19 Cisco Technology, Inc. Detecting malicious software through contextual convictions, generic signatures and machine learning techniques
US8549647B1 (en) 2011-01-14 2013-10-01 The United States Of America As Represented By The Secretary Of The Air Force Classifying portable executable files as malware or whiteware
US9398033B2 (en) 2011-02-25 2016-07-19 Cavium, Inc. Regular expression processing automaton
US20120222097A1 (en) * 2011-02-28 2012-08-30 Wilson Jobin System and method for user classification and statistics in telecommunication network
US8990149B2 (en) * 2011-03-15 2015-03-24 International Business Machines Corporation Generating a predictive model from multiple data sources
US9286182B2 (en) * 2011-06-17 2016-03-15 Microsoft Technology Licensing, Llc Virtual machine snapshotting and analysis
US8631395B2 (en) 2011-09-02 2014-01-14 Microsoft Corporation Inter-procedural dead catch handler optimizations
US9329887B2 (en) 2011-10-19 2016-05-03 Hob Gmbh & Co. Kg System and method for controlling multiple computer peripheral devices using a generic driver
US20130152200A1 (en) 2011-12-09 2013-06-13 Christoph Alme Predictive Heap Overflow Protection
CN103186406B (en) 2011-12-30 2016-08-17 国际商业机器公司 Method and apparatus for control flow analysis
US8713684B2 (en) 2012-02-24 2014-04-29 Appthority, Inc. Quantifying the risks of applications for mobile devices
US8627291B2 (en) 2012-04-02 2014-01-07 International Business Machines Corporation Identification of localizable function calls
WO2013174451A1 (en) * 2012-05-25 2013-11-28 Nec Europe Ltd. Method for executing processes on a worker machine of a distributed computing system and a distributed computing system
US9292688B2 (en) 2012-09-26 2016-03-22 Northrop Grumman Systems Corporation System and method for automated machine-learning, zero-day malware detection
US9069916B2 (en) * 2012-11-13 2015-06-30 Chevron U.S.A. Inc. Model selection from a large ensemble of models
US8880446B2 (en) * 2012-11-15 2014-11-04 Purepredictive, Inc. Predictive analytics factory
US20140180738A1 (en) * 2012-12-21 2014-06-26 Cloudvu, Inc. Machine learning for systems management
US20140189703A1 (en) * 2012-12-28 2014-07-03 General Electric Company System and method for distributed computing using automated provisoning of heterogeneous computing resources
US20140188768A1 (en) * 2012-12-28 2014-07-03 General Electric Company System and Method For Creating Customized Model Ensembles On Demand
US9104525B2 (en) 2013-01-22 2015-08-11 Microsoft Technology Licensing, Llc API usage pattern mining
US9015685B2 (en) 2013-03-01 2015-04-21 International Business Machines Corporation Code analysis for simulation efficiency improvement
US9218574B2 (en) * 2013-05-29 2015-12-22 Purepredictive, Inc. User interface for machine learning
US20140358828A1 (en) * 2013-05-29 2014-12-04 Purepredictive, Inc. Machine learning generated action plan
US20140372513A1 (en) * 2013-06-12 2014-12-18 Cloudvu, Inc. Multi-tenant enabling a single-tenant computer program product
AU2014302603A1 (en) 2013-06-24 2016-01-07 Cylance Inc. Automated system for generative multimodel multiclass classification and similarity analysis using machine learning
US9286573B2 (en) * 2013-07-17 2016-03-15 Xerox Corporation Cost-aware non-stationary online learning
EP2833594A1 (en) 2013-07-31 2015-02-04 Siemens Aktiengesellschaft Feature based three stage neural networks intrusion detection method and system
US10055434B2 (en) 2013-10-16 2018-08-21 University Of Tennessee Research Foundation Method and apparatus for providing random selection and long-term potentiation and depression in an artificial network
US8930916B1 (en) 2014-01-31 2015-01-06 Cylance Inc. Generation of API call graphs from static disassembly
US9262296B1 (en) 2014-01-31 2016-02-16 Cylance Inc. Static feature extraction from structured files
WO2015120243A1 (en) 2014-02-07 2015-08-13 Cylance Inc. Application execution control utilizing ensemble machine learning for discernment
US9171154B2 (en) 2014-02-12 2015-10-27 Symantec Corporation Systems and methods for scanning packed programs in response to detecting suspicious behaviors
EP3238611B1 (en) * 2016-04-29 2021-11-17 Stichting IMEC Nederland A method and device for estimating a condition of a person
EP3255573A1 (en) * 2016-06-10 2017-12-13 Electronics and Telecommunications Research Institute Clinical decision supporting ensemble system and clinical decison supporting method using the same
WO2019236997A1 (en) * 2018-06-08 2019-12-12 Zestfinance, Inc. Systems and methods for decomposition of non-differentiable and differentiable models

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"LECTURE NOTES IN COMPUTER SCIENCE", vol. 3488, 31 January 2005, SPRINGER BERLIN HEIDELBERG, Berlin, Heidelberg, ISBN: 978-3-54-045234-8, ISSN: 0302-9743, article SALVATORE J. STOLFO ET AL: "Anomaly Detection in Computer Security and an Application to File System Accesses", pages: 14 - 28, XP055192090, DOI: 10.1007/11425274_2 *
HAJIME INOUE: "Anomaly detection in dynamic execution environments", 31 January 2005 (2005-01-31), XP055191583, ISBN: 978-0-54-249408-6, Retrieved from the Internet <URL:https://www.cs.unm.edu/~forrest/dissertations/inoue-dissertation.pdf> [retrieved on 20150526] *
XUN WANG ET AL: "Detecting worms via mining dynamic program execution", SECURITY AND PRIVACY IN COMMUNICATIONS NETWORKS AND THE WORKSHOPS, 2007. SECURECOMM 2007. THIRD INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 17 September 2007 (2007-09-17), pages 412 - 421, XP031276575, ISBN: 978-1-4244-0974-7 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018200342A1 (en) * 2017-04-25 2018-11-01 Xaxis, Inc. Double blind machine learning insight interface apparatuses, methods and systems
US11449787B2 (en) 2017-04-25 2022-09-20 Xaxis, Inc. Double blind machine learning insight interface apparatuses, methods and systems

Also Published As

Publication number Publication date
US20150227741A1 (en) 2015-08-13
JP2017508210A (en) 2017-03-23
EP3103070B1 (en) 2023-09-13
US10235518B2 (en) 2019-03-19
CA2938580C (en) 2022-08-16
US20190188375A1 (en) 2019-06-20
HK1232326A1 (en) 2018-01-05
JP6662782B2 (en) 2020-03-11
CA2938580A1 (en) 2015-08-13
AU2015213797B2 (en) 2019-09-26
EP3103070A1 (en) 2016-12-14
AU2015213797A1 (en) 2016-09-08
US10817599B2 (en) 2020-10-27
WO2015120243A8 (en) 2016-09-09

Similar Documents

Publication Publication Date Title
US10817599B2 (en) Application execution control utilizing ensemble machine learning for discernment
US20220121995A1 (en) Automatic generation of training data for anomaly detection using other user&#39;s data samples
US11182471B2 (en) Isolating data for analysis to avoid malicious attacks
US11334671B2 (en) Adding adversarial robustness to trained machine learning models
KR101789962B1 (en) Method and system for inferring application states by performing behavioral analysis operations in a mobile device
US20190130101A1 (en) Methods and apparatus for detecting a side channel attack using hardware performance counters
US20210349865A1 (en) Data migration system
JP2017508210A5 (en)
CN111656350A (en) Malware sequence detection
JP2019192198A (en) System and method of training machine learning model for detection of malicious container
EP3812929A1 (en) Utilizing a neural network model to determine risk associated with an application programming interface of a web application
WO2021056275A1 (en) Optimizing generation of forecast
US20210264199A1 (en) Control of hyperparameter tuning based on machine learning
Brown et al. Automated machine learning for deep learning based malware detection
US20220004904A1 (en) Deepfake detection models utilizing subject-specific libraries
US11113579B2 (en) Machine learning model score obfuscation using step function, position-dependent noise
CN110225019B (en) Network security processing method and device
Tiwaskar et al. Performance Comparison of Imputation Methods for Heart Disease Prediction
US11960746B2 (en) Storage context aware tiering policy advisor
US20220383154A1 (en) Computer-automated processing with rule-supplemented machine learning
Lopes et al. Predicting the Impact of Android Malicious Samples Via Machine Learning
WO2023186771A1 (en) Quantum computer performance enhancement
Kumar et al. An efficient security testing mechanism for Android Apps based on malware analysis and optimized XGBoost

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15708931

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2938580

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2016550628

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015708931

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015708931

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2015213797

Country of ref document: AU

Date of ref document: 20150206

Kind code of ref document: A