US20160269423A1 - Methods and systems for malware analysis - Google Patents

Methods and systems for malware analysis Download PDF

Info

Publication number
US20160269423A1
US20160269423A1 US15/138,919 US201615138919A US2016269423A1 US 20160269423 A1 US20160269423 A1 US 20160269423A1 US 201615138919 A US201615138919 A US 201615138919A US 2016269423 A1 US2016269423 A1 US 2016269423A1
Authority
US
United States
Prior art keywords
malware
sample
analyzers
user
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/138,919
Inventor
Mark McLarnon
Mark V. Raugas
Ryan Fisher
Nate Rogers
Mike Kolodny
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CYBERPOINT INTERNATIONAL LLC
Original Assignee
CYBERPOINT INTERNATIONAL LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CYBERPOINT INTERNATIONAL LLC filed Critical CYBERPOINT INTERNATIONAL LLC
Priority to US15/138,919 priority Critical patent/US20160269423A1/en
Assigned to CYBERPOINT INTERNATIONAL LLC reassignment CYBERPOINT INTERNATIONAL LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOLODNY, MIKE, MCLARNON, MARK, RAUGAS, MARK V., ROGERS, NATE, FISHER, RYAN
Publication of US20160269423A1 publication Critical patent/US20160269423A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F17/30876
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • G06N99/005
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Definitions

  • This disclosure relates generally to malware analysis, and more particularly to methods, system, and media for malware analysis.
  • malware analysis services suffer from several deficiencies. First, some of these services, although competent for some malware threats, are not enough to combat a malware infection. One cannot just rely on a sandbox to determine what a piece of malware has done. Second, several previous attempts are built to target only a single type of malware or platform, e.g. Microsoft® Windows®. Yet, malware is often platform agnostic, and can target multiple platforms. Third, some of these services do not produce output understandable to anyone beyond those with specialized training, e.g. a degree in Computer Science. This limits the usefulness of these services to users that do not possess the specialized training.
  • Various embodiments are generally directed to malware analysis to overcome the aforementioned problems.
  • One or more embodiments may include a method for analyzing a potential malware sample, the method comprising: receiving a sample for malware analysis through a web interface; analyzing the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and displaying the extracted information to a user through the web interface.
  • One or more embodiments may include a system comprising: a memory; and a processor coupled to the memory, the processor being configured to: receive a sample for malware analysis through a web interface; analyze the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and display the extracted information to a user through the web interface.
  • One or more embodiments may include a computer readable storage medium comprising instructions that if executed enables a computing system to: receive a sample for malware analysis through a web interface; analyze the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and display the extracted information to a user through the web interface.
  • FIG. 1 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 2 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 3 depicts a block flow diagram of an exemplary method in accordance with one or more embodiments.
  • FIG. 4 depicts an exemplary workflow editor in accordance with one or more embodiments.
  • FIG. 5-1 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 5-2 depicts an example of custom rules in accordance with one or more embodiments.
  • FIG. 5-3 depicts an exemplary analytic summary in accordance with one or more embodiments.
  • FIG. 6 depicts an exemplary interface in accordance with one or more embodiments.
  • FIG. 7 depicts an exemplary architecture for implementing a computing device in accordance with one or more embodiments.
  • FIG. 8 is an exemplary embodiment of the invention depicting an example workflow where malware analyzers are run in a specified sequence.
  • a system, method, medium, or computer-based product may provide tools to assist analysts and computer incident responders when analyzing malware.
  • the system, method, medium, or product may be designed to reduce the amount of effort required to analyze and reverse engineer malware. It may help to identify the malware, what the malware did to a system, what the malware could have done, how one knows if the malware ran on one or more systems, and how one removes the malware from a system.
  • the system, method, medium, or product may combine an expandable set of machine learning algorithms and rule sets for automated analysis, adaptors for external analytics, a workflow management framework for processing and reporting, and a web-based user interface.
  • the system, method, medium, or product can substantially increase the work productivity of malware analysts and computer incident responders.
  • the system, method, medium, or product may provide user, e.g. novice and intermediate level security experts, with the tools to perform at expert levels and with much greater efficiency.
  • the system, method, medium, or product can be deployed as a stand-alone tool or can be integrated into an existing automated workflow.
  • FIG. 1 depicts a block diagram of an exemplary system 100 in accordance with one or more embodiments.
  • System 100 may include one or more user devices, e.g. user device 120 - 1 , user device 120 - 2 , and user device 120 - 3 , network 130 , server 150 , database 155 , software module 165 , and server 180 .
  • user devices e.g. user device 120 - 1 , user device 120 - 2 , and user device 120 - 3 , network 130 , server 150 , database 155 , software module 165 , and server 180 .
  • the one or more user devices may any type of computing device, including a mobile telephone, a laptop, tablet, or desktop computer, a netbook, a video game device, a smart phone, an ultra-mobile personal computer (UMPC), etc.
  • the one or more user devices may run one or more applications, such as Internet browsers, voice calls, video games, videoconferencing, and email, among others.
  • the one or more user devices may be any combination of computing devices. These devices may be coupled to network 130 .
  • Network 130 may provide network access, data transport and other services to the devices coupled to it.
  • network 130 may include and implement any commonly defined network architectures including those defined by standards bodies, such as the Global System for Mobile communication (GSM) Association, the Internet Engineering Task Force (IETF), and the Worldwide Interoperability for Microwave Access (WiMAX) forum.
  • GSM Global System for Mobile communication
  • IETF Internet Engineering Task Force
  • WiMAX Worldwide Interoperability for Microwave Access
  • network 130 may implement one or more of a GSM architecture, a General Packet Radio Service (GPRS) architecture, a Universal Mobile Telecommunications System (UMTS) architecture, and an evolution of UMTS referred to as Long Term Evolution (LTE).
  • LTE Long Term Evolution
  • Network 130 may, again as an alternative or in conjunction with one or more of the above, implement a WiMAX architecture defined by the WiMAX forum.
  • Network 130 may also comprise, for instance, a local area network (LAN), a wide area network (WAN), the Internet, a virtual
  • Server 150 or server 180 may also be any type of computing device coupled to network 130 , including but not limited to a personal computer, a server computer, a series of server computers, a mini computer, and a mainframe computer, or combinations thereof.
  • Server 150 or server 180 may be a web server (or a series of servers) running a network operating system, examples of which may include but are not limited to Microsoft Windows Server, Novell NetWare, or Linux.
  • Server 150 or server 180 may be used for and/or provide cloud and/or network computing.
  • server 150 and or server 180 may have connections to external systems like email, SMS messaging, text messaging, ad content providers, etc. Any of the features of server 150 may be also implemented in server 180 and vice versa.
  • Database 155 may be any type of database, including a database managed by a database management system (DBMS).
  • DBMS database management system
  • a DBMS is typically implemented as an engine that controls organization, storage, management, and retrieval of data in a database. DBMSs frequently provide the ability to query, backup and replicate, enforce rules, provide security, do computation, perform change and access logging, and automate optimization. Examples of DBMSs include Oracle database, IBM DB2, Adaptive Server Enterprise, FileMaker, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, and a NoSQL implementation.
  • a DBMS typically includes a modeling language, data structure, database query language, and transaction mechanism.
  • the modeling language is used to define the schema of each database in the DBMS, according to the database model, which may include a hierarchical model, network model, relational model, object model, or some other applicable known or convenient organization.
  • Data structures can include fields, records, files, objects, and any other applicable known or convenient structures for storing data.
  • a DBMS may also include metadata about the data that is stored.
  • Software module 165 may be a module that is configured to send, process, and receive information at server 150 .
  • Software module 165 may provide another mechanism for sending and receiving data at server 150 besides handling requests through web server functionalities.
  • Software module 165 may send and receive information using any technique for sending and receiving information between processes or devices including but not limited to using a scripting language, a remote procedure call, an email, a tweet, an application programming interface, Simple Object Access Protocol (SOAP) methods, Common Object Request Broker Architecture (CORBA), HTTP (Hypertext Transfer Protocol), REST (Representational State Transfer), any interface for software components to communicate with each other, using any other known technique for sending information from a one device to another, or any combination thereof.
  • SOAP Simple Object Access Protocol
  • CORBA Common Object Request Broker Architecture
  • HTTP Hypertext Transfer Protocol
  • REST Real-Representational State Transfer
  • software module 165 may be described in relation to server 150 , software module 165 may reside on any other device. Further, the functionality of software module 165 may be duplicated on, distributed across, and/or performed by one or more other devices, either in whole or in part.
  • FIG. 2 depicts a block diagram of an exemplary system 200 in accordance with one or more embodiments.
  • System 200 may provide a workflow management system for the automated, collaborative analysis, and/or reverse engineering of malware.
  • System 200 may combine an expandable set of machine learning algorithms and rule sets for automated analysis, adaptors for external analytics, a workflow management framework for processing and reporting, and a web-based user interface.
  • System 200 may be implemented on system 100 .
  • the software modules may be implemented by software module 165 , and any information may be stored in database 155 .
  • a user 210 may utilize system 200 .
  • System 200 may include one or more honeypots, e.g., honeypot 215 - 1 , honeypot 215 - 2 , and honeypot 215 - 3 , threat navigation module 220 , data bridge 225 , workflow manager 230 , analysis manager 235 , one or more analyzers, e.g. 240 - 1 , 240 - 2 , and 240 - 3 , one or more environments, e.g. 245 - 1 , 245 - 2 , and 245 - 3 , results 250 , and web interface 255 .
  • honeypots e.g., honeypot 215 - 1 , honeypot 215 - 2 , and honeypot 215 - 3
  • threat navigation module 220 e.g., data bridge 225 , workflow manager 230 , analysis manager 235 , one or more analyzers, e.g. 240 - 1 , 240 - 2 , and
  • FIG. 3 depicts a block flow diagram of an exemplary method 300 in accordance with one or more embodiments. Although exemplary method 300 will be discussed in conjunction with system 200 , exemplary method 300 is not limited to execution on system 200 , and may be implemented by any system capable of performing or being configured to perform exemplary method 300 .
  • a sample for malware analysis may be received.
  • User 210 one or more honeypots, or any combination thereof, may submit one or more samples, e.g. files, binary files, etc., to initiate malware analysis. Samples may also be received via a data feed 211 . In some instances, samples may be automatically collected and submitted via data feed 211 . The samples may be submitted via a web interface.
  • the one or more honeypots e.g., honeypot 215 - 1 , honeypot 215 - 2 , and honeypot 215 - 3 , may refer to a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems.
  • User 210 may be any user of system 200 .
  • Threat navigation module 220 may receive one or more samples, which may initiate a series of automated, configurable analytic steps, which may include application of machine learning models for signature-free assessment of threat severity, as well as external static and dynamic analytics, including file hashing, comparison against public or private whitelists/blacklists, and storage of ingested files and their resulting metadata, or any combination thereof.
  • the threat navigation module 220 may be responsible for preprocessing the sample before entry into the data bridge 225 . Results of the preprocessing step may assist the system in determining initial workflows. Examples of preprocessing are: uncompressing a sample, decrypting a sample, and identifying the file type. Attributes such as the file type may affect the workflow by determining the analyzers that are applicable to the sample. Thus, the analyzers used in a workflow may be assigned based on the results of the preprocessing.
  • the files may be forwarded to a data bridge 225 for storage in a sample repository.
  • Data bridge 225 and/or the sample repository may be implemented by database 165 .
  • Samples captured by honeypots may be presented to a threat navigation module 220 and forwarded to the data bridge 225 for storage.
  • Workflow manager 230 may leverage high availability and fault tolerant computer technologies to scale in processing power as the user base expands. Workflow manager 230 can easily integrate new analyzers while giving users the ability to not only schedule new workflows but also stop existing workflows from the administrative interface of the system. This is all done without having to shut down or redeploy the system 200 .
  • FIG. 4 depicts an exemplary workflow editor 400 in accordance with one or more embodiments.
  • Workflow editor may be presented to a user (e.g. user 210 ) on a user device (e.g. one or more of user devices 120 - 1 , 120 - 2 , or 120 - 3 ).
  • Workflow editor 400 may be presented as a web page, by an application, or any combination thereof.
  • a user may select one or more analyzers to run in the workflow. For example, in FIG. 4 , the BinaryFeatureExtractor, CLAM_AV, FindStrings, and ModelScoringEngine analyzers have been selected.
  • Any analyzer may be listed and/or selected by the user.
  • a user may also specify an order in which to apply the selected analyzers, e.g. by using workflow editor 400 .
  • a user may also specify one or more workflow options. For example, a user may specify whether or not a workflow will support and/or use virtual machines, whether or not a workflow will support scripts, or any other user selectable option associated with a workflow.
  • workflow manager 230 may invoke an analysis manager 235 , which may invoke one or more analyzers, e.g. 240 - 1 , 240 - 2 , and 240 - 3 , that perform a sequence of configured analytic steps to extract information about the sample.
  • the analysis manager 235 may be pre-configured to follow a specific sequence created by default or a sequence generated by the user. In some embodiments, analysis manager 235 may have control only of one or more data analyzers, whereas the workflow manager 230 may have a wider influence on the sequence of system actions.
  • An analyzer e.g. analyzers 240 - 1 , 240 - 2 , and 240 - 3 , may refer to a discrete program, script, or environment designed to process a piece of malware in some manner to extract some useful piece of information within or metadata about the malware.
  • the analyzer may be provided with a complete API of functions for storage, extraction, processing and reporting on malware.
  • An API such as a RESTful interface, may be used to make the extracted information available to other computing devices and to upload the file of potential malware.
  • An analyzer may be implemented in any programming language, e.g. in Python and Java implementations, and may be developed for implementation on any operating system, e.g. Linux, OS X, Windows, etc. However, the analyzers, regardless of implementation, may all integrate with the application programming interface.
  • the system may be capable of recursive analysis, in which each analytical outcome could reveal more information to invoke more analyzers.
  • a first analyzer may be run and produce a first analytical outcome as a result of the execution.
  • the first analyzer may run a second analyzer, e.g. another analyzer different from the first analyzer or even the same first analyzer, to process the first analytical outcome.
  • the first analyzer may call the second analyzer before or after completing its own analysis.
  • the first analyzer may use the results of the run of the second analyzer when performing its analysis.
  • the analyzers performing a sequence of configured analytic steps may include forwarding the sample to one or more environments, e.g. 245 - 1 , 2 - 45 - 2 , and 245 - 3 , for execution and behavioral profiling.
  • the one or more environments may include a sandbox environment for execution and behavior profiling.
  • the one or more environments may include hardware configurations, to which samples may be sent for processing.
  • Instructions to and results from the analyzers may be passed via a heterogeneous set of messaging mechanisms.
  • FIG. 5-1 depicts one or more analyzers 520 , 521 , and 522 processing binary sample 510 (suspected malware) and interrogating the rule knowledgebase 540 (via the rules engine 530 ) to extract knowledge to produce classification, observations, and conclusions that are presented to the user as an analytic summary 501 .
  • the analytic summary 501 may be a conversion of technical data into actionable data points that can be consumed by users of the system, e.g. novice users of the system.
  • the rule knowledgebase may be updated as new rules are developed.
  • FIG. 5-2 depicts an example of a rule.
  • FIG. 5-3 depicts an exemplary analytic summary 501 in accordance with one or more embodiments.
  • Analytic summary 501 may include several examples of the actionable data or one or more analyzed samples.
  • analytic summary 501 includes the actionable data “The target was observed installing a function hook for all desktop programs to interrupt all graphical actions (e.g. mouse clicks, menu options, new windows, etc.). (70%).” From this actionable data, a user may gain an understanding of the behavior of the sample, and determine whether or not to pursue further action, e.g. removing the malware, alerting someone about the malware, etc.
  • Actionable data may include a percentage which indicates the system's confidence level.
  • One or more analyzers may leverage machine learning technology to automatically classify each submitted sample and attempt to determine if the sample is malware or not without requiring any antivirus signatures.
  • results from the analyzers may be stored, and, once analysis is complete, the results may be presented at the user interface as a report.
  • results of the analysis 250 may be displayed to the user in the web interface 255 .
  • the results may be information extracted about the sample during the analysis.
  • the results may be a clear, concise and simple explanation about the malware submitted, and may include everything from complex classification to basic, to high-level conclusions (“What is it?”), and even suggestions for further proof or remediation of the target or any combination thereof.
  • the output may be designed to be user friendly to anyone from a newly hired junior system administrator to an executive level user responsible for thousands of machines.
  • FIG. 5-3 depicts an exemplary analytic summary 501 .
  • Analytic summary 501 may be an example of the report displayed at the user interface.
  • results may be annotated and shared, and additional analytics may be requested.
  • Users may retrieve via the web interface 255 the results of prior analyses, and current and prior analyses may be annotated and shared.
  • a user may provide an annotation of extracted information through web interface 255 that provides an identification or steps for remediation the sample.
  • the annotation may be transmitted to one or more other users, so that the other users can even more easily identify and/or remediate the sample.
  • FIG. 6 depicts an exemplary interface 600 in accordance with one or more embodiments. Interface 600 may be presented via the web interface 255 .
  • Alerts may inform users when the results of new analyses are available. For example, a user may be identified as having been interested in a particular instance, type, or class of malware. Whenever a new analysis of a sample is performed, and that sample matches the particular instance, type, or class of interested, system 200 may transmit an alert to the user when the new analysis is available.
  • the alert may include the timestamp for the identification, filename of the triggering malware, SHA1 or other unique hash for the binary, and name of the alert that was triggered.
  • a URL may also be provided to view any meta-data or report information generated for the binary.
  • FIG. 7 depicts an exemplary architecture for implementing a computing device 700 in accordance with one or more embodiments, which may be used to implement any of the devices discussed herein, or any other computer system or computing device component thereof. It will be appreciated that other devices that can be used with the computing device 700 , such as a client or a server, may be similarly configured. As illustrated in FIG. 7 , computing device 700 may include a bus 710 , a processor 720 , a memory 730 , a read only memory (ROM) 740 , a storage device 750 , an input device 760 , an output device 770 , and a communication interface 780 .
  • ROM read only memory
  • Bus 710 may include one or more interconnects that permit communication among the components of computing device 700 .
  • Processor 720 may include any type of processor, microprocessor, or processing logic that may interpret and execute instructions (e.g., a field programmable gate array (FPGA)).
  • Processor 720 may include a single device (e.g., a single core) and/or a group of devices (e.g., multi-core).
  • Memory 730 may include a random access memory (RAM) or another type of dynamic storage device that may store information and instructions for execution by processor 720 .
  • Memory 730 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 720 .
  • ROM 740 may include a ROM device and/or another type of static storage device that may store static information and instructions for processor 720 .
  • Storage device 750 may include a magnetic disk and/or optical disk and its corresponding drive for storing information and/or instructions.
  • Storage device 750 may include a single storage device or multiple storage devices, such as multiple storage devices operating in parallel.
  • storage device 750 may reside locally on the computing device 700 and/or may be remote with respect to a server and connected thereto via network and/or another type of connection, such as a dedicated link or channel.
  • Input device 760 may include any mechanism or combination of mechanisms that permit an operator to input information to computing device 700 , such as a keyboard, a mouse, a touch sensitive display device, a microphone, a pen-based pointing device, and/or a biometric input device, such as a voice recognition device and/or a finger print scanning device.
  • Output device 770 may include any mechanism or combination of mechanisms that outputs information to the operator, including a display, a printer, a speaker, etc.
  • Communication interface 780 may include any transceiver-like mechanism that enables computing device 700 to communicate with other devices and/or systems, such as a client, a server, a license manager, a vendor, etc.
  • communication interface 780 may include one or more interfaces, such as a first interface coupled to a network and/or a second interface coupled to a license manager.
  • communication interface 780 may include other mechanisms (e.g., a wireless interface) for communicating via a network, such as a wireless network.
  • communication interface 780 may include logic to send code to a destination device, such as a target device that can include general purpose hardware (e.g., a personal computer form factor), dedicated hardware (e.g., a digital signal processing (DSP) device adapted to execute a compiled version of a model or a part of a model), etc.
  • a target device such as a target device that can include general purpose hardware (e.g., a personal computer form factor), dedicated hardware (e.g., a digital signal processing (DSP) device adapted to execute a compiled version of a model or a part of a model), etc.
  • DSP digital signal processing
  • Computing device 700 may perform certain functions in response to processor 720 executing software instructions contained in a computer-readable medium, such as memory 730 .
  • a computer-readable medium such as memory 730 .
  • hardwired circuitry may be used in place of or in combination with software instructions to implement features consistent with principles of the disclosure.
  • implementations consistent with principles of the disclosure are not limited to any specific combination of hardware circuitry and software.
  • FIG. 8 Depicted in FIG. 8 is one embodiment of the invention where an exemplary workflow is depicted.
  • Starting at start point 805 specifies that a workflow wherein analyzers 810 , 820 830 and 840 are run simultaneously from Divergence point 807 .
  • the workflow specifies that analyzer 850 is run after analyzers 810 through 840 are completed at convergence point 880 .
  • Decision point 890 specifies that Analyzer 860 is run if the results from analyzer 850 show that the sample is suspected to be malware.
  • the analysis workflow is complete and the results of all the analyzers are gathered for presentation at finish 897 .
  • Exemplary embodiments may be embodied in many different ways as a software component.
  • a software component may be a stand-alone software package, a combination of software packages, or it may be a software package incorporated as a “tool” in a larger software product.
  • It may be downloadable from a network, for example, a web site, as a stand-alone product or as an add-in package for installation in an existing software application. It may also be available as a client-server software application, or as a web-enabled software application. It may also be embodied as a software package installed on a hardware device.
  • any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrase “in one embodiment” in the specification are not necessarily all referring to the same embodiment.
  • exemplary functional components or modules may be implemented by one or more hardware components, software components, and/or combination thereof.
  • the functional components and/or modules may be implemented, for example, by logic (e.g., instructions, data, and/or code) to be executed by a logic device (e.g., processor).
  • logic e.g., instructions, data, and/or code
  • Such logic may be stored internally or externally to a logic device on one or more types of computer-readable storage media.
  • An article of manufacture may comprise a storage medium to store logic.
  • Examples of a storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. Examples of storage media include hard drives, disk drives, solid state drives, and any other tangible storage media.
  • FIG. 1 Some of the figures may include a flow diagram. Although such figures may include a particular logic flow, it can be appreciated that the logic flow merely provides an exemplary implementation of the general functionality. Further, the logic flow does not necessarily have to be executed in the order presented unless otherwise indicated. In addition, the logic flow may be implemented by a hardware element, a software element executed by a processor, or any combination thereof.

Abstract

Methods, system, and media for analyzing a potential malware sample are disclosed. A sample for malware analysis may be received. The sample may be received through a web interface. The sample may be analyzed using a plurality of analyzers implemented on one or more computing devices. The analyzers may perform a sequence of configurable analytic steps to extract information about the sample. The extracted information may be displayed to a user through the web interface.

Description

    CROSS-REFERENCE TO REPLATED APPLICATION
  • This Application is a continuation of U.S. application Ser. No. 14/068,605, filed Oct. 31, 2013, the contents of which are incorporated by reference.
  • FIELD
  • This disclosure relates generally to malware analysis, and more particularly to methods, system, and media for malware analysis.
  • BACKGROUND
  • Existing malware analysis services suffer from several deficiencies. First, some of these services, although competent for some malware threats, are not enough to combat a malware infection. One cannot just rely on a sandbox to determine what a piece of malware has done. Second, several previous attempts are built to target only a single type of malware or platform, e.g. Microsoft® Windows®. Yet, malware is often platform agnostic, and can target multiple platforms. Third, some of these services do not produce output understandable to anyone beyond those with specialized training, e.g. a degree in Computer Science. This limits the usefulness of these services to users that do not possess the specialized training.
  • What is needed is a design such that as malware threats change and evolve, the analysis conducted by the various processing elements can change and evolve as well.
  • SUMMARY
  • Various embodiments are generally directed to malware analysis to overcome the aforementioned problems.
  • One or more embodiments may include a method for analyzing a potential malware sample, the method comprising: receiving a sample for malware analysis through a web interface; analyzing the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and displaying the extracted information to a user through the web interface.
  • One or more embodiments may include a system comprising: a memory; and a processor coupled to the memory, the processor being configured to: receive a sample for malware analysis through a web interface; analyze the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and display the extracted information to a user through the web interface.
  • One or more embodiments may include a computer readable storage medium comprising instructions that if executed enables a computing system to: receive a sample for malware analysis through a web interface; analyze the sample using a plurality of analyzers implemented on one or more computing devices, wherein the analyzers perform a sequence of configurable analytic steps to extract information about the sample; and display the extracted information to a user through the web interface.
  • These and other features and advantages will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments will now be described in connection with the associated drawings, in which:
  • FIG. 1 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 2 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 3 depicts a block flow diagram of an exemplary method in accordance with one or more embodiments.
  • FIG. 4 depicts an exemplary workflow editor in accordance with one or more embodiments.
  • FIG. 5-1 depicts a block diagram of an exemplary system in accordance with one or more embodiments.
  • FIG. 5-2 depicts an example of custom rules in accordance with one or more embodiments.
  • FIG. 5-3 depicts an exemplary analytic summary in accordance with one or more embodiments.
  • FIG. 6 depicts an exemplary interface in accordance with one or more embodiments.
  • FIG. 7 depicts an exemplary architecture for implementing a computing device in accordance with one or more embodiments.
  • FIG. 8 is an exemplary embodiment of the invention depicting an example workflow where malware analyzers are run in a specified sequence.
  • DETAILED DESCRIPTION OF THE DRAWINGS
  • Exemplary embodiments are discussed in detail below. While specific exemplary embodiments are discussed, it should be understood that this is done for illustration purposes only. In describing and illustrating the exemplary embodiments, specific terminology is employed for the sake of clarity. However, the embodiments are not intended to be limited to the specific terminology so selected. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the embodiments. It is to be understood that each specific element includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. The examples and embodiments described herein are non-limiting examples.
  • A system, method, medium, or computer-based product may provide tools to assist analysts and computer incident responders when analyzing malware. The system, method, medium, or product may be designed to reduce the amount of effort required to analyze and reverse engineer malware. It may help to identify the malware, what the malware did to a system, what the malware could have done, how one knows if the malware ran on one or more systems, and how one removes the malware from a system. The system, method, medium, or product may combine an expandable set of machine learning algorithms and rule sets for automated analysis, adaptors for external analytics, a workflow management framework for processing and reporting, and a web-based user interface.
  • The system, method, medium, or product can substantially increase the work productivity of malware analysts and computer incident responders. The system, method, medium, or product may provide user, e.g. novice and intermediate level security experts, with the tools to perform at expert levels and with much greater efficiency. The system, method, medium, or product can be deployed as a stand-alone tool or can be integrated into an existing automated workflow.
  • FIG. 1 depicts a block diagram of an exemplary system 100 in accordance with one or more embodiments. System 100 may include one or more user devices, e.g. user device 120-1, user device 120-2, and user device 120-3, network 130, server 150, database 155, software module 165, and server 180.
  • The one or more user devices, e.g. user device 120-1, user device 120-2, and user device 120-3 may any type of computing device, including a mobile telephone, a laptop, tablet, or desktop computer, a netbook, a video game device, a smart phone, an ultra-mobile personal computer (UMPC), etc. The one or more user devices may run one or more applications, such as Internet browsers, voice calls, video games, videoconferencing, and email, among others. The one or more user devices may be any combination of computing devices. These devices may be coupled to network 130.
  • Network 130 may provide network access, data transport and other services to the devices coupled to it. In general, network 130 may include and implement any commonly defined network architectures including those defined by standards bodies, such as the Global System for Mobile communication (GSM) Association, the Internet Engineering Task Force (IETF), and the Worldwide Interoperability for Microwave Access (WiMAX) forum. For example, network 130 may implement one or more of a GSM architecture, a General Packet Radio Service (GPRS) architecture, a Universal Mobile Telecommunications System (UMTS) architecture, and an evolution of UMTS referred to as Long Term Evolution (LTE). Network 130 may, again as an alternative or in conjunction with one or more of the above, implement a WiMAX architecture defined by the WiMAX forum. Network 130 may also comprise, for instance, a local area network (LAN), a wide area network (WAN), the Internet, a virtual LAN (VLAN), an enterprise LAN, a layer 3 virtual private network (VPN), an enterprise IP network, or any combination thereof.
  • Server 150 or server 180 may also be any type of computing device coupled to network 130, including but not limited to a personal computer, a server computer, a series of server computers, a mini computer, and a mainframe computer, or combinations thereof. Server 150 or server 180 may be a web server (or a series of servers) running a network operating system, examples of which may include but are not limited to Microsoft Windows Server, Novell NetWare, or Linux. Server 150 or server 180 may be used for and/or provide cloud and/or network computing. Although not shown in FIG. 1, server 150 and or server 180 may have connections to external systems like email, SMS messaging, text messaging, ad content providers, etc. Any of the features of server 150 may be also implemented in server 180 and vice versa.
  • Database 155 may be any type of database, including a database managed by a database management system (DBMS). A DBMS is typically implemented as an engine that controls organization, storage, management, and retrieval of data in a database. DBMSs frequently provide the ability to query, backup and replicate, enforce rules, provide security, do computation, perform change and access logging, and automate optimization. Examples of DBMSs include Oracle database, IBM DB2, Adaptive Server Enterprise, FileMaker, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, and a NoSQL implementation. A DBMS typically includes a modeling language, data structure, database query language, and transaction mechanism. The modeling language is used to define the schema of each database in the DBMS, according to the database model, which may include a hierarchical model, network model, relational model, object model, or some other applicable known or convenient organization. Data structures can include fields, records, files, objects, and any other applicable known or convenient structures for storing data. A DBMS may also include metadata about the data that is stored.
  • Software module 165 may be a module that is configured to send, process, and receive information at server 150. Software module 165 may provide another mechanism for sending and receiving data at server 150 besides handling requests through web server functionalities. Software module 165 may send and receive information using any technique for sending and receiving information between processes or devices including but not limited to using a scripting language, a remote procedure call, an email, a tweet, an application programming interface, Simple Object Access Protocol (SOAP) methods, Common Object Request Broker Architecture (CORBA), HTTP (Hypertext Transfer Protocol), REST (Representational State Transfer), any interface for software components to communicate with each other, using any other known technique for sending information from a one device to another, or any combination thereof.
  • Although software module 165 may be described in relation to server 150, software module 165 may reside on any other device. Further, the functionality of software module 165 may be duplicated on, distributed across, and/or performed by one or more other devices, either in whole or in part.
  • FIG. 2 depicts a block diagram of an exemplary system 200 in accordance with one or more embodiments. System 200 may provide a workflow management system for the automated, collaborative analysis, and/or reverse engineering of malware. System 200 may combine an expandable set of machine learning algorithms and rule sets for automated analysis, adaptors for external analytics, a workflow management framework for processing and reporting, and a web-based user interface. System 200 may be implemented on system 100. For example, the software modules may be implemented by software module 165, and any information may be stored in database 155.
  • A user 210 may utilize system 200. System 200 may include one or more honeypots, e.g., honeypot 215-1, honeypot 215-2, and honeypot 215-3, threat navigation module 220, data bridge 225, workflow manager 230, analysis manager 235, one or more analyzers, e.g. 240-1, 240-2, and 240-3, one or more environments, e.g. 245-1, 245-2, and 245-3, results 250, and web interface 255.
  • FIG. 3 depicts a block flow diagram of an exemplary method 300 in accordance with one or more embodiments. Although exemplary method 300 will be discussed in conjunction with system 200, exemplary method 300 is not limited to execution on system 200, and may be implemented by any system capable of performing or being configured to perform exemplary method 300.
  • In block 310, a sample for malware analysis may be received. User 210, one or more honeypots, or any combination thereof, may submit one or more samples, e.g. files, binary files, etc., to initiate malware analysis. Samples may also be received via a data feed 211. In some instances, samples may be automatically collected and submitted via data feed 211. The samples may be submitted via a web interface. The one or more honeypots, e.g., honeypot 215-1, honeypot 215-2, and honeypot 215-3, may refer to a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. User 210 may be any user of system 200.
  • Threat navigation module 220 may receive one or more samples, which may initiate a series of automated, configurable analytic steps, which may include application of machine learning models for signature-free assessment of threat severity, as well as external static and dynamic analytics, including file hashing, comparison against public or private whitelists/blacklists, and storage of ingested files and their resulting metadata, or any combination thereof. The threat navigation module 220 may be responsible for preprocessing the sample before entry into the data bridge 225. Results of the preprocessing step may assist the system in determining initial workflows. Examples of preprocessing are: uncompressing a sample, decrypting a sample, and identifying the file type. Attributes such as the file type may affect the workflow by determining the analyzers that are applicable to the sample. Thus, the analyzers used in a workflow may be assigned based on the results of the preprocessing.
  • The files may be forwarded to a data bridge 225 for storage in a sample repository. Data bridge 225 and/or the sample repository may be implemented by database 165. Samples captured by honeypots may be presented to a threat navigation module 220 and forwarded to the data bridge 225 for storage.
  • Workflow manager 230 may leverage high availability and fault tolerant computer technologies to scale in processing power as the user base expands. Workflow manager 230 can easily integrate new analyzers while giving users the ability to not only schedule new workflows but also stop existing workflows from the administrative interface of the system. This is all done without having to shut down or redeploy the system 200.
  • Users may be able to create and/or modify existing workflows by invoking a workflow editor and selecting the desired analyzers. FIG. 4 depicts an exemplary workflow editor 400 in accordance with one or more embodiments. Workflow editor may be presented to a user (e.g. user 210) on a user device (e.g. one or more of user devices 120-1, 120-2, or 120-3). Workflow editor 400 may be presented as a web page, by an application, or any combination thereof. Using workflow editor 400, a user may select one or more analyzers to run in the workflow. For example, in FIG. 4, the BinaryFeatureExtractor, CLAM_AV, FindStrings, and ModelScoringEngine analyzers have been selected. Any analyzer may be listed and/or selected by the user. A user may also specify an order in which to apply the selected analyzers, e.g. by using workflow editor 400. A user may also specify one or more workflow options. For example, a user may specify whether or not a workflow will support and/or use virtual machines, whether or not a workflow will support scripts, or any other user selectable option associated with a workflow.
  • Referring back to FIG. 3, in block 320, once the samples are stored, workflow manager 230 may invoke an analysis manager 235, which may invoke one or more analyzers, e.g. 240-1, 240-2, and 240-3, that perform a sequence of configured analytic steps to extract information about the sample. The analysis manager 235 may be pre-configured to follow a specific sequence created by default or a sequence generated by the user. In some embodiments, analysis manager 235 may have control only of one or more data analyzers, whereas the workflow manager 230 may have a wider influence on the sequence of system actions.
  • An analyzer, e.g. analyzers 240-1, 240-2, and 240-3, may refer to a discrete program, script, or environment designed to process a piece of malware in some manner to extract some useful piece of information within or metadata about the malware. The analyzer may be provided with a complete API of functions for storage, extraction, processing and reporting on malware. An API, such as a RESTful interface, may be used to make the extracted information available to other computing devices and to upload the file of potential malware. An analyzer may be implemented in any programming language, e.g. in Python and Java implementations, and may be developed for implementation on any operating system, e.g. Linux, OS X, Windows, etc. However, the analyzers, regardless of implementation, may all integrate with the application programming interface.
  • The system may be capable of recursive analysis, in which each analytical outcome could reveal more information to invoke more analyzers. For example, a first analyzer may be run and produce a first analytical outcome as a result of the execution. The first analyzer may run a second analyzer, e.g. another analyzer different from the first analyzer or even the same first analyzer, to process the first analytical outcome. The first analyzer may call the second analyzer before or after completing its own analysis. The first analyzer may use the results of the run of the second analyzer when performing its analysis.
  • The analyzers performing a sequence of configured analytic steps may include forwarding the sample to one or more environments, e.g. 245-1, 2-45-2, and 245-3, for execution and behavioral profiling. The one or more environments may include a sandbox environment for execution and behavior profiling. The one or more environments may include hardware configurations, to which samples may be sent for processing.
  • Instructions to and results from the analyzers may be passed via a heterogeneous set of messaging mechanisms.
  • FIG. 5-1 depicts one or more analyzers 520, 521, and 522 processing binary sample 510 (suspected malware) and interrogating the rule knowledgebase 540 (via the rules engine 530) to extract knowledge to produce classification, observations, and conclusions that are presented to the user as an analytic summary 501. The analytic summary 501 may be a conversion of technical data into actionable data points that can be consumed by users of the system, e.g. novice users of the system. The rule knowledgebase may be updated as new rules are developed. FIG. 5-2 depicts an example of a rule.
  • FIG. 5-3 depicts an exemplary analytic summary 501 in accordance with one or more embodiments. Analytic summary 501 may include several examples of the actionable data or one or more analyzed samples. For example, analytic summary 501 includes the actionable data “The target was observed installing a function hook for all desktop programs to interrupt all graphical actions (e.g. mouse clicks, menu options, new windows, etc.). (70%).” From this actionable data, a user may gain an understanding of the behavior of the sample, and determine whether or not to pursue further action, e.g. removing the malware, alerting someone about the malware, etc. Actionable data may include a percentage which indicates the system's confidence level.
  • One or more analyzers may leverage machine learning technology to automatically classify each submitted sample and attempt to determine if the sample is malware or not without requiring any antivirus signatures.
  • Referring back to FIG. 3, results from the analyzers may be stored, and, once analysis is complete, the results may be presented at the user interface as a report. In block 330, results of the analysis 250 may be displayed to the user in the web interface 255. The results may be information extracted about the sample during the analysis. As shown in FIG. 5-3, the results may be a clear, concise and simple explanation about the malware submitted, and may include everything from complex classification to basic, to high-level conclusions (“What is it?”), and even suggestions for further proof or remediation of the target or any combination thereof. The output may be designed to be user friendly to anyone from a newly hired junior system administrator to an executive level user responsible for thousands of machines. As discussed above, FIG. 5-3 depicts an exemplary analytic summary 501. Analytic summary 501 may be an example of the report displayed at the user interface.
  • Via the web interface 255, results may be annotated and shared, and additional analytics may be requested. Users may retrieve via the web interface 255 the results of prior analyses, and current and prior analyses may be annotated and shared. For example, a user may provide an annotation of extracted information through web interface 255 that provides an identification or steps for remediation the sample. The annotation may be transmitted to one or more other users, so that the other users can even more easily identify and/or remediate the sample.
  • FIG. 6 depicts an exemplary interface 600 in accordance with one or more embodiments. Interface 600 may be presented via the web interface 255.
  • Alerts may inform users when the results of new analyses are available. For example, a user may be identified as having been interested in a particular instance, type, or class of malware. Whenever a new analysis of a sample is performed, and that sample matches the particular instance, type, or class of interested, system 200 may transmit an alert to the user when the new analysis is available. The alert may include the timestamp for the identification, filename of the triggering malware, SHA1 or other unique hash for the binary, and name of the alert that was triggered. A URL may also be provided to view any meta-data or report information generated for the binary.
  • FIG. 7 depicts an exemplary architecture for implementing a computing device 700 in accordance with one or more embodiments, which may be used to implement any of the devices discussed herein, or any other computer system or computing device component thereof. It will be appreciated that other devices that can be used with the computing device 700, such as a client or a server, may be similarly configured. As illustrated in FIG. 7, computing device 700 may include a bus 710, a processor 720, a memory 730, a read only memory (ROM) 740, a storage device 750, an input device 760, an output device 770, and a communication interface 780.
  • Bus 710 may include one or more interconnects that permit communication among the components of computing device 700. Processor 720 may include any type of processor, microprocessor, or processing logic that may interpret and execute instructions (e.g., a field programmable gate array (FPGA)). Processor 720 may include a single device (e.g., a single core) and/or a group of devices (e.g., multi-core). Memory 730 may include a random access memory (RAM) or another type of dynamic storage device that may store information and instructions for execution by processor 720. Memory 730 may also be used to store temporary variables or other intermediate information during execution of instructions by processor 720.
  • ROM 740 may include a ROM device and/or another type of static storage device that may store static information and instructions for processor 720. Storage device 750 may include a magnetic disk and/or optical disk and its corresponding drive for storing information and/or instructions. Storage device 750 may include a single storage device or multiple storage devices, such as multiple storage devices operating in parallel. Moreover, storage device 750 may reside locally on the computing device 700 and/or may be remote with respect to a server and connected thereto via network and/or another type of connection, such as a dedicated link or channel.
  • Input device 760 may include any mechanism or combination of mechanisms that permit an operator to input information to computing device 700, such as a keyboard, a mouse, a touch sensitive display device, a microphone, a pen-based pointing device, and/or a biometric input device, such as a voice recognition device and/or a finger print scanning device. Output device 770 may include any mechanism or combination of mechanisms that outputs information to the operator, including a display, a printer, a speaker, etc.
  • Communication interface 780 may include any transceiver-like mechanism that enables computing device 700 to communicate with other devices and/or systems, such as a client, a server, a license manager, a vendor, etc. For example, communication interface 780 may include one or more interfaces, such as a first interface coupled to a network and/or a second interface coupled to a license manager. Alternatively, communication interface 780 may include other mechanisms (e.g., a wireless interface) for communicating via a network, such as a wireless network. In one implementation, communication interface 780 may include logic to send code to a destination device, such as a target device that can include general purpose hardware (e.g., a personal computer form factor), dedicated hardware (e.g., a digital signal processing (DSP) device adapted to execute a compiled version of a model or a part of a model), etc.
  • Computing device 700 may perform certain functions in response to processor 720 executing software instructions contained in a computer-readable medium, such as memory 730. In alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions to implement features consistent with principles of the disclosure. Thus, implementations consistent with principles of the disclosure are not limited to any specific combination of hardware circuitry and software.
  • Depicted in FIG. 8 is one embodiment of the invention where an exemplary workflow is depicted. Starting at start point 805 specifies that a workflow wherein analyzers 810, 820 830 and 840 are run simultaneously from Divergence point 807. The workflow then specifies that analyzer 850 is run after analyzers 810 through 840 are completed at convergence point 880. Decision point 890 specifies that Analyzer 860 is run if the results from analyzer 850 show that the sample is suspected to be malware. At convergence point 895, the analysis workflow is complete and the results of all the analyzers are gathered for presentation at finish 897.
  • Exemplary embodiments may be embodied in many different ways as a software component. For example, it may be a stand-alone software package, a combination of software packages, or it may be a software package incorporated as a “tool” in a larger software product. It may be downloadable from a network, for example, a web site, as a stand-alone product or as an add-in package for installation in an existing software application. It may also be available as a client-server software application, or as a web-enabled software application. It may also be embodied as a software package installed on a hardware device.
  • Numerous specific details have been set forth to provide a thorough understanding of the embodiments. It will be understood, however, that the embodiments may be practiced without these specific details. In other instances, well-known operations, components and circuits have not been described in detail so as not to obscure the embodiments. It can be appreciated that the specific structural and functional details are representative and do not necessarily limit the scope of the embodiments.
  • It is worthy to note that any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in the specification are not necessarily all referring to the same embodiment.
  • Although some embodiments may be illustrated and described as comprising exemplary functional components or modules performing various operations, it can be appreciated that such components or modules may be implemented by one or more hardware components, software components, and/or combination thereof. The functional components and/or modules may be implemented, for example, by logic (e.g., instructions, data, and/or code) to be executed by a logic device (e.g., processor). Such logic may be stored internally or externally to a logic device on one or more types of computer-readable storage media.
  • Some embodiments may comprise an article of manufacture. An article of manufacture may comprise a storage medium to store logic. Examples of a storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. Examples of storage media include hard drives, disk drives, solid state drives, and any other tangible storage media.
  • It also is to be appreciated that the described embodiments illustrate exemplary implementations, and that the functional components and/or modules may be implemented in various other ways which are consistent with the described embodiments. Furthermore, the operations performed by such components or modules may be combined and/or separated for a given implementation and may be performed by a greater number or fewer number of components or modules.
  • Some of the figures may include a flow diagram. Although such figures may include a particular logic flow, it can be appreciated that the logic flow merely provides an exemplary implementation of the general functionality. Further, the logic flow does not necessarily have to be executed in the order presented unless otherwise indicated. In addition, the logic flow may be implemented by a hardware element, a software element executed by a processor, or any combination thereof.
  • While various exemplary embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should instead be defined only in accordance with the following claims and their equivalents.

Claims (21)

1-20. (canceled)
21. A computer-implemented method comprising:
obtaining a particular analyzer that is trained using machine learning to classify data samples as likely including malware or as likely not including malware;
providing data for generating a graphical user interface at a user device, the graphical user interface being configured to receive, through selectable options, configuration data that defines a user-defined workflow to control one or more analyzers for analyzing malware having a particular malware attribute;
receiving, at a server from the user device, the configuration data;
storing the configuration data in a workflow definition database, the workflow definition database including workflow definitions for a plurality of workflows respectively associated with a plurality of malware attributes;
receiving a sample including a potential malware;
determining, by the server, at least one malware attribute of the sample;
determining, by the server, that the at least one malware attribute of the sample is associated with the particular malware attribute;
selecting, from the plurality of workflows, the user-defined workflow for analyzing the sample;
causing, by the server, the one or more analyzers to analyze the sample according to the user-defined workflow associated with the stored configuration data to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware, the one or more analyzers including the particular analyzer that is trained using machine learning; and
providing the analysis result for output.
22. The computer-implemented method of claim 21, wherein determining, by the server, at least one malware attribute of the sample comprises one or more of:
uncompressing the sample;
decrypting the sample; and
identifying a file type of the sample.
23. The computer-implemented method of claim 22, further comprising:
selecting the particular analyzer to analyze the sample based on the identified file type.
24. The computer-implemented method of claim 21, wherein receiving, at a server from the user device, the configuration data comprises one or more of:
receiving order data indicative of an order in which to apply the one or more analyzers for analyzing the malware having the particular malware attribute; and
receiving compatibility data indicating that the user-defined workflow is configured to support one or more scripts or one or more virtual machines.
25. The computer-implemented method of claim 21, further comprising:
causing, by the server, an analyzer other than the particular analyzer to analyze the analysis result.
26. The computer-implemented method of claim 21, wherein causing, by the server, one or more analyzers to analyze the sample according to the user-defined workflow comprises:
providing the sample to one or more environments for execution and behavioral profiling.
27. The computer-implemented method of claim 21, wherein causing, by the server, one or more analyzers to analyze the sample according to the user-defined workflow to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware comprises:
obtaining one or more rules from a database;
classifying the sample as likely including malware or as likely not including malware based on the obtained one or more rules; and
generating the analysis result based, in part, on the classifying.
28. A non-transitory computer-readable storage medium encoded with a computer program, the computer program comprising instructions that, upon execution by a computer, cause the computer to perform operations comprising:
obtaining a particular analyzer that is trained using machine learning to classify data samples as likely including malware or as likely not including malware;
providing data for generating a graphical user interface at a user device, the graphical user interface being configured to receive, through selectable options, configuration data that defines a user-defined workflow to control one or more analyzers for analyzing malware having a particular malware attribute;
receiving, from the user device, the configuration data;
storing the configuration data in a workflow definition database, the workflow definition database including workflow definitions for a plurality of workflows respectively associated with a plurality of malware attributes;
receiving a sample including a potential malware;
determining that the particular malware attribute is an attribute of the sample;
causing the one or more analyzers to analyze the sample according to the user-defined workflow associated with the stored configuration data to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware, the one or more analyzers including the particular analyzer that is trained using machine learning; and
providing the analysis result for output.
29. The non-transitory computer-readable storage medium of claim 28, determining that the particular malware attribute is an attribute of the sample comprises one or more of:
uncompressing the sample;
decrypting the sample; and
identifying a file type of the sample and selecting the particular analyzer to analyze the sample based on the identified file type.
30. The non-transitory computer-readable storage medium of claim 28, wherein receiving, from the user device, the configuration data comprises one or more of:
receiving order data indicative of an order in which to apply the one or more analyzers for analyzing the malware having the particular malware attribute; and
receiving compatibility data indicating that the user-defined workflow is configured to support one or more scripts or one or more virtual machines.
31. The non-transitory computer-readable storage medium of claim 28, wherein the operations further comprise:
causing an analyzer other than the particular analyzer to analyze the analysis result.
32. The non-transitory computer-readable storage medium of claim 28, wherein causing one or more analyzers to analyze the sample according to the user-defined workflow comprises:
providing the sample to one or more environments for execution and behavioral profiling.
33. The non-transitory computer-readable storage medium of claim 28, wherein causing one or more analyzers to analyze the sample according to the user-defined workflow to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware comprises:
obtaining one or more rules from a database;
classifying the sample as likely including malware or as likely not including malware based on the obtained one or more rules; and
generating the analysis result based, in part, on the classifying.
34. A system comprising:
one or more processors and one or more computer storage media storing instructions that are operable and when executed by the one or more processors, cause the one or more processors to perform operations comprising:
obtaining a particular analyzer that is trained using machine learning to classify data samples as likely including malware or as likely not including malware;
providing data for generating a graphical user interface at a user device, the graphical user interface being configured to receive, through selectable options, configuration data that defines a user-defined workflow to control one or more analyzers for analyzing malware having a particular malware attribute;
receiving, from the user device, the configuration data;
storing the configuration data in a workflow definition database, the workflow definition database including workflow definitions for a plurality of workflows respectively associated with a plurality of malware attributes;
receiving a sample including a potential malware;
determining that the particular malware attribute is an attribute of the sample;
causing the one or more analyzers to analyze the sample according to the user-defined workflow associated with the stored configuration data to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware, the one or more analyzers including the particular analyzer that is trained using machine learning; and
providing the analysis result for output.
35. The system of claim 34, wherein determining that the particular malware attribute is an attribute of the sample comprises one or more of:
uncompressing the sample;
decrypting the sample; and
identifying a file type of the sample.
36. The system of claim 35, wherein the operations further comprise:
selecting the particular analyzer to analyze the sample based on the identified file type.
37. The system of claim 34, wherein receiving, from the user device, the configuration data comprises one or more of:
receiving order data indicative of an order in which to apply the one or more analyzers for analyzing the malware having the particular malware attribute; and
receiving compatibility data indicating that the user-defined workflow is configured to support one or more scripts or one or more virtual machines.
38. The system of claim 34, wherein the operations further comprise:
causing an analyzer other than the particular analyzer to analyze the analysis result.
39. The system of claim 34, wherein causing one or more analyzers to analyze the sample according to the user-defined workflow comprises:
providing the sample to one or more environments for execution and behavioral profiling.
40. The system of claim 34, wherein causing one or more analyzers to analyze the sample according to the user-defined workflow to generate an analysis result that indicates a likelihood that the sample includes malware or does not include malware comprises:
obtaining one or more rules from a database;
classifying the sample as likely including malware or as likely not including malware based on the obtained one or more rules; and
generating the analysis result based, in part, on the classifying.
US15/138,919 2013-10-31 2016-04-26 Methods and systems for malware analysis Abandoned US20160269423A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/138,919 US20160269423A1 (en) 2013-10-31 2016-04-26 Methods and systems for malware analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/068,605 US9350747B2 (en) 2013-10-31 2013-10-31 Methods and systems for malware analysis
US15/138,919 US20160269423A1 (en) 2013-10-31 2016-04-26 Methods and systems for malware analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/068,605 Continuation US9350747B2 (en) 2013-10-31 2013-10-31 Methods and systems for malware analysis

Publications (1)

Publication Number Publication Date
US20160269423A1 true US20160269423A1 (en) 2016-09-15

Family

ID=51897482

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/068,605 Active 2034-05-13 US9350747B2 (en) 2013-10-31 2013-10-31 Methods and systems for malware analysis
US15/138,919 Abandoned US20160269423A1 (en) 2013-10-31 2016-04-26 Methods and systems for malware analysis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/068,605 Active 2034-05-13 US9350747B2 (en) 2013-10-31 2013-10-31 Methods and systems for malware analysis

Country Status (2)

Country Link
US (2) US9350747B2 (en)
WO (1) WO2015066509A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234400A (en) * 2016-12-15 2018-06-29 北京金山云网络技术有限公司 A kind of attack determines method, apparatus and Situation Awareness System
CN108306891A (en) * 2018-02-13 2018-07-20 第四范式(北京)技术有限公司 The method, apparatus and system of machine learning are executed using data to be exchanged

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9756074B2 (en) * 2013-12-26 2017-09-05 Fireeye, Inc. System and method for IPS and VM-based detection of suspicious objects
US9591015B1 (en) 2014-03-28 2017-03-07 Fireeye, Inc. System and method for offloading packet processing and static analysis operations
US10084813B2 (en) 2014-06-24 2018-09-25 Fireeye, Inc. Intrusion prevention and remedy system
US10805340B1 (en) 2014-06-26 2020-10-13 Fireeye, Inc. Infection vector and malware tracking with an interactive user display
TWI604320B (en) * 2014-08-01 2017-11-01 緯創資通股份有限公司 Methods for accessing big data and systems using the same
US10360378B2 (en) 2014-08-22 2019-07-23 Nec Corporation Analysis device, analysis method and computer-readable recording medium
KR102069698B1 (en) * 2014-11-20 2020-02-12 한국전자통신연구원 Apparatus and Method Correcting Linguistic Analysis Result
US9690933B1 (en) 2014-12-22 2017-06-27 Fireeye, Inc. Framework for classifying an object as malicious with machine learning for deploying updated predictive models
US9838417B1 (en) 2014-12-30 2017-12-05 Fireeye, Inc. Intelligent context aware user interaction for malware detection
JP6174826B2 (en) * 2015-01-28 2017-08-02 日本電信電話株式会社 Malware analysis system, malware analysis method and malware analysis program
US10148693B2 (en) 2015-03-25 2018-12-04 Fireeye, Inc. Exploit detection system
US20160381049A1 (en) * 2015-06-26 2016-12-29 Ss8 Networks, Inc. Identifying network intrusions and analytical insight into the same
US10176321B2 (en) * 2015-09-22 2019-01-08 Fireeye, Inc. Leveraging behavior-based rules for malware family classification
WO2017074479A1 (en) * 2015-10-30 2017-05-04 Intuit Inc. Globally scalable solution
US20170178026A1 (en) * 2015-12-22 2017-06-22 Sap Se Log normalization in enterprise threat detection
US10075462B2 (en) 2015-12-22 2018-09-11 Sap Se System and user context in enterprise threat detection
IL250797B (en) 2016-02-25 2020-04-30 Cyren Ltd Multi-threat analyzer array system and method of use
US10505960B2 (en) 2016-06-06 2019-12-10 Samsung Electronics Co., Ltd. Malware detection by exploiting malware re-composition variations using feature evolutions and confusions
US10587647B1 (en) * 2016-11-22 2020-03-10 Fireeye, Inc. Technique for malware detection capability comparison of network security devices
US10839351B1 (en) * 2017-09-18 2020-11-17 Amazon Technologies, Inc. Automated workflow validation using rule-based output mapping
US10944766B2 (en) * 2017-09-22 2021-03-09 Microsoft Technology Licensing, Llc Configurable cyber-attack trackers
US11537439B1 (en) * 2017-11-22 2022-12-27 Amazon Technologies, Inc. Intelligent compute resource selection for machine learning training jobs
KR102046262B1 (en) * 2017-12-18 2019-11-18 고려대학교 산학협력단 Device and method for managing risk of mobile malware behavior in mobiel operating system, recording medium for performing the method
US10826931B1 (en) * 2018-03-29 2020-11-03 Fireeye, Inc. System and method for predicting and mitigating cybersecurity system misconfigurations
US10853489B2 (en) * 2018-10-19 2020-12-01 EMC IP Holding Company LLC Data-driven identification of malicious files using machine learning and an ensemble of malware detection procedures
US11368475B1 (en) * 2018-12-21 2022-06-21 Fireeye Security Holdings Us Llc System and method for scanning remote services to locate stored objects with malware
US20230214487A1 (en) * 2022-01-04 2023-07-06 Samsung Eletrônica da Amazônia Ltda. Non-invasive computer implemented method for malware detection, a non-transitory computer-readable medium, and, a system for detecting malware in an application

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181975A1 (en) * 2012-11-06 2014-06-26 William Spernow Method to scan a forensic image of a computer system with multiple malicious code detection engines simultaneously from a master control point

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8424011B2 (en) 2007-05-31 2013-04-16 Sap Ag Multiple instance management for workflow process models
US8103727B2 (en) * 2007-08-30 2012-01-24 Fortinet, Inc. Use of global intelligence to make local information classification decisions
WO2011127488A2 (en) 2010-04-08 2011-10-13 Lynux Works, Inc. Systems and methods of processing data associated with detection and/or handling of malware
RU2449348C1 (en) 2010-11-01 2012-04-27 Закрытое акционерное общество "Лаборатория Касперского" System and method for virus-checking data downloaded from network at server side
US8800044B2 (en) 2011-03-23 2014-08-05 Architelos, Inc. Storing and accessing threat information for use in predictive modeling in a network security service
US8640246B2 (en) * 2011-06-27 2014-01-28 Raytheon Company Distributed malware detection
US8732831B2 (en) * 2011-07-14 2014-05-20 AVG Netherlands B.V. Detection of rogue software applications
US8776242B2 (en) 2011-11-29 2014-07-08 Raytheon Company Providing a malware analysis using a secure malware detection process
US9171160B2 (en) * 2013-09-30 2015-10-27 Fireeye, Inc. Dynamically adaptive framework and method for classifying malware using intelligent static, emulation, and dynamic analyses

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181975A1 (en) * 2012-11-06 2014-06-26 William Spernow Method to scan a forensic image of a computer system with multiple malicious code detection engines simultaneously from a master control point

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234400A (en) * 2016-12-15 2018-06-29 北京金山云网络技术有限公司 A kind of attack determines method, apparatus and Situation Awareness System
CN108306891A (en) * 2018-02-13 2018-07-20 第四范式(北京)技术有限公司 The method, apparatus and system of machine learning are executed using data to be exchanged

Also Published As

Publication number Publication date
US20150121526A1 (en) 2015-04-30
WO2015066509A1 (en) 2015-05-07
US9350747B2 (en) 2016-05-24

Similar Documents

Publication Publication Date Title
US9350747B2 (en) Methods and systems for malware analysis
US11586972B2 (en) Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
US11252168B2 (en) System and user context in enterprise threat detection
US11811805B1 (en) Detecting fraud by correlating user behavior biometrics with other data sources
US9197665B1 (en) Similarity search and malware prioritization
US11755585B2 (en) Generating enriched events using enriched data and extracted features
US20170178025A1 (en) Knowledge base in enterprise threat detection
US20170178026A1 (en) Log normalization in enterprise threat detection
US11503070B2 (en) Techniques for classifying a web page based upon functions used to render the web page
US11870741B2 (en) Systems and methods for a metadata driven integration of chatbot systems into back-end application services
US10740164B1 (en) Application programming interface assessment
US10454967B1 (en) Clustering computer security attacks by threat actor based on attack features
US20160124979A1 (en) Providing rule based analysis of content to manage activation of web extension
US20210165785A1 (en) Remote processing of memory and files residing on endpoint computing devices from a centralized device
US20150286663A1 (en) Remote processing of memory and files residing on endpoint computing devices from a centralized device
US11315010B2 (en) Neural networks for detecting fraud based on user behavior biometrics
US20160098563A1 (en) Signatures for software components
US20230269272A1 (en) System and method for implementing an artificial intelligence security platform
US20210124661A1 (en) Diagnosing and remediating errors using visual error signatures
WO2016188334A1 (en) Method and device for processing application access data
CN114175067A (en) Incident survey workspace generation and survey control
WO2023064007A1 (en) Augmented threat investigation
US20220050839A1 (en) Data profiling and monitoring
US20240070319A1 (en) Dynamically updating classifier priority of a classifier model in digital data discovery
US20230319062A1 (en) System and method for predicting investigation queries based on prior investigations

Legal Events

Date Code Title Description
AS Assignment

Owner name: CYBERPOINT INTERNATIONAL LLC, MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MCLARNON, MARK;RAUGAS, MARK V.;FISHER, RYAN;AND OTHERS;SIGNING DATES FROM 20140303 TO 20140318;REEL/FRAME:038392/0275

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION