US20200117802A1 - Systems, methods, and media for identifying and responding to malicious files having similar features - Google Patents
Systems, methods, and media for identifying and responding to malicious files having similar features Download PDFInfo
- Publication number
- US20200117802A1 US20200117802A1 US16/370,328 US201916370328A US2020117802A1 US 20200117802 A1 US20200117802 A1 US 20200117802A1 US 201916370328 A US201916370328 A US 201916370328A US 2020117802 A1 US2020117802 A1 US 2020117802A1
- Authority
- US
- United States
- Prior art keywords
- file
- feature information
- information
- indicates
- information includes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/565—Static detection by checking file integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/567—Computer malware detection or handling, e.g. anti-virus arrangements using dedicated hardware
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Definitions
- Modifications can include instruction reordering, instruction virtualization, file encryption or packing, dynamic recompilation, appending data to the end of the file, and a variety of other techniques.
- systems, methods, and media for identifying and responding to malicious files having similar features are provided. More particularly, in some embodiments, systems for identifying and responding to malicious files having similar features are provided, the systems comprising: a memory; and a hardware processor coupled to the memory and configured to: receive feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; create clusters based on the feature information; determine if a file corresponding to one of the clusters is malicious; and report to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- methods for identifying and responding to malicious files having similar features comprising: receiving, at a hardware processor, feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; creating, using the hardware processor, clusters based on the feature information; determining, using the hardware processor, if a file corresponding to one of the clusters is malicious; and reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- non-transitory computer-readable media containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for identifying and responding to malicious files having similar features
- the method comprising: receiving feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; creating clusters based on the feature information; determining if a file corresponding to one of the clusters is malicious; and reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- FIG. 1 shows an example of a process for extracting and sending feature information from a file from an endpoint to a server in accordance with some embodiments.
- FIGS. 2A and 2B show an example of a process for creating clusters of files and adding files to existing clusters of files in accordance with some embodiments.
- FIG. 3 shows an example of a process for determining whether extracted feature information for a file matches reference feature information for a reference object (e.g., which can be a reference existing cluster or a reference file) in accordance with some embodiments.
- a reference object e.g., which can be a reference existing cluster or a reference file
- FIG. 4 shows an example of a process for processing a malicious file (or a file that is otherwise of concern) in accordance with some embodiments.
- FIG. 5 shows an example of hardware in accordance with some embodiments.
- FIG. 6 shows an example of more specific hardware that can be used to implement some components of FIG. 5 in accordance with some embodiments.
- mechanisms (which can include systems, methods, and media) for identifying and responding to malicious files having similar features are provided.
- mechanism can identify a malicious file and warn endpoints having similar, but not identical files, that those files may be malicious too.
- these mechanisms can extract features information from a file (e.g., as described in connection with FIG. 1 ), create clusters of files based on the extracted feature information (e.g., as described in connection with FIGS. 2A, 2B, and 3 ), determine if a file is malicious (e.g., as described in connection with FIG. 4 ), and report that other files in a corresponding cluster may be malicious so that appropriate action can be taken (e.g., as described in connection with FIG. 4 ).
- Process 100 can be performed by any suitable device and in any suitable manner in some embodiments.
- process 100 can be performed by an endpoint, such as an endpoint as illustrated in and described in connection with FIGS. 5 and 6 below.
- process 100 can be performed in response to a file being received at an endpoint, in response to a file being selected by a user of an endpoint, in response to a file being scanned by a security application on an endpoint, and/or in response to any other event, and multiple instances of process 100 may be performed in order to process multiple files at the same time, or about the same time, in some embodiments.
- process 100 can begin by executing a file at 102 .
- any suitable file type can be executed and the file can be executed in any suitable manner.
- the file can be a Microsoft Windows operating system (or any other operating system) executable file that is executed by a Microsoft Windows operating system (or any other operating system).
- the file can be a script file that is executed by a suitable script interpreter.
- the file can be a media file that is executed by a suitable media player.
- the file can be executed in a secure environment, such as a sandbox or virtual machine.
- static feature information can be extracted.
- any suitable static feature information can be extracted and the static feature information can be extracted in any suitable manner.
- static feature information can include information that describes the contents of the file. More particularly, for example, in some embodiments, static feature information can include information such as a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, strings embedded in the file, anything else that can be statically derived from the file, and/or any other suitable information that describes the contents of a file.
- APIs application programming interfaces
- environmental feature information can be extracted.
- any suitable environmental feature information can be extracted and the environmental feature information can be extracted in any suitable manner.
- environmental feature information can include information that describes how the file is installed and executed on a device.
- environmental feature information can include information that identifies a path from which the file is executed, that identifies a parent process of the file, that indicates that the file is installed as a service, that indicates that the file has an uninstaller registered for the same path, that indicates that the file has a run key or other automated execution condition, that indicates that the file is registered as a shell extension, that indicates the file's age in the environment, that indicates the file's prevalence in the environment, that indicates whether or not any short cuts reference the file, that indicates what operating system the file is configured to run in, and/or any other suitable information that describes how the file is installed and executed on a device.
- behavioral feature information can be extracted.
- any suitable behavioral feature information can be extracted and the behavioral feature information can be extracted in any suitable manner.
- behavioral feature information can include information that describes observable outcomes of executing the file. More particularly, for example, in some embodiments, behavioral feature information can include information that indicates that the file, when executed, connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, installs WMI provider, and/or any other suitable information that describes observable outcomes of executing the file.
- the extracted feature information for the file can be sent to a server for further processing as described below in connection with FIG. 2 .
- the extracted feature information can be sent to the server in any suitable manner in some embodiments.
- the extracted feature information can be sent as three vectors: (1) a static feature information vector; (2) an environmental feature information vector; and (3) a behavioral feature information vector. Each of these vectors can reflect the feature information corresponding to the vector in some embodiments.
- a static feature information vector can describe information such as a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, strings embedded in the file, anything else that can be statically derived from the file, and/or any other suitable information that describes the contents of a file.
- APIs application programming interfaces
- an environmental feature information vector can describe information that identifies a path from which the file is executed, that identifies a parent process of the file, that indicates that the file is installed as a service, that indicates that the file has an uninstaller registered for the same path, that indicates that the file has a run key or other automated execution condition, that indicates that the file is registered as a shell extension, that indicates the file's age in the environment, that indicates the file's prevalence in the environment, that indicates whether or not any short cuts reference the file, that indicates what operating system the file is configured to run in, and/or any other suitable information that describes how the file is installed and executed on a device.
- a behavioral feature information vector can describe information that indicates that the file, when executed, connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, installs WMI provider, and/or any other suitable information that describes observable outcomes of executing the file.
- Process 200 can be performed by any suitable device and in any suitable manner in some embodiments.
- process 200 can be performed by a server, such as a server as illustrated in and described in connection with FIGS. 5 and 6 below.
- process 200 can be performed in response to extracted feature information being made available to a given device, such as a server, and multiple instances of process 200 may be performed in order to process extracted feature information being received at the same time, or about the same time, at a server.
- process 200 begins by receiving extracted feature information for a file from an endpoint.
- the extracted feature information for the file can be received in any suitable manner in some embodiments.
- the extracted feature information can be sent as three vectors: (1) a static feature information vector; (2) an environmental feature information vector; and (3) a behavioral feature information vector.
- process 200 can select the first cluster to check the file against.
- This cluster can be selected in any suitable manner. For example, a most recently created cluster can be selected as the first cluster, a cluster with the most members can be selected as the first cluster, etc.
- process 200 can determine whether the extracted feature information for the file matches the selected existing cluster. This determination can be made in any suitable manner in some embodiments. For example, in some embodiments, process 200 can determine whether the extracted feature information for a file matches the selected existing cluster by determining a mathematical distance (which can be calculated in any suitable manner) for each category of the extracted feature information for the file and the extracted feature information representing the selected existing cluster, and determining if a combined distance of the determined mathematic distances for the categories is less than a threshold (which can be any suitable threshold). As another example, process 200 can determine whether the extracted feature information for the file matches an existing cluster using the process illustrated in FIG. 3 .
- a mathematical distance which can be calculated in any suitable manner
- process 300 begins by selecting a first category among the categories of feature information for the file at 302 . Any suitable category can be selected as the first category.
- the static feature information can be selected as the first category of feature information in some embodiments.
- the environmental feature information can be selected as the first category of feature information in some embodiments.
- the behavioral feature information can be selected as the first category of feature information in some embodiments.
- process 200 can select the first feature from the selected category. Any suitable feature can be selected as the first feature.
- process 300 can determine if there is a match between the value of the selected feature of the file and the value of the selected feature of the reference object (e.g., an existing cluster or a file).
- Process 400 can determine if there is a match between the values in any suitable manner. For example, for some features, a match may only exist when the values are identical. More particularly, for example, if a value is binary, then both values may be required to be true or false for a match to exist. As another example, for some features, a match may exist when the values are the same or within a certain percentage of each other. More particularly, for example, if values are continuous, a match may exist when the values are within 10% (or any other suitable percentage) of the larger value (e.g., values 90 and 100 would match because 90 is within 10% of 100).
- process 300 can determine if there is another feature in the selected category. If there is, then process 300 can branch to 310 at which it can select the next feature and the loop back to 306 . Otherwise, if there is not another feature in the selected category, then process 300 can branch to 312 at which it can assign a score for the category as a ratio of the matched features to the total features. For example, if a category has ten features and six match, a score of 0.6 can be assigned to the category.
- process 300 can determine if there is another category. If so, process 300 can select the next category at 318 and loop back to 304 . Otherwise, process 300 can indicate a match between the file and the reference object (e.g., an existing cluster or a file) if more than 50% (or any other suitable percentage) of the categories have scores greater than 0.5 (or any other suitable value) at 320 and then end at 322 .
- the reference object e.g., an existing cluster or a file
- process 200 can calculate a combined score for the cluster at 208 .
- This combined score can be calculated in any suitable manner.
- the combined score for the cluster can be set equal to a combined mathematical distance based on calculated mathematical distances of each category of features between the file and the selected cluster.
- the combined score the cluster can be a weighted combination of the scores for each category. More particularly, for example, the combined score (CS) can be:
- W S is the weight for the static category
- S S is the score for the static category
- W E is the weight for the environmental category
- S E is the score for the environmental category
- W B is the weight for the behavioral category
- S B is the score for the behavioral category
- process 200 can determine if there are more clusters to check at 210 . If there are more clusters to check, then process 200 can branch to 212 at which the next cluster can be selected and then loop back to 206 . Otherwise, if process determines that there are no more clusters at 210 , then process 200 can branch to 214 at which the process can determine if there are any matching clusters. If there are matching clusters, then process 200 can add the file to the matching existing cluster with the best combined score at 216 and then end at 218 . Otherwise, if there are no matching clusters, process 200 can branch via 220 to 222 of FIG. 2B .
- a file can be added to an existing cluster at 216 in any suitable manner.
- the name of the file, the location of the file (e.g., IP address), the extracted feature information for the file, and/or any other suitable information can be stored in a database in association with the cluster.
- feature information associated with the cluster e.g., that can be used to determine whether extracted feature information of one or more files matches the cluster at 206
- a cluster can be assigned values that reflect the median, mean, or most common values for those features of each category (based on the nature of the feature) so that assigned values can be compared to future files to determine if the future file matches the cluster.
- process 200 can determine if the file is interesting. This can be performed in any suitable manner. For example, in some embodiments, a file may be determined to be interesting if the file, when executed, exhibits behavior that is anomalous (e.g., by performing actions that similar files do not do). As another example, a file may be determined to be interesting if it comes from an external source, a less-trusted process (e.g., an email client, a browser, etc.), etc., is packed, is low prevalence, etc.
- a less-trusted process e.g., an email client, a browser, etc.
- process 200 can end at 224 .
- process 200 can proceed to 226 , at which it can retrieve feature information for a first non-clustered file from a database.
- Any suitable non-clustered file can be selected as the first non-clustered file.
- the first non-clustered can be randomly selected from a group of recent non-clustered files (e.g., file in the last 24 hours) or can be selected as the most recent non-clustered file.
- process 200 can determine whether the extracted feature information for the file matches the feature information for the selected file from the database. This determination can be made in any suitable manner in some embodiments. For example, in some embodiments, process 200 can determine whether the extracted feature information for a file matches the selected file from the database by determining a mathematical distance (which can be calculated in any suitable manner) for each category of the extracted feature information for the file and the extracted feature information representing the selected file from the database, and determining if a combined distance of the determined mathematic distances for the categories is less than a threshold (which can be any suitable threshold). As another example, process 200 can determine whether the extracted feature information for the file matches the selected file from the database using the process illustrated in FIG. 3 and as described above.
- a mathematical distance which can be calculated in any suitable manner
- process 200 can calculate a combined score for the file from the database at 230 .
- This combined score can be calculated in any suitable manner.
- the combined score for the file from the database can be set equal to a combined mathematical distance based on calculated mathematical distances of each category of features between the file and the file from the database.
- the combined score the file from the database can be a weighted combination of the scores for each category. More particularly, for example, the combined score (CS) can be:
- W S is the weight for the static category
- S S is the score for the static category
- W E is the weight for the environmental category
- S E is the score for the environmental category
- W B is the weight for the behavioral category
- S B is the score for the behavioral category
- process 200 can determine if there are more files to check at 232 . If there are more files to check, then process 200 can branch to 234 at which the feature information for the next file from the database can be retrieved and process 200 can then loop back to 228 . Otherwise, if process determines that there are no more files to check at 232 , then process 200 can branch to 236 at which the process can determine if there are any matching files. If there are matching files, then process 200 can create a cluster with the file and the file from the database having the best score at 238 . Otherwise, if there are no matching files from the database, process 200 can branch to 242 at which it can add the file to the database at 242 and then end at 240 .
- a cluster can be created at 238 in any suitable manner.
- the name of the files, the location of the files (e.g., IP addresses), the extracted feature information for the files, and/or any other suitable information can be stored in a database in association with the cluster.
- feature information associated with the cluster e.g., that can be used to determine whether extracted feature information of one or more files matches the cluster at 206
- a nearest neighbor search can be used to identify a matching cluster or a match file in some embodiments.
- Process 400 can be performed by any suitable device and in any suitable manner in some embodiments.
- process 400 can be performed by a server, such as a server as illustrated in and described in connection with FIGS. 5 and 6 below.
- process 400 can be performed in response to a server detecting a malicious file or the server being informed about a malicious file from an endpoint.
- process 400 can determine that a file is malicious or otherwise of concern. This can be performed in any suitable manner.
- process 200 can determine if the file matches an existing cluster. This can be performed in any suitable manner. For example, static, environmental, and behavioral feature information for the file can be extracted (e.g., as described above in connection with 104 , 106 , and 108 of FIG. 1 ) and a match between the malicious file and an existing cluster can be found (e.g., as described above in connection with 204 , 206 , 208 , 210 , 212 , 214 , and 216 of FIG. 2A ).
- process 400 can report the reputation of the malicious file to endpoints indicated as having the files in the matching cluster. In response to this, the endpoints can then take any suitable action such as quarantining the files in the matching cluster. Finally, after reporting the reputation at 406 or determining that no matching existing cluster exists at 404 , process 400 can end at 408 .
- hardware 500 can include a server 502 , a database 504 , a communication network 506 , and/or one or more endpoints 508 , 510 , and 512 .
- Server 502 can be any suitable server(s) for performing the functions described in connection with FIGS. 2A, 2B, 3, and 4 , and/or any other suitable functions.
- Database 504 can be any suitable database for storing information, such as the database information described above in connection with FIG. 2B .
- Communication network 506 can be any suitable combination of one or more wired and/or wireless networks in some embodiments.
- communication network 506 can include any one or more of the Internet, an intranet, a wide-area network (WAN), a local-area network (LAN), a wireless network, a digital subscriber line (DSL) network, a frame relay network, an asynchronous transfer mode (ATM) network, a virtual private network (VPN), BLUETOOTH, BLUETOOTH LE, and/or any other suitable communication network.
- server 502 , database 504 , communication network 506 , and/or one or more endpoints 508 , 510 , and 512 can be interconnected any suitable links 514 , such as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or any suitable combination of such links.
- suitable links 514 such as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or any suitable combination of such links.
- Endpoints 508 , 510 , and 512 can include any suitable endpoints.
- the endpoints can include desktop computers, laptop computers, tablet computers, mobile phones, Internet of Things (IoT) devices (such as smart thermostats, smart home devices (e.g., an alarm clock, an electric toothbrush, a scale, a lock, a VoIP phone, and/or any other suitable home device), smart personal assistants, smart exercise machines, smart appliances (e.g., lighting systems, kitchen appliances, washer, dryers, fire alarms, spray systems, and/or any other suitable appliances), smart media systems (e.g., a television, a speaker, a streaming media device, a virtual assistant device, and/or any other suitable media device), smart computing devices (e.g., a printer, a computer, a network router, and/or any other suitable computing device), smart HVAC systems, smart security systems, etc.) and/or any other suitable device capable of connecting to a network to transmit and receive data.
- IoT Internet of Things
- server 502 and database 504 are each illustrated as one device, the functions performed by device(s) 502 and 504 can be performed using any suitable number of devices in some embodiments. For example, in some embodiments, multiple devices can be used to implement the functions performed by each device(s) 502 and 504 . Similarly, although device 502 and 504 are illustrated as being separate, in some embodiments, these devices can be integrated.
- endpoints 508 , 510 , and 512 are shown in FIG. 5 to avoid over-complicating the figure, any suitable number of endpoints, and/or any suitable types of endpoints, can be used in some embodiments.
- Server 502 , database 504 , and endpoints 508 , 510 , and 512 can be implemented using any suitable hardware in some embodiments.
- devices 502 , 504 , 508 , 510 , and 512 can be implemented using any suitable general-purpose computer or special-purpose computer.
- an endpoint which is a mobile phone can be implemented using a special-purpose computer.
- Any such general-purpose computer or special-purpose computer can include any suitable hardware. For example, as illustrated in example hardware 600 of FIG.
- such hardware can include hardware processor 602 , memory and/or storage 604 , an input device controller 606 , an input device 608 , display/audio drivers 610 , display and audio output circuitry 612 , communication interface(s) 614 , an antenna 616 , and a bus 618 .
- Hardware processor 602 can include any suitable hardware processor, such as a microprocessor, a micro-controller, digital signal processor(s), dedicated logic, and/or any other suitable circuitry for controlling the functioning of a general-purpose computer or a special-purpose computer in some embodiments.
- hardware processor 602 can be controlled by a program stored in memory and/or storage 604 .
- Memory and/or storage 604 can be any suitable memory and/or storage for storing programs, data, and/or any other suitable information in some embodiments.
- memory and/or storage 604 can include random access memory, read-only memory, flash memory, hard disk storage, optical media, and/or any other suitable memory.
- Input device controller 606 can be any suitable circuitry for controlling and receiving input from one or more input devices 608 in some embodiments.
- input device controller 606 can be circuitry for receiving input from a touchscreen, from a keyboard, from one or more buttons, from a voice recognition circuit, from a microphone, from a camera, from an optical sensor, from an accelerometer, from a temperature sensor, from a near field sensor, from a pressure sensor, from an encoder, and/or any other type of input device.
- Display/audio drivers 610 can be any suitable circuitry for controlling and driving output to one or more display/audio output devices 612 in some embodiments.
- display/audio drivers 610 can be circuitry for driving a touchscreen, a flat-panel display, a cathode ray tube display, a projector, a speaker or speakers, and/or any other suitable display and/or presentation devices.
- Communication interface(s) 614 can be any suitable circuitry for interfacing with one or more communication networks (e.g., communication network 506 ).
- interface(s) 614 can include network interface card circuitry, wireless communication circuitry, and/or any other suitable type of communication network circuitry.
- Antenna 616 can be any suitable one or more antennas for wirelessly communicating with a communication network (e.g., communication network 506 ) in some embodiments. In some embodiments, antenna 616 can be omitted.
- Bus 618 can be any suitable mechanism for communicating between two or more components 602 , 604 , 606 , 610 , and 614 in some embodiments.
- At least some of the above described blocks of the processes of FIGS. 1, 2A, 2B, 3, and 4 can be executed or performed in any order or sequence not limited to the order and sequence shown in and described in connection with the figures. Also, some of the above blocks of FIGS. 1, 2A, 2B, 3, and 4 can be executed or performed substantially simultaneously where appropriate or in parallel to reduce latency and processing times. Additionally or alternatively, some of the above described blocks of the processes of FIGS. 1, 2A, 2B, 3, and 4 can be omitted.
- any suitable computer readable media can be used for storing instructions for performing the functions and/or processes herein.
- computer readable media can be transitory or non-transitory.
- non-transitory computer readable media can include media such as non-transitory forms of magnetic media (such as hard disks, floppy disks, and/or any other suitable magnetic media), non-transitory forms of optical media (such as compact discs, digital video discs, Blu-ray discs, and/or any other suitable optical media), non-transitory forms of semiconductor media (such as flash memory, electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and/or any other suitable semiconductor media), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media.
- transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Systems, methods, and media for identifying and responding to malicious files having similar features are provided. More particularly, in some embodiments, systems for identifying and responding to malicious files having similar features are provided, the systems comprising: a memory; and a hardware processor coupled to the memory and configured to: receive feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; create clusters based on the feature information; determine if a file corresponding to one of the clusters is malicious; and report to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
Description
- This application claims the benefit of U.S. Provisional Patent Application No. 62/745,919, filed Oct. 15, 2018, which is hereby incorporated by reference herein in its entirety.
- Malicious computer files are frequently modified in order to evade detection by anti-virus and anti-intrusion software and systems. Modifications can include instruction reordering, instruction virtualization, file encryption or packing, dynamic recompilation, appending data to the end of the file, and a variety of other techniques.
- It is desirable to provide new mechanisms for identifying and responding to malicious files having similar features.
- In accordance with some embodiments, systems, methods, and media for identifying and responding to malicious files having similar features are provided. More particularly, in some embodiments, systems for identifying and responding to malicious files having similar features are provided, the systems comprising: a memory; and a hardware processor coupled to the memory and configured to: receive feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; create clusters based on the feature information; determine if a file corresponding to one of the clusters is malicious; and report to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- In some embodiments, methods for identifying and responding to malicious files having similar features are provided, the methods comprising: receiving, at a hardware processor, feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; creating, using the hardware processor, clusters based on the feature information; determining, using the hardware processor, if a file corresponding to one of the clusters is malicious; and reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- In some embodiments, non-transitory computer-readable media containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for identifying and responding to malicious files having similar features are provided, the method comprising: receiving feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information; creating clusters based on the feature information; determining if a file corresponding to one of the clusters is malicious; and reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
- Various objects, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.
-
FIG. 1 shows an example of a process for extracting and sending feature information from a file from an endpoint to a server in accordance with some embodiments. -
FIGS. 2A and 2B show an example of a process for creating clusters of files and adding files to existing clusters of files in accordance with some embodiments. -
FIG. 3 shows an example of a process for determining whether extracted feature information for a file matches reference feature information for a reference object (e.g., which can be a reference existing cluster or a reference file) in accordance with some embodiments. -
FIG. 4 shows an example of a process for processing a malicious file (or a file that is otherwise of concern) in accordance with some embodiments. -
FIG. 5 shows an example of hardware in accordance with some embodiments. -
FIG. 6 shows an example of more specific hardware that can be used to implement some components ofFIG. 5 in accordance with some embodiments. - In accordance with some embodiments, mechanisms (which can include systems, methods, and media) for identifying and responding to malicious files having similar features are provided. By using the techniques described herein, mechanism can identify a malicious file and warn endpoints having similar, but not identical files, that those files may be malicious too.
- In some embodiments, these mechanisms can extract features information from a file (e.g., as described in connection with
FIG. 1 ), create clusters of files based on the extracted feature information (e.g., as described in connection withFIGS. 2A, 2B, and 3 ), determine if a file is malicious (e.g., as described in connection withFIG. 4 ), and report that other files in a corresponding cluster may be malicious so that appropriate action can be taken (e.g., as described in connection withFIG. 4 ). - Turning to
FIG. 1 , an example of aprocess 100 for extracting and sending feature information for a file in accordance with some embodiments is shown.Process 100 can be performed by any suitable device and in any suitable manner in some embodiments. For example, in some embodiments,process 100 can be performed by an endpoint, such as an endpoint as illustrated in and described in connection withFIGS. 5 and 6 below. As another example,process 100 can be performed in response to a file being received at an endpoint, in response to a file being selected by a user of an endpoint, in response to a file being scanned by a security application on an endpoint, and/or in response to any other event, and multiple instances ofprocess 100 may be performed in order to process multiple files at the same time, or about the same time, in some embodiments. - As illustrated,
process 100 can begin by executing a file at 102. In some embodiments, any suitable file type can be executed and the file can be executed in any suitable manner. For example, in some embodiments, the file can be a Microsoft Windows operating system (or any other operating system) executable file that is executed by a Microsoft Windows operating system (or any other operating system). As another example, in some embodiments, the file can be a script file that is executed by a suitable script interpreter. As still another example, in some embodiments, the file can be a media file that is executed by a suitable media player. In some embodiments, the file can be executed in a secure environment, such as a sandbox or virtual machine. - Next, at 104, static feature information can be extracted. In some embodiments, any suitable static feature information can be extracted and the static feature information can be extracted in any suitable manner. For example, static feature information can include information that describes the contents of the file. More particularly, for example, in some embodiments, static feature information can include information such as a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, strings embedded in the file, anything else that can be statically derived from the file, and/or any other suitable information that describes the contents of a file.
- Then, at 106, environmental feature information can be extracted. In some embodiments, any suitable environmental feature information can be extracted and the environmental feature information can be extracted in any suitable manner. For example, environmental feature information can include information that describes how the file is installed and executed on a device. More particularly, for example, in some embodiments, environmental feature information can include information that identifies a path from which the file is executed, that identifies a parent process of the file, that indicates that the file is installed as a service, that indicates that the file has an uninstaller registered for the same path, that indicates that the file has a run key or other automated execution condition, that indicates that the file is registered as a shell extension, that indicates the file's age in the environment, that indicates the file's prevalence in the environment, that indicates whether or not any short cuts reference the file, that indicates what operating system the file is configured to run in, and/or any other suitable information that describes how the file is installed and executed on a device.
- Next, at 108, behavioral feature information can be extracted. In some embodiments, any suitable behavioral feature information can be extracted and the behavioral feature information can be extracted in any suitable manner. For example, behavioral feature information can include information that describes observable outcomes of executing the file. More particularly, for example, in some embodiments, behavioral feature information can include information that indicates that the file, when executed, connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, installs WMI provider, and/or any other suitable information that describes observable outcomes of executing the file.
- Finally, at 110, the extracted feature information for the file can be sent to a server for further processing as described below in connection with
FIG. 2 . The extracted feature information can be sent to the server in any suitable manner in some embodiments. For example in some embodiments, the extracted feature information can be sent as three vectors: (1) a static feature information vector; (2) an environmental feature information vector; and (3) a behavioral feature information vector. Each of these vectors can reflect the feature information corresponding to the vector in some embodiments. For example, in some embodiments, a static feature information vector can describe information such as a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, strings embedded in the file, anything else that can be statically derived from the file, and/or any other suitable information that describes the contents of a file. As another example, in some embodiments, an environmental feature information vector can describe information that identifies a path from which the file is executed, that identifies a parent process of the file, that indicates that the file is installed as a service, that indicates that the file has an uninstaller registered for the same path, that indicates that the file has a run key or other automated execution condition, that indicates that the file is registered as a shell extension, that indicates the file's age in the environment, that indicates the file's prevalence in the environment, that indicates whether or not any short cuts reference the file, that indicates what operating system the file is configured to run in, and/or any other suitable information that describes how the file is installed and executed on a device. As still another example, in some embodiments, a behavioral feature information vector can describe information that indicates that the file, when executed, connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, installs WMI provider, and/or any other suitable information that describes observable outcomes of executing the file. - Turning to
FIGS. 2A and 2B , an example of aprocess 200 for creating clusters of files and adding files to existing clusters of files in accordance with some embodiments is shown.Process 200 can be performed by any suitable device and in any suitable manner in some embodiments. For example, in some embodiments,process 200 can be performed by a server, such as a server as illustrated in and described in connection withFIGS. 5 and 6 below. As another example,process 200 can be performed in response to extracted feature information being made available to a given device, such as a server, and multiple instances ofprocess 200 may be performed in order to process extracted feature information being received at the same time, or about the same time, at a server. - As illustrated in
FIG. 2A ,process 200 begins by receiving extracted feature information for a file from an endpoint. The extracted feature information for the file can be received in any suitable manner in some embodiments. For example, as described in connection withblock 110 ofFIG. 1 , in some embodiments, the extracted feature information can be sent as three vectors: (1) a static feature information vector; (2) an environmental feature information vector; and (3) a behavioral feature information vector. - Next, at 204,
process 200 can select the first cluster to check the file against. This cluster can be selected in any suitable manner. For example, a most recently created cluster can be selected as the first cluster, a cluster with the most members can be selected as the first cluster, etc. - Then, at 206,
process 200 can determine whether the extracted feature information for the file matches the selected existing cluster. This determination can be made in any suitable manner in some embodiments. For example, in some embodiments,process 200 can determine whether the extracted feature information for a file matches the selected existing cluster by determining a mathematical distance (which can be calculated in any suitable manner) for each category of the extracted feature information for the file and the extracted feature information representing the selected existing cluster, and determining if a combined distance of the determined mathematic distances for the categories is less than a threshold (which can be any suitable threshold). As another example,process 200 can determine whether the extracted feature information for the file matches an existing cluster using the process illustrated inFIG. 3 . - Turning to
FIG. 3 , an example 300 of a process for determining whether extracted feature information for a file matches reference feature information for a reference object (e.g., which can be a reference existing cluster or a reference file) is illustrated in accordance with some embodiments. As shown,process 300 begins by selecting a first category among the categories of feature information for the file at 302. Any suitable category can be selected as the first category. For example, the static feature information can be selected as the first category of feature information in some embodiments. As another example, the environmental feature information can be selected as the first category of feature information in some embodiments. As yet another example, the behavioral feature information can be selected as the first category of feature information in some embodiments. - Next, at 304,
process 200 can select the first feature from the selected category. Any suitable feature can be selected as the first feature. - Then, at 306,
process 300 can determine if there is a match between the value of the selected feature of the file and the value of the selected feature of the reference object (e.g., an existing cluster or a file).Process 400 can determine if there is a match between the values in any suitable manner. For example, for some features, a match may only exist when the values are identical. More particularly, for example, if a value is binary, then both values may be required to be true or false for a match to exist. As another example, for some features, a match may exist when the values are the same or within a certain percentage of each other. More particularly, for example, if values are continuous, a match may exist when the values are within 10% (or any other suitable percentage) of the larger value (e.g., values 90 and 100 would match because 90 is within 10% of 100). - At 308,
process 300 can determine if there is another feature in the selected category. If there is, then process 300 can branch to 310 at which it can select the next feature and the loop back to 306. Otherwise, if there is not another feature in the selected category, then process 300 can branch to 312 at which it can assign a score for the category as a ratio of the matched features to the total features. For example, if a category has ten features and six match, a score of 0.6 can be assigned to the category. - Next, at 314,
process 300 can determine if there is another category. If so,process 300 can select the next category at 318 and loop back to 304. Otherwise,process 300 can indicate a match between the file and the reference object (e.g., an existing cluster or a file) if more than 50% (or any other suitable percentage) of the categories have scores greater than 0.5 (or any other suitable value) at 320 and then end at 322. - Referring back to
process 200 shown inFIG. 2 , if a match is determined at 206, then process 200 can calculate a combined score for the cluster at 208. This combined score can be calculated in any suitable manner. For example, in some embodiments, the combined score for the cluster can be set equal to a combined mathematical distance based on calculated mathematical distances of each category of features between the file and the selected cluster. As another example, in some embodiments, the combined score the cluster can be a weighted combination of the scores for each category. More particularly, for example, the combined score (CS) can be: -
CS=W S *S S +W E *S E +W B *S B - where:
- WS is the weight for the static category;
- SS is the score for the static category;
- WE is the weight for the environmental category;
- SE is the score for the environmental category;
- WB is the weight for the behavioral category;
- SB is the score for the behavioral category; and
- WS+WE+WB=1.
- If no match is determined at 206 or after calculating a combined score at 208,
process 200 can determine if there are more clusters to check at 210. If there are more clusters to check, then process 200 can branch to 212 at which the next cluster can be selected and then loop back to 206. Otherwise, if process determines that there are no more clusters at 210, then process 200 can branch to 214 at which the process can determine if there are any matching clusters. If there are matching clusters, then process 200 can add the file to the matching existing cluster with the best combined score at 216 and then end at 218. Otherwise, if there are no matching clusters,process 200 can branch via 220 to 222 ofFIG. 2B . - A file can be added to an existing cluster at 216 in any suitable manner. For example, in some embodiments, the name of the file, the location of the file (e.g., IP address), the extracted feature information for the file, and/or any other suitable information can be stored in a database in association with the cluster. As another example, in some embodiments, feature information associated with the cluster (e.g., that can be used to determine whether extracted feature information of one or more files matches the cluster at 206) can be updated using the extracted feature of the file. More particularly, for each feature of each category, a cluster can be assigned values that reflect the median, mean, or most common values for those features of each category (based on the nature of the feature) so that assigned values can be compared to future files to determine if the future file matches the cluster.
- At 222 of
FIG. 2B ,process 200 can determine if the file is interesting. This can be performed in any suitable manner. For example, in some embodiments, a file may be determined to be interesting if the file, when executed, exhibits behavior that is anomalous (e.g., by performing actions that similar files do not do). As another example, a file may be determined to be interesting if it comes from an external source, a less-trusted process (e.g., an email client, a browser, etc.), etc., is packed, is low prevalence, etc. - If the file is determined to not be interesting at 222, then process 200 can end at 224.
- Otherwise, if the file is determined to be interesting at 222, then process 200 can proceed to 226, at which it can retrieve feature information for a first non-clustered file from a database. Any suitable non-clustered file can be selected as the first non-clustered file. For example, in some embodiments, the first non-clustered can be randomly selected from a group of recent non-clustered files (e.g., file in the last 24 hours) or can be selected as the most recent non-clustered file.
- Then, at 228,
process 200 can determine whether the extracted feature information for the file matches the feature information for the selected file from the database. This determination can be made in any suitable manner in some embodiments. For example, in some embodiments,process 200 can determine whether the extracted feature information for a file matches the selected file from the database by determining a mathematical distance (which can be calculated in any suitable manner) for each category of the extracted feature information for the file and the extracted feature information representing the selected file from the database, and determining if a combined distance of the determined mathematic distances for the categories is less than a threshold (which can be any suitable threshold). As another example,process 200 can determine whether the extracted feature information for the file matches the selected file from the database using the process illustrated inFIG. 3 and as described above. - If a match is determined at 228, then process 200 can calculate a combined score for the file from the database at 230. This combined score can be calculated in any suitable manner. For example, in some embodiments, the combined score for the file from the database can be set equal to a combined mathematical distance based on calculated mathematical distances of each category of features between the file and the file from the database. As another example, in some embodiments, the combined score the file from the database can be a weighted combination of the scores for each category. More particularly, for example, the combined score (CS) can be:
-
CS=W S *S S +W E *S E +W B *S B - where:
- WS is the weight for the static category;
- SS is the score for the static category;
- WE is the weight for the environmental category;
- SE is the score for the environmental category;
- WB is the weight for the behavioral category;
- SB is the score for the behavioral category; and
- WS+WE+WB=1.
- If no match is determined at 228 or after calculating a combined score at 230,
process 200 can determine if there are more files to check at 232. If there are more files to check, then process 200 can branch to 234 at which the feature information for the next file from the database can be retrieved andprocess 200 can then loop back to 228. Otherwise, if process determines that there are no more files to check at 232, then process 200 can branch to 236 at which the process can determine if there are any matching files. If there are matching files, then process 200 can create a cluster with the file and the file from the database having the best score at 238. Otherwise, if there are no matching files from the database,process 200 can branch to 242 at which it can add the file to the database at 242 and then end at 240. - A cluster can be created at 238 in any suitable manner. For example, in some embodiments, the name of the files, the location of the files (e.g., IP addresses), the extracted feature information for the files, and/or any other suitable information can be stored in a database in association with the cluster. As another example, in some embodiments, feature information associated with the cluster (e.g., that can be used to determine whether extracted feature information of one or more files matches the cluster at 206) can be created using the extracted feature of the file.
- While specific examples of techniques for identifying matching clusters and matching files from a database are provided and described in connection with the blocks in
areas FIGS. 2A and 2B , it should be apparent that other mechanisms can be used as well. For example, in some embodiments, a nearest neighbor search can be used to identify a matching cluster or a match file in some embodiments. - Turning to
FIG. 4 , an example 400 of a process for processing a malicious file (or a file that is otherwise of concern) is illustrated.Process 400 can be performed by any suitable device and in any suitable manner in some embodiments. For example, in some embodiments,process 400 can be performed by a server, such as a server as illustrated in and described in connection withFIGS. 5 and 6 below. As another example,process 400 can be performed in response to a server detecting a malicious file or the server being informed about a malicious file from an endpoint. - As shown, at 402,
process 400 can determine that a file is malicious or otherwise of concern. This can be performed in any suitable manner. Next, at 404,process 200 can determine if the file matches an existing cluster. This can be performed in any suitable manner. For example, static, environmental, and behavioral feature information for the file can be extracted (e.g., as described above in connection with 104, 106, and 108 ofFIG. 1 ) and a match between the malicious file and an existing cluster can be found (e.g., as described above in connection with 204, 206, 208, 210, 212, 214, and 216 ofFIG. 2A ). If a match is determined to exist at 404, then, at 406,process 400 can report the reputation of the malicious file to endpoints indicated as having the files in the matching cluster. In response to this, the endpoints can then take any suitable action such as quarantining the files in the matching cluster. Finally, after reporting the reputation at 406 or determining that no matching existing cluster exists at 404,process 400 can end at 408. - Turning to
FIG. 5 , an example 500 of hardware that can be used in accordance with some embodiments of the disclosed subject matter is shown. As illustrated,hardware 500 can include aserver 502, adatabase 504, acommunication network 506, and/or one ormore endpoints -
Server 502 can be any suitable server(s) for performing the functions described in connection withFIGS. 2A, 2B, 3, and 4 , and/or any other suitable functions. -
Database 504 can be any suitable database for storing information, such as the database information described above in connection withFIG. 2B . -
Communication network 506 can be any suitable combination of one or more wired and/or wireless networks in some embodiments. For example,communication network 506 can include any one or more of the Internet, an intranet, a wide-area network (WAN), a local-area network (LAN), a wireless network, a digital subscriber line (DSL) network, a frame relay network, an asynchronous transfer mode (ATM) network, a virtual private network (VPN), BLUETOOTH, BLUETOOTH LE, and/or any other suitable communication network. - In some embodiments,
server 502,database 504,communication network 506, and/or one ormore endpoints suitable links 514, such as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or any suitable combination of such links. -
Endpoints - Although
server 502 anddatabase 504 are each illustrated as one device, the functions performed by device(s) 502 and 504 can be performed using any suitable number of devices in some embodiments. For example, in some embodiments, multiple devices can be used to implement the functions performed by each device(s) 502 and 504. Similarly, althoughdevice - Although three
endpoints FIG. 5 to avoid over-complicating the figure, any suitable number of endpoints, and/or any suitable types of endpoints, can be used in some embodiments. -
Server 502,database 504, andendpoints devices example hardware 600 ofFIG. 6 , such hardware can includehardware processor 602, memory and/orstorage 604, aninput device controller 606, aninput device 608, display/audio drivers 610, display andaudio output circuitry 612, communication interface(s) 614, anantenna 616, and abus 618. -
Hardware processor 602 can include any suitable hardware processor, such as a microprocessor, a micro-controller, digital signal processor(s), dedicated logic, and/or any other suitable circuitry for controlling the functioning of a general-purpose computer or a special-purpose computer in some embodiments. In some embodiments,hardware processor 602 can be controlled by a program stored in memory and/orstorage 604. - Memory and/or
storage 604 can be any suitable memory and/or storage for storing programs, data, and/or any other suitable information in some embodiments. For example, memory and/orstorage 604 can include random access memory, read-only memory, flash memory, hard disk storage, optical media, and/or any other suitable memory. -
Input device controller 606 can be any suitable circuitry for controlling and receiving input from one ormore input devices 608 in some embodiments. For example,input device controller 606 can be circuitry for receiving input from a touchscreen, from a keyboard, from one or more buttons, from a voice recognition circuit, from a microphone, from a camera, from an optical sensor, from an accelerometer, from a temperature sensor, from a near field sensor, from a pressure sensor, from an encoder, and/or any other type of input device. - Display/
audio drivers 610 can be any suitable circuitry for controlling and driving output to one or more display/audio output devices 612 in some embodiments. For example, display/audio drivers 610 can be circuitry for driving a touchscreen, a flat-panel display, a cathode ray tube display, a projector, a speaker or speakers, and/or any other suitable display and/or presentation devices. - Communication interface(s) 614 can be any suitable circuitry for interfacing with one or more communication networks (e.g., communication network 506). For example, interface(s) 614 can include network interface card circuitry, wireless communication circuitry, and/or any other suitable type of communication network circuitry.
-
Antenna 616 can be any suitable one or more antennas for wirelessly communicating with a communication network (e.g., communication network 506) in some embodiments. In some embodiments,antenna 616 can be omitted. -
Bus 618 can be any suitable mechanism for communicating between two ormore components - Any other suitable components can be included in
hardware 600 in accordance with some embodiments. - In some embodiments, at least some of the above described blocks of the processes of
FIGS. 1, 2A, 2B, 3, and 4 can be executed or performed in any order or sequence not limited to the order and sequence shown in and described in connection with the figures. Also, some of the above blocks ofFIGS. 1, 2A, 2B, 3, and 4 can be executed or performed substantially simultaneously where appropriate or in parallel to reduce latency and processing times. Additionally or alternatively, some of the above described blocks of the processes ofFIGS. 1, 2A, 2B, 3, and 4 can be omitted. - In some embodiments, any suitable computer readable media can be used for storing instructions for performing the functions and/or processes herein. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as non-transitory forms of magnetic media (such as hard disks, floppy disks, and/or any other suitable magnetic media), non-transitory forms of optical media (such as compact discs, digital video discs, Blu-ray discs, and/or any other suitable optical media), non-transitory forms of semiconductor media (such as flash memory, electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and/or any other suitable semiconductor media), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
- Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention. Features of the disclosed embodiments can be combined and rearranged in various ways.
Claims (20)
1. A system for identifying and responding to malicious files having similar features, comprising:
a memory; and
a hardware processor coupled to the memory and configured to:
receive feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information;
create clusters based on the feature information;
determine if a file corresponding to one of the clusters is malicious; and
report to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
2. The system of claim 1 , wherein the feature information includes static feature information, and the static feature information includes information that describes contents of the file.
3. The system of claim 2 , wherein the static feature information includes at least one of a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, and strings embedded in the file.
4. The system of claim 1 , wherein the feature information includes environmental feature information, and the environmental feature information includes information that describes how the file is installed and executed on a device.
5. The system of claim 4 , wherein the environmental feature information includes at least one of information that identifies a path from which the file is executed, information that identifies a parent process of the file, information that indicates that the file is installed as a service, information that indicates that the file has an uninstaller registered for the same path, information that indicates that the file has a run key or other automated execution condition, information that indicates that the file is registered as a shell extension, information that indicates the file's age in the environment, information that indicates the file's prevalence in the environment, information that indicates whether or not any short cuts reference the file, and information that indicates what operating system the file is configured to run in.
6. The system of claim 1 , wherein the feature information includes behavioral feature information, and the behavioral feature information includes information that describes observable outcomes of executing the file.
7. The system of claim 6 , wherein the behavioral feature information includes information that indicates that the file, when executed, does at least one of: connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, and installs WMI provider.
8. A method for identifying and responding to malicious files having similar features, comprising:
receiving, at a hardware processor, feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information;
creating, using the hardware processor, clusters based on the feature information;
determining, using the hardware processor, if a file corresponding to one of the clusters is malicious; and
reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
9. The method of claim 8 , wherein the feature information includes static feature information, and the static feature information includes information that describes contents of the file.
10. The method of claim 9 , wherein the static feature information includes at least one of a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, and strings embedded in the file.
11. The method of claim 8 , wherein the feature information includes environmental feature information, and the environmental feature information includes information that describes how the file is installed and executed on a device.
12. The method of claim 11 , wherein the environmental feature information includes at least one of information that identifies a path from which the file is executed, information that identifies a parent process of the file, information that indicates that the file is installed as a service, information that indicates that the file has an uninstaller registered for the same path, information that indicates that the file has a run key or other automated execution condition, information that indicates that the file is registered as a shell extension, information that indicates the file's age in the environment, information that indicates the file's prevalence in the environment, information that indicates whether or not any short cuts reference the file, and information that indicates what operating system the file is configured to run in.
13. The method of claim 8 , wherein the feature information includes behavioral feature information, and the behavioral feature information includes information that describes observable outcomes of executing the file.
14. The method of claim 13 , wherein the behavioral feature information includes information that indicates that the file, when executed, does at least one of: connects to external URLs, creates certain files, disables a firewall or one or more features of a firewall, opens up ports for listening, interacts with other processes, registry, or files, executes with a certain frequency, requires a certain user security level or integrity level when executed, and installs WMI provider.
15. A non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for identifying and responding to malicious files having similar features, the method comprising:
receiving feature information extracted from a file, wherein the feature information includes at least two of static feature information, environmental feature information, and behavioral feature information;
creating clusters based on the feature information;
determining if a file corresponding to one of the clusters is malicious; and
reporting to a plurality of endpoints that other files corresponding to the one of the clusters is malicious.
16. The non-transitory computer-readable medium of claim 15 , wherein the feature information includes static feature information, and the static feature information includes information that describes contents of the file.
17. The non-transitory computer-readable medium of claim 16 , wherein the static feature information includes at least one of a size of the file, a description of one or more overlays in the file, a geometry of the file, resources used by file, application programming interfaces (APIs) used by the file, entropy of portions of the file, sections of code executed by the file, libraries imported or referenced by the file, and strings embedded in the file.
18. The non-transitory computer-readable medium of claim 15 , wherein the feature information includes environmental feature information, and the environmental feature information includes information that describes how the file is installed and executed on a device.
19. The non-transitory computer-readable medium of claim 18 , wherein the environmental feature information includes at least one of information that identifies a path from which the file is executed, information that identifies a parent process of the file, information that indicates that the file is installed as a service, information that indicates that the file has an uninstaller registered for the same path, information that indicates that the file has a run key or other automated execution condition, information that indicates that the file is registered as a shell extension, information that indicates the file's age in the environment, information that indicates the file's prevalence in the environment, information that indicates whether or not any short cuts reference the file, and information that indicates what operating system the file is configured to run in.
20. The non-transitory computer-readable medium of claim 15 , wherein the feature information includes behavioral feature information, and the behavioral feature information includes information that describes observable outcomes of executing the file.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/370,328 US20200117802A1 (en) | 2018-10-15 | 2019-03-29 | Systems, methods, and media for identifying and responding to malicious files having similar features |
PCT/US2019/054774 WO2020081264A1 (en) | 2018-10-15 | 2019-10-04 | Systems, methods, and media for identifying and responding to malicious files having similar features |
US17/195,130 US11989293B2 (en) | 2018-10-15 | 2021-03-08 | Systems, methods, and media for identifying and responding to malicious files having similar features |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862745919P | 2018-10-15 | 2018-10-15 | |
US16/370,328 US20200117802A1 (en) | 2018-10-15 | 2019-03-29 | Systems, methods, and media for identifying and responding to malicious files having similar features |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/195,130 Continuation US11989293B2 (en) | 2018-10-15 | 2021-03-08 | Systems, methods, and media for identifying and responding to malicious files having similar features |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200117802A1 true US20200117802A1 (en) | 2020-04-16 |
Family
ID=70160047
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/370,328 Abandoned US20200117802A1 (en) | 2018-10-15 | 2019-03-29 | Systems, methods, and media for identifying and responding to malicious files having similar features |
US17/195,130 Active 2040-02-21 US11989293B2 (en) | 2018-10-15 | 2021-03-08 | Systems, methods, and media for identifying and responding to malicious files having similar features |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/195,130 Active 2040-02-21 US11989293B2 (en) | 2018-10-15 | 2021-03-08 | Systems, methods, and media for identifying and responding to malicious files having similar features |
Country Status (2)
Country | Link |
---|---|
US (2) | US20200117802A1 (en) |
WO (1) | WO2020081264A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220318384A1 (en) * | 2021-03-30 | 2022-10-06 | Microsoft Technololgy Licensing, LLC | Malicious pattern identification in clusters of data items |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130246352A1 (en) | 2009-06-17 | 2013-09-19 | Joel R. Spurlock | System, method, and computer program product for generating a file signature based on file characteristics |
US8572740B2 (en) * | 2009-10-01 | 2013-10-29 | Kaspersky Lab, Zao | Method and system for detection of previously unknown malware |
KR20150089664A (en) * | 2014-01-28 | 2015-08-05 | 한국전자통신연구원 | System for detecting mobile malware |
US9754106B2 (en) * | 2014-10-14 | 2017-09-05 | Symantec Corporation | Systems and methods for classifying security events as targeted attacks |
US9594906B1 (en) * | 2015-03-31 | 2017-03-14 | Juniper Networks, Inc. | Confirming a malware infection on a client device using a remote access connection tool to identify a malicious file based on fuzzy hashes |
KR101819322B1 (en) * | 2016-03-16 | 2018-02-28 | 주식회사 엘지유플러스 | Malicious Code Analysis Module and Method therefor |
US9998484B1 (en) * | 2016-03-28 | 2018-06-12 | EMC IP Holding Company LLC | Classifying potentially malicious and benign software modules through similarity analysis |
KR101880686B1 (en) * | 2018-02-28 | 2018-07-20 | 에스지에이솔루션즈 주식회사 | A malware code detecting system based on AI(Artificial Intelligence) deep learning |
-
2019
- 2019-03-29 US US16/370,328 patent/US20200117802A1/en not_active Abandoned
- 2019-10-04 WO PCT/US2019/054774 patent/WO2020081264A1/en active Application Filing
-
2021
- 2021-03-08 US US17/195,130 patent/US11989293B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220318384A1 (en) * | 2021-03-30 | 2022-10-06 | Microsoft Technololgy Licensing, LLC | Malicious pattern identification in clusters of data items |
US11868472B2 (en) * | 2021-03-30 | 2024-01-09 | Microsoft Technology Licensing, Llc | Malicious pattern identification in clusters of data items |
Also Published As
Publication number | Publication date |
---|---|
US11989293B2 (en) | 2024-05-21 |
WO2020081264A1 (en) | 2020-04-23 |
US20210374240A1 (en) | 2021-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11126716B2 (en) | System security method and apparatus | |
US10956477B1 (en) | System and method for detecting malicious scripts through natural language processing modeling | |
JP7086972B2 (en) | Continuous learning for intrusion detection | |
CN112040468B (en) | Method, computing device, and computer storage medium for vehicle interaction | |
WO2014201830A1 (en) | Method and device for detecting software-tampering | |
CA2955457A1 (en) | System, method and apparatus for detecting vulnerabilities in electronic devices | |
US9349002B1 (en) | Android application classification using common functions | |
US10860719B1 (en) | Detecting and protecting against security vulnerabilities in dynamic linkers and scripts | |
US11989293B2 (en) | Systems, methods, and media for identifying and responding to malicious files having similar features | |
EP2728472B1 (en) | User terminal, reliability management server, and method and program for preventing unauthorized remote operation | |
WO2019242441A1 (en) | Dynamic feature-based malware recognition method and system and related apparatus | |
CN110399131B (en) | Method, device and computer equipment for improving stability of application program | |
US9507621B1 (en) | Signature-based detection of kernel data structure modification | |
CN116028917A (en) | Authority detection method and device, storage medium and electronic equipment | |
CN113839944B (en) | Method, device, electronic equipment and medium for coping with network attack | |
US10019582B1 (en) | Detecting application leaks | |
CN114500368A (en) | Data transmission method and device and router adopting device | |
US10216947B2 (en) | System and method for activating a data entry mechanism | |
US10706146B2 (en) | Scanning kernel data structure characteristics | |
US20230289442A1 (en) | Computer-implemented automatic security methods and systems | |
US11727113B1 (en) | System and method for training of antimalware machine learning models | |
US11785028B1 (en) | Dynamic analysis for detecting harmful content | |
US20230297671A1 (en) | Computer-implemented automatic security methods and systems | |
US20230274000A1 (en) | Computer-implemented automatic security methods and systems | |
US20230101198A1 (en) | Computer-implemented systems and methods for application identification and authentication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MCAFEE, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SPURLOCK, JOEL R.;FRITTELLI, LEONARDO;REEL/FRAME:050764/0438 Effective date: 20191010 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |