GB2616506A

GB2616506A - Malware detection by distributed telemetry data analysis

Info

Publication number: GB2616506A
Application number: GB2300649.7A
Authority: GB
Inventors: Udupi Raghavendra Arjun; Uwe Scheideler Tim; Seul Matthias; giovannini Andrea
Original assignee: Kyndryl Inc
Current assignee: Kyndryl Inc
Priority date: 2020-10-13
Filing date: 2021-09-27
Publication date: 2023-09-13
Also published as: US20220114260A1; WO2022078196A1; US11886587B2; JP2023549284A; GB202300649D0; DE112021004808T5

Abstract

A method, computer program product, and system for detecting a malicious process by a selected instance of an anti-malware system are provided. The method includes one or more processors examining a process for indicators of compromise to the process. The method further includes one or more processors determining a categorization of the process based upon a result of the examination. In response to determining that the categorization of the process does not correspond to a known benevolent process and a known malicious process, the method further includes one or more processors executing the process in a secure enclave. The method further includes one or more processors collecting telemetry data from executing the process in the secure enclave. The method further includes one or more processors passing the collected telemetry data to a locally trained neural network system.

Claims

1. A method comprising: examining, by one or more processors, a process for indicators of compromise to the process; determining, by one or more processors, a categorization of the process based upon a result of the examination; in response to determining that the categorization of the process does not correspond to a known benevolent process and a known malicious process, executing, by one or more processors, the process in a secure enclave; collecting, by one or more processors, telemetry data from executing the process in the secure enclave; passing, by one or more processors, the collected telemetry data to a locally trained neural network system, wherein training data of the locally trained neural network system compri ses telemetry data from processes being executed on a host system underlyi ng the locally trained neural network system; determining, by one or more processors, a result of a first loss function for the locally trained neural network system; and comparing, by one or more processors, the result with a result of said loss function at an end of a training of said locally trained neural network system.

2. The method of claim 1, further comprising: passing, by one or more processors, the collected telemetry data to a federated trained neural network system , wherein the federated trained neural network system is adapted to receive a federated trained neural network model; determining, by one or more processors, a result of a second loss function for the federated trained neural netwo rk system; comparing, by one or more processors, the result with a result of the loss function of the received federated t rained neural network model of the federated trained neural network system ; aggregating, by one or more processors, results of the first loss function and the second loss function; and determining, by one or more processors, whether the process is anomalous based on the aggregated results.

3. The method of claim 1, further comprising: collecting, by one or more processors, telemetry data of non-malicious processes being executed on a selected in stance of an anti-malware system; and retraining, by one or more processors, the locally trained neural network system with the collected telemetry da ta to build an updated local neural network model on a regular basis.

4. The method of claim 2, further comprising: receiving, by one or more processors, an updated federated neural network model for the federated neural networ k system, wherein the federated neural network model is built using locally trained neural network models of a plurality of selected instances as input.

5. The method of claim 2, wherein at least one of the locally trained neural network systems and th e federated trained neural network system is an auto-encoder system.

6. The method of claim 1, further comprising: in response to determining that the categorization of the process correspo nds to an anomalous process, discarding, by one or more processors, the process.

7. The method of claim 1, further comprising: in response to determining that the process is a regular process, based on execution in the secure enclave, moving, by one or more processors, the process out of the secure enclave; and executing, by one or more processors, the process as a regular process.

8. The method of claim 2, wherein the federated trained neural network system is trained with telem etry data from a plurality of hosts.

9. The method of claim 8, wherein training of the federated neural network system further comprises : processing, by one or more processors, by each of a plurality of received locally trained neural network models, a set of representative telemetry data and storing respective results; and training, by one or more processors, the federated neural network using input/output pairs of telemetry data u sed and generated during processing of the telemetry data as input data fo r a training of the federated neural network model.

10. The method of claim 9: wherein the input/output pairs of telemetry data used for the training of the federated neural network model are weighted depending on a geographica l vicinity between geographical host locations of the local received neura l network models and a geographical host location for which a new federate d neural network model is trained, and wherein the input/output pairs of telemetry data used for the training of said federated neural network model are weighted depending on a predefined metric.

11. The method of claim 2, wherein aggregating results of the first loss function and the second los s function further comprises: building, by one or more processors, a weighted average of the first loss function and the second loss functio n.

12. The method of claim 1, wherein the determined categorization is selected from the group consisti ng of: a known benevolent process, a known malicious process, and an unknown process.

13. A computer system comprising: one or more computer processors; one or more computer readable storage media; and program instructions stored on the computer readable storage media for exe cution by at least one of the one or more processors, the program instructions comprising: program instructions to examine a process for indicators of compromise to the process; program instructions to determine a categorization of the process based up on a result of the examination; in response to determining that the categorization of the process does not correspond to a known benevolent process and a known malicious process, program instructions to execute the process in a secure enclave; program instructions to collect telemetry data from executing the process in the secure enclave; program instructions to pass the collected telemetry data to a locally tra ined neural network system, wherein training data of the locally trained neural network system compri ses telemetry data from processes being executed on a host system underlyi ng the locally trained neural network system; program instructions to determine a result of a first loss function for th e locally trained neural network system; and program instructions to compare the result with a result of said loss func tion at an end of a training of said locally trained neural network system .

14. The computer system of claim 13, further comprising program instructions, stored on the computer readable storage media for execution by at least o ne of the one or more processors, to: pass the collected telemetry data to a federated trained neural network sy stem, wherein the federated trained neural network system is adapted to receive a federated trained neural network model; determine a result of a second loss function for the federated trained neu ral network system; compare the result with a result of the loss function of the received fede rated trained neural network model of the federated trained neural network system; aggregate results of the first loss function and the second loss function; and determine whether the process is anomalous based on the aggregated results .

15. The computer system of claim 13, further comprising program instructions, stored on the computer readable storage media for execution by at least o ne of the one or more processors, to: collect telemetry data of non-malicious processes being executed on a sele cted instance of an anti-malware system; and retrain the locally trained neural network system with the collected telem etry data to build an updated local neural network model on a regular basi s.

16. The computer system of claim 13, further comprising program instructions, stored on the computer readable storage media for execution by at least o ne of the one or more processors, to: in response to determining that the categorization of the process correspo nds to an anomalous process, discard the process.

17. The computer system of claim 13, further comprising program instructions, stored on the computer readable storage media for execution by at least o ne of the one or more processors, to: in response to determining that the process is a regular process, based on execution in the secure enclave, move the process out of the secure enclave; and execute the process as a regular process.

18. The computer system of claim 14, wherein the federated trained neural network system is trained with telem etry data from a plurality of hosts.

19. A computer program product comprising: one or more computer readable storage media and program instructions store d on the one or more computer readable storage media, the program instructions comprising: program instructions to examine a process for indicators of compromise to the process; program instructions to determine a categorization of the process based up on a result of the examination; in response to determining that the categorization of the process does not correspond to a known benevolent process and a known malicious process, program instructions to execute the process in a secure enclave; program instructions to collect telemetry data from executing the process in the secure enclave; program instructions to pass the collected telemetry data to a locally tra ined neural network system, wherein training data of the locally trained neural network system compri ses telemetry data from processes being executed on a host system underlyi ng the locally trained neural network system; program instructions to determine a result of a first loss function for th e locally trained neural network system; and program instructions to compare the result with a result of said loss func tion at an end of a training of said locally trained neural network system .

20. The computer program product of claim 19, further comprising program instructions, stored on the one or more computer readable storage media, to: pass the collected telemetry data to a federated trained neural network sy stem, wherein the federated trained neural network system is adapted to receive a federated trained neural network model; determine a result of a second loss function for the federated trained neu ral network system; compare the result with a result of the loss function of the received fede rated trained neural network model of the federated trained neural network system; aggregate results of the first loss function and the second loss function; and determine whether the process is anomalous based on the aggregated results .