WO2022167840A1

WO2022167840A1 - Profiling workloads using graph based neural networks in a cloud native environment

Info

Publication number: WO2022167840A1
Application number: PCT/IB2021/050921
Authority: WO
Inventors: Amine BOUKHTOUTA; Taous MADI
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-08-11

Abstract

A computer-implemented method of training graph based neural networks for profiling workloads in a cloud native environment is provided. The method includes: collecting data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; building, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; building, using the dependency graphs,computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; training and validating the computation neural network graphs using the generated embeddings for the workloads; and generating, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads.

Description

PROFILING WORKLOADS USING GRAPH BASED NEURAL NETWORKS IN A CLOUD NATIVE ENVIRONMENT TECHNICAL FIELD [001] Disclosed are embodiments related to cloud native environments and, in particular, to profiling workloads using graph based neural networks in a cloud native environment. BACKGROUND [002] In the prevailing of convergence of traditional deployments (i.e., Telco and industry operational networks) towards cloud native applications, there is a keen interest on how to protect these deployments. The protection is meant to define and integrate security controls to corroborate the security posture and management by introducing preventive, detective, corrective and dissuasive security controls. [003] By providing simple and agile execution environments, container technology has enabled building, rolling out and running applications faster than ever. The corresponding workloads are also scaled and adjusted with an unprecedented velocity to seamlessly cope with different needs in a timely manner. Although this technology allows for taking full advantage of the cloud dynamic, elastic and self-service nature by prioritizing agility and abstracting the infrastructure’s complexity, the latter also emphasizes all the existing security risks related to the virtualized infrastructures due to the fast time to roll out, the increased fluidity, and the lack of visibility. This makes it difficult to track the operation of different kinds of workloads and identify potential threats when they are manifested. [004] Considering the huge streams of data produced by the cloud native environments, which needs to be processed and analyzed instantly to support run-time detection capabilities, a data driven analytic approach using Machine Learning (ML) is seen as a key enabler for defining detective and preventive security controls based on predictive models. Furthermore, considering the scarcity of data collected from environments undergoing malicious activities on one side, and the increasing number of zero-day threats which do not have any clear characterization on the other side, an anomaly-based detection approach constitutes a good fit. The ETSI GS NFV-SEC 013 V3.1.1 (2017-02) “Network Functions Virtualisation (NFV) Release 3; Security; Security Management and Monitoring specification” document pinpointed the usefulness of advanced machine learning algorithms and various big data analytics methods to detect patterns and threat vectors as part of the security monitoring life cycle in NFV workloads. SUMMARY [005] In comparison to other machine learning approaches, embodiments disclosed herein utilize graph based neural networks, which consider workloads features as well as interactions between workloads through learning dependency graphs’ structures, whereas other approaches use flat structured features collected at the level of workloads. Embodiments disclosed herein utilize a double depth learning, which are structures of the workloads interaction and attributes characterizing workloads. [006] Embodiments disclosed herein utilize an approach based on establishing a baseline reflecting the normal behavior of a system, then monitoring the system behavior to capture deviations potentially related to suspicious activities and disturbances. This novel approach introduces predictive capabilities to support preventive and detective security controls for cloud native application workloads. This novel approach provides the capability to model the environment in a way to properly capture its dynamically changing and interacting components, to build a security baseline out of different perspectives or scopes for running workloads based on the created model, and to use the security baseline to detect disturbances and attacks on cloud native applications. [007] Embodiments disclosed herein utilize a data driven approach to build a ground truth for running workloads using neural graph embedding techniques. Embedding techniques can be used, for example, to transform nodes and their features, as well as edges, into space vectors. These vectors, which can represent latent values computed through neural graphs layers and final predictive values, are used, for example, in profiling workloads through classification or clustering. An advantage of the embodiments disclosed herein is that they provide the capability to build a learning function out of interactions between workloads to map workloads into reduced n-dimensional space for classifying workloads, including from many perspectives, for example, clusters, namespaces, working nodes, microservice types, and microservice instances. [008] Embodiments disclosed herein provide for modeling of different workload perspectives dependencies through computation neural network graphs by extracting compute features out of perspective attributes for each workload and the dependencies of each workload with other workloads in a neighborhood of that workload. An advantage of the embodiments disclosed herein is that they provide for one class profiling, multi-class profiling and multi- model segregation, where a class can be defined according to different criteria and abstractions levels. [009] Embodiments disclosed herein provide for anomaly detection depending on the model building strategy. In the one class classification approach, deviations are defined with respect to the unique classification class, then a major vote mechanism helps identify the anomalies. In the multi-class classifications approach, unclassified workloads are considered as anomalies. [0010] Embodiments disclosed herein provide for workloads visibility to support cloud native applications security posture management. Embodiments disclosed herein provide for workloads abstraction through different perspectives, which range in terms of granularity from, for example, cloud native clusters, worker machines including masters and slaves, namespaces and micro-services types. Embodiments disclosed herein provide for evaluation of attributes for workloads to identify interactions between entities in different workload perspectives. [0011] An advantage of the embodiments disclosed herein is that they enable a common generic modeling of workloads ground truth in cloud native applications to harden the security posture, for example, of telco and operation technologies deployments. Embodiments disclosed herein provide for the use of workloads ground truth modeling as a security baseline to identify deviations generated from anomalies including disturbances and attacks. Embodiments disclosed herein provide for anomalies and novelties detection support using a supervised classification model, which is trained on two modes, one model-based mode, and multi-model-based mode. Embodiments disclosed herein provide for anomalies and novelties detection support using an unsupervised clustering model, which characterizes workloads into clusters, which are used as ground truth to identify anomalies and novelties. [0012] According to a first aspect, a computer-implemented method of training graph based neural networks for profiling workloads in a cloud native environment is provided. The method includes collecting data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives. The method further includes building, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads. The method further includes building, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads. The method further includes training and validating the computation neural network graphs using the generated embeddings for the workloads. The method further includes generating, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. [0013] In some embodiments, the plurality of sources for collecting data includes one or more of an orchestrator, an operating system, and a network device. In some embodiments, the data collected includes one or more of system call statistics, resource usage, network communications between workloads, workload-related metadata, labels, container image properties, workload type data, microservice type data, virtual cluster, and assigned IP address. In some embodiments, the attribute information includes one or more of scheduling type information, networking type information, storage type information, CPU type information, memory type information, and meta-information type information. In some embodiments, the represented profiles for workloads are based on one or more of a service, a type of service, and a set of services. In some embodiments, the multiple perspectives for the workloads includes one or more of clusters, namespaces, working nodes, microservice types, and microservice instances. In some embodiments, the embeddings are indexed based on one or more perspectives, including the clusters, the namespaces, the working nodes, the microservice types, and the microservice instances. [0014] In some embodiments, building the dependency graphs includes using a set of vertices representing workloads indexed per each perspective and a set of dependencies between workloads. In some embodiments, the set of dependencies between workloads includes dependencies based on one or more of: network attributes, memory attributes, and input/output storage attributes. [0015] In some embodiments, building the computation neural network graphs includes sampling the dependency graphs to learn the attribute information for each workload and the dependencies of each workload with other workloads in a neighborhood of that workload. In some embodiments, building the computation neural network graphs further includes training a set of aggregation functions by aggregating information learned from the sampling. In some embodiments, the training includes the use of nodes’ neighborhood and aggregator functions to generate latent values through computational neural networks. The final latent values are nodes’ embeddings used either for classification or clustering. The classification (supervised learning) uses a loss function to measure accuracy of the model, adjusting it until the error has been sufficiently minimized. The clustering (unsupervised learning) uses a loss function to optimize quality of grouping items in clusters (i.e., distance), or generating workloads embeddings and using a complementary algorithm (distance based algorithm) to group them into clusters. In some embodiments, the training and validation includes one or more of: unsupervised or supervised loss function to evaluate the quality of workload embeddings, mapped to workload perspectives. In some embodiments, the aggregation functions include one or more of: a mean aggregator function, a pool aggregator function, and a long short-term memory (LSTM) aggregator function. In some embodiments, building the computation neural network graphs includes one or more of offline learning and online learning. In some embodiments, training and validating the computation neural network graphs includes one or more of profiling types of workloads, segregating types of workloads, and generating a one class model that represents all workloads as a baseline. [0016] In some embodiments, the method further includes comparing the reference model profiling the first set of workloads with a candidate graph generated for a second set of workloads. In some embodiments, the method further includes triggering an alert if an anomaly is detected. [0017] According to a second aspect, a network node is provided. The network node includes processing circuitry and a memory containing instructions executable by the processing circuitry to train graph based neural networks for profiling workloads in a cloud native environment. The network node is operative to collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives. The network node is further operative to build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads. The network node is further operative to build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating workloads embeddings for the workloads. The network node is further operative to train and validate the computation neural network graphs using the generated embeddings for the workloads. The network node is further operative to generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. [0018] According to a third aspect, a network node operable to train graph based neural networks for profiling workloads in a cloud native environment is provided. The network node includes a multi-perspective data collector module operative to collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives. The network node further includes a dependency graph builder module operative to build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads. The network node further includes a multi-perspective computation graph builder module operative to build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads. The network node further includes a training and validating module operative to train and validate the computation neural network graphs using the generated embeddings for the workloads. The network node further includes a reference model generation module operative to generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. [0019] According to a fourth aspect, a computer program is provided comprising instructions which when executed by processing circuitry causes the processing circuitry to perform the method of any one of the embodiments of the first aspect. [0020] According to a fifth aspect, a carrier is provided containing the computer program of the fourth aspect, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium. BRIEF DESCRIPTION OF THE DRAWINGS [0021] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments. [0022] FIG.1 illustrates a layered architecture including a workloads layer, intelligence layer, and security posture management layer according to an embodiment. [0023] FIG.2 illustrates a Kubernetes (K8S) architecture. [0024] FIG.3 illustrates multi-points of presence Kubernetes workload deployments of cloud native applications. [0025] FIG.4 is a block diagram illustrating a continuous integration and continuous delivery (CI/CD) pipeline. [0026] FIG.5 illustrates a learning function out of workloads inner-workings and interactions between workloads to build d-dimensional workloads dependency embeddings according to an embodiment. [0027] FIG.6 is a table listing exemplary workload attributes according to an embodiment. [0028] FIG.7 is a block diagram illustrating different workloads views and levels according to an embodiment. [0029] FIG.8 is a block diagram illustrating an architecture for data collection from multiple sources according to an embodiment. [0030] FIG.9 is a block diagram illustrating multiple Kubernetes workload deployments of cloud native applications and an architecture for a network node according to an embodiment. [0031] FIG.10 is a flow chart illustrating a process according to an embodiment. [0032] FIG.11 illustrates a workloads dependency graph according to an embodiment. [0033] FIG.12A illustrates a neighborhood sampling workloads dependency graph according to an embodiment. [0034] FIG.12B illustrates a neighborhood features aggregation workloads dependency graph according to an embodiment. [0035] FIG.12C illustrates a computation neural network graph according to an embodiment. [0036] FIG.13A illustrates a neighborhood sampling workloads dependency graph according to an embodiment. [0037] FIG.13B illustrates a computation neural network graph according to an embodiment. [0038] FIG.14A illustrates a collection of workloads dependency graphs for offline learning according to an embodiment. [0039] FIG.14B illustrates an accumulated workloads dependency graph for offline learning according to an embodiment. [0040] FIG.14C illustrates computation neural network graphs for offline learning according to an embodiment. [0041] FIG.15A illustrates a collection of workloads dependency graphs for online learning according to an embodiment. [0042] FIG.15B illustrates computation neural network graphs for online learning according to an embodiment. [0043] FIG.16A illustrates a collection of workloads dependency graphs for workloads types profiling according to an embodiment. [0044] FIG.16B illustrates computation neural network graphs for workloads types profiling according to an embodiment. [0045] FIG.17A illustrates a collection of workloads dependency graphs for workloads types segregation according to an embodiment. [0046] FIG.17B illustrates computation neural network graphs for workloads types segregation according to an embodiment. [0047] FIG.18A illustrates a collection of workloads dependency graphs for workloads as an anomaly detection unsupervised model according to an embodiment. [0048] FIG.18B illustrates computation neural network graphs for workloads as an anomaly detection unsupervised model according to an embodiment. [0049] FIG.18C illustrates neural networks embeddings as signatures for workloads as an anomaly detection unsupervised model according to an embodiment. [0050] FIG.19 is a flow chart illustrating a process according to an embodiment. [0051] FIG.20 is a flow chart illustrating a process according to an embodiment. [0052] FIG.21 is a flow chart illustrating a process according to an embodiment. [0053] FIG.22 is a flow chart illustrating a process according to an embodiment. [0054] FIG.23 is a block diagram of an apparatus according to an embodiment. [0055] FIG.24 is a block diagram of an apparatus according to an embodiment. DETAILED DESCRIPTION [0056] Architecture Overview [0057] FIG.1 illustrates a layered architecture 100 including a workloads layer 110, an intelligence layer 120, and a security posture management layer 130. Workloads layer 110 includes workloads dependency graphs, built using data that includes attribute information for multiple workload perspectives, representing interaction relationships between workloads. The intelligence layer 120 digests workloads inner-workings and interactions to an intelligence by building, using the dependency graphs, computation neural network graphs representing profiles for workloads. Building of the computation neural network graphs includes generating embeddings for the workloads indexed per a perspective. The computation neural network graphs are then trained and validated using the generated embeddings for the workloads. A reference model is then generated, using the trained and validated computation neural network graphs, profiling the workloads. This “intelligence” is referred to in FIG.1 as “Fingerprinting” and includes services, tenancy, and management to create the reference model profiling the workloads. [0058] The reference model profiling the workloads can be used as a security baseline to help to detect deviations, like anomalies and attacks. The anomalies detection referred to in the intelligence layer 120 may be performed by comparing the reference model profiling the workloads with a candidate graph generated for other workloads, and triggering an alert if an anomaly is detected. These detection events are ingested in the security posture management layer 130. If the events are identified, a mitigation through security orchestration can be applied. If they are not identified, a root cause analysis through security incident event management (SIEM) can be performed. [0059] Cloud Native Workloads [0060] Cloud native means building, packaging and running applications, also known as workloads, by taking advantage of cloud computing principles, i.e. on demand computing power and resource allocation, to deploy business models with greater agility, resiliency and portability. The definition of a workload can be established at different levels of granularity. For the sake of illustration of a fine-grained view, reference is made to the Kubernetes (K8S) architecture. [0061] Kubernetes Overview [0062] FIG.2 illustrates a Kubernetes (K8S) architecture 200. This architecture is one of the mostly widely used cloud native orchestration platforms. In this fine-grained view, the workload concept refers to a microservice instance. The cluster 210 consists of several worker machines (nodes) 220 which can be bare metal or virtualized. Each node can have a master role 222 or a worker role 224. Application workloads are typically scheduled on the worker nodes 224, while the master nodes 222 run certain central parts of Kubernetes responsible for computing resource allocation, orchestrating changes, etc. (scheduler, controller, API server). On all nodes 220 in the cluster 210 there are agents that are part of the Kubernetes platform. The Kubernetes platform relies on operating system (OS) isolation capabilities to enforce strict isolation of application workloads. These isolation capabilities on Linux, for example, include Linux OS namespaces (not the same concept as Kubernetes namespaces) and control groups. [0063] Workloads and Pods in Kubernetes [0064] Workloads are packaged as a multitude of container images which are in a special image format that contains the application binaries and necessary supporting software like libraries, etc. The resulting isolated run-time entities are called pods. In accordance with the current Kubernetes documentation, pods are the smallest deployable units of computing that can be created and managed in Kubernetes, and are a model of the pattern of multiple cooperating processes which form a cohesive unit of service. [0065] Every workload in Kubernetes is run in/as a pod. In a microservice application architecture, each application pod represents a microservice instance of a certain type. There can be a multitude of pods running the same container image, providing the same service. This is called horizontal scaling of the instances that are part of the same Kubernetes ReplicaSet/StatefulSet/DaemonSet. [0066] FIG.3 illustrates multi-points of presence Kubernetes workload deployments of cloud native applications. Each microservice instance is represented as a cluster 210. As shown in FIG.3, there are a plurality of N clusters 210, each representing a point of presence workload deployment of a cloud native application pod and microservice instance. [0067] Even parts of the Kubernetes platform (e.g., control plane, DNS, and networking support) run in special pods. Every pod in Kubernetes has an associated Kubernetes namespace (i.e., a virtual cluster). This makes it possible to group them together which is useful during the management of workloads (e.g., different namespaces contain different applications (VNFs), the infrastructure, etc.). “Namespace” as used herein refers to a Kubernetes namespace (or virtual cluster) unless qualified otherwise. [0068] Graph Neural Networks [0069] Vector embeddings in graphs referring to complex data structures like social or biological networks proved its usefulness for predictions and graph analysis. See, e.g., S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information; CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, October 2015, Pages 891–900; A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks; KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, Pages 855–864; B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations; KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, August 2014, Pages 701–710; J.Tang, M.Qu, M.Wang, M.Zhang, J.Yan, and Q.Mei. Line: Large scale information network embedding; WWW 2015, May 18–22, 2015, Florence, Italy; D. Wang, P. Cui, and W. Zhu. Structural deep network embedding; KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, Pages 1225–123. Initially, the main idea behind using embeddings was to reduce dimensionality of attributes into dense vectors representations. However, proposed approaches suffered from inductive node embedding problem due to a lack of graph generalization of unseen nodes. Therefore, there is a need to align sub-graphs structure and unseen nodes through an algorithm that already optimizes a predictive model. The algorithm is meant to recognize nodes’ neighborhood as well as catching inner properties of nodes of the same class or sharing characteristics. [0070] As a first step, graph convolution networks (GCNs) were applied on fixed graphs (see, e.g., T. N. Kipf and M. Welling. Semi-VXSHUYLVHG^FODVVL¿FDWLRQ^ZLWK^JUDSK^FRQYROXWLRQDO^ networks, ICLR 2017; T. N. Kipf and M. Welling. Variational graph auto-encoders. In NIPS Workshop on Bayesian Deep Learning, 2016) presenting an initial attempt for generalization. An extension of this work, namely GraphSAGE (Graph SAmple and aggreGatE) (see, e.g., Hamilton, W., Ying, Z., & Leskovec, J. Inductive representation learning on large graphs, Advances in neural information processing systems (pp.1024-1034), 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA)), was proposed to improve generalization by using trainable aggregation functions. The extension is a framework that leverages nodes’ features as well as the distribution of nodes’ features in the neighborhood. Instead of training a distinct embedding vector for each node, GraphSAGE trains a set of aggregator functions that learn to aggregate feature information from different number of hops, known also as search depth in the graph. Number of hops represent the neighborhood order, where aggregators are built and trained on. Another advantage is that GraphSAGE can train on data in both supervised and unsupervised manner. [0071] Given the fact that cloud native applications are an interaction between components (e.g., microservice instances) or composition of components (microservices types, namespaces, clusters, worker machines), these interactions can be modeled as a dependency graph, where vertices are workloads (e.g., microservice instances, types, clusters, worker machines). The dependency graphs can be used as ground truth to profile workloads and, as well, as identification of anomalies. [0072] GraphSAGE can be used, for example, to train on dependency graphs to profile workloads and identify anomalies in accordance with some of the embodiments. [0073] Continuous Integration and Continuous Delivery (CI/CD) [0074] FIG.4 is a block diagram illustrating a continuous integration and continuous delivery (CI/CD) pipeline 400. Some embodiments fit into Continuous Integration and Continuous Delivery (CI/CD) pipeline 400. Cloud native applications are characterized as workloads, which can be fingerprinted once on testing stage before being delivered. With reference to FIG.4, a possible, simplified CI/CD flow is illustrated for an application with a microservice architecture. Embodiments of the present disclosure can be applied on an application level 410, after assembling the set of necessary microservices in the application staging module 420. An application microservice module 412 and a generic microservice module 414 can function as inputs to the application staging module 420. The approach can be also applied during production. The application staging module 420 can function as an input to production module 430 and to monitoring and analytics module 440. In some cases, the models/fingerprints obtained in one environment can be transferred to and used in another environment. Furthermore, microservice-level verification can also benefit from using the technique. [0075] An advantage of the embodiments disclosed herein is that they provide the capability to build a learning function out of interactions between workloads to map workloads into reduced n-dimensional space for classifying workloads, including from many perspectives, for example, clusters, namespaces, working nodes, microservice types, and microservice instances. [0076] FIG.5 illustrates a learning function 500 out of workloads inner-workings and interactions between workloads to build d-dimensional workloads dependency embeddings 510 according to an embodiment. [0077] Workload Attributes [0078] A workload in cloud native environment can be overlaid through many angles like services, tenancy, management profiled through clusters, worker machines, namespaces and orchestrated micro-services. Despite considered different angles and perspectives, a set of attributes can be formed for workloads based on observation of the running system. The observation can in one embodiment be done on resource usage of different kinds, relations (communicating peers), workload meta-information from the orchestrator. [0079] FIG.6 is a table 600 listing exemplary workload attributes according to an embodiment. As indicated in the table 600, the attribute information may include one or more of: scheduling type information 610, networking type information 620, storage type information 630, CPU type information 640, memory type information 650, and meta-information type information 660. [0080] Intelligence [0081] By Intelligence, it is meant: (1) the ability to turn workloads seen from different perspectives into a profile, which, in some embodiments, is a security baseline, which can (2) then be used as a reference point to detect misbehaviors of cloud native applications. The profile and, in some embodiments, security baseline is defined by turning running workloads into digital fingerprints, which represent the baseline, which is expressed in three forms: workload as a profile (service), type of workload (type of service) as a profile and set of workloads (namespaces, worker machines, clusters) as a profile. Profiles can be gathered from different perspectives including clusters, worker machines including masters and slaves, namespaces and micro-services. Profiles are used as references to, for example, detect deviations, meant to be disturbances, misconfigurations or attacks, and commonly referred to be anomalies. [0082] Security Posture Management Enforcement [0083] Any viable security solution needs to bring forward comprehensive and accurate asset knowledge. In the case of cloud native applications, having a consistent visibility and control on workloads, regardless of the granularity of workload and associated perspective, is desirable. Visibility, in this context, refers to getting information beyond static attributes of workloads, like the name of a service, namespace, IP address, worker machines and clusters. The visibility needs to be shifted to gaining insights on inner workings of workloads. Embodiments of the present disclosure include digesting complex behavior of workloads through dependency graph embedding mechanism. This mechanism captures interactions between workloads to create the profiles which, in some embodiments, serve as a security baseline, therefore providing deeper visibility to the security posture management. Deviations from security baseline are considered as inputs to SIEMs or security orchestrator to corroborate root cause analysis of security problems and mitigation. [0084] Workloads Different Views/Levels [0085] In a complex system, usually there is a plethora of different kinds of pods working together to provide a certain high-level functionality. It can be useful to lift the workload concept to encompass different granularities in the system view. [0086] In Kubernetes terms, the usual structure of concepts is as follows (< denotes part of): Pod < ReplicaSet (Deployment)/StatefulSet/DaemonSet < Namespace < Cluster. In an application with a microservice-architecture, this can be roughly translated to: Microservice instance < Microservice type < Namespace (Application) < Cluster. In some cases, multiple applications might be co-located in the same system (solution level). In other cases, there might be different clusters that logically belong to the same application but are separated to provide geo-redundancy. [0087] The clusters naturally consist of worker nodes. The mapping between pods and worker nodes can be controlled (with e.g. labels, affinity rules), otherwise - in the Kubernetes case - the scheduling of pods on different worker nodes is undefined. If the mapping between pods and worker nodes is well-defined, it can be meaningful to treat a worker node also as a workload-view (i.e. a well-defined set of lower-level workloads). [0088] FIG.7 is a block diagram 700 illustrating different workloads views and levels according to an embodiment. With reference to FIG.7, the following views in a typical system containing one or more virtual network functions (VNFs) are identified as: Microservice instance 710, Microservice type (set of its instances) 720, Namespace (set of Microservices) or Application 730, Cluster (set of Namespaces) 740, and Worker machine, if applicable. [0089] Observability of Cloud Native Workloads [0090] FIG.8 is a block diagram illustrating an architecture 800 for data collection from multiple sources according to an embodiment. With reference to FIG.8, an example of a workload 810 run by a container orchestrator 820 (e.g., Kubernetes) without a virtualization layer is shown. Data is collected about the workload 810 from orchestrator 820, operating system 830, and intermediary network device 840. The collected data is transmitted to, for example, a database 850, where the collected data may be stored. [0091] Observability mechanisms in the operating system 830 provide means to collect information about the behavior of running processes, etc. that make up cloud native workloads 810. The collected data can, for example, include system call statistics and resource usage of different kinds. In systems with virtualization, data about running workloads may, for example, be collected from the hypervisor layer. [0092] Network communication between workloads can also be inspected with a multitude of tools. These can be based on operating system features or facilities in intermediary network devices 840 (virtualized or hardware-based). The collected data can, for example, include information on communicating peers (e.g., IP addresses, ports, etc.), protocols, flow summaries, fault statistics, etc. [0093] Workload-related metadata defined or maintained within the orchestrator 820 can be accessed via management interfaces provided by the orchestrator. In, for example, Kubernetes, such data is accessible via the “Kubernetes API server.” Collected data can be, for example, labels, container image properties, data denoting the type of the workload/microservice, Kubernetes namespace, assigned IP addresses, etc. Data may, for example, be collected in accordance with the architecture illustrated in FIG.8 and as described without requiring modifying or instrumenting the application/workload 810 code and, with the necessary permissions granted, the workloads 810 behavior can be passively monitored. [0094] System Architecture [0095] FIG.9 is a block diagram 900 illustrating multiple Kubernetes workload deployments of cloud native applications, clusters 210, and an architecture for a network node 910 according to an embodiment. In some embodiments, the network node architecture 910 includes a data collection and processing module 920, a multi-perspective dependency graph embeddings generation module 930, and workload embeddings 940 indexed per a perspective. In some embodiments, the network node architecture 910 may also include a prediction and anomaly detection module 960. [0096] In some embodiments, the data collection and processing module 920 includes multi-perspective data collectors 922 and dependency graph builders 924. Given different point of presences constituting a distributed cloud native application, deployed agents collect workload features as well as dependencies to shape and build dependency graphs. This constitutes a phase to gather a ground truth for training phase. The building of dependency graphs can be done on a certain view or perspective (e.g., microservice instances, types, clusters and worker machines). In addition, building of dependencies can be either a composition of sub graphs into a bigger one or snapshots of sub-graphs’ dependencies taken on discrete manner or periodically. [0097] In some embodiments, the multi-perspective dependency graph embeddings generation module 930 includes computation graph building 932 and graph embeddings calculation 934. Any dependency graph is used to build a neural network as a set of computation graphs that characterizes workloads interactions. The trained computation graphs can be stored as graph embeddings, which represent, for example, a workload type classifier, a workload type multi-classifiers committee model or a one-class workloads novelty and anomalies model. The embeddings can be indexed based on training workload perspective. In some embodiments, the multi-perspective dependency graph embeddings 940 include workloads embeddings indexed per clusters 942, workloads embeddings indexed per namespaces 944, workloads embeddings indexed per working nodes 946, workloads embeddings indexed per microservice types 948, and workloads embeddings indexed per microservice instances 950. [0098] FIG.10 illustrates a flow chart according to an embodiment. Process 1000 is a computer-implemented method of training graph based neural networks for profiling workloads in a cloud native environment. Process 1000 may begin with step s1002. [0099] Step s1002 comprises collecting data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives. [00100] Step s1004 comprises building, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads. [00101] Step s1006 comprises building, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads. [00102] Step s1008 comprises training and validating the computation neural network graphs using the generated embeddings for the workloads. [00103] Step s1010 comprises generating, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. [00104] Workloads Dependency Graphs [00105] FIG. 11 illustrates a workloads dependency graph 1100 according to an embodiment. Workloads dependency graph G =( V,ε) describes an interaction relationship between workloads. V is a set of vertices representing workloads indexed per a perspective (services, type of services, namespaces, worker machines or clusters). ε is a set of dependencies between workloads (undirected edges). Workloads vi and vj, are dependent if they interact with each other through system operations or network communications. With reference to FIG.11, the workloads dependency graph 1100 depicts the interaction relationship between workloads V₁1110 and V₂1120. [00106] In some embodiments, for example, a workload from a certain service needs to send TCP or UDP data to another workload in another service. A dependency between workloads can be built, where network attributes can be considered for each workload. In some embodiments, for example, a workload can produce a memory pipe, where another workload consumes data on, and memory attributes can be considered for each workload. In some embodiments, for example, a workload can create a file on a disk, where another workload writes on, and input/output file attributes can be considered. [00107] Workloads Dependency Graph Embeddings [00108] The solution uses GraphSAGE learning approach – see, e.g., Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems (pp.1024-1034) -- to create profiles, such as, for example, a security baseline, for workloads. This learning approach is different from classical node embeddings approaches (i.e., matrix factorization), where workload attributes, such as, for example, the workload attributes illustrated in FIG.6, are leveraged to learn a generalization embedding function that predicts workloads classes (indexed per services, type of services, namespaces, worker machines, clusters). [00109] FIG.12A illustrates a neighborhood sampling workloads dependency graph 1210 according to an embodiment. By considering workload features in the learning algorithm, the topological structure of the dependencies that a workload has with its neighborhood workloads can be learned. The learning is two-fold: workloads attributes and structural dependencies of workloads with their neighborhoods. Each workload is characterized by an N-order neighborhood, which defines the depth of the structural dependencies for learning the workloads dependency graph embeddings. [00110] FIG.12B illustrates a neighborhood features aggregation workloads dependency graph 1220 according to an embodiment. At the opposite of training an embedding for each workload, in some embodiments, a set of aggregator functions to catch feature information from workload’s neighborhood are trained. Each function aggregates information from a search path away from a workload to build a compute graph. FIG.12C illustrates a computation neural network graph 1230 according to an embodiment. The training can be done, for example, in two ways: (1) unsupervised embeddings, or (2) supervised by mapping embeddings to existing classes (i.e., indexed per services, type of services, namespaces, worker machines, clusters). [00111] Learning [00112] Computation Graph [00113] In some aspects, the solution involves turning workloads view from different perspectives into a computation graph that generates digital fingerprints (final embeddings) for targeted workloads. The fingerprints represent a ground truth to profile workloads, which can be indexed by classes (services, type of services, namespaces, worker machines, clusters). The digital fingerprints (final embeddings) can, for example, be used to catch the security baseline and identify anomalies. [00114] FIG.13A illustrates a neighborhood sampling workloads dependency graph 1300 according to an embodiment. FIG.13B illustrates a computation neural network graph 1350 according to an embodiment. The neighborhood sampling workloads dependency graph 1300 and computation neural network graph 1350 illustrated in FIGs.13A and 13B depict an example of the generation of the fingerprint (final embedding) of a target workload A, which is dependent on first order neighbors (i.e., B, C, D) and second order neighbors (i.e., A, F, E). Once neighbors are collected, the interaction of workloads with respect to workload A is expressed through a neural network 1350 representing a computation graph, which has the same depth as the considered maximum order of the neighborhood (2 in the example) as depicted in FIG.13A. The inputs depicted in Layer 01360 to the first layer 11370 are the attributes vector (e.g., set of workload attributes, such as, for example, the workload attributes illustrated in FIG.6) of the highest order neighborhood (second order neighborhood as depicted in FIG.13A). The layers 1360, 1370, and 1380 represent a set of black boxes, known as aggregators (order invariant functions, e.g., Mean), used to compute transition fingerprints (i.e., workloads dependency graph embeddings), which can be recursively combined until reaching the output aggregation function to output a targeted fingerprint (i.e., final embedding), which, in this example, is a fingerprint (i.e., final embedding) for workload A. [00115] Workload Embedding Generation [00116] Under the assumption that aggregator functions have been trained out of information from workload neighborhood to train their parameters, a set of weight matrices ^k Bk (where k ∈{1,,K},Kbeing the chosenneighborhood depth) are used to propagate information between different layers of the compute graphs. Within an iteration of each compute graph, vertices representing workloads aggregate information from their neighbors. Given the case of a workloads dependency graph G=(V,ε) and features for all workloads Xv, a feature vector x_v represents the initial embedding of a workload v, namely, h_v ⁰. The embedding in a layer k, namely, h_v ^k represents the term that is learnt at a layer k within the computation graph for each workload v. The generation of embedding of a workload is done based on non-linear activation function σ (e.g., sigmoid), which transforms the representations to the next iteration of embedding generation. A stochastic gradient descent algorithm can be used to train weight matrices Wk and Bk. The first may include, for example, using an aggregation function to aggregate representations of workload neighborhood N(v),whereas the second term is the previous embedding generated for the workload v, namely, h_v ^k-1. An embedding of a workload is a training on aggregation of features collected from neighborhood’s (first term) and previous embedding of the workload itself (second term). [00117] The different notation of terms used in the workloads embeddings generation is shown in the following equation:

where: σ refers to non-linearity (e.g., Sigmoid, ReLu, Htan); W_k and B_k refer to trainable weight matrices (e.g., stochastic gradient to train the weight parameters);

_{refers to generalized} aggregation; ∀u ∈ N(v) refers to workload “v” neighborhood;

refers to previous neighborhood embeddings generated in previous layer for workload “v”; represents the initial embedding for workload “v”;

and

represents the embedding for workload “v” after K layers. [00118] The last embedding Z_v represents a vector representation of a workload, which represents an input to the last output prediction layer, which can be unsupervised learning through a graph-based loss function (using random walks) or a supervised binary classification or multi-class classification (Softmax) loss functions. [00119] Aggregation Functions [00120] In some embodiments, a mean aggregator function may be used. The mean aggregator function may, for example, include: taking an element wise mean of embedding vectors representing neighborhood of a workload; computing the first term of an embedding at a layer k within a compute graph by considering the sum of neighborhood embeddings generated at layer k-1; and dividing by the cardinality of vectors in the neighborhood. The following equation depicts this aggregation function:

where: u ∈ N(v) refers to workload “v” neighborhood; refers to previous neighborhood embeddings generated in

previous layer for workload “v”; N(v) refers to the neighborhood of vector v; and

_{refers to the cardinality of neighborhood set of} workload “v”. [00121] In some embodiments, a pool aggregator function may be used. The pool aggregator function may be, for example, symmetric (permutation invariant) and trainable since each embedding generated at layer k-1 is fed to a fully connected neural network (i.e., a non- linear activation function), which can be a multi-layer perceptron architecture, where element- wise max or mean operator can be applied on trained neighborhood embeddings. Regarding pooling (mean or max), given stacked neighborhood workloads embeddings, the formula down- samples them by using Element wise mean or max mean pooling, where neighborhood vectors are transformed with symmetric vector function (Q) by using element wise mean (meanpool) or max vector (maxpool) "y". The following equation depicts this aggregation function:

where: ∀u ∈ N(v) refers to workload “v” neighborhood; refers to previous neighborhood embeddings generated in

previous layer for workload “v”; Q refers to symmetric vector function; and Υ refers to element wise mean (meanpool) or max vector (maxpool). [00122] In some embodiments, a long short-term memory (LSTM) aggregator function may be used. The LSTM aggregator function, which is based on LSTM architecture – see, e.g., S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735– 1780, 1997 --, has the advantage of larger expressive capability compared to the mean aggregator function. LSTMs are not inherently symmetric (i.e., they are not permutation invariant), since they are based on processing inputs as sequences. In GraphSAGE, LSTMs operate on an unordered set by applying a random permutation of workload’s neighbors’ embeddings generated at layer k-1. LSTM is used on many shuffles (i.e., random permutation sampling from neighborhood π(N(v))) of neighborhood embeddings to create an embedding for a workload v in a certain layer in a computation graph. The following equation depicts this aggregation function:

where: ∀u ∈ π(N(v)) refers to workload “v” neighborhood random permutation sampling; refers to previous neighborhood embeddings generated in

previous layer for workload “v”; and LSTM refers to Long Short-Term Memory. [00123] Offline Learning [00124] FIG.14A illustrates a collection of workloads dependency graphs 1400 for offline learning according to an embodiment. In the offline learning mode, a collection of data is done periodically to collect dependency graphs 1400. After a period, these graphs 1400 are accumulated to build a workloads dependency graph 1440. FIG. 14B illustrates an accumulated workloads dependency graph 1440 for offline learning according to an embodiment. [00125] The accumulation of attributes of workloads can be enriched by considering statistical functions like, maximum, minimum, average and standard deviation. For the sake of illustration, we can consider per workload the maximum, minimum, average, standard deviation of the number of processes observed during the collection period. Once, the solution builds the dependencies, the computation neural network graphs 1480 are generated based on neighborhood embeddings to build predictive models. FIG.14C illustrates computation neural network graphs 1480 for offline learning according to an embodiment. [00126] Online Learning [00127] FIG.15A illustrates a collection of workloads dependency graphs 1500 for online learning according to an embodiment. In the online learning mode, workloads dependency graphs 1500 are collected as a set of snapshots interaction between dependencies based on time. By considering the time, the collection of snapshots (samples) can be done either on discrete manner or periods, where time-series of different attributes can be stacked on features’ data frames. [00128] FIG.15B illustrates computation neural network graphs 1550 for online learning according to an embodiment. Computation neural network graphs 1550 are generated to train weak models that evolve to stronger models. Training of enough captured snapshots of the system may continue until predictive models are satisfactory in terms of evaluation criteria (e.g., function loss convergence, micro and macro F1 scores). [00129] Modeling Strategies for Profiling Workloads [00130] In some embodiments, a modeling strategy that includes labeling workloads types for training and creating one model is used. This modeling strategy profiles workloads and outputs a probabilistic embedding (vector) having the same length as the number of profiled workloads types. FIG.16A illustrates a collection of workloads dependency graphs 1600 for workloads types profiling according to an embodiment. FIG.16B illustrates computation neural network graphs 1650 for workloads types profiling according to an embodiment. In the example depicted in FIGs.16A and 16B, three types of workloads are profiled with a computation graph encompassing three sub-computation graphs for targeted workloads (types 1, 2, 3). [00131] In some embodiments, a modeling strategy that includes generating a model per workload type to segregate it from the rest of workloads types is used. This modeling strategy is based on the creation models as many as the number of workloads types. To train each model, a compute graph is considered to model a workload as a positive class, and the rest of workloads as a negative class. The model implements a binary classification. The same process is repeated for each workload type to create a workload type target model. A committee of binary classification models is built. For each unseen workload, each model decides either that it is classified with its workload type characterization or that it is not. If all models output that a workload is not characterized, an anomaly can be raised. [00132] FIG.17A illustrates a collection of workloads dependency graphs 1700 for workloads types segregation according to an embodiment. FIG.17B illustrates computation neural network graphs 1750 for workloads types segregation according to an embodiment. In the example depicted in FIGs.17A and 17B, three types of workloads are profiled, and a committee of three models is generated. Each output (final embedding) of the workload is binary, meaning either the workload is classified or not by a model. [00133] In some embodiments, a modeling strategy that includes generating a one class model that represents all workloads as a baseline, then using the one class classifier as a detector of anomalies is used. This is done by considering final embeddings of different workloads as signatures that are gathered into a one class entropy classifier, which allows to characterize future workloads as novel or abnormal by considering final neural networks embeddings as signatures comparable to the ones learnt based on GraphSAGE unsupervised approach. Figure 18 illustrates this strategy [00134] FIG.18A illustrates a collection of workloads dependency graphs 1800 for workloads as an anomaly detection unsupervised model according to an embodiment. FIG.18B illustrates computation neural network graphs 1840 for workloads as an anomaly detection unsupervised model according to an embodiment. FIG.18C illustrates neural networks embeddings 1880 as signatures for workloads as an anomaly detection unsupervised model according to an embodiment. [00135] Supervised Training Model Training/Validation Metrics [00136] In some embodiments, a supervised learning approach is used. Training/Validation loss is the error after running the training/validation set of data through the trained model. As training epochs (episodes) increase both training and validation error drop and converges toward a certain slightly changing value. The ratio between training and validation loss should, preferably, be maintained in a reasonable range (0.8 to 1.0). It is an indicator to check model generalization to avoid overfitting or underfitting of data. [00137] F1 score is computed per workload type (class) considering precision and recall. Equation 1 is used to compute F1 score for workload type “xi”: ƒ1_score(x_i) = 2 × (precision_x_i × recall_x_i)/ (precision_x_i + recall_x_i) (1) [00138] Average macro F1 score is the arithmetic mean of F1 score for each class. Equation 2 is used to compute average macro F1 score considering “n” workload types:

[00139] Average micro F1 score is computed globally by counting the total true positive, false negative, and false positive samples per class. Weighted F1 scores per classes are used to calculate average micro F1 score. For each workload type, the weights are computed by dividing the number of workloads labeled with a type “xi” with the number of all workloads “|X|”. Equations 3-6 are used to compute average micro F1 score:

[00140] Unsupervised Training Model Metrics [00141] In some embodiments, an unsupervised learning approach is used. Final embeddings neural representation of workloads in dependency are used to build clusters of different workloads. The clusters allow to label workloads, the quality of clustering can be evaluated by cohesion and separation. Cohesion is computed within clusters (an intra-cluster metric), whereas separation is computed between clusters (an inter-cluster measure). The silhouette coefficient combines cohesion and separation in a single measure. It is based on computing the average distance (e.g., in Euclidian space) between elements (i.e., a(i)) within clusters as well as the average distance between elements located in different clusters (i.e., b(i)). Equation 7 depicts how silhouette coefficient is calculated: s(i) = b(i) - a(i)/ max (a(i), b(i)) (7) [00142] If silhouette coefficient converges positively towards “1”, it means that the clustering solution is good. If it converges towards “0”, it means that clustering solution suffers from some noisy data. If it converges towards “-1”, it is a bad quality clustering solution. [00143] Novelty and Anomaly Detection [00144] FIG.19 is a flow chart illustrating a process according to an embodiment. Process 1900 is a computer-implemented method of training graph based neural networks for profiling workloads in a cloud native environment that includes further steps from the process illustrated in the flow chart of FIG.10. Process 1900 may begin with step s1902. [00145] Step s1902, which may follow from step s1010 of FIG.10, comprises generating a candidate graph for a second set of workloads. [00146] Step s1904 comprises comparing the reference model profiling the first set of workloads (generated in step s1010 of FIG.10) with the candidate graph generated for the second set of workloads. [00147] Step s1906 comprises detecting, based on the comparison of the reference model profiling the first set of workloads with the candidate graph generated for the second set of workloads, if there is an anomaly. [00148] If an anomaly is detected – that is, if the candidate graph is anomalous --, then, in step s1906, an alert is triggered. If an anomaly is not detected – that is, if the candidate graph is not anomalous, then, in step s1908, the candidate graph can be tagged as benign or normal. Then, a new candidate graph can be considered by looping back to step s1902. [00149] Modeling Strategy for Detection of Novelties and Anomalies [00150] FIG.20, FIG.21, and FIG.22 are flow charts illustrating processes according to some embodiments, and depict different modeling strategies for detection of anomalies. Test Graph G, shown in step s2002 in FIG.20, step s2102 in FIG.21, and step s2202 in FIG.22, is a candidate graph which is tested against a Model, shown in step s2004 in FIG.20, step s2104 in FIG.21, and step s2204 in FIG.22. [00151] Referring to FIG.20, the Model shown in step s2004 is the set of computation graphs 1650 of FIG.16B generated to profile workload types. During the training phase (learning), a set of training graphs is maintained to build the model. Another set of graphs is used for validation, which checks the fitting of unseen data (validation graphs) on the model. The fitting is controlled by the cumulative loss values. A model is validated if the validation convergence of loss values and the estimation of average micro and macro F1 scores do not generate a big gap, meaning over/under-fitting data (loss values), or a drop in accuracy and recall (macro and micro F1 scores). When the training phase is consumed and the metrics are satisfactory, any candidate graph, for example Test Graph G shown in step s2002 can be tested against the Model shown in step s2004. In this case, once a new candidate (new sub-graphs or complex accumulated graphs) is tested against the reference model, the obtained metrics, namely, testing scores including loss value, average micro F1 score and average macro F1 score, as shown in step s2006, are compared to the training/validation scores, as shown in step s2010. The comparison is made and the gap between scores evaluated, as shown in step s2008. If there is a big drop (e.g., a drop from 90% range to 70% range with respect to f1 scores), or a big gap in terms of loss value, predictions are extracted, as shown in step s2012, and bad predictions indicative of an anomaly or novelty are flagged, as shown in step s2014. A security expert can then check the workloads types wrong predictions. If there is not a big drop or a big gap in terms of loss value, the Graph G is considered normal, as shown in step s2016. [00152] Referring to FIG.21, the Model shown in step s2104 is a committee of binary classification models (graph neural networks) of FIG.17B. With respect to the modeling strategy depicted in FIG.21, the prediction is based on a committee of binary classification models (graph neural networks), as shown in step s2104, where each model segregates a workload type from other workloads types, as depicted in FIG.17A and FIG.17B. The workloads in a candidate graph, such as Test Graph G, as shown in step s2102, are labeled by their type. For each workload, it is checked if its type is correctly predictable by segregating it from the rest of workloads. Step s2110 includes extracting labels for each workload present in the candidate graph, Test Graph G. Step s2112 includes mapping a workload “j” to a one-hot encoding of length “n”, where the latter is the total number of workload types. A bit is equal to 0 with respect to a workload type “i”, if the workload “j” is not labeled as workload type “i”. A bit is equal to 1 if the workload “j” is labeled as workload type “i”. Each workload in the test graph G, as shown in step s2102, can be tested against the committee of models, as shown in step s2104, where each model characterizes the segregation between corresponding workload type “i” and the rest of workload types “not i”. As shown in step s2106, the output of the committee is a mapping associating a workload “j” to binary predictions represented with a one-hot encoder. A bit is equal to 0 with respect to a workload type “i”, if the workload “j” is not predicted as workload type “i”. A bit is equal to 1 if the workload “j” is predicted as workload type “i”. The labeling one-hot encoder is compared with the prediction one-hot encoding by computing Hamming distance, as shown in step s2108, and outputting the distance, as shown in step s2114. In step s2116, a determination is made whether the Hamming distance equals 0 or 1. If the Hamming distance does not equal 0 – that is, it equals 1 – a bad prediction indicative of an anomaly is flagged, as shown in step s2118. A security expert can then investigate the workload. If the Hamming distance is equal to 0, the workload is well predicted, as shown in steps s2120. [00153] Referring to FIG.22, the Model shown in step s2204 is the neural networks embeddings 1880 as signatures for workloads of FIG.18C. With respect to modeling strategy depicted in FIG.22, new workloads candidates’ attributes and neighborhood random walking are transformed to final embedding vectors, which are checked against clustering ground truth built during unsupervised model training. If an embedding vector is not cluster-able (outlier), it is flagged as novelty or anomaly. [00154] In step s2202, a test graph G is an input for an unsupervised model, as shown in step s2204, which maps workloads to final embedding vectors, as depicted in FIG.18C. Step s2206 includes generating a final embedding vector for a workload “j” which corresponds to a numerical signature for a workload “j”. A test graph G, as shown in step s2202, corresponds to a set of final embeddings mapped to workloads, where “n” is the number of workloads. The model, as shown in step s2204, outputs a set of clusters, as shown in step s2212, where “x” is the number of clusters. Each final embedding vector corresponding to a workload “j”, as shown in step s2206, is checked whether it is cluster-able in a cluster “i” or not, as shown in steps s2208 and s2210. If the vector is not cluster-able, the workload “j” is flagged as novelty or anomaly, as shown in step s2214. A security expert can then check it. If the vector is cluster-able, it is normal. [00155] In some embodiments, the plurality of sources for collecting data includes one or more of an orchestrator, an operating system, and a network device. In some embodiments, the data collected includes one or more of system call statistics, resource usage, network communications between workloads, workload-related metadata, labels, container image properties, workload type data, microservice type data, virtual cluster, and assigned IP address. In some embodiments, the attribute information includes one or more of scheduling type information, networking type information, storage type information, CPU type information, memory type information, and meta-information type information. In some embodiments, the represented profiles for workloads are based on one or more of a service, a type of service, and a set of services. In some embodiments, the multiple perspectives for the workloads includes one or more of clusters, namespaces, working nodes, microservice types, and microservice instances. In some embodiments, the workloads embeddings are indexed based on one or more perspectives, including the clusters, the namespaces, the working nodes, the microservice types, and the microservice instances. [00156] In some embodiments, building the dependency graphs includes using a set of vertices representing workloads indexed per each perspective and a set of dependencies between workloads. In some embodiments, the set of dependencies between workloads includes dependencies based on one or more of: network attributes, memory attributes, and input/output storage attributes. [00157] In some embodiments, building the computation neural network graphs includes sampling the dependency graphs to learn the attribute information for each workload and the dependencies of each workload with other workloads in a neighborhood of that workload. In some embodiments, building the computation neural network graphs further includes training a set of aggregation functions by aggregating information learned from the sampling. In some embodiments, the training and validation includes one or more of: unsupervised or supervised loss function to evaluate the quality of workload embeddings, mapped to workload perspectives. In some embodiments, the aggregation functions include one or more of: a mean aggregator function, a pool aggregator function, and a long short-term memory (LSTM) aggregator function. In some embodiments, building the computation neural network graphs includes one or more of offline learning and online learning. In some embodiments, training and validating the computation neural network graphs includes one or more of profiling types of workloads, segregating types of workloads, and generating a one class model that represents all workloads as a baseline. [00158] In some embodiments, the method further includes comparing the reference model profiling the first set of workloads with a candidate graph generated for a second set of workloads. In some embodiments, the method further includes triggering an alert if an anomaly is detected. [00159] FIG.23 is a block diagram of an apparatus 2300 (e.g., a network node), according to some embodiments. As shown in FIG.23, the apparatus may comprise: processing circuitry (PC) 2302, which may include one or more processors (P) 2304 (e.g., a general purpose microprocessor and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like); a network interface 2306 comprising a transmitter (Tx) 2308 and a receiver (Rx) 2310 for enabling the apparatus to transmit data to and receive data from other computing devices connected to a network 2312 (e.g., an Internet Protocol (IP) network) to which network interface 2306 is connected; and a local storage unit (a.k.a., “data storage system”) 2314, which may include one or more non- volatile storage devices and/or one or more volatile storage devices. In embodiments where PC 2302 includes a programmable processor, a computer program product (CPP) 2316 may be provided. CPP 2316 includes a computer readable medium (CRM) 2318 storing a computer program (CP) 2320 comprising computer readable instructions (CRI) 2322. CRM 2318 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like. In some embodiments, the CRI 2322 of CP 2320 is configured such that when executed by PC 2302, the CRI 2322 causes the apparatus 2300 to perform steps described herein (e.g., steps described herein with reference to the flow charts). In other embodiments, the apparatus may be configured to perform steps described herein without the need for code. That is, for example, PC 2302 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software. [00160] According to some embodiments, apparatus 2300 may be a network node comprising processing circuitry and a memory containing instructions executable by the processing circuitry to train graph based neural networks for profiling workloads in a cloud native environment. The network node may be operative to: collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; train and validate the computation neural network graphs using the generated embeddings for the workloads; and generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. The network node may be further operative to: generate a candidate graph for a second set of workloads; compare the reference model profiling the first set of workloads with the candidate graph generated for the second set of workloads; and trigger an alert if an anomaly is detected. [00161] FIG.24 is a schematic block diagram of the apparatus 2300 according to some other embodiments. The apparatus 2300 includes one or more modules 2400, each of which is implemented in software. The module(s) 2400 provide the functionality of apparatus 2300 described herein (e.g., steps described herein with reference to the flow charts). [00162] According to some embodiments, apparatus 2300 may be a network node operable to train graph based neural networks for profiling workloads in a cloud native environment, and the modules 2400 providing the functionality of apparatus 2300 may include: a multi-perspective data collector module operative to collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; a dependency graph builder module operative to build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; a multi-perspective computation graph builder module operative to build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; a training and validating module operative to train and validate the computation neural network graphs using the generated embeddings for the workloads; and a reference model generation module operative to generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads. The modules 2400 providing the functionality of apparatus 2300 may further include: a prediction and anomaly detection module operative to compare the reference model profiling the first set of workloads with a candidate graph generated for a second set of workloads and to trigger an alert if an anomaly is predicted or detected. [00163] While various embodiments of the present disclosure are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present disclosure should not be limited by any of the above- described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context. [00164] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.

Claims

CLAIMS: 1. A computer-implemented method of training graph based neural networks for profiling workloads in a cloud native environment, the method comprising: collecting data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; building, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; building, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; training and validating the computation neural network graphs using the generated embeddings for the workloads; and generating, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads.

2. The method of claim 1, wherein the plurality of sources for collecting data includes one or more of: an orchestrator, an operating system, a network device.

3. The method of any one of claims 1-2, wherein the data collected includes one or more of: system call statistics, resource usage, network communications between workloads, workload-related metadata, labels, container image properties, workload type data, microservice type data, virtual cluster, and assigned IP address.

4. The method of any one of claims 1-3, wherein the attribute information includes one or more of: scheduling type information, networking type information, storage type information, CPU type information, memory type information, and meta-information type information.

5. The method of any one of claims 1-4, wherein the represented profiles for workloads are based on one or more of: a service, a type of service, and a set of services.

6. The method of any one of claims 1-5, wherein multiple perspectives for the workloads includes one or more of: clusters, namespaces, working nodes, microservice types, and microservice instances.

7. The method of any one of claims 1-6, wherein building the dependency graphs includes using a set of vertices representing workloads indexed per each perspective and a set of dependencies between workloads.

8. The method of any one of claims 1-6, wherein building the dependency graphs is according to G=(V,ε) where V a set of vertices representing workloads indexed per each perspective and ε is a set of dependencies between workloads.

9. The method of any one of claims 7-8, wherein the set of dependencies between workloads includes dependencies based on one or more of: network attributes, memory attributes, and input/output storage attributes.

10. The method of any one of claims 1-9, wherein the embeddings are indexed based on one or more of: the clusters, the namespaces, the working nodes, the microservice types, and the microservice instances.

11. The method of any one of claims 1-10, wherein building the computation neural network graphs includes sampling the dependency graphs to learn the attribute information for each workload and the dependencies of each workload with other workloads in a neighborhood of that workload.

12. The method of claim 11, wherein building the computation neural network graphs further includes training a set of aggregation functions by aggregating information learned from the sampling.

13. The method of claim 12, wherein the training and validation includes one or more of: unsupervised or supervised loss function to evaluate the quality of workload embeddings, mapped to workload perspectives.

14. The method of claim 12, wherein the aggregation functions include one or more of: a mean aggregator function, a pool aggregator function, and a long short-term memory (LSTM) aggregator function.

15. The method of claim 12, wherein the embeddings are generated according to:

where: σ refers to non-linearity; W_k and B_k refer to trainable weight matrices;

_{refers to generalized} aggregation; ∀u ∈ N(v) refers to workload “v” neighborhood; refers to previous neighborhood embeddings generated in

previous layer for workload “v”; represents the initial embedding for workload “v”;

and represents the embedding for workload “v” after K

layers.

16. The method of claim 14, wherein the mean aggregator function is carried out according to:

where: u ∈ N(v r)efers to workload “v” neighborhood; refers to previous neighborhood embeddings generated in

previous layer for workload “v”; N(v) refers to the neighborhood of vector v; and refers to the cardinality of neighborhood set of

workload “v”.

17. The method of claim 14, wherein the pool aggregator function is carried out according to:

previous layer for workload “v”; Q refers to symmetric vector function; and Υ rerers to element wise mean (meanpool) or max vector (maxpool).

18. The method of claim 14, wherein the long short-term memory (LSTM) aggregator function is carried out according to:

previous layer for workload “v”; and LSTM refers to Long Short-Term Memory.

19. The method of any one of claims 1-18, wherein building the computation neural network graphs includes one or more of: offline learning and online learning.

20. The method of any one of claims 1-19, wherein training and validating the computation neural network graphs includes one or more of: profiling types of workloads, segregating types of workloads, and generating a one class model that represents all workloads as a baseline.

21. The method of any of claims 1-20, further comprising: generating a candidate graph for a second set of workloads; comparing the reference model profiling the first set of workloads with the candidate graph generated for the second set of workloads; and triggering an alert if an anomaly is detected.

22. A network node comprising: processing circuitry; and a memory containing instructions executable by the processing circuitry to train graph based neural networks for profiling workloads in a cloud native environment, the network node operative to: collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; train and validate the computation neural network graphs using the generated embeddings for the workloads; and generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads.

23. The network node of claim 22, wherein the plurality of sources for collecting data includes one or more of: an orchestrator, an operating system, a network device.

24. The network node of any one of claims 22-23, wherein the data collected includes one or more of: system call statistics, resource usage, network communications between workloads, workload-related metadata, labels, container image properties, workload type data, microservice type data, virtual cluster, and assigned IP address.

25. The network node of any one of claims 22-24, wherein the attribute information includes one or more of: scheduling type information, networking type information, storage type information, CPU type information, memory type information, and meta-information type information.

26. The network node of any one of claims 22-25, wherein the represented profiles for workloads are based on one or more of: a service, a type of service, and a set of services.

27. The network node of any one of claims 22-26, wherein multiple perspectives for the workloads includes one or more of: clusters, namespaces, working nodes, microservice types, and microservice instances.

28. The network node of any one of claims 22-27, wherein building the dependency graphs includes using a set of vertices representing workloads indexed per each perspective and a set of dependencies between workloads.

29. The network node of any one of claims 22-27, wherein building the dependency graphs is according to G=(V,ε) , where V a set of vertices representing workloads indexed per each perspective and ε is a set of dependencies between workloads.

30. The network node of any one of claims 28-29, wherein the set of dependencies between workloads includes dependencies based on one or more of: network attributes, memory attributes, and input/output storage attributes.

31. The network node of any one of claims 22-30, wherein the workloads embeddings are indexed based on one or more of: the clusters, the namespaces, the working nodes, the microservice types, and the microservice instances.

32. The network node of any one of claims 22-31, wherein building the computation neural network graphs includes sampling the dependency graphs to learn the attribute information for each workload and the dependencies of each workload with other workloads in a neighborhood of that workload.

33. The network node of claim 32, wherein building the computation neural network graphs further includes training a set of aggregation functions by aggregating information learned from the sampling.

34. The network node of claim 33, wherein the training and validation includes one or more of: unsupervised or supervised loss function to evaluate the quality of workload embeddings, mapped to workload perspectives.

35. The network node of claim 33, wherein the aggregation functions include one or more of: a mean aggregator function, a pool aggregator function, and a long short-term memory (LSTM) aggregator function.

36. The network node of claim 33, wherein the embeddings are generated according to:

previous layer for workload “v”;

represents the initial embedding for workload “v”; and represents the embedding for workload “v” after K

layers.

37. The network node of claim 35, wherein the mean aggregator function is carried out according to:

_{refers to the cardinality of neighborhood set of} workload “v”.

38. The network node of claim 35, wherein the pool aggregator function is carried out according to:

previous layer for workload “v”; Q refers to symmetric vector function; and Υ refers to element wise mean (meanpool) or max vector (maxpool).

39. The network node of claim 35, wherein the long short-term memory (LSTM) aggregator function is carried out according to:

previous layer for workload “v”; and LSTM refers to Long Short-Term Memory.

40. The network node of any one of claims 22-39, wherein building the computation neural network graphs includes one or more of: offline learning and online learning.

41. The network node of any one of claims 22-40, wherein training and validating the computation neural network graphs includes one or more of: profiling types of workloads, segregating types of workloads, and generating a one class model that represents all workloads as a baseline.

42. The network node of any of claims 22-41, wherein the network node is further operative to: generate a candidate graph for a second set of workloads; compare the reference model profiling the first set of workloads with the candidate graph generated for the second set of workloads; and trigger an alert if an anomaly is detected.

43. A network node operable to train graph based neural networks for profiling workloads in a cloud native environment, the network node comprising: a multi-perspective data collector module operative to collect data for a first set of workloads running in the cloud native environment from a plurality of sources, wherein the data collected includes attribute information for multiple workload perspectives; a dependency graph builder module operative to build, using the attribute information from the data collected, dependency graphs representing interaction relationships between workloads; a multi-perspective computation graph builder module operative to build, using the dependency graphs, computation neural network graphs representing profiles for workloads, wherein building the computation neural network graphs includes generating embeddings for the workloads; a training and validating module operative to train and validate the computation neural network graphs using the generated embeddings for the workloads; and a reference model generation module operative to generate, using the trained and validated computation neural network graphs, a reference model profiling the first set of workloads.

44. The network node of claim 43, further comprising: a prediction and anomaly detection module operative to compare the reference model profiling the first set of workloads with a candidate graph generated for a second set of workloads and to trigger an alert if an anomaly is predicted or detected.

45. A computer program comprising instructions which when executed by processing circuitry causes the processing circuitry to perform the method of any one of claims 1-21.

46. A carrier containing the computer program of claim 45, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.