NL2029790B1

NL2029790B1 - Key management for crypto processors attached to other processing units

Info

Publication number: NL2029790B1
Application number: NL2029790A
Authority: NL
Inventors: Kounavis Michael; M Smith Ned; Wang Junyuan; Guo Kaijie
Original assignee: Intel Corp
Priority date: 2020-12-24
Filing date: 2021-11-17
Publication date: 2023-06-16
Also published as: NL2029790A; TW202232354A; WO2022133860A1

Abstract

Technology for protecting tenant workloads and tenant data within a multi-tenant workload environment may include a workload processor allocable into tenant slices, a cryptographic engine, memory storing a key table, and a key provisioner to provision a separate tenant key for each workload, each tenant key stored in the key table with an associated unique key handle and a resource identifier for a separate one of the plurality of tenant slices assigned to the respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via the cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

Description

KEY MANAGEMENT FOR CRYPTO PROCESSORS ATTACHED TO OTHER PROCESSING

UNITS

TECHNICAL FIELD

Embodiments generally relate to technology for cloud or edge computing systems. More particularly, embodiments relate to managing keys used in protecting client workloads and client data across processing units.

BACKGROUND

Cloud or edge computing systems may include a platform having a plurality of processors of various types, including one or more of a central processing unit (CPU), a graphics processing unit (GPU), an intelligence (or artificial intelligence) processing unit {IPU), a network processing unit (NPL), etc. (processors may generically be referred to as xPU). A processor such as a GPU may include one or more compute engines for handling complex or parallel processing tasks. In such computing environments, a processor such as a GPU must integrate seamlessly with other xPUs that share in processing a tenant workload. Additionally, tenant data may need to be protected end- to-end and across the respective xPUs handling the data.

BRIEF DESCRIPTION OF THE DRAWINGS

The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:

FIG. 1 provides a diagram of an example key management system according to one or more embodiments;

FIG. 2 provides a diagram of an example key management system according to one or more embodiments;

FIG. 3 provides a diagram of another example key management system according to one or more embodiments;

FIG. 4 provides a diagram illustrating operation of an example key management system according to one or more embodiments;

FIG. 5 provides a flowchart illustrating an example method of operating a key management system according to one or more embodiments;

FIG. 6 provides a diagram illustrating an example key management computing system according to one or more embodiments; and

FIG. 7 is a block diagram illustrating an example semiconductor apparatus for key management according to one or more embodiments;

FIG. 8 is a block diagram illustrating an example processor according to one or more embodiments; and

FIG. 9 is a block diagram illustrating an example of a multi-processor based computing system according to one or more embodiments.

DESCRIPTION OF EMBODIMENTS

An improved computing system as described herein provides for protecting tenant workloads and tenant data within a multi-tenant workload environment. In the improved computing system of this disclosure, a key management system provides for provisioning and securely storing tenant specific keys for cryptographic operations while protecting workload tenant content during workload execution that involves scheduling tasks across the various xPUs. The key management system ensures that each tenant workload and data remains protected within the tenant’s assigned tenant slice. A key table is maintained having resource partitioning context per workload associated with a tenant key and a key handle, unique to the tenant, to permit secure exchange of data between tenants and workload resources. Compute engines and other resources can use the key handle to point a cryptographic engine to the key table for unlocking and securing tenant data. The key management system coordinates the provision and handling of keys to maintain security.

FIG. 1 provides a diagram of an example key management system 100 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description. The system 100 may include a cloud or edge computing platform 105, which provides a multi-tenant processing environment. The platform 105 may include one or more processing units such as a graphics processing unit (GPU) 110 and/or an xPU 120, a cryptographic processing unit (CrPU) 130, a key provisioner (KP) 140, and a resource director (RD) 150. The GPU 110 and the xPU 120 may be collectively referred to as a workload processor.

The GPU 110 may include a key provisioning module (GKP) 115 to interface with the key provisioner 140. Likewise, the xPU 120 may include an xPU-specific key provisioning module (xKP) 125 to interface with the key provisioner 140. The cryptographic processing unit 130 may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more selected cryptographic algorithms and techniques. The key provisioner 140 may provision keys and manage a key table 145 that holds keys used for securing (encrypting) and decrypting tenant workloads and data. The key table may be isolated from GPUs/xPUs, thus effectively blocking the GPUs/xPUs from accessing any keys stored in the key table. The key table

145 may also be hardened against malicious attack by, for example, restricting the access to read- only or write-only and/or by restricting the interfaces to only CrPU and KP. Additionally, the key table 145 may be stored in unprotected storage after encrypting its contents using a hardware root of trust generated storage key. The key provisioner 140 may provision keys in concert with the cryptographic processing unit 130 (e.g., the cryptographic processing unit 130 may generate the keys upon request by the key provisioner 140). A memory for storing data entries in the key table 145 may include any suitable memory for secure data storage and may be hosted by any portion of the platform 105. The resource director 150 may receive workload requests from tenants (e.g., via orchestrator 170) and determine per-tenant compute resources such as, e.g., memory partitions, compute engine assignment and/or compute thread assignment. Compute resources may be assigned by the resource director 150 in concert with one or more of the GPU 110 and/or the xPU 120. In some embodiments, the key provisioner 140 may be incorporated within other system components, such as the resource director 150. In some embodiments, the resource director 150 may maintain and/or verify the integrity of the key table 145.

The platform 105 may communicate with one or more tenants 160; … 1608 through an orchestrator 170 which may be situated in or close to a cloud/edge 180. The orchestrator 170 may provide a network-based system or service that provides an interface between tenants 160 and platform 105. The orchestrator 170 may schedule tenant workloads, and may provide for decomposition of workloads into distributed applications / assignments to be performed by one of more of the GPU 110 and/or the xPU 120. In some embodiments, one or more of the GPU 110, the xPU 120, the CrPU 130, the KP 140, the key table 145, the RD 150, and/or the orchestrator 170 may include a root of trust with attestation keys to enable verification of system integrity. In some embodiments, cloud/edge 180 may include at least portions of a public network such as the Internet.

In some embodiments, cloud/edge 180 may be a private or other network. In some embodiments, functionality of the key provisioner 140 may be implemented based on Intel® Key Protection

Technology (and extensions thereto). In some embodiments, functionality of the resource director 150 may be implemented based on Intel® Resource Director Technology (and extensions thereto).

FIG. 2 provides a diagram of an example key management system 200 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description. The system 200 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed), and those components and features will not be repeated except as necessary to describe the additional components and features shown. The system 200 may include, in addition to the components and features shown in and described with reference to FIG. 1, one or more processing units such as a graphics processing unit (GPU) 210 and/or an xPU 220, a cryptographic processing unit (CrPU) 230,

and a key provisioner (KP) 240. The GPU 210, the xPU 220, the CrPU 230 and the KP 240 may correspond to the GPU 110, the xPU 120, the CrPU 130 and the KP 140, respectively (FIG. 1, already discussed). The GPU 210 and the xPU 220 may be collectively referred to as a workload processor.

The GPU 210 may include a plurality of compute engines (CE), such as a compute engine

E1211, a compute engine E2 212, and a compute engine E3 213. Each of the compute engines may, e.g., handle processing tasks particularly suited for parallel processing. Although FIG. 2 shows three compute engines for illustrative purposes, the GPU 210 may have many more than three compute engines. Each compute engine may be selectively allocated and assigned to perform all or portions of tenant workloads. The GPU 210 may also include a resource manager (GRM) 214 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed. The

GPU 210 may also include a key provisioning module (GKP) 215 to interface with the key provisioner 240. The GKP 215 may correspond to the GKP 115 (FIG. 1, already discussed).

The xPU 220 may include a plurality of processing cores, such as a core C1 221, a core C2 222, and a core C3 223. Although FIG. 2 shows three processing cores for illustrative purposes, the xPU 220 may have many more than three processing cores. Each processing core may be selectively allocated and assigned to perform all or portions of tenant workloads. The xPU 220 may also include resource manager (xRM) 224 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed. The xPU 220 may also include a key provisioning module (xKP) 225 to interface with the key provisioner 240. The xKP 225 may correspond to xKP 125 (FIG. 1, already discussed).

In response to a tenant workload request and allocation of resources, the GPU 210 (via GRM 214) may assign resources and associate the resources with a process address space identifier (PASID) 219 (as a resource identifier) for a specific tenant workload. Each PASID is to identify a memory partition and other resources, including resource isolation context / boundary, of the particular GPU/xPU assigned to a specific tenant workload. The PASID 219 may be provided to the key provisioner 240. Likewise, in response to a tenant workload request and allocation of resources, the xPU 220 (via xRM 224) may assign resources and associate the resources with a PASID. If the tenant workload is the same as the workload for the GPU 210, the xPU may associate the resources with PASID 219 or a separate PASID 229 (as a resource identifier). The PASID 229 may be provided to the key provisioner 240. Additionally, tenant context information for a particular tenant workload may be provided to the key provisioner 240 by, e.g., the resource director 150. For a given tenant workload, the resources assigned to the workload for the tenant may be referred to as a tenant slice.

For example, as shown in FIG. 2, tenant slice resources assigned for a tenant workload for tenant T may include the compute engine E3 213, labeled as 255, and the core C1221, labeled as 259. These assignments are shown for illustrative purposes, and other resource allocations and assignments may be made for other tenant workloads; in this manner, a plurality of tenant workloads may be executed at the same time or during overlapping time periods. 5 The cryptographic processing unit 230 may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more cryptographic algorithms and techniques. The cryptographic processing unit 230 may include one or more cryptographic engines (not shown in FIG. 2) to perform selected cryptographic operations. A cryptographic accelerator may be used to implement all or portions of the cryptographic processing unit 230. The key provisioner 240 may provision keys and manage a key table 245 that holds keys used for securing and decrypting tenant workloads/data. The key provisioner 240 may provision keys in concert with the cryptographic processing unit 230 (e.g., the cryptographic processing unit 230 may generate the keys upon request by the key provisioner 240).

A memory for storing data entries in the key table 245 may include any suitable memory for secure data storage and may be hosted by any portion of the system 200. The key table 245 may include, for each tenant workload, a PASID issued by each processor having resources assigned to the workload, a tenant key Ky for protecting the tenant workload, and a slice key handle (SKH) H+ for the tenant workload. Each slice key handle is unique to the associated tenant key. As an example, for the tenant workload slices 255 and 259 shown in FIG. 2, the key table may include entries tenant key Kr, a PASID:; for the assigned compute engine E3 213 in GPU 210, a PASID:: for the assigned core C1 221 in xPU 220, and a slice key handle (SKH) H+; associated with the key Ky, and each of the PASID:: entries. In some embodiments, the tenant key Ky may include multiple keys. Further details regarding keys and key generation are described with reference to FIG. 4 herein.

An example of an arrangement of the key table 245 may be illustrated by the following table, where the xA, xB etc. labels refer to respective GPU/xPUs that may have assigned resources associated with a PASID:

TABLE 1: Tenant Key Table

Tenant Tenant i

Handle «»PASID | sPASID #PASID

WL Key xAPASIDT: | xsPASIDr } xPASIDT1 xAPASIDT2 | xsPASIDt2 } xPASIDT2 xAPASIDT3 | xsPASID:3 ! xPASIDT3 xaAPASIDmy | xePASIDn } ««PASID

In some embodiments, there may be a single PASID covering all GPU/xPU resource assignments for the particular tenant slice and, thus, the key table 245 may be presented in a simpler format:

Tenant Tenant

El dled] [1

The entries in the key table 245 may be arranged in any order, and in some embodiments fewer (or more) entries may be contained in the key table. Thus, in the example as illustrated in FIG. 2, the key table 245 includes entries as follows: for a first tenant workload: {K71, xeuPASIDT:, seuPASIDT, and Hrs}, for a second tenant workload: {K72, seuPASID72, and Hy}, and for a third tenant workload: {Kr3, xeuPASID7:, and Hrs}.

In some embodiments, one or more tenants may each have multiple keys. In some embodiments, one or more tenants may each have at least one seed used to generate keys. Seeds for generating keys for tenants may also be stored in the key table 245, and a seed may be supplied by the key table to the cryptographic processing unit 230 (or to a cryptographic engine) to generate a key that is then stored in the key table and made available for other cryptographic operations.

The key table 245 may correspond to the key table 145 (FIG. 1, already discussed). In operation, the key provisioner 240 may share, for a particular workload, the slice key handle Hy for the workload with the processor(s) performing the workload. The slice key handle Hy may be shared with the respective key provisioner module {e.g., GKP 215 and/or xKP 225 as shown in FIG. 2) of the assigned processor(s). In turn, the respective key provisioner module may share the slice key handle Hr with the assigned slice (e.g., the compute engine 213 for tenant slice 255, or the core 221 for tenant slice 255, as shown in FIG. 2). When executing the tenant workload, to decrypt any portion of the tenant workload/data, the respective tenant slice (e.g., the compute engine 213 or the core 221, as shown in FIG. 2) may send the encrypted data (ciphertext) along with the slice key handle

Hy to the cryptographic processor 230, which may use the slice key handle Hy to retrieve the appropriate key (Ky) from the key table, use that key for decryption of the ciphertext, and return the decrypted data (cleartext) to the requesting slice. At the completion of the workload (or portion of the workload as may be completed by the respective tenant slice), the respective tenant slice may send workload results (cleartext) along with the slice key handle Hy to the cryptographic processor 230, which may encrypt the workload results using the appropriate key retrieved from the key table and return encrypted results (ciphertext) to the requesting slice. In some embodiments, when requesting decryption or encryption, the requesting slice may also send the respective PASID along with the key and slice key handle to the cryptographic processor 230.

FIG. 3 provides a diagram of another example key management system 300 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description. The system 300 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed) and/or system 200 (FIG. 2, already discussed), and those components and features will not be repeated except as necessary to describe the additional components and features shown. The system 300 may include, in addition to the components and features shown in and described with reference to FIG. 1, a workload processing unit which may be a graphics processing unit (GPU) 310, and a key provisioner (KP) 340. In some embodiments, the workload processing unit 310 may be another xPU.

The GPU 310 may include a plurality of compute engines (CE), such as a compute engine

E1 311, a compute engine E2 312, and a compute engine E3 313. Each of the compute engines may, e.g., handle processing tasks particularly suited for parallel processing. Although FIG. 3 shows three compute engines for illustrative purposes, the GPU 310 may have many more than three compute engines. Each compute engine may be selectively allocated and assigned to perform tenant workloads.

The GPU 310 may also include a resource manager (GRM) 314 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed. The GRM 314 may correspond to the

GRM 214 (FIG. 2, already discussed). The GPU 310 may also include a key provisioning module (GKP) 315 to interface with the key provisioner 340. The GKP 315 may correspond to the GKP 115 (FIG. 1, already discussed) and/or to the GKP 215 (FIG. 2, already discussed).

The GPU 310 may also include a plurality of cryptographic engines (CrE), such as a cryptographic engine CrE1 316, a cryptographic engine CrE2 317, and a cryptographic engine CrE3 318. Each of the cryptographic engines may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more cryptographic algorithms and techniques. Although FIG. 3 shows three cryptographic engines for illustrative purposes, the GPU 310 may have many more than three compute engines. Each cryptographic engine may be selectively allocated and assigned to perform tenant workloads. A cryptographic accelerator may be used to implement all or portions of the cryptographic processing engines CrE1

316, CrE2 317, and CrE3 318. In some embodiments, the cryptographic engines CrE1 316, CrE2 317, and CrE3 318 may each be a part of a separate respective tenant slice.

In response to a tenant workload request and allocation of resources, the GPU 310 (via GRM 314) may assign resources and associate the resources with a PASID 319. Tenant context information for a particular tenant workload, which may include the PASID(s) issued for the workload, may be provided to the key provisioner 340 by, e.g., the resource director 150. For a given tenant workload, the resources assigned to the workload for the tenant may be referred to as a tenant slice.

For example, as shown in FIG. 3, resources assigned for a tenant workload for tenant T1 may include the compute engine E3 313 and the cryptographic engine CrE3 318, collectively labeled as tenant slice 355. These assigned are shown for illustrative purposes, and other resource allocations and assignments may be made for other tenant workloads; in this manner, a plurality of tenant workloads may be executed at the same time or during overlapping time periods.

The key provisioner 340 may provision keys and manage a key table 345 that holds keys used for securing and decrypting tenant workloads/data. The key provisioner 340 may provision keys in concert with a cryptographic engine (e.g., the cryptographic engine may generate the keys upon request by the key provisioner 340), where the cryptographic engine may be one of CrE1 316,

CrE2 317, or CrE3 318, or another cryptographic engine (not shown). In some embodiments, the key provisioner 340 may incorporate a cryptographic engine for key generation. The KP 340 may correspond to the KP 140 (FIG. 1, already discussed) and/or to the KP 240 (FIG. 1, already discussed).

A memory for storing data entries in the key table 345 may include any suitable memory for secure data storage and may be hosted by any portion of the system 300. The key table 345 may include, for each tenant workload, a PASID assigned by the GPU 310, a tenant key Ky for protecting the tenant workload, and a slice key handle (SKH) H; for the tenant workload. Each slice key handle is unique to the associated tenant key. As an example, for the tenant workload slice 355 shown in

FIG. 3, the key table may include entries platform key Kyi, a PASID:: for the assigned compute engine E3 213 and cryptographic engine CrE3 318, and a slice key handle (SKH) Hy associated with the key Kr, and the PASID:: entries. Further details regarding keys and key generation are described with reference to FIG. 4 herein. In the example as illustrated in FIG. 3, the key table 345 includes entries as follows: for a first tenant workload: {Ky1, PASIDy4, and Hr}, for a second tenant workload: {K72, PASIDrz, and Hz}, and for a third tenant workload: {Kr3, PASID:3, and Hrs}.

The key table 345 may correspond to the key table 145 (FIG. 1, already discussed), and/or the key table 245 (FIG. 2, already discussed). In operation, the key provisioner 340 may share, for a particular workload, the slice key handle Hy for the workload with the processor performing the workload. For example, the slice key handle Hy may be shared with the key provisioner module GKP 315. In turn, the key provisioner module 315 may share the slice key handle Hy with the assigned slice (e.g., the compute engine 313 for tenant slice 355, as shown in FIG. 3). When executing the tenant workload, to decrypt any portion of the tenant workload/data, the tenant slice (e.g., the compute engine 313 as shown in FIG. 3) may send the encrypted data (ciphertext) along with the slice key handle Hr to the assigned cryptographic engine (e.g., the CrE3 318 as shown in FIG. 3), which may use the slice key handle Hr to retrieve the appropriate key from the key table, use that key for decryption of the ciphertext, and return the decrypted data (cleartext) to the requesting compute engine. At the completion of the workload, the respective tenant slice may send workload results (cleartext) along with the slice key handle Hr to the cryptographic engine, which may encrypt the workload results using the appropriate key retrieved from the key table and return encrypted results (ciphertext) to the requesting slice. In some embodiments, when requesting decryption or encryption, the requesting slice may also send the respective PASID along with the key and slice key handle to the assigned cryptographic engine. In some embodiments, resource assignment may include use of memory address partitions such that a handle value may be combined (e.g., “swizzling”) with the memory partition assigned to the cryptographic engine slice, such that an attempt by a second cryptographic engine to use a handle assigned to a first cryptographic engine can be determined to be maliciously or erroneously used by the second cryptographic engine. The key table 345 may have access logic {or firmware) that may implement an algorithm to check the handle origination (e.g., “de-swizzling”) to enforce slice isolation semantics.

FIG. 4 provides a diagram illustrating operation of an example key management system 400 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description. The system 400 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed), system 200 (FIG. 2, already discussed), and/or system 300 (FIG. 3, already discussed), and those components and features will not be repeated except as necessary to describe the additional components and features shown. The system 400 may include one or more processors (e.g., GPU or xPU) 410, one or more cryptographic engines 415, a key provisioner (KP) 420, and a resource director (RD) 150, which may be part of a platform 405. The system 400 may communicate with one or more tenants 160, ... 160 (e.g., tenant 1604) through an orchestrator 170, which may be situated in or close to a cloud/edge 180 (not shown in FIG. 4). The GPU/xPU 410 may correspond to the

GPU 110 (FIG. 1, already discussed), the xPU 120 (FIG. 1, already discussed), the GPU 210 (FIG. 2, already discussed), the xPU 220 (FIG. 2, already discussed), and/or the GPU 310 (FIG. 3, already discussed). The cryptographic engine 415 may correspond to the cryptographic processing unit 130

(FIG. 1, already discussed), the cryptographic processing unit 230 (FIG. 2, already discussed), and/or one or more of the cryptographic engines 316, 317, and/or 318 (FIG. 3, already discussed).

The platform 405 may correspond to platform 105 (FIG. 1, already discussed).

The key provisioner 420 may provision keys in concert with the cryptographic engine 415 (e.g., the cryptographic engine 415 may generate the keys upon request by the key provisioner 420).

In some embodiments, the key provisioner 420 may incorporate a cryptographic engine for key generation. A memory for storing data entries in the key table 425 may include any suitable memory for secure data storage and may be hosted by any portion of the system 400. The KP 420 may correspond to the KP 140 (FIG. 1, already discussed), the KP 240 (FIG. 2, already discussed), and/or the KP 340 (FIG. 3, already discussed). The key table 425 may correspond to the key table 145 (FIG. 1, already discussed), the key table 245 (FIG. 2, already discussed), and/or the key table 345 (FIG. 3, already discussed).

Operation of system 400 may be illustrated through a sequence of events (process 440) as shown in the example in FIG. 4. At label 442, a tenant 160, may submit a request to orchestrator 170 to carry out a workload WL. In response, at label 444 the orchestrator 170 may schedule the workload WVL for execution by the platform 405 and send a workload request for WL to the resource director 150. At label 446, the resource director 150 may determine which resource allocations are required and send a request to the GPU/xPU 410 for resource assignments to meet the required resource allocations. The request may be accompanied by a PASID for the workload. Resources may be allocated and assigned from among one or more of the following: compute engines, processing cores, memory, input/output devices (e.g., sensors), accelerators (e.g., artificial intelligence accelerators), cryptographic engines (or cryptographic processors), applications, etc. At label 448, the GPU/xPU 410 may assign the resources (i.e., tenant slice) required for workload WL, and may send a confirmation of the resource assignment for the workload slice with the associated

PASID to the resource director 150. At label 450, the resource director 150 may provide tenant context information to the key provisioner 420, which may include the PASID for the tenant slice for the workload WL; and other information about the tenant/workload.

At label 452, after receiving the PASID for the workload slice, the key provisioner 420 may provision a tenant key Ky for the workload WL. The tenant key Kr may be generated by the key provisioner 420 in concert with the cryptographic engine 415. Any suitable cryptographic algorithm(s) may be used for generating the tenant key Ky and for using the tenant key to encrypt or decrypt data.

For example, the tenant key Ky may be an asymmetric key pair generated based on a public/private key algorithm, such as, e.g., the RSA algorithm. The key provisioner 420 may then store, in key table 425, the generated tenant key Ky, the PASID for the workload WL: and slice key handle, Hy, associated with the tenant key Kr and PASID; these may be stored in a single entry or as related entries, or may be stored as a database record, etc. In some embodiments only the respective private key of the Ky pair is stored in the key table 425.

At label 454, the key provisioner 420 may send the tenant key Ky to the orchestrator 170. In embodiments where the tenant key Kr is a public/private key pair, only the respective public key of the Ky pair is sent to the orchestrator 170. At label 458, the key provisioner 420 may send the slice key handle (SKH) Hy along with the associated PASID for the workload to the GPU/xPU 410 for use in subsequent processing of the tenant workload by the assigned tenant slice.

At label 458, the sent Kr {e.g., a public key) may be used to wrap (i.e, encrypt) a data key

DK: (e.g., a symmetric key) specific to the tenant, where the data key DK: is used to encrypt the tenant workload/data. The result is a key-wrapped key KWKy that may be sent to the platform 405 in a manner that secures the data key DK; and allows authorized access only for use in decrypting the tenant workload/data during performance of the workload. The symmetric data key DK; may be generated and the workload/data encrypted by the tenant 160+ and/or the orchestrator 170 (e.g., working in concert). The encryption of the workload/data may be performed in advance or at any time before sending the encrypted data to the platform 405. In some embodiments (not shown in

FIG. 4), if the tenant and the orchestrator 170 are unable to generate the data key, the key provisioner 420 may, upon request, provision a data key DK: (e.g., a symmetric key is generated) and send the

DK: to the orchestrator 170 for use in encrypting the tenant workload/data. In some embodiments, upon provisioning a data key the key provisioner 420 may store the data key DKy in key table 425.

At label 460, the orchestrator 170 may send the encrypted workload/data WL, along with a key-wrapped data key (KWKy), to the GPU/xPU 410 for execution via the assigned tenant slice. The assigned tenant slice in GPU/xPU 410 may then commence processing the workload.

In executing the workload WL, the assigned tenant slice of the GPU/xPU 410 may need to decrypt all or part of the data in the workload (which has been previously encrypted using the data key (discussed herein with reference to label 458). At label 462, the assigned tenant slice may provide (via the GPU/xPU 410) the ciphertext (i.e, encrypted data to be decrypted), the KWK:;, and the slice key handle Hy to the cryptographic engine 415. In some embodiments, the PASID may also be provided to the cryptographic engine 415.

At label 464, using the slice key handle Hy, the cryptographic engine 415 may access the key table 425 and retrieve the respective tenant key Ky from the key table 425, the Ky being previously used to generate the key-wrapped key KWK:z. In the case where Ky is a public/private key pair, the key retrieved from the key table is the respective private key of the pair (the respective public key of the pair having been sent to the orchestrator 170 and used to generate the KWKy). At label 468, the cryptographic engine 415 may then use the retrieved Ky to unwrap (i.e., decrypt) the KWKy and obtain the data key DK: for the tenant. Where the retrieved Ky is the private key of the public/private pair,

the private key will successfully obtain the data key DK: by unwrapping (decrypting) the key-wrapped key KWK; because the KWK; was generated by wrapping the DK; with the public key of the Ks pair.

The tenant key Ky and the data key DK; may have different lifetimes / lifecycles. In some embodiments, the tenant key Ky may be a symmetric key generated based on a symmetric key algorithm. A symmetric Kr performs the same function as an asymmetric Ky. For example, a symmetric Kr may be used to wrap the data key DKy. While there is potential risk associated with the orchestrator 170 having access to a symmetric Kr (e.g., the orchestrator could masquerade as the tenant / tenant compute engine), the risk may be minimal because of other trust mechanisms that may be in place. For example, the orchestrator 170 may be a highly trusted entity in the system, with precautions in place such as attestation and lifecycle management that ensures trust and/or monitors for misbehavior.

After the data key DK: is unwrapped, the DK; may be used to decrypt ciphertext (i.e., the encrypted tenant workload/data) into cleartext. At label 468, the cryptographic engine 415 may send the cleartext to the GPU/xPU 410. In some embodiments, the PASID may also be sent with the cleartext. The data key DK: is not sent to the GPU/xPU 410. In some embodiments, the data key

DK; and/or the key-wrapped key KWK; may be stored in the key table 425 with the associated slice key handle Hy.

The tenant slice may then continue executing the tenant workload. Once the tenant slice has completed the workload, the workload results may need to be reported back to the orchestrator 170 and/or the tenant 1604. At label 470, the GPU/xPU 410 may provide the results (e.g., data) as cleartext to the cryptographic engine 415, along with the slice key handle Hy (and, in some embodiments, the PASID), to encrypt the results.

Using the slice key handle Hr, the cryptographic engine 415 may at label 472 access the key table 425 and retrieve the data key DK: (if stored in the key table) or the key-wrapped key KWK: (if stored in the key table); if retrieving the KWK:, the cryptographic engine 415 may also retrieve the tenant key Kr in order to obtain the data key DK: from the KWKT (as previously described). In either case, the cryptographic engine 415 may use the DK: to encrypt the cleartext (results) into ciphertext.

In an alternative performed in some embodiments (not shown in FIG. 4), the data key DK: is not used to encrypt the workload results. Instead, the key provisioner 420 may (in concert with the cryptographic engine 415) provision a new symmetric data key ND: and request a new public key (part of a public/private key pair NKr) from the orchestrator 170, which may be provided by the tenant 1604. The new symmetric data key ND: may be stored in the key table 425 with the associated slice key handle Hr. Upon receiving the new public key NK: from the orchestrator 170, the key provisioner 420 may store the new public key in the key table 425 with the associated slice key handle Hy. Using the handle Hr, the cryptographic engine 415 may retrieve the new symmetric data key ND: from the key table 425 and use the key to encrypt the cleartext (results) into ciphertext. The cryptographic engine 415 may also retrieve the new public key NK: and use the public key to encrypt the new data key into a new key-wrapped key NWK:.

At label 474, the cryptographic engine 415 may send the ciphertext (encrypted results) to the

GPU/xPU 410. For the alternative embodiments that provision and use the new data key NDr to encrypt the results, the cryptographic engine 415 may also send the new key-wrapped key NWKy with the ciphertext results.

At label 476, the GPU/xXPU 410 may (via the tenant slice) use the ciphertext received from the cryptographic engine 415 to package the encrypted results of the tenant workload WL: and send to the orchestrator 170. For the alternative embodiments that use the new data key ND; to encrypt the results, the new key-wrapped key NWK: may be sent to the orchestrator 170 with the results. At label 478, the orchestrator 170 may send the encrypted workload results to the tenant. Where the results were encrypted with the data key DKy, the tenant may use the DK: (to which it should already have access) to decrypt the workload results. In the alternative embodiments where the results were encrypted with the new data key ND+, the orchestrator 170 may provide the new key-wrapped key

NWK: to the tenant, which may use the new private key NK: (to which it should already have access) to obtain the new data key ND: from the key-wrapped key NWKy. Once obtaining the new data key

NDy, the tenant can use the key to decrypt the workload results.

The foregoing example sequence of events of process 440 are described for purposes of illustrating operation of system 400. In some embodiments the events of process 440 may occur in different order and may include different and/or additional and/or fewer events.

In some embodiments, some tenants may have a permanent (or long-term) presence in the platform 405. For example, some tenants may have a workload that runs periodically on the platform.

In such cases, the tenant keys Ky may be persistent keys that are wrapped using a key-wrapping storage key that requires a user authentication to unwrap. The user authentication may be provided by the tenant or the orchestrator and may be performed at or near the beginning of each workload cycle. Furthermore, the tenant storage keys may be stronger keys requiring larger key sizes and additional entropy supplied by the tenant or orchestrator. Additionally, a seed that generates storage keys may include tenant-related data (such as, e.g., a PASID or other tenant identifier). Seed data obtained from a tenant may be stored in the key table with an associated handle so that a tenant specific key table archive can be created.

FIG. 5 provides a flowchart illustrating an example method 500 of operating a key management system according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description. The method 500 may be implemented as one or more modules in a set of logic instructions stored in a non-transitory machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., in configurable logic such as, for example, programmable logic arrays (PLAS), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), in fixed-functionality hardware logic using circuit technology such as, for example, application specific integrated circuit (ASIC), complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.

For example, computer program code to carry out operations shown in the method 70 may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.

Additionally, logic instructions might include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit’ CPU, microcontroller, etc.).

Illustrated processing block 510 provides for provisioning a separate tenant key for each workload, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices assigned to the respective workload. At illustrated processing block 520, each respective tenant key is provided to a respective requestor of each workload. At illustrated processing block 530, for each workload, the respective key handle is provided to the assigned tenant slice, the tenant slice to use the key handle to perform, via the cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

At illustrated processing block 540, the key handle is provided to the cryptographic engine, and the cryptographic engine is to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation. Illustrated processing block 550 provides for decrypting, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key. Illustrated processing block 560 provides for using the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext.

FIG. 6 shows a block diagram illustrating an example computing system 10 for key management for multi-tenant workload environments according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description. The system 10 may generally be part of an electronic device/platform having computing and/or communications functionality (e.g., server, cloud infrastructure controller, database controller, notebook computer, desktop computer, personal digital assistant/PDA, tablet computer, convertible tablet, smart phone, etc.), imaging functionality (e.g., camera, camcorder), media playing functionality (e.g., smart television/TV), wearable functionality (e.g., watch, eyewear, headwear, footwear, jewelry), vehicular functionality (e.g., car, truck, motorcycle), robotic functionality (e.g., autonomous robot), Internet of Things (loT) functionality, etc., or any combination thereof. In the illustrated example, the system 10 may include a host processor 12 (e.g., central processing unity CPU) having an integrated memory controller (IMC) 14 that may be coupled to system memory 20. The host processor 12 may include any type of processing device, such as, e.g., microcontroller, microprocessor, RISC processor, ASIC, etc., along with associated processing modules or circuitry.

The system memory 20 may include any non-transitory machine- or computer-readable storage medium such as RAM, ROM, PROM, EEPROM, firmware, flash memory, etc., configurable logic such as, for example, PLAs, FPGAs, CPLDs, fixed-functionality hardware logic using circuit technology such as, for example, ASIC, CMOS or TTL technology, or any combination thereof suitable for storing instructions 28.

The system 10 may also include an input/output (I/O) subsystem 16. The I/O subsystem 16 may communicate with for example, one or more input/output (I/O) devices 17, a network controller 24 (e.g., wired and/or wireless NIC), and storage 22. The storage 22 may be comprised of any appropriate non-transitory machine- or computer-readable memory type (e.g., flash memory, DRAM,

SRAM (static random access memory), solid state drive (SSD), hard disk drive (HDD), optical disk, etc). The storage 22 may include mass storage. In some embodiments, the host processor 12 and/ or the I/O subsystem 16 may communicate with the storage 22 (all or portions thereof) via a network controller 24. In some embodiments, the system 10 may also include a graphics processor 26 (e.g., a graphics processing unit/GPU) and a cryptographic accelerator 27. The cryptographic accelerator 27 may include an ASIC with algorithms specifically tuned for performing cryptographic operations.

The host processor 12 and the I/O subsystem 16 may be implemented together on a semiconductor die as a system on chip (SoC) 11, shown encased in a solid line. The SoC 11 may therefore operate as a computing apparatus for key management for multi-tenant workload environments. In some embodiments, the SoC 11 may also include one or more of the system memory 20, the network controller 24, and/or the graphics processor 26 (shown encased in dotted lines). In some embodiments, the SoC 11 may also include other components of the system 10.

The host processor 12 and/or the I/O subsystem 16 may execute program instructions 28 retrieved from the system memory 20 and/or the storage 22 to perform one or more aspects of process 440 (FIG. 4) and/or process 500 (FIG. 5). The system 10 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4. The system 10 is therefore considered to be performance-enhanced at least to the extent that the technology provides for the secure exchange and use of tenant specific keys within a multi-tenant workload environment.

Computer program code to carry out the processes described above may be written in any combination of one or more programming languages, including an object-oriented programming language such as JAVA, JAVASCRIPT, PYTHON, SMALLTALK, C++ or the like and/or conventional procedural programming languages, such as the “C” programming language or similar programming languages, and implemented as program instructions 28. Additionally, program instructions 28 may include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, microprocessor, etc.).

I/O devices 17 may include one or more of input devices, such as a touch-screen, keyboard, mouse, cursor-control device, touch-screen, microphone, digital camera, video recorder, camcorder, biometric scanners and/or sensors; input devices may be used to enter information and interact with system 10 and/or with other devices. The I/O devices 17 may also include one or more of output devices, such as a display (e.g., touch screen, liquid crystal display/LCD, light emitting diode/LED display, plasma panels, etc.), speakers and/or other visual or audio output devices. The input and/or output devices may be used, e.g., to provide a user interface.

FIG. 7 shows a block diagram illustrating an example semiconductor apparatus 30 for key management for multi-tenant workload environments according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description. The semiconductor apparatus 30 may be implemented, e.g., as a chip, die, or other semiconductor package. The semiconductor apparatus 30 may include one or more substrates 32 comprised of, e.g., silicon, sapphire, gallium arsenide, etc. The semiconductor apparatus 30 may also include logic 34 comprised of, e.g., transistor array(s) and other integrated circuit (IC) components) coupled to the substrate(s) 32. The logic 34 may be implemented at least partly in configurable logic or fixed-functionality logic hardware. The logic 34 may implement the system on chip (SoC) 11 described above with reference to FIG. 6. The logic 34 may implement one or more aspects of the processes described above, including process 440 (FIG. 4) and/or process 500 (FIG. 5). The logic 34 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4. The apparatus 30 is therefore considered to be performance-enhanced at least to the extent that the technology provides for the secure exchange and use of tenant specific keys within a multi-tenant workload environment.

The semiconductor apparatus 30 may be constructed using any appropriate semiconductor manufacturing processes or techniques. For example, the logic 34 may include transistor channel regions that are positioned (e.g., embedded) within the substrate(s) 32. Thus, the interface between the logic 34 and the substrate(s) 32 may not be an abrupt junction. The logic 34 may also be considered to include an epitaxial layer that is grown on an initial wafer of the substrate(s) 34.

FIG. 8 is a block diagram illustrating an example processor core 40 according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description. The processor core 40 may be the core for any type of processor, such as a micro-processor, an embedded processor, a digital signal processor (DSP), a network processor, a graphics processing unit (GPU), or other device to execute code. Although only one processor core 40 is illustrated in FIG. 8, a processing element may alternatively include more than one of the processor core 40 illustrated in FIG. 8. The processor core 40 may be a single- threaded core or, for at least one embodiment, the processor core 40 may be multithreaded in that it may include more than one hardware thread context (or “logical processor”) per core.

FIG. 8 also illustrates a memory 41 coupled to the processor core 40. The memory 41 may be any of a wide variety of memories (including various layers of memory hierarchy) as are known or otherwise available to those of skill in the art. The memory 41 may include one or more code 42 instruction(s) to be executed by the processor core 40. The code 42 may implement one or more aspects of process 440 (FIG. 4) and/or process 500 (FIG. 5). The processor core 40 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4. The processor core 40 may follow a program sequence of instructions indicated by the code 42. Each instruction may enter a front end portion 43 and be processed by one or more decoders 44. The decoder 44 may generate as its output a micro operation such as a fixed width micro operation in a predefined format, or may generate other instructions, microinstructions, or control signals which reflect the original code instruction. The illustrated front end portion 43 also includes register renaming logic 46 and scheduling logic 48, which generally allocate resources and queue the operation corresponding to the convert instruction for execution.

The processor core 40 is shown including execution logic 50 having a set of execution units 55-1 through 55-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. The illustrated execution logic 50 performs the operations specified by code instructions.

After completion of execution of the operations specified by the code instructions, back end logic 58 retires the instructions of code 42. In one embodiment, the processor core 40 allows out of order execution but requires in order retirement of instructions. Retirement logic 59 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like). In this manner,

the processor core 40 is transformed during execution of the code 42, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 46, and any registers (not shown) modified by the execution logic 50.

Although not illustrated in FIG. 8, a processing element may include other elements on chip with the processor core 40. For example, a processing element may include memory control logic along with the processor core 40. The processing element may include I/O control logic and/or may include I/O control logic integrated with memory control logic. The processing element may also include one or more caches.

FIG. 9 is a block diagram illustrating an example of a multi-processor based computing system 60 according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description. The multiprocessor system 60 includes a first processing element 70 and a second processing element 80. While two processing elements 70 and 80 are shown, it is to be understood that an embodiment of the system 60 may also include only one such processing element.

The system 60 is illustrated as a point-to-point interconnect system, wherein the first processing element 70 and the second processing element 80 are coupled via a point-to-point interconnect 71. It should be understood that any or all of the interconnects illustrated in FIG. 9 may be implemented as a multi-drop bus rather than point-to-point interconnect.

As shown in FIG. 9, each of the processing elements 70 and 80 may be multicore processors, including first and second processor cores (i.e., processor cores 74a and 74b and processor cores 84a and 84b). Such cores 74a, 74b, 84a, 84b may be configured to execute instruction code in a manner similar to that discussed above in connection with FIG. 8.

Each processing element 70, 80 may include at least one shared cache 99a, 99b. The shared cache 99a, 99b may store data (e.g., instructions) that are utilized by one or more components of the processor, such as the cores 74a, 74b and 84a, 84b, respectively. For example, the shared cache 99a, 99b may locally cache data stored in a memory 62, 63 for faster access by components of the processor. In one or more embodiments, the shared cache 99a, 99b may include one or more mid- level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, a last level cache (LLC), and/or combinations thereof.

While shown with only two processing elements 70, 80, it is to be understood that the scope of the embodiments is not so limited. In other embodiments, one or more additional processing elements may be presentin a given processor. Alternatively, one or more of the processing elements 70, 80 may be an element other than a processor, such as an accelerator or a field programmable gate array. For example, additional processing element(s) may include additional processors(s) that are the same as a first processor 70, additional processor(s) that are heterogeneous or asymmetric to processor a first processor 70, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processing element. There can be a variety of differences between the processing elements 70, 80 in terms of a spectrum of metrics of merit including architectural, micro architectural, thermal, power consumption characteristics, and the like. These differences may effectively manifest themselves as asymmetry and heterogeneity amongst the processing elements 70, 80. For at least one embodiment, the various processing elements 70, 80 may reside in the same die package.

The first processing element 70 may further include memory controller logic (MC) 72 and point-to-point (P-P) interfaces 76 and 78. Similarly, the second processing element 80 may include a MC 82 and P-P interfaces 86 and 88. As shown in FIG. 9, MC's 72 and 82 couple the processors to respective memories, namely a memory 62 and a memory 63, which may be portions of main memory locally attached to the respective processors. While the MC 72 and 82 is illustrated as integrated into the processing elements 70, 80, for alternative embodiments the MC logic may be discrete logic outside the processing elements 70, 80 rather than integrated therein.

The first processing element 70 and the second processing element 80 may be coupled to an

I/O subsystem 90 via P-P interconnects 76 and 86, respectively. As shown in FIG. 9, the I/O subsystem 90 includes P-P interfaces 94 and 98. Furthermore, the I/O subsystem 90 includes an interface 92 to couple I/O subsystem 90 with a high performance graphics engine 64. In one embodiment, a bus 73 may be used to couple the graphics engine 64 to the I/O subsystem 90.

Alternately, a point-to-point interconnect may couple these components.

In turn, the I/O subsystem 90 may be coupled to a first bus 65 via an interface 96. In one embodiment, the first bus 65 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or another third generation I/O interconnect bus, although the scope of the embodiments are not so limited.

As shown in FIG. 9, various I/O devices 65a (e.g., biometric scanners, speakers, cameras, and/or sensors) may be coupled to the first bus 65, along with a bus bridge 66 which may couple the first bus 65 to a second bus 67. In one embodiment, the second bus 67 may be a low pin count (LPC) bus. Various devices may be coupled to the second bus 67 including, for example, a keyboard/mouse 67a, communication device(s) 67b, and a data storage unit 68 such as a disk drive or other mass storage device which may include code 69, in one embodiment. The illustrated code 69 may implement one or more aspects of the processes described above, including process 440 (FIG. 4) and/or process 500 (FIG. 5). The illustrated code 69 may be similar to the code 42 (FIG. 8), already discussed. Further, an audio I/O 67c may be coupled to second bus 67 and a battery 61 may supply power to the computing system 60. The system 60 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to

FIGs. 1-4.

Note that other embodiments are contemplated. For example, instead of the point-to-point architecture of FIG. 9, a system may implement a multi-drop bus or another such communication topology. Also, the elements of FIG. 9 may alternatively be partitioned using more or fewer integrated chips than shown in FIG. 9.

Embodiments of each of the above systems, devices, components and/or methods, including the system 10, the semiconductor apparatus 30, the processor core 40, the system 60, system 100, platform 105, system 200, system 300, system 400, platform 405, process 440, process 500, and/or any other system components, may be implemented in hardware, software, or any suitable combination thereof. For example, hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAS), field programmable gate arrays (FPGAS), complex programmable logic devices (CPLDs), or fixed-functionality logic hardware using circuit technology such as, for example, application specific integrated circuit (ASIC), complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.

Alternatively, or additionally, all or portions of the foregoing systems and/or components and/or methods may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system (OS) applicable/appropriate programming languages, including an object-oriented programming language such as PYTHON, PERL, JAVA, SMALLTALK, C++, C# or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.

Additional Notes and Examples:

Example 1 includes a computing system comprising a workload processor allocable into a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, a cryptographic engine coupled to the workload processor, a memory coupled to the cryptographic engine, the memory to store a key table, a key provisioner coupled to the memory, the key provisioner including one or more substrates and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic to provision a separate tenant key for each workload, each tenant key stored in the key table with an associated unique key handle and a resource identifier for a separate one of the plurality of tenant slices assigned to the respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload,

the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via the cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

Example 2 includes the system of Example 1, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.

Example 3 includes the system of Example 2, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.

Example 4 includes the system of Example 3, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.

Example 5 includes the system of Example 2, wherein the tenant key comprises a public- private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.

Example 6 includes the system of any one of Examples 1-5, wherein the logic is further to provision, for each workload, a second data key for use in encrypting the respective workload results.

Example 7 includes a semiconductor apparatus comprising one or more substrates, and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to provision a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

Example 8 includes the semiconductor apparatus of Example 7, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.

Example 9 includes the semiconductor apparatus of Example 8, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.

Example 10 includes the semiconductor apparatus of Example 9, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.

Example 11 includes the semiconductor apparatus of Example 8, wherein the tenant key comprises a public-private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair, and wherein the logic is further to provision, for each workload, a second data key for use in encrypting the respective workload results.

Example 12 includes the semiconductor apparatus of any one of Examples 7-11, wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.

Example 13 includes at least one non-transitory computer readable storage medium comprising a set of instructions which, when executed by a computing system, cause the computing system to provision a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

Example 14 includes the at least one non-transitory computer readable storage medium of

Example 13, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.

Example 15 includes the at least one non-transitory computer readable storage medium of

Example 14, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.

Example 16 includes the at least one non-transitory computer readable storage medium of

Example 15, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.

Example 17 includes the at least one non-transitory computer readable storage medium of

Example 14, wherein the tenant key comprises a public-private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.

Example 18 includes the at least one non-transitory computer readable storage medium of any one of Examples 13-17, wherein the instructions, when executed, further cause the computing system to provision, for each workload, a second data key for use in encrypting the respective workload results.

Example 19 includes a method comprising provisioning a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, providing each respective tenant key to a respective requestor of each workload, and providing, for each workload, the respective key handle to the assigned tenant slice, the tenant slice using the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.

Example 20 includes the method of Example 19, wherein using the key handle to perform, via the cryptographic engine, the cryptographic operation comprises providing the key handle to the cryptographic engine, the cryptographic engine accessing, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.

Example 21 includes the method of Example 20, wherein the cryptographic operation comprises decrypting, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.

Example 22 includes the method of Example 21, further comprising using, by the cryptographic engine, the unwrapped data key in a second cryptographic operation, wherein the second cryptographic operation includes decrypting ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine further provides the cleartext to the assigned tenant slice.

Example 23 includes the method of Example 20, wherein the tenant key comprises a public- private key pair, wherein providing each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein accessing, based on the key handle, the associated tenant key in the key table comprises accessing the private key of the key pair.

Example 24 includes the method of any one of Examples 19-23, further comprising provisioning, for each workload, a second data key for use in encrypting the respective workload results.

Example 25 includes an apparatus comprising means for performing the method of any one of Examples 19-23.

Thus, technology described herein improves the performance of computing systems by providing for the secure exchange and use of tenant specific keys for cryptographic operations required in executing tenant workloads within a multi-tenant workload environment. By maintaining a key table separate from tenant slices and limiting key access to the key provisioner and cryptographic processor/engines, tenant workloads and data may be securely held within the tenant slice context. For example, the technology may enable GPU/xPU caching schemes that allocate cache lines for L1 and L2 cache based on a tenant specific context. Respecting tenant boundaries in the cache lines helps mitigate a variety of possible side-channel attacks.

Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.

Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments.

Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.

The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.

As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” may mean A, B,C; A and B; A and C; B and C; or A, Band C.

Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.

Claims

CLAIMS

A computer system comprising: a workload processor assignable in a plurality of tenant slices, each respective tenant slice required to execute at least a portion of a separate respective workload; a cryptographic engine coupled to the workload processor; a memory coupled to the cryptographic engine, the memory for storing a key table; a key issuer coupled to the memory, the key issuer comprising one or more substrates and logic coupled to the one or more substrates, the logic being at least partially implemented in one or more configurable logic or fixed functionality hardware logic, the logic serving to: provide a separate tenant key for each workload, each tenant key being stored in the key table with an associated unique key handle and a resource identifier for an individual instance of the plurality of tenant slices assigned to the respective workload; provide each respective tenant key to a respective requestor of each workload; and for each workload provide the respective key handle to the assigned tenant slice, wherein the tenant slice is to use the key handle to perform a cryptographic operation, via the cryptographic engine, on a wrapped data key associated with the workload.

The system of claim 1, wherein to use the key handle to perform a cryptographic operation, via the cryptographic engine, the tenant slice must provide the key handle to the cryptographic engine, the cryptographic engine having to access, based on the key handle, to the associated tenant key in the key table for use in performing the cryptographic operation.

The system of claim 1 or 2, wherein the cryptographic operation is to decrypt the wrapped data key associated with the workload using the tenant key to generate an unwrapped data key.

The system of claim 1, 2, or 3, wherein the cryptographic engine is to use the extracted data key in a second cryptographic operation, the second cryptographic operation being to decrypt ciphertext associated with the respective workload into plaintext, and where the cryptographic engine should provide the plaintext to the assigned tenant slice.

The system of claim 1, 2, 3 or 4, wherein the tenant key comprises a public-private key pair, providing each respective tenant key to a respective requestor of each workload and sending only the public key of the key pair to the workload requestor and wherein accessing the associated tenant key based on the key handle includes accessing the private key of the key pair.

The system of any of claims 1-5, wherein the logic is further to provide a second data key, for each workload, for use in encrypting the respective workload results.

7. A semiconductor device comprising: one or more substrates; and logic associated with the one or more substrates, where the logic is at least partially implemented in one or more configurable logic or fixed functionality hardware logic, where the logic associated with the one or more substrates serves to: provide a separate tenant key for each of a plurality of workloads, where each tenant key is stored in a key table with an associated unique key handle and a resource identifier for a separate instance of a plurality of tenant slices, where each respective tenant slice must run at least a portion of a separate respective workload ; provide each respective tenant key to a respective requestor of each workload; and provide the respective key handle, for each workload, to the assigned tenant slice, wherein the tenant slice is to use the key handle to perform a cryptographic operation, via a cryptographic engine, on a wrapped data key associated with the workload.

The semiconductor device of claim 7, wherein to use the key handle to perform the cryptographic operation, through the cryptographic engine, the tenant slice must provide the key handle to the cryptographic engine, the cryptographic engine having to access, based on the key handle, to the associated tenant key in the key table for use in performing the cryptographic operation.

The semiconductor device of claim 7 or 8, wherein the cryptographic operation is to decrypt the packed data key associated with the workload, using the tenant key, to generate an unpacked data key.

The semiconductor device of claim 7, 8 or 9, wherein the cryptographic engine is to use the extracted data key in a second cryptographic operation, the second cryptographic operation being to decrypt ciphertext associated with the respective workload into plaintext, and where the cryptographic engine should provide the plaintext to the assigned tenant slice.

The semiconductor device of claim 7, 8, 9 or 10, wherein the tenant key comprises a public-private key pair, providing each respective tenant key to a respective requestor of each workload and sending only the public key of the key pair to the workload requestor and wherein accessing, based on the key handle, the associated tenant key in the key table includes accessing the private key of the key pair, and wherein the logic is further required to provide a second data key, for each workload, for use when encrypting the respective workload results.

The semiconductor device of any one of claims 7 to 11, wherein the logic coupled to the one or more substrates comprises transistor channel regions positioned within the one or more substrates.

13. At least one non-perishable computer-readable storage medium that contains a sequence of instructions, when executed by a computer system, that causes the computer system to: Issue a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table having an associated unique key handle and resource identifier for an individual instance of a plurality of tenant slices, each respective tenant slice being required to run at least a portion of a separate respective workload; each respective tenant key provided to a respective requestor of each workload; and provides the respective key handle, for each workload, to the assigned tenant slice, where the tenant slice is to use the key handle to perform a cryptographic operation, via a cryptographic engine, on a wrapped data key associated with the workload.

The at least one non-perishable computer-readable storage medium of claim 13, wherein to use the key handle to perform the cryptographic operation, through the cryptographic engine, the tenant slice must provide the key handle to the cryptographic engine, the cryptographic engine allowing access should obtain, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.

The at least one non-perishable computer-readable storage medium according to claim 13 or 14, wherein the cryptographic operation is to decrypt the packaged data key associated with the workload, using the tenant key, to generate an extracted data key.

The at least one non-perishable computer readable storage medium according to claim 13, 14 or 15, wherein the cryptographic engine is to use the extracted data key in a second cryptographic operation, the second cryptographic operation being to extract ciphertext associated with the respective workload. decrypt to plaintext, and requiring the cryptographic engine to provide the plaintext to the assigned tenant slice.

The at least one non-perishable computer-readable storage medium as recited in claim 13, 14, 15 or 18, wherein the tenant key comprises a public-private key pair, wherein providing each respective tenant key to a respective requestor of each workload includes sending only the public key of the key pair to the workload requestor, and wherein accessing, based on the key handle, the associated tenant key in the key table includes accessing the private key of the key pair.

The at least one non-perishable computer-readable storage medium according to any one of claims 13-17, wherein the instructions, if executed, further cause the computer system to provide a second data key, for each workload, for use in encrypting the respective workload results.

19. A method comprising: providing a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate instance of a plurality of tenant slices, each respective tenantslice must run at least part of a separate respective workload; providing each respective tenant key to a respective requestor of each workload; and providing, for each workload, the respective key handle to the assigned tenant slice, the tenant slice using the key handle to perform a cryptographic operation, via a cryptographic engine, on a wrapped data key associated with the workload.

The method of claim 19, wherein using the key handle to perform the cryptographic operation, via the cryptographic engine, comprises providing the key handle to the cryptographic engine, the cryptographic engine accessing, based on the key handle, to the associated tenant key in the key table for use in performing the cryptographic operation.

The method of claim 19 or 20, wherein the cryptographic operation comprises encrypting, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.

The method of claim 19, 20 or 21, further comprising using, by the cryptographic engine, the extracted data key in a second cryptographic operation, the second cryptographic operation decrypting ciphertext associated with the respective workload to plaintext, and wherein the cryptographic engine further provides the plaintext to the assigned tenant slice.

The method of claim 19, 20, 21 or 22, wherein the tenant key comprises a public-private key pair, providing each respective tenant key to a respective requestor of each workload and sending only the public key of the key pair to the workload requestor and wherein accessing, based on the key handle, the associated tenant key in the key table includes accessing the private key of the key pair.

The method of any one of claims 19-23, further comprising providing, for each workload, a second data key for use in encrypting the respective workload results.

An apparatus comprising means for performing the method of any one of claims 19-24.