WO2022133860A1 - Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement - Google Patents

Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement Download PDF

Info

Publication number
WO2022133860A1
WO2022133860A1 PCT/CN2020/138853 CN2020138853W WO2022133860A1 WO 2022133860 A1 WO2022133860 A1 WO 2022133860A1 CN 2020138853 W CN2020138853 W CN 2020138853W WO 2022133860 A1 WO2022133860 A1 WO 2022133860A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
tenant
workload
cryptographic
handle
Prior art date
Application number
PCT/CN2020/138853
Other languages
English (en)
Inventor
Michael Kounavis
Ned M. Smith
Junyuan Wang
Kaijie GUO
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2020/138853 priority Critical patent/WO2022133860A1/fr
Priority to TW110135370A priority patent/TW202232354A/zh
Priority to NL2029790A priority patent/NL2029790B1/en
Publication of WO2022133860A1 publication Critical patent/WO2022133860A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/0819Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
    • H04L9/0822Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) using key encryption key
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/088Usage controlling of secret information, e.g. techniques for restricting cryptographic keys to pre-authorized uses, different access levels, validity of crypto-period, different key- or password length, or different strong and weak cryptographic algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms

Definitions

  • Embodiments generally relate to technology for cloud or edge computing systems. More particularly, embodiments relate to managing keys used in protecting client workloads and client data across processing units.
  • Cloud or edge computing systems may include a platform having a plurality of processors of various types, including one or more of a central processing unit (CPU) , a graphics processing unit (GPU) , an intelligence (or artificial intelligence) processing unit (IPU) , a network processing unit (NPU) , etc. (processors may generically be referred to as xPU) .
  • a processor such as a GPU may include one or more compute engines for handling complex or parallel processing tasks.
  • a processor such as a GPU must integrate seamlessly with other xPUs that share in processing a tenant workload. Additionally, tenant data may need to be protected end-to-end and across the respective xPUs handling the data.
  • FIG. 1 provides a diagram of an example key management system according to one or more embodiments
  • FIG. 2 provides a diagram of an example key management system according to one or more embodiments
  • FIG. 3 provides a diagram of another example key management system according to one or more embodiments.
  • FIG. 4 provides a diagram illustrating operation of an example key management system according to one or more embodiments
  • FIG. 5 provides a flowchart illustrating an example method of operating a key management system according to one or more embodiments
  • FIG. 6 provides a diagram illustrating an example key management computing system according to one or more embodiments.
  • FIG. 7 is a block diagram illustrating an example semiconductor apparatus for key management according to one or more embodiments.
  • FIG. 8 is a block diagram illustrating an example processor according to one or more embodiments.
  • FIG. 9 is a block diagram illustrating an example of a multi-processor based computing system according to one or more embodiments.
  • An improved computing system as described herein provides for protecting tenant workloads and tenant data within a multi-tenant workload environment.
  • a key management system provides for provisioning and securely storing tenant specific keys for cryptographic operations while protecting workload tenant content during workload execution that involves scheduling tasks across the various xPUs.
  • the key management system ensures that each tenant workload and data remains protected within the tenant’s assigned tenant slice.
  • a key table is maintained having resource partitioning context per workload associated with a tenant key and a key handle, unique to the tenant, to permit secure exchange of data between tenants and workload resources. Compute engines and other resources can use the key handle to point a cryptographic engine to the key table for unlocking and securing tenant data.
  • the key management system coordinates the provision and handling of keys to maintain security.
  • FIG. 1 provides a diagram of an example key management system 100 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description.
  • the system 100 may include a cloud or edge computing platform 105, which provides a multi-tenant processing environment.
  • the platform 105 may include one or more processing units such as a graphics processing unit (GPU) 110 and/or an xPU 120, a cryptographic processing unit (CrPU) 130, a key provisioner (KP) 140, and a resource director (RD) 150.
  • the GPU 110 and the xPU 120 may be collectively referred to as a workload processor.
  • the GPU 110 may include a key provisioning module (GKP) 115 to interface with the key provisioner 140.
  • the xPU 120 may include an xPU-specific key provisioning module (xKP) 125 to interface with the key provisioner 140.
  • the cryptographic processing unit 130 may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more selected cryptographic algorithms and techniques.
  • the key provisioner 140 may provision keys and manage a key table 145 that holds keys used for securing (encrypting) and decrypting tenant workloads and data.
  • the key table may be isolated from GPUs/xPUs, thus effectively blocking the GPUs/xPUs from accessing any keys stored in the key table.
  • the key table 145 may also be hardened against malicious attack by, for example, restricting the access to read-only or write-only and/or by restricting the interfaces to only CrPU and KP. Additionally, the key table 145 may be stored in unprotected storage after encrypting its contents using a hardware root of trust generated storage key.
  • the key provisioner 140 may provision keys in concert with the cryptographic processing unit 130 (e.g., the cryptographic processing unit 130 may generate the keys upon request by the key provisioner 140) .
  • a memory for storing data entries in the key table 145 may include any suitable memory for secure data storage and may be hosted by any portion of the platform 105.
  • the resource director 150 may receive workload requests from tenants (e.g., via orchestrator 170) and determine per-tenant compute resources such as, e.g., memory partitions, compute engine assignment and/or compute thread assignment. Compute resources may be assigned by the resource director 150 in concert with one or more of the GPU 110 and/or the xPU 120.
  • the key provisioner 140 may be incorporated within other system components, such as the resource director 150.
  • the resource director 150 may maintain and/or verify the integrity of the key table 145.
  • the platform 105 may communicate with one or more tenants 160 1 ... 160 N through an orchestrator 170 which may be situated in or close to a cloud/edge 180.
  • the orchestrator 170 may provide a network-based system or service that provides an interface between tenants 160 and platform 105.
  • the orchestrator 170 may schedule tenant workloads, and may provide for decomposition of workloads into distributed applications /assignments to be performed by one of more of the GPU 110 and/or the xPU 120.
  • one or more of the GPU 110, the xPU 120, the CrPU 130, the KP 140, the key table 145, the RD 150, and/or the orchestrator 170 may include a root of trust with attestation keys to enable verification of system integrity.
  • cloud/edge 180 may include at least portions of a public network such as the Internet. In some embodiments, cloud/edge 180 may be a private or other network. In some embodiments, functionality of the key provisioner 140 may be implemented based on Key Protection Technology (and extensions thereto) . In some embodiments, functionality of the resource director 150 may be implemented based on Resource Director Technology (and extensions thereto) .
  • FIG. 2 provides a diagram of an example key management system 200 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description.
  • the system 200 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed) , and those components and features will not be repeated except as necessary to describe the additional components and features shown.
  • the system 200 may include, in addition to the components and features shown in and described with reference to FIG. 1, one or more processing units such as a graphics processing unit (GPU) 210 and/or an xPU 220, a cryptographic processing unit (CrPU) 230, and a key provisioner (KP) 240.
  • GPU graphics processing unit
  • CrPU cryptographic processing unit
  • KP key provisioner
  • the GPU 210, the xPU 220, the CrPU 230 and the KP 240 may correspond to the GPU 110, the xPU 120, the CrPU 130 and the KP 140, respectively (FIG. 1, already discussed) .
  • the GPU 210 and the xPU 220 may be collectively referred to as a workload processor.
  • the GPU 210 may include a plurality of compute engines (CE) , such as a compute engine E1 211, a compute engine E2 212, and a compute engine E3 213. Each of the compute engines may, e.g., handle processing tasks particularly suited for parallel processing. Although FIG. 2 shows three compute engines for illustrative purposes, the GPU 210 may have many more than three compute engines. Each compute engine may be selectively allocated and assigned to perform all or portions of tenant workloads.
  • the GPU 210 may also include a resource manager (GRM) 214 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed.
  • the GPU 210 may also include a key provisioning module (GKP) 215 to interface with the key provisioner 240.
  • the GKP 215 may correspond to the GKP 115 (FIG. 1, already discussed) .
  • the xPU 220 may include a plurality of processing cores, such as a core C1 221, a core C2 222, and a core C3 223. Although FIG. 2 shows three processing cores for illustrative purposes, the xPU 220 may have many more than three processing cores. Each processing core may be selectively allocated and assigned to perform all or portions of tenant workloads.
  • the xPU 220 may also include resource manager (xRM) 224 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed.
  • the xPU 220 may also include a key provisioning module (xKP) 225 to interface with the key provisioner 240.
  • the xKP 225 may correspond to xKP 125 (FIG. 1, already discussed) .
  • the GPU 210 may assign resources and associate the resources with a process address space identifier (PASID) 219 (as a resource identifier) for a specific tenant workload.
  • PASID process address space identifier
  • Each PASID is to identify a memory partition and other resources, including resource isolation context /boundary, of the particular GPU/xPU assigned to a specific tenant workload.
  • the PASID 219 may be provided to the key provisioner 240.
  • the xPU 220 via xRM 224) may assign resources and associate the resources with a PASID.
  • the xPU may associate the resources with PASID 219 or a separate PASID 229 (as a resource identifier) .
  • the PASID 229 may be provided to the key provisioner 240.
  • tenant context information for a particular tenant workload may be provided to the key provisioner 240 by, e.g., the resource director 150.
  • the resources assigned to the workload for the tenant may be referred to as a tenant slice.
  • tenant slice resources assigned for a tenant workload for tenant T 1 may include the compute engine E3 213, labeled as 255, and the core C1 221, labeled as 259.
  • the cryptographic processing unit 230 may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more cryptographic algorithms and techniques.
  • the cryptographic processing unit 230 may include one or more cryptographic engines (not shown in FIG. 2) to perform selected cryptographic operations.
  • a cryptographic accelerator may be used to implement all or portions of the cryptographic processing unit 230.
  • the key provisioner 240 may provision keys and manage a key table 245 that holds keys used for securing and decrypting tenant workloads/data.
  • the key provisioner 240 may provision keys in concert with the cryptographic processing unit 230 (e.g., the cryptographic processing unit 230 may generate the keys upon request by the key provisioner 240) .
  • a memory for storing data entries in the key table 245 may include any suitable memory for secure data storage and may be hosted by any portion of the system 200.
  • the key table 245 may include, for each tenant workload, a PASID issued by each processor having resources assigned to the workload, a tenant key K T for protecting the tenant workload, and a slice key handle (SKH) H T for the tenant workload.
  • Each slice key handle is unique to the associated tenant key. As an example, for the tenant workload slices 255 and 259 shown in FIG.
  • the key table may include entries tenant key K T1 , a PASID T1 for the assigned compute engine E3 213 in GPU 210, a PASID T1 for the assigned core C1 221 in xPU 220, and a slice key handle (SKH) H T1 associated with the key K T1 , and each of the PASID T1 entries.
  • the tenant key K T may include multiple keys. Further details regarding keys and key generation are described with reference to FIG. 4 herein.
  • xA, xB etc. labels refer to respective GPU/xPUs that may have assigned resources associated with a PASID:
  • the key table 245 may be presented in a simpler format:
  • the entries in the key table 245 may be arranged in any order, and in some embodiments fewer (or more) entries may be contained in the key table.
  • the key table 245 includes entries as follows:
  • one or more tenants may each have multiple keys. In some embodiments, one or more tenants may each have at least one seed used to generate keys. Seeds for generating keys for tenants may also be stored in the key table 245, and a seed may be supplied by the key table to the cryptographic processing unit 230 (or to a cryptographic engine) to generate a key that is then stored in the key table and made available for other cryptographic operations.
  • the key table 245 may correspond to the key table 145 (FIG. 1, already discussed) .
  • the key provisioner 240 may share, for a particular workload, the slice key handle H T for the workload with the processor (s) performing the workload.
  • the slice key handle H T may be shared with the respective key provisioner module (e.g., GKP 215 and/or xKP 225 as shown in FIG. 2) of the assigned processor (s) .
  • the respective key provisioner module may share the slice key handle H T with the assigned slice (e.g., the compute engine 213 for tenant slice 255, or the core 221 for tenant slice 255, as shown in FIG. 2) .
  • the respective tenant slice (e.g., the compute engine 213 or the core 221, as shown in FIG. 2) may send the encrypted data (ciphertext) along with the slice key handle H T to the cryptographic processor 230, which may use the slice key handle H T to retrieve the appropriate key (K T ) from the key table, use that key for decryption of the ciphertext, and return the decrypted data (cleartext) to the requesting slice.
  • the respective tenant slice e.g., the compute engine 213 or the core 221, as shown in FIG. 2
  • the cryptographic processor 230 may use the slice key handle H T to retrieve the appropriate key (K T ) from the key table, use that key for decryption of the ciphertext, and return the decrypted data (cleartext) to the requesting slice.
  • the respective tenant slice may send workload results (cleartext) along with the slice key handle H T to the cryptographic processor 230, which may encrypt the workload results using the appropriate key retrieved from the key table and return encrypted results (ciphertext) to the requesting slice.
  • the requesting slice may also send the respective PASID along with the key and slice key handle to the cryptographic processor 230.
  • FIG. 3 provides a diagram of another example key management system 300 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description.
  • the system 300 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed) and/or system 200 (FIG. 2, already discussed) , and those components and features will not be repeated except as necessary to describe the additional components and features shown.
  • the system 300 may include, in addition to the components and features shown in and described with reference to FIG. 1, a workload processing unit which may be a graphics processing unit (GPU) 310, and a key provisioner (KP) 340.
  • the workload processing unit 310 may be another xPU.
  • the GPU 310 may include a plurality of compute engines (CE) , such as a compute engine E1 311, a compute engine E2 312, and a compute engine E3 313.
  • Each of the compute engines may, e.g., handle processing tasks particularly suited for parallel processing.
  • FIG. 3 shows three compute engines for illustrative purposes, the GPU 310 may have many more than three compute engines.
  • Each compute engine may be selectively allocated and assigned to perform tenant workloads.
  • the GPU 310 may also include a resource manager (GRM) 314 to interface with the resource director 150 in allocating and assigning resources for handling tenant workloads, and in deallocating resources once the tenant workload assignment is completed.
  • the GRM 314 may correspond to the GRM 214 (FIG. 2, already discussed) .
  • the GPU 310 may also include a key provisioning module (GKP) 315 to interface with the key provisioner 340.
  • the GKP 315 may correspond to the GKP 115 (FIG. 1, already discussed) and/or to the GKP 215 (FIG. 2, already discussed) .
  • the GPU 310 may also include a plurality of cryptographic engines (CrE) , such as a cryptographic engine CrE1 316, a cryptographic engine CrE2 317, and a cryptographic engine CrE3 318.
  • Each of the cryptographic engines may perform cryptographic operations such as, e.g., generating keys, encrypting data and decrypting data according to one or more cryptographic algorithms and techniques.
  • FIG. 3 shows three cryptographic engines for illustrative purposes, the GPU 310 may have many more than three compute engines.
  • Each cryptographic engine may be selectively allocated and assigned to perform tenant workloads.
  • a cryptographic accelerator may be used to implement all or portions of the cryptographic processing engines CrE1 316, CrE2 317, and CrE3 318.
  • the cryptographic engines CrE1 316, CrE2 317, and CrE3 318 may each be a part of a separate respective tenant slice.
  • the GPU 310 may assign resources and associate the resources with a PASID 319.
  • Tenant context information for a particular tenant workload which may include the PASID (s) issued for the workload, may be provided to the key provisioner 340 by, e.g., the resource director 150.
  • the resources assigned to the workload for the tenant may be referred to as a tenant slice.
  • resources assigned for a tenant workload for tenant T 1 may include the compute engine E3 313 and the cryptographic engine CrE3 318, collectively labeled as tenant slice 355. These assigned are shown for illustrative purposes, and other resource allocations and assignments may be made for other tenant workloads; in this manner, a plurality of tenant workloads may be executed at the same time or during overlapping time periods.
  • the key provisioner 340 may provision keys and manage a key table 345 that holds keys used for securing and decrypting tenant workloads/data.
  • the key provisioner 340 may provision keys in concert with a cryptographic engine (e.g., the cryptographic engine may generate the keys upon request by the key provisioner 340) , where the cryptographic engine may be one of CrE1 316, CrE2 317, or CrE3 318, or another cryptographic engine (not shown) .
  • the key provisioner 340 may incorporate a cryptographic engine for key generation.
  • the KP 340 may correspond to the KP 140 (FIG. 1, already discussed) and/or to the KP 240 (FIG. 1, already discussed) .
  • a memory for storing data entries in the key table 345 may include any suitable memory for secure data storage and may be hosted by any portion of the system 300.
  • the key table 345 may include, for each tenant workload, a PASID assigned by the GPU 310, a tenant key K T for protecting the tenant workload, and a slice key handle (SKH) H T for the tenant workload.
  • Each slice key handle is unique to the associated tenant key.
  • the key table may include entries platform key K T1 , a PASID T1 for the assigned compute engine E3 213 and cryptographic engine CrE3 318, and a slice key handle (SKH) H T1 associated with the key K T1 , and the PASID T1 entries. Further details regarding keys and key generation are described with reference to FIG. 4 herein.
  • the key table 345 includes entries as follows:
  • the key table 345 may correspond to the key table 145 (FIG. 1, already discussed) , and/or the key table 245 (FIG. 2, already discussed) .
  • the key provisioner 340 may share, for a particular workload, the slice key handle H T for the workload with the processor performing the workload.
  • the slice key handle H T may be shared with the key provisioner module GKP 315.
  • the key provisioner module 315 may share the slice key handle H T with the assigned slice (e.g., the compute engine 313 for tenant slice 355, as shown in FIG. 3) .
  • the tenant slice e.g., the compute engine 313 as shown in FIG.
  • the respective tenant slice may send workload results (cleartext) along with the slice key handle H T to the cryptographic engine, which may encrypt the workload results using the appropriate key retrieved from the key table and return encrypted results (ciphertext) to the requesting slice.
  • the assigned cryptographic engine e.g., the CrE3 318 as shown in FIG. 3
  • the respective tenant slice may send workload results (cleartext) along with the slice key handle H T to the cryptographic engine, which may encrypt the workload results using the appropriate key retrieved from the key table and return encrypted results (ciphertext) to the requesting slice.
  • the requesting slice when requesting decryption or encryption, may also send the respective PASID along with the key and slice key handle to the assigned cryptographic engine.
  • resource assignment may include use of memory address partitions such that a handle value may be combined (e.g., “swizzling” ) with the memory partition assigned to the cryptographic engine slice, such that an attempt by a second cryptographic engine to use a handle assigned to a first cryptographic engine can be determined to be maliciously or erroneously used by the second cryptographic engine.
  • the key table 345 may have access logic (or firmware) that may implement an algorithm to check the handle origination (e.g., “de-swizzling” ) to enforce slice isolation semantics.
  • FIG. 4 provides a diagram illustrating operation of an example key management system 400 according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description.
  • the system 400 may include components and features the same as or similar to those in system 100 (FIG. 1, already discussed) , system 200 (FIG. 2, already discussed) , and/or system 300 (FIG. 3, already discussed) , and those components and features will not be repeated except as necessary to describe the additional components and features shown.
  • the system 400 may include one or more processors (e.g., GPU or xPU) 410, one or more cryptographic engines 415, a key provisioner (KP) 420, and a resource director (RD) 150, which may be part of a platform 405.
  • processors e.g., GPU or xPU
  • cryptographic engines 415 e.g., a key provisioner (KP) 420
  • RD resource director
  • the system 400 may communicate with one or more tenants 160 1 ... 160 N (e.g., tenant 160 1 ) through an orchestrator 170, which may be situated in or close to a cloud/edge 180 (not shown in FIG. 4) .
  • the GPU/xPU 410 may correspond to the GPU 110 (FIG. 1, already discussed) , the xPU 120 (FIG. 1, already discussed) , the GPU 210 (FIG. 2, already discussed) , the xPU 220 (FIG. 2, already discussed) , and/or the GPU 310 (FIG. 3, already discussed) .
  • the cryptographic engine 415 may correspond to the cryptographic processing unit 130 (FIG. 1, already discussed) , the cryptographic processing unit 230 (FIG. 2, already discussed) , and/or one or more of the cryptographic engines 316, 317, and/or 318 (FIG. 3, already discussed) .
  • the platform 405 may correspond to platform 105 (FIG. 1, already discussed) .
  • the key provisioner 420 may provision keys in concert with the cryptographic engine 415 (e.g., the cryptographic engine 415 may generate the keys upon request by the key provisioner 420) .
  • the key provisioner 420 may incorporate a cryptographic engine for key generation.
  • a memory for storing data entries in the key table 425 may include any suitable memory for secure data storage and may be hosted by any portion of the system 400.
  • the KP 420 may correspond to the KP 140 (FIG. 1, already discussed) , the KP 240 (FIG. 2, already discussed) , and/or the KP 340 (FIG. 3, already discussed) .
  • the key table 425 may correspond to the key table 145 (FIG. 1, already discussed) , the key table 245 (FIG. 2, already discussed) , and/or the key table 345 (FIG. 3, already discussed) .
  • Operation of system 400 may be illustrated through a sequence of events (process 440) as shown in the example in FIG. 4.
  • a tenant 160 1 may submit a request to orchestrator 170 to carry out a workload WL T .
  • the orchestrator 170 may schedule the workload WL T for execution by the platform 405 and send a workload request for WL T to the resource director 150.
  • the resource director 150 may determine which resource allocations are required and send a request to the GPU/xPU 410 for resource assignments to meet the required resource allocations.
  • the request may be accompanied by a PASID for the workload.
  • Resources may be allocated and assigned from among one or more of the following: compute engines, processing cores, memory, input/output devices (e.g., sensors) , accelerators (e.g., artificial intelligence accelerators) , cryptographic engines (or cryptographic processors) , applications, etc.
  • the GPU/xPU 410 may assign the resources (i.e., tenant slice) required for workload WL T , and may send a confirmation of the resource assignment for the workload slice with the associated PASID to the resource director 150.
  • the resource director 150 may provide tenant context information to the key provisioner 420, which may include the PASID for the tenant slice for the workload WL T and other information about the tenant/workload.
  • the key provisioner 420 may provision a tenant key K T for the workload WL T .
  • the tenant key K T may be generated by the key provisioner 420 in concert with the cryptographic engine 415. Any suitable cryptographic algorithm (s) may be used for generating the tenant key K T and for using the tenant key to encrypt or decrypt data.
  • the tenant key K T may be an asymmetric key pair generated based on a public/private key algorithm, such as, e.g., the RSA algorithm.
  • the key provisioner 420 may then store, in key table 425, the generated tenant key K T , the PASID for the workload WL T and slice key handle, H T , associated with the tenant key K T and PASID; these may be stored in a single entry or as related entries, or may be stored as a database record, etc. In some embodiments only the respective private key of the K T pair is stored in the key table 425.
  • the key provisioner 420 may send the tenant key K T to the orchestrator 170.
  • the tenant key K T is a public/private key pair
  • only the respective public key of the K T pair is sent to the orchestrator 170.
  • the key provisioner 420 may send the slice key handle (SKH) H T along with the associated PASID for the workload to the GPU/xPU 410 for use in subsequent processing of the tenant workload by the assigned tenant slice.
  • the sent K T (e.g., a public key) may be used to wrap (i.e., encrypt) a data key DK T (e.g., a symmetric key) specific to the tenant, where the data key DK T is used to encrypt the tenant workload/data.
  • the result is a key-wrapped key KWK T that may be sent to the platform 405 in a manner that secures the data key DK T and allows authorized access only for use in decrypting the tenant workload/data during performance of the workload.
  • the symmetric data key DK T may be generated and the workload/data encrypted by the tenant 160 1 and/or the orchestrator 170 (e.g., working in concert) .
  • the encryption of the workload/data may be performed in advance or at any time before sending the encrypted data to the platform 405.
  • the key provisioner 420 may, upon request, provision a data key DK T (e.g., a symmetric key is generated) and send the DK T to the orchestrator 170 for use in encrypting the tenant workload/data.
  • the key provisioner 420 may store the data key DK T in key table 425.
  • the orchestrator 170 may send the encrypted workload/data WL T , along with a key-wrapped data key (KWK T ) , to the GPU/xPU 410 for execution via the assigned tenant slice.
  • the assigned tenant slice in GPU/xPU 410 may then commence processing the workload.
  • the assigned tenant slice of the GPU/xPU 410 may need to decrypt all or part of the data in the workload (which has been previously encrypted using the data key (discussed herein with reference to label 458) .
  • the assigned tenant slice may provide (via the GPU/xPU 410) the ciphertext (i.e., encrypted data to be decrypted) , the KWK T , and the slice key handle H T to the cryptographic engine 415.
  • the PASID may also be provided to the cryptographic engine 415.
  • the cryptographic engine 415 may access the key table 425 and retrieve the respective tenant key K T from the key table 425, the K T being previously used to generate the key-wrapped key KWK T .
  • the key retrieved from the key table is the respective private key of the pair (the respective public key of the pair having been sent to the orchestrator 170 and used to generate the KWK T ) .
  • the cryptographic engine 415 may then use the retrieved K T to unwrap (i.e., decrypt) the KWK T and obtain the data key DK T for the tenant.
  • the private key will successfully obtain the data key DK T by unwrapping (decrypting) the key-wrapped key KWK T because the KWK T was generated by wrapping the DK T with the public key of the K T pair.
  • the tenant key K T and the data key DK T may have different lifetimes /lifecycles.
  • the tenant key K T may be a symmetric key generated based on a symmetric key algorithm.
  • a symmetric K T performs the same function as an asymmetric K T .
  • a symmetric K T may be used to wrap the data key DK T .
  • the orchestrator 170 may be a highly trusted entity in the system, with precautions in place such as attestation and lifecycle management that ensures trust and/or monitors for misbehavior.
  • the DK T may be used to decrypt ciphertext (i.e., the encrypted tenant workload/data) into cleartext.
  • the cryptographic engine 415 may send the cleartext to the GPU/xPU 410.
  • the PASID may also be sent with the cleartext.
  • the data key DK T is not sent to the GPU/xPU 410.
  • the data key DK T and/or the key-wrapped key KWK T may be stored in the key table 425 with the associated slice key handle H T .
  • the tenant slice may then continue executing the tenant workload. Once the tenant slice has completed the workload, the workload results may need to be reported back to the orchestrator 170 and/or the tenant 160 1 .
  • the GPU/xPU 410 may provide the results (e.g., data) as cleartext to the cryptographic engine 415, along with the slice key handle H T (and, in some embodiments, the PASID) , to encrypt the results.
  • the cryptographic engine 415 may at label 472 access the key table 425 and retrieve the data key DK T (if stored in the key table) or the key-wrapped key KWK T (if stored in the key table) ; if retrieving the KWK T , the cryptographic engine 415 may also retrieve the tenant key K T in order to obtain the data key DK T from the KWKT (as previously described) . In either case, the cryptographic engine 415 may use the DK T to encrypt the cleartext (results) into ciphertext.
  • the data key DK T is not used to encrypt the workload results.
  • the key provisioner 420 may (in concert with the cryptographic engine 415) provision a new symmetric data key ND T and request a new public key (part of a public/private key pair NK T ) from the orchestrator 170, which may be provided by the tenant 160 1 .
  • the new symmetric data key ND T may be stored in the key table 425 with the associated slice key handle H T .
  • the key provisioner 420 may store the new public key in the key table 425 with the associated slice key handle H T .
  • the cryptographic engine 415 may retrieve the new symmetric data key ND T from the key table 425 and use the key to encrypt the cleartext (results) into ciphertext.
  • the cryptographic engine 415 may also retrieve the new public key NK T and use the public key to encrypt the new data key into a new key-wrapped key NWK T .
  • the cryptographic engine 415 may send the ciphertext (encrypted results) to the GPU/xPU 410.
  • the cryptographic engine 415 may also send the new key-wrapped key NWK T with the ciphertext results.
  • the GPU/xPU 410 may (via the tenant slice) use the ciphertext received from the cryptographic engine 415 to package the encrypted results of the tenant workload WL T and send to the orchestrator 170.
  • the new key-wrapped key NWK T may be sent to the orchestrator 170 with the results.
  • the orchestrator 170 may send the encrypted workload results to the tenant. Where the results were encrypted with the data key DK T , the tenant may use the DK T (to which it should already have access) to decrypt the workload results.
  • the orchestrator 170 may provide the new key-wrapped key NWK T to the tenant, which may use the new private key NK T (to which it should already have access) to obtain the new data key ND T from the key-wrapped key NWK T . Once obtaining the new data key ND T , the tenant can use the key to decrypt the workload results.
  • some tenants may have a permanent (or long-term) presence in the platform 405.
  • some tenants may have a workload that runs periodically on the platform.
  • the tenant keys K T may be persistent keys that are wrapped using a key-wrapping storage key that requires a user authentication to unwrap.
  • the user authentication may be provided by the tenant or the orchestrator and may be performed at or near the beginning of each workload cycle.
  • the tenant storage keys may be stronger keys requiring larger key sizes and additional entropy supplied by the tenant or orchestrator.
  • a seed that generates storage keys may include tenant-related data (such as, e.g., a PASID or other tenant identifier) . Seed data obtained from a tenant may be stored in the key table with an associated handle so that a tenant specific key table archive can be created.
  • FIG. 5 provides a flowchart illustrating an example method 500 of operating a key management system according to one or more embodiments, with reference to components and features described above including but not limited to the figures and associated description.
  • the method 500 may be implemented as one or more modules in a set of logic instructions stored in a non-transitory machine-or computer-readable storage medium such as random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , firmware, flash memory, etc., in configurable logic such as, for example, programmable logic arrays (PLAs) , field programmable gate arrays (FPGAs) , complex programmable logic devices (CPLDs) , in fixed-functionality hardware logic using circuit technology such as, for example, application specific integrated circuit (ASIC) , complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.
  • ASIC application specific integrated circuit
  • CMOS complementary metal oxide semiconductor
  • TTL transistor-transistor logic
  • computer program code to carry out operations shown in the method 70 may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • logic instructions might include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, etc. ) .
  • Illustrated processing block 510 provides for provisioning a separate tenant key for each workload, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices assigned to the respective workload.
  • each respective tenant key is provided to a respective requestor of each workload.
  • the respective key handle is provided to the assigned tenant slice, the tenant slice to use the key handle to perform, via the cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.
  • the key handle is provided to the cryptographic engine, and the cryptographic engine is to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.
  • Illustrated processing block 550 provides for decrypting, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.
  • Illustrated processing block 560 provides for using the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext.
  • FIG. 6 shows a block diagram illustrating an example computing system 10 for key management for multi-tenant workload environments according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description.
  • the system 10 may generally be part of an electronic device/platform having computing and/or communications functionality (e.g., server, cloud infrastructure controller, database controller, notebook computer, desktop computer, personal digital assistant/PDA, tablet computer, convertible tablet, smart phone, etc.
  • computing and/or communications functionality e.g., server, cloud infrastructure controller, database controller, notebook computer, desktop computer, personal digital assistant/PDA, tablet computer, convertible tablet, smart phone, etc.
  • the system 10 may include a host processor 12 (e.g., central processing unit/CPU) having an integrated memory controller (IMC) 14 that may be coupled to system memory 20.
  • a host processor 12 e.g., central processing unit/CPU
  • IMC integrated memory controller
  • the host processor 12 may include any type of processing device, such as, e.g., microcontroller, microprocessor, RISC processor, ASIC, etc., along with associated processing modules or circuitry.
  • the system memory 20 may include any non-transitory machine-or computer-readable storage medium such as RAM, ROM, PROM, EEPROM, firmware, flash memory, etc., configurable logic such as, for example, PLAs, FPGAs, CPLDs, fixed-functionality hardware logic using circuit technology such as, for example, ASIC, CMOS or TTL technology, or any combination thereof suitable for storing instructions 28.
  • the system 10 may also include an input/output (I/O) subsystem 16.
  • the I/O subsystem 16 may communicate with for example, one or more input/output (I/O) devices 17, a network controller 24 (e.g., wired and/or wireless NIC) , and storage 22.
  • the storage 22 may be comprised of any appropriate non-transitory machine-or computer-readable memory type (e.g., flash memory, DRAM, SRAM (static random access memory) , solid state drive (SSD) , hard disk drive (HDD) , optical disk, etc. ) .
  • the storage 22 may include mass storage.
  • the host processor 12 and/or the I/O subsystem 16 may communicate with the storage 22 (all or portions thereof) via a network controller 24.
  • the system 10 may also include a graphics processor 26 (e.g., a graphics processing unit/GPU) and a cryptographic accelerator 27.
  • the cryptographic accelerator 27 may include an ASIC with algorithms specifically tuned for performing cryptographic operations.
  • the host processor 12 and the I/O subsystem 16 may be implemented together on a semiconductor die as a system on chip (SoC) 11, shown encased in a solid line.
  • SoC 11 may therefore operate as a computing apparatus for key management for multi-tenant workload environments.
  • the SoC 11 may also include one or more of the system memory 20, the network controller 24, and/or the graphics processor 26 (shown encased in dotted lines) .
  • the SoC 11 may also include other components of the system 10.
  • the host processor 12 and/or the I/O subsystem 16 may execute program instructions 28 retrieved from the system memory 20 and/or the storage 22 to perform one or more aspects of process 440 (FIG. 4) and/or process 500 (FIG. 5) .
  • the system 10 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4.
  • the system 10 is therefore considered to be performance-enhanced at least to the extent that the technology provides for the secure exchange and use of tenant specific keys within a multi-tenant workload environment.
  • Computer program code to carry out the processes described above may be written in any combination of one or more programming languages, including an object-oriented programming language such as JAVA, JAVASCRIPT, PYTHON, SMALLTALK, C++ or the like and/or conventional procedural programming languages, such as the “C” programming language or similar programming languages, and implemented as program instructions 28.
  • program instructions 28 may include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, microprocessor, etc. ) .
  • I/O devices 17 may include one or more of input devices, such as a touch-screen, keyboard, mouse, cursor-control device, touch-screen, microphone, digital camera, video recorder, camcorder, biometric scanners and/or sensors; input devices may be used to enter information and interact with system 10 and/or with other devices.
  • the I/O devices 17 may also include one or more of output devices, such as a display (e.g., touch screen, liquid crystal display/LCD, light emitting diode/LED display, plasma panels, etc. ) , speakers and/or other visual or audio output devices.
  • the input and/or output devices may be used, e.g., to provide a user interface.
  • FIG. 7 shows a block diagram illustrating an example semiconductor apparatus 30 for key management for multi-tenant workload environments according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description.
  • the semiconductor apparatus 30 may be implemented, e.g., as a chip, die, or other semiconductor package.
  • the semiconductor apparatus 30 may include one or more substrates 32 comprised of, e.g., silicon, sapphire, gallium arsenide, etc.
  • the semiconductor apparatus 30 may also include logic 34 comprised of, e.g., transistor array (s) and other integrated circuit (IC) components) coupled to the substrate (s) 32.
  • the logic 34 may be implemented at least partly in configurable logic or fixed- functionality logic hardware.
  • the logic 34 may implement the system on chip (SoC) 11 described above with reference to FIG. 6.
  • SoC system on chip
  • the logic 34 may implement one or more aspects of the processes described above, including process 440 (FIG. 4) and/or process 500 (FIG. 5) .
  • the logic 34 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4.
  • the apparatus 30 is therefore considered to be performance-enhanced at least to the extent that the technology provides for the secure exchange and use of tenant specific keys within a multi-tenant workload environment.
  • the semiconductor apparatus 30 may be constructed using any appropriate semiconductor manufacturing processes or techniques.
  • the logic 34 may include transistor channel regions that are positioned (e.g., embedded) within the substrate (s) 32. Thus, the interface between the logic 34 and the substrate (s) 32 may not be an abrupt junction.
  • the logic 34 may also be considered to include an epitaxial layer that is grown on an initial wafer of the substrate (s) 34.
  • FIG. 8 is a block diagram illustrating an example processor core 40 according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description.
  • the processor core 40 may be the core for any type of processor, such as a micro-processor, an embedded processor, a digital signal processor (DSP) , a network processor, a graphics processing unit (GPU) , or other device to execute code. Although only one processor core 40 is illustrated in FIG. 8, a processing element may alternatively include more than one of the processor core 40 illustrated in FIG. 8.
  • the processor core 40 may be a single-threaded core or, for at least one embodiment, the processor core 40 may be multithreaded in that it may include more than one hardware thread context (or “logical processor” ) per core.
  • FIG. 8 also illustrates a memory 41 coupled to the processor core 40.
  • the memory 41 may be any of a wide variety of memories (including various layers of memory hierarchy) as are known or otherwise available to those of skill in the art.
  • the memory 41 may include one or more code 42 instruction (s) to be executed by the processor core 40.
  • the code 42 may implement one or more aspects of process 440 (FIG. 4) and/or process 500 (FIG. 5) .
  • the processor core 40 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4.
  • the processor core 40 may follow a program sequence of instructions indicated by the code 42. Each instruction may enter a front end portion 43 and be processed by one or more decoders 44.
  • the decoder 44 may generate as its output a micro operation such as a fixed width micro operation in a predefined format, or may generate other instructions, microinstructions, or control signals which reflect the original code instruction.
  • the illustrated front end portion 43 also includes register renaming logic 46 and scheduling logic 48, which generally allocate resources and queue the operation corresponding to the convert instruction for execution.
  • the processor core 40 is shown including execution logic 50 having a set of execution units 55-1 through 55-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function.
  • the illustrated execution logic 50 performs the operations specified by code instructions.
  • back end logic 58 retires the instructions of code 42.
  • the processor core 40 allows out of order execution but requires in order retirement of instructions.
  • Retirement logic 59 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like) . In this manner, the processor core 40 is transformed during execution of the code 42, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 46, and any registers (not shown) modified by the execution logic 50.
  • a processing element may include other elements on chip with the processor core 40.
  • a processing element may include memory control logic along with the processor core 40.
  • the processing element may include I/O control logic and/or may include I/O control logic integrated with memory control logic.
  • the processing element may also include one or more caches.
  • FIG. 9 is a block diagram illustrating an example of a multi-processor based computing system 60 according to one or more embodiments, with reference to components and features described herein including but not limited to the figures and associated description.
  • the multiprocessor system 60 includes a first processing element 70 and a second processing element 80. While two processing elements 70 and 80 are shown, it is to be understood that an embodiment of the system 60 may also include only one such processing element.
  • the system 60 is illustrated as a point-to-point interconnect system, wherein the first processing element 70 and the second processing element 80 are coupled via a point-to-point interconnect 71. It should be understood that any or all of the interconnects illustrated in FIG. 9 may be implemented as a multi-drop bus rather than point-to-point interconnect.
  • each of the processing elements 70 and 80 may be multicore processors, including first and second processor cores (i.e., processor cores 74a and 74b and processor cores 84a and 84b) .
  • processor cores 74a and 74b and processor cores 84a and 84b may be configured to execute instruction code in a manner similar to that discussed above in connection with FIG. 8.
  • Each processing element 70, 80 may include at least one shared cache 99a, 99b.
  • the shared cache 99a, 99b may store data (e.g., instructions) that are utilized by one or more components of the processor, such as the cores 74a, 74b and 84a, 84b, respectively.
  • the shared cache 99a, 99b may locally cache data stored in a memory 62, 63 for faster access by components of the processor.
  • the shared cache 99a, 99b may include one or more mid-level caches, such as level 2 (L2) , level 3 (L3) , level 4 (L4) , or other levels of cache, a last level cache (LLC) , and/or combinations thereof.
  • LLC last level cache
  • additional processing elements may be present in a given processor.
  • one or more of the processing elements 70, 80 may be an element other than a processor, such as an accelerator or a field programmable gate array.
  • additional processing element (s) may include additional processors (s) that are the same as a first processor 70, additional processor (s) that are heterogeneous or asymmetric to processor a first processor 70, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units) , field programmable gate arrays, or any other processing element.
  • accelerators such as, e.g., graphics accelerators or digital signal processing (DSP) units
  • DSP digital signal processing
  • processing elements 70, 80 there can be a variety of differences between the processing elements 70, 80 in terms of a spectrum of metrics of merit including architectural, micro architectural, thermal, power consumption characteristics, and the like. These differences may effectively manifest themselves as asymmetry and heterogeneity amongst the processing elements 70, 80.
  • the various processing elements 70, 80 may reside in the same die package.
  • the first processing element 70 may further include memory controller logic (MC) 72 and point-to-point (P-P) interfaces 76 and 78.
  • the second processing element 80 may include a MC 82 and P-P interfaces 86 and 88.
  • MC’s 72 and 82 couple the processors to respective memories, namely a memory 62 and a memory 63, which may be portions of main memory locally attached to the respective processors. While the MC 72 and 82 is illustrated as integrated into the processing elements 70, 80, for alternative embodiments the MC logic may be discrete logic outside the processing elements 70, 80 rather than integrated therein.
  • the first processing element 70 and the second processing element 80 may be coupled to an I/O subsystem 90 via P-P interconnects 76 and 86, respectively.
  • the I/O subsystem 90 includes P-P interfaces 94 and 98.
  • the I/O subsystem 90 includes an interface 92 to couple I/O subsystem 90 with a high performance graphics engine 64.
  • a bus 73 may be used to couple the graphics engine 64 to the I/O subsystem 90.
  • a point-to-point interconnect may couple these components.
  • the I/O subsystem 90 may be coupled to a first bus 65 via an interface 96.
  • the first bus 65 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or another third generation I/O interconnect bus, although the scope of the embodiments are not so limited.
  • PCI Peripheral Component Interconnect
  • various I/O devices 65a may be coupled to the first bus 65, along with a bus bridge 66 which may couple the first bus 65 to a second bus 67.
  • the second bus 67 may be a low pin count (LPC) bus.
  • Various devices may be coupled to the second bus 67 including, for example, a keyboard/mouse 67a, communication device (s) 67b, and a data storage unit 68 such as a disk drive or other mass storage device which may include code 69, in one embodiment.
  • the illustrated code 69 may implement one or more aspects of the processes described above, including process 440 (FIG. 4) and/or process 500 (FIG.
  • the illustrated code 69 may be similar to the code 42 (FIG. 8) , already discussed.
  • an audio I/O 67c may be coupled to second bus 67 and a battery 61 may supply power to the computing system 60.
  • the system 60 may implement one or more aspects of system 100, system 200, system 300, and/or system 400 as described herein with reference to FIGs. 1-4.
  • FIG. 9 may implement a multi-drop bus or another such communication topology.
  • the elements of FIG. 9 may alternatively be partitioned using more or fewer integrated chips than shown in FIG. 9.
  • Embodiments of each of the above systems, devices, components and/or methods including the system 10, the semiconductor apparatus 30, the processor core 40, the system 60, system 100, platform 105, system 200, system 300, system 400, platform 405, process 440, process 500, and/or any other system components, may be implemented in hardware, software, or any suitable combination thereof.
  • hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAs) , field programmable gate arrays (FPGAs) , complex programmable logic devices (CPLDs) , or fixed-functionality logic hardware using circuit technology such as, for example, application specific integrated circuit (ASIC) , complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.
  • PLAs programmable logic arrays
  • FPGAs field programmable gate arrays
  • CPLDs complex programmable logic devices
  • ASIC application specific integrated circuit
  • CMOS complementary metal oxide semiconductor
  • TTL transistor-transistor logic
  • all or portions of the foregoing systems and/or components and/or methods may be implemented in one or more modules as a set of logic instructions stored in a machine-or computer-readable storage medium such as random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , firmware, flash memory, etc., to be executed by a processor or computing device.
  • a machine-or computer-readable storage medium such as random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , firmware, flash memory, etc.
  • computer program code to carry out the operations of the components may be written in any combination of one or more operating system (OS) applicable/appropriate programming languages, including an object-oriented programming language such as PYTHON, PERL, JAVA, SMALLTALK, C++, C#or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • OS operating system
  • Example 1 includes a computing system comprising a workload processor allocable into a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, a cryptographic engine coupled to the workload processor, a memory coupled to the cryptographic engine, the memory to store a key table, a key provisioner coupled to the memory, the key provisioner including one or more substrates and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic to provision a separate tenant key for each workload, each tenant key stored in the key table with an associated unique key handle and a resource identifier for a separate one of the plurality of tenant slices assigned to the respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via the cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.
  • Example 2 includes the system of Example 1, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.
  • Example 3 includes the system of Example 2, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.
  • Example 4 includes the system of Example 3, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.
  • Example 5 includes the system of Example 2, wherein the tenant key comprises a public-private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.
  • the tenant key comprises a public-private key pair
  • to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.
  • Example 6 includes the system of any one of Examples 1-5, wherein the logic is further to provision, for each workload, a second data key for use in encrypting the respective workload results.
  • Example 7 includes a semiconductor apparatus comprising one or more substrates, and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to provision a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.
  • Example 8 includes the semiconductor apparatus of Example 7, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.
  • Example 9 includes the semiconductor apparatus of Example 8, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.
  • Example 10 includes the semiconductor apparatus of Example 9, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.
  • Example 11 includes the semiconductor apparatus of Example 8, wherein the tenant key comprises a public-private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair, and wherein the logic is further to provision, for each workload, a second data key for use in encrypting the respective workload results.
  • the tenant key comprises a public-private key pair
  • to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair, and wherein the logic is further to provision, for each workload, a second data key for use in encrypting the respective workload results.
  • Example 12 includes the semiconductor apparatus of any one of Examples 7-11, wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.
  • Example 13 includes at least one non-transitory computer readable storage medium comprising a set of instructions which, when executed by a computing system, cause the computing system to provision a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, provide each respective tenant key to a respective requestor of each workload, and provide, for each workload, the respective key handle to the assigned tenant slice, the tenant slice to use the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.
  • Example 14 includes the at least one non-transitory computer readable storage medium of Example 13, wherein to use the key handle to perform, via the cryptographic engine, the cryptographic operation, the tenant slice is to provide the key handle to the cryptographic engine, the cryptographic engine to access, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.
  • Example 15 includes the at least one non-transitory computer readable storage medium of Example 14, wherein the cryptographic operation is to decrypt, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.
  • Example 16 includes the at least one non-transitory computer readable storage medium of Example 15, wherein the cryptographic engine is to use the unwrapped data key in a second cryptographic operation, the second cryptographic operation to decrypt ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine is to provide the cleartext to the assigned tenant slice.
  • Example 17 includes the at least one non-transitory computer readable storage medium of Example 14, wherein the tenant key comprises a public-private key pair, wherein to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.
  • the tenant key comprises a public-private key pair
  • to provide each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein to access, based on the key handle, the associated tenant key in the key table comprises to access the private key of the key pair.
  • Example 18 includes the at least one non-transitory computer readable storage medium of any one of Examples 13-17, wherein the instructions, when executed, further cause the computing system to provision, for each workload, a second data key for use in encrypting the respective workload results.
  • Example 19 includes a method comprising provisioning a separate tenant key for each of a plurality of workloads, each tenant key stored in a key table with an associated unique key handle and a resource identifier for a separate one of a plurality of tenant slices, each respective tenant slice to execute at least a portion of a separate respective workload, providing each respective tenant key to a respective requestor of each workload, and providing, for each workload, the respective key handle to the assigned tenant slice, the tenant slice using the key handle to perform, via a cryptographic engine, a cryptographic operation on a wrapped data key associated with the workload.
  • Example 20 includes the method of Example 19, wherein using the key handle to perform, via the cryptographic engine, the cryptographic operation comprises providing the key handle to the cryptographic engine, the cryptographic engine accessing, based on the key handle, the associated tenant key in the key table for use in performing the cryptographic operation.
  • Example 21 includes the method of Example 20, wherein the cryptographic operation comprises decrypting, using the tenant key, the wrapped data key associated with the workload to generate an unwrapped data key.
  • Example 22 includes the method of Example 21, further comprising using, by the cryptographic engine, the unwrapped data key in a second cryptographic operation, wherein the second cryptographic operation includes decrypting ciphertext associated with the respective workload into cleartext, and wherein the cryptographic engine further provides the cleartext to the assigned tenant slice.
  • Example 23 includes the method of Example 20, wherein the tenant key comprises a public-private key pair, wherein providing each respective tenant key to a respective requestor of each workload comprises sending only the public key of the key pair to the workload requestor, and wherein accessing, based on the key handle, the associated tenant key in the key table comprises accessing the private key of the key pair.
  • Example 24 includes the method of any one of Examples 19-23, further comprising provisioning, for each workload, a second data key for use in encrypting the respective workload results.
  • Example 25 includes an apparatus comprising means for performing the method of any one of Examples 19-23.
  • technology described herein improves the performance of computing systems by providing for the secure exchange and use of tenant specific keys for cryptographic operations required in executing tenant workloads within a multi-tenant workload environment.
  • tenant workloads and data may be securely held within the tenant slice context.
  • the technology may enable GPU/xPU caching schemes that allocate cache lines for L1 and L2 cache based on a tenant specific context. Respecting tenant boundaries in the cache lines helps mitigate a variety of possible side-channel attacks.
  • Embodiments are applicable for use with all types of semiconductor integrated circuit ( “IC” ) chips.
  • IC semiconductor integrated circuit
  • Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs) , memory chips, network chips, systems on chip (SoCs) , SSD/NAND controller ASICs, and the like.
  • PLAs programmable logic arrays
  • SoCs systems on chip
  • SSD/NAND controller ASICs solid state drive/NAND controller ASICs
  • signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner.
  • Any represented signal lines may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
  • Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured.
  • well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art.
  • Coupled may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections.
  • first may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
  • a list of items joined by the term “one or more of” may mean any combination of the listed terms.
  • the phrases “one or more of A, B or C” may mean A, B, C; A and B; A and C; B and C; or A, B and C.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Une technologie destinée à protéger des charges de travail de locataire et des données de locataire dans un environnement de charge de travail multilocataire peut comprendre un processeur de charge de travail pouvant être attribué en tranches de locataire, un moteur cryptographique, de la mémoire stockant une table de clés, et un dispositif de provisionnement de clés pour provisionner une clé de locataire distincte pour chaque charge de travail, chaque clé de locataire étant stockée dans la table de clés avec un descripteur de clé unique associé et un identificateur de ressource pour une tranche distincte de la pluralité de tranches de locataire attribuée à la charge de travail respective, fournir chaque clé de locataire respective à un demandeur respectif de chaque charge de travail, et fournir, pour chaque charge de travail, le descripteur de clé respectif à la tranche de locataire attribuée, la tranche de locataire étant destinée à utiliser le descripteur de clé pour effectuer, par l'intermédiaire du moteur cryptographique, une opération cryptographique sur une clé de données encapsulée associée à la charge de travail.
PCT/CN2020/138853 2020-12-24 2020-12-24 Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement WO2022133860A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2020/138853 WO2022133860A1 (fr) 2020-12-24 2020-12-24 Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement
TW110135370A TW202232354A (zh) 2020-12-24 2021-09-23 用於被附接至其他處理單元的加密處理器之金鑰管理技術
NL2029790A NL2029790B1 (en) 2020-12-24 2021-11-17 Key management for crypto processors attached to other processing units

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/138853 WO2022133860A1 (fr) 2020-12-24 2020-12-24 Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement

Publications (1)

Publication Number Publication Date
WO2022133860A1 true WO2022133860A1 (fr) 2022-06-30

Family

ID=82158586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138853 WO2022133860A1 (fr) 2020-12-24 2020-12-24 Gestion de clés pour processeurs cryptographiques attachés à d'autres unités de traitement

Country Status (3)

Country Link
NL (1) NL2029790B1 (fr)
TW (1) TW202232354A (fr)
WO (1) WO2022133860A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055048A (zh) * 2023-03-31 2023-05-02 成都四方伟业软件股份有限公司 一种分散密钥存储、还原方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104104513A (zh) * 2014-07-22 2014-10-15 浪潮电子信息产业股份有限公司 一种云端多租户数据存储安全隔离方法
US20150055780A1 (en) * 2013-08-21 2015-02-26 International Business Machines Corporation Event-driven, asset-centric key management in a smart grid
CN109643284A (zh) * 2016-09-30 2019-04-16 英特尔公司 用于存储级存储器的多租户加密
US20200159676A1 (en) * 2019-06-29 2020-05-21 Intel Corporation Cryptographic computing using encrypted base addresses and used in multi-tenant environments
US20200201789A1 (en) * 2019-06-29 2020-06-25 Intel Corporation Cryptographic computing using encrypted base addresses and used in multi-tenant environments

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10177908B2 (en) * 2016-08-30 2019-01-08 Workday, Inc. Secure storage decryption system
US11397692B2 (en) * 2018-06-29 2022-07-26 Intel Corporation Low overhead integrity protection with high availability for trust domains
US11829517B2 (en) * 2018-12-20 2023-11-28 Intel Corporation Method and apparatus for trust domain creation and destruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150055780A1 (en) * 2013-08-21 2015-02-26 International Business Machines Corporation Event-driven, asset-centric key management in a smart grid
CN104104513A (zh) * 2014-07-22 2014-10-15 浪潮电子信息产业股份有限公司 一种云端多租户数据存储安全隔离方法
CN109643284A (zh) * 2016-09-30 2019-04-16 英特尔公司 用于存储级存储器的多租户加密
US20200159676A1 (en) * 2019-06-29 2020-05-21 Intel Corporation Cryptographic computing using encrypted base addresses and used in multi-tenant environments
US20200201789A1 (en) * 2019-06-29 2020-06-25 Intel Corporation Cryptographic computing using encrypted base addresses and used in multi-tenant environments

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055048A (zh) * 2023-03-31 2023-05-02 成都四方伟业软件股份有限公司 一种分散密钥存储、还原方法及装置
CN116055048B (zh) * 2023-03-31 2023-05-30 成都四方伟业软件股份有限公司 一种分散密钥存储、还原方法及装置

Also Published As

Publication number Publication date
NL2029790B1 (en) 2023-06-16
NL2029790A (en) 2022-07-20
TW202232354A (zh) 2022-08-16

Similar Documents

Publication Publication Date Title
US11088846B2 (en) Key rotating trees with split counters for efficient hardware replay protection
US11159518B2 (en) Container independent secure file system for security application containers
CN107750363B (zh) 保护与硬件加速器的通信以增加工作流安全性
US8954753B2 (en) Encrypting data in volatile memory
US9678894B2 (en) Cache-less split tracker architecture for replay protection trees
US20210042294A1 (en) Blockchain-based consent management system and method
EP3876095A1 (fr) Lancements de conteneurs agiles agnostiques par réutilisation latérale de capacités dans des exécutions standard
Arasu et al. A secure coprocessor for database applications
US11144213B2 (en) Providing preferential access to a metadata track in two track writes
US20180349631A1 (en) Efficient and secure sharing of large data repositories
NL2029790B1 (en) Key management for crypto processors attached to other processing units
CN106030602B (zh) 基于虚拟化的块内工作负荷隔离
EP4198780A1 (fr) Attestation distribuée dans des grappes informatiques hétérogènes
WO2022001878A1 (fr) Clé de chiffrement d'ensemble de données générée par un système
WO2023092320A1 (fr) Protection en mémoire pour réseaux de neurones artificiels
US11907405B2 (en) Secure data storage device access control and sharing
US11874777B2 (en) Secure communication of virtual machine encrypted memory
US20210081332A1 (en) Cache set permutations based on galois field operations
US20220245252A1 (en) Seamless firmware update mechanism
WO2021120092A1 (fr) Partage d'abstraction à base de matériel de dispositifs matériels entre des plateformes informatiques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20966433

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20966433

Country of ref document: EP

Kind code of ref document: A1