CN116719667A - Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof - Google Patents
Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof Download PDFInfo
- Publication number
- CN116719667A CN116719667A CN202310457019.8A CN202310457019A CN116719667A CN 116719667 A CN116719667 A CN 116719667A CN 202310457019 A CN202310457019 A CN 202310457019A CN 116719667 A CN116719667 A CN 116719667A
- Authority
- CN
- China
- Prior art keywords
- mac
- gpu
- error correction
- ecc
- time consumption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 title description 4
- 230000015654 memory Effects 0.000 claims abstract description 31
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1044—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0618—Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
- H04L9/0631—Substitution permutation network [SPN], i.e. cipher composed of a number of stages or rounds each involving linear and nonlinear transformations, e.g. AES algorithms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/065—Encryption by serially and continuously modifying data stream elements, e.g. stream cipher systems, RC4, SEAL or A5/3
- H04L9/0656—Pseudorandom key sequence combined element-for-element with data sequence, e.g. one-time-pad [OTP] or Vernam's cipher
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method for reducing time consumption of a GPU for realizing ECC (error correction based on a media access control) and a hardware structure thereof, wherein when MEEs in a storage controller MC (Memory Controller) detect that error correction is needed for reading data errors, encryption operations in all the storage controllers are suspended; the MAC computation load scheduler MCLS (MAC computing load scheduling) in the GPU distributes MAC computation operations required for error correction to MAC engines in other memory controllers; because the GPU comprises a plurality of independent and concurrent DRAM channels and corresponding storage controllers, MAC calculation operations required by MAC engines in the storage controllers for performing error correction in a concurrent manner can reduce the time consumption of the GPU for realizing ECC error correction based on the MAC. The method provided by the invention has practicability.
Description
Technical Field
The invention belongs to the technical field of computer system structures and integrated circuit designs, and particularly relates to a hardware structure and a method for reducing the time consumption of GPU (Graphic processing unit) for realizing ECC (Error Correcting Codes) error correction based on MAC (Message authentication code).
Background
With the widespread application of GPU in many fields such as computer vision, natural language processing, high-performance computing, etc., the security is generally concerned, and the construction of a GPU trusted execution environment (TEE: trusted Execution Environment) has urgent practical requirements. At present, commercial GPUs have not been developed with products for providing a TEE, and some research efforts attempt to construct a GPU TEE by using a software or software-hardware cooperation method. The memory encryption engine (MEE: memory Encryption Engine) ensures data security of the processor off-chip main memory DRAM (Dynamic Random Access Memory), an important component of TEE related technology. The GPU system architecture of the integrated MEE is shown in fig. 1: AES (Advanced Encryption Standard) the engine encrypts the data based on AES algorithm, guaranteeing Confidentiality of the data; the MAC engine generates the MAC of the data based on a hash algorithm, and ensures the Integrity (Integrity) of the data. AES encryption typically uses a counter (counter) mode, as shown in fig. 2, where each block of data in the DRAM corresponds to a counter whose value is used as a leaf node of BMT (Bonsai Merkle Tree) to guarantee Freshness (Freshness) of the data based on the BMT. Memory encryption operations generate large amounts of security metadata (counter data, BMT data, MAC data) that are stored in off-chip DRAM, whose memory access operations consume a large amount of memory bandwidth. Although MEE typically caches secure metadata by integrating counter/MAC/BMT caches, these caches have limited capacity due to many factors such as chip area, power consumption, etc., and massive secure metadata access still severely degrades the system performance of the GPU.
Related research work (see "Analyzing Secure Memory Architecture for GPUs", "plus: bandwidth-Efficient Memory Security for GPUs", etc.) shows that among three types of security metadata, the amount of MAC data is the largest and the consumption of memory Bandwidth is the largest. Some research efforts have attempted to store MAC data in ECC DRAM in CPU systems (see "SYNERGY: rethinking Secure-Memory Design for Error-Correcting Memories," et al), detect and correct errors based on MAC data, and utilize ECC DRAM channels with regular data DRThe concurrent nature of the AM channel avoids occupation of regular data DRAM bandwidth by MAC data accesses, thereby reducing the impact of memory encryption on processor system performance. SECDED (Single-Error Correcting Double-Error detection) is the most commonly used ECC mechanism, generating 8bits of ECC data per 8B of regular data, enabling Single bit Error correction and double bit Error detection. When ECC error correction is implemented based on MAC, MAC data can be stored in an ECC memory, and conventional data and security metadata are distributed as shown in FIG. 3 (a). The MAC-based capability to detect any number of bit flip errors and error correction by an exhaustive approach is shown in fig. 3 (b) in comparison to the error correction/detection capability of SECDED ECC. Assuming that the number of 8B fields is N, the number of inversion bits is N, and the maximum value M of the number of MAC calculation times required by error correction by adopting an exhaustion method isWhen n changes from 1 to 3, M increases from 256 to 2763520. If the MAC-based error correction has the same error correction capability as SECDED ECC, i.e. 1-bit flip error is corrected in the 8B field, M is +.>When n=3, m=1048576. When N or N is large, a large number of MAC calculations are required based on MAC error correction, which consumes a large amount of time. Based on publicly available literature displays, there is currently no related art record of how to reduce how much time is consumed by a GPU to implement ECC error correction based on MAC.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for reducing the time consumption of the GPU for realizing ECC error correction based on MAC and a hardware structure thereof, when MEE in a storage controller MC (Memory Controller) detects that error correction is needed for reading data, encryption operation in all the storage controllers is suspended; the MAC computation load scheduler MCLS (MAC computing load scheduling) in the GPU distributes MAC computation operations required for error correction to MAC engines in other memory controllers; because the GPU comprises a plurality of independent and concurrent DRAM channels and corresponding storage controllers, MAC calculation operations required by MAC engines in the storage controllers for performing error correction in a concurrent manner can reduce the time consumption of the GPU for realizing ECC error correction based on the MAC. The method provided by the invention has practicability.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: in the GPU, MAC data generated by memory encryption is stored in the ECC DRAM, and because the GPU can access the MAC data and the conventional data concurrently, the MEE does not need to integrate an MAC cache to cache the MAC data;
step 2: when the MEE in the memory controller MC detects that the read data errors need to be corrected, the encryption operation in all the memory controllers is suspended;
step 3: a MAC computation load scheduler MCLS in the GPU distributes MAC computation operations required by error correction to MAC engines in other storage controllers;
step 4: because the GPU comprises a plurality of independent and concurrent DRAM channels and corresponding storage controllers, MAC calculation operations required by MAC engines in the storage controllers for performing error correction in a concurrent manner can reduce the time consumption of the GPU for realizing ECC error correction based on the MAC.
A hardware structure for realizing a method for reducing the time consumption of a GPU to realize ECC based on MAC comprises N DRAMs, N MP (memory partition), one MCLS and L SM (streaming multiprocessor);
the MP comprises MC and L2 cache; n DRAMs are connected with N MPs in one-to-one correspondence, and the DRAMs are connected to MC of the MP; the MC of each MP is connected to the MCLS;
the L2 cache of each MP is connected to a Interconnection network bus;
the L SMs are all connected to the Interconnection network bus.
The beneficial effects of the invention are as follows:
in the prior art, if ECC check based on MAC implementation has the same error correction capability as SECDED ECC, when the number of 8B fields and the number of inversion bits are divided into N and N, the maximum value M of MAC calculation times required by exhaustive error correction isSince the data error is not detected based on the MAC checkThe specific bit number of bit flip can be judged, so that the maximum value M of the MAC calculation times required for completing error correction is corrected to be +.>Taking the cache line with the size of GPU L2 128B as an example, n=16, when P is 2, 3, and 4, if the number of pipeline MAC engines integrated by MEE is 1 and the operating frequency thereof is 1GHz, the longest time consumption T for implementing ECC error correction based on MAC is about 0.49ms, 147.29ms, and 30681.83ms, respectively. When an MEE detects an error and corrects the error in a certain MC, the MCLS distributes the MAC calculation load to the MAC engines in other MC. Assuming that the number of the MAC engines in the MC and each MC is N and K respectively, M and T can be reduced to 1/(K multiplied by N) of the original value, and the time consumption for realizing ECC error correction based on the MAC is greatly reduced.
It should be noted that when the number P of supported error correction bits is large, the time required for error correction is still a huge value even if the MAC engine among the plurality of MC is called, and the GPU does not have practical applicability to implementing ECC check based on MAC. Fortunately, the probability of bit flipping of multiple 8B fields simultaneously is very low, which is also proved by related research work, so that P can be set to a smaller value, and the method provided by the invention has practicability.
Drawings
Fig. 1 is a schematic diagram of a GPU system architecture of an integrated MEE.
FIG. 2 is a schematic diagram of Counter encryption mode.
FIG. 3 is a schematic diagram showing the data storage distribution and error correction/detection capability of ECC verification based on MAC.
FIG. 4 is a schematic diagram of a GPU system with MAC computation load inter-MC scheduling and distribution function according to the present invention
Detailed Description
The invention will be further described with reference to the drawings and examples.
The invention mainly relates to a GPU memory encryption engine system structure for realizing ECC check based on MAC, which utilizes a plurality of memory encryption engines in the GPU to realize concurrent MAC calculation and reduces the time consumption for realizing ECC error correction based on MAC.
And by adopting the ECC DRAM to store the MAC data, the concurrent access memory of the MAC data and the conventional data can reduce the negative influence of memory encryption on the performance of the GPU system. ECC error correction and error detection can be achieved by adopting an exhaustive method based on MAC, but more error correction time is required, especially when multiple bit flipping occurs. The invention provides a method for reducing the time consumption of a GPU (graphics processing unit) for realizing ECC (error correction based on a media access control) and a hardware system structure thereof, and the performance of a GPU system is improved.
The invention provides a method for reducing time consumption of a GPU for realizing ECC (error correction based on MAC), and a corresponding GPU system structure is shown in FIG. 4. The MAC data generated by memory encryption is stored in the ECC DRAM, and the MEE does not need to integrate the MAC cache to cache the MAC data because the MAC data and the regular data can be accessed concurrently. When the MEE in a certain memory controller MC detects that the read data error requires error correction, the encryption operations in all the memory controllers are suspended, and the MAC computation load scheduler MCLS distributes the MAC computation operations required for error correction to the MAC engines in other memory controllers. Because the GPU comprises a plurality of independent and concurrent DRAM channels and corresponding storage controllers, MAC calculation operations required by MAC engines in the storage controllers for performing error correction in a concurrent manner can remarkably reduce the time consumption of the GPU for realizing ECC error correction based on MAC.
A GPU hardware structure for reducing the time consumption of a GPU for realizing ECC (error correction based on a MAC) comprises N DRAMs (dynamic random access memory), N MPs, one MCLS (micro control LS) and L SMs;
the MP comprises MC and L2 cache; n DRAMs are connected with N MPs in one-to-one correspondence, and the DRAMs are connected to MC of the MP; the MC of each MP is connected to the MCLS;
the L2 cache of each MP is connected to a Interconnection network bus;
the L SMs are all connected to the Interconnection network bus.
Claims (2)
1. The method for reducing the time consumption of the GPU for realizing ECC (error correction based on the MAC) is characterized by comprising the following steps of:
step 1: in the GPU, MAC data generated by memory encryption is stored in the ECC DRAM, and because the GPU can access the MAC data and the conventional data concurrently, the MEE does not need to integrate an MAC cache to cache the MAC data;
step 2: when the MEE in the memory controller MC detects that the read data errors need to be corrected, the encryption operation in all the memory controllers is suspended;
step 3: a MAC computation load scheduler MCLS in the GPU distributes MAC computation operations required by error correction to MAC engines in other storage controllers;
step 4: because the GPU comprises a plurality of independent and concurrent DRAM channels and corresponding storage controllers, MAC calculation operations required by MAC engines in the storage controllers for performing error correction in a concurrent manner can reduce the time consumption of the GPU for realizing ECC error correction based on the MAC.
2. A hardware structure for implementing the method of claim 1, comprising N DRAMs, N MPs, one MCLS, and L SMs;
the MP comprises MC and L2 cache; n DRAMs are connected with N MPs in one-to-one correspondence, and the DRAMs are connected to MC of the MP; the MC of each MP is connected to the MCLS;
the L2 cache of each MP is connected to a Interconnection network bus;
the L SMs are all connected to the Interconnection network bus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310457019.8A CN116719667A (en) | 2023-04-25 | 2023-04-25 | Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310457019.8A CN116719667A (en) | 2023-04-25 | 2023-04-25 | Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116719667A true CN116719667A (en) | 2023-09-08 |
Family
ID=87874110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310457019.8A Pending CN116719667A (en) | 2023-04-25 | 2023-04-25 | Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116719667A (en) |
-
2023
- 2023-04-25 CN CN202310457019.8A patent/CN116719667A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mutlu et al. | Rowhammer: A retrospective | |
US10572164B2 (en) | Systems and methods for improving efficiencies of a memory system | |
US10303622B2 (en) | Data write to subset of memory devices | |
KR102198611B1 (en) | Method of correcting error in a memory | |
EP3716071B1 (en) | Combined secure message authentication codes (mac) and device correction using encrypted parity with multi-key domains | |
US9983930B2 (en) | Systems and methods for implementing error correcting code regions in a memory | |
US20180183577A1 (en) | Techniques for secure message authentication with unified hardware acceleration | |
CN113076219B (en) | High-energy-efficiency on-chip memory error detection and correction circuit and implementation method | |
US20060077750A1 (en) | System and method for error detection in a redundant memory system | |
US20230236934A1 (en) | Instant write scheme with dram submodules | |
Chen et al. | Memguard: A low cost and energy efficient design to support and enhance memory system reliability | |
Gurumurthi et al. | HBM3 RAS: Enhancing resilience at scale | |
KR102519891B1 (en) | Granular refresh rate control for memory devices | |
US9147499B2 (en) | Memory operation of paired memory devices | |
CN116719667A (en) | Method for reducing time consumption of GPU (graphics processing unit) for realizing ECC (error correction based on MAC (media access control) and hardware structure thereof | |
US20220413959A1 (en) | Systems and methods for multi-use error correcting codes | |
US20240086551A1 (en) | Data compression method and apparatus, electronic device, and storage medium | |
CN115016981A (en) | Setting method of storage area, data reading and writing method and related device | |
WO2023055806A1 (en) | A method and apparatus for protecting memory devices via a synergic approach | |
US20200233819A1 (en) | Memory rank design for a memory channel that is optimized for graph applications | |
US8964495B2 (en) | Memory operation upon failure of one of two paired memory devices | |
Soltani et al. | RandShift: An energy-efficient fault-tolerant method in secure nonvolatile main memory | |
US20230236933A1 (en) | Shadow dram with crc+raid architecture, system and method for high ras feature in a cxl drive | |
US20240143206A1 (en) | Memory controller to perform in-line data processing and efficiently organize data and associated metadata in memory | |
US20240070024A1 (en) | Read Data Path for a Memory System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |