WO2022056779A1 - Improving system memory access performance using high performance memory - Google Patents

Improving system memory access performance using high performance memory Download PDF

Info

Publication number
WO2022056779A1
WO2022056779A1 PCT/CN2020/115898 CN2020115898W WO2022056779A1 WO 2022056779 A1 WO2022056779 A1 WO 2022056779A1 CN 2020115898 W CN2020115898 W CN 2020115898W WO 2022056779 A1 WO2022056779 A1 WO 2022056779A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
high performance
processing system
location information
tables
Prior art date
Application number
PCT/CN2020/115898
Other languages
French (fr)
Inventor
Tao Xu
Yufu Li
Lei Zhu
Shijie Liu
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to CN202080103259.2A priority Critical patent/CN115885267A/en
Priority to PCT/CN2020/115898 priority patent/WO2022056779A1/en
Publication of WO2022056779A1 publication Critical patent/WO2022056779A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1694Configuration of memory controller to different memory types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping

Definitions

  • a processing system may include hardware and software components.
  • the software components may include one or more applications, an operating system (OS) , and firmware.
  • the applications may include control logic for performing the work that is of value to the user of the processing system.
  • the applications run on top of the OS, which runs at a lower logical level than the applications (i.e., closer to the hardware) to provide an underlying environment or abstraction layer that makes it easier to create and execute the applications.
  • the firmware runs at an even lower logical level to provide an underlying environment or abstraction layers which makes it easier to create and execute the OS.
  • the firmware may establish a basic input/output system (BIOS) , and the OS may use that BIOS to communicate with different hardware component within the processing system.
  • BIOS basic input/output system
  • the OS and the applications execute out of random-access memory (RAM) , which is volatile. Some or all of the firmware may also execute out of RAM. However, since the RAM is volatile, the environment for performing useful work basically disappears whenever the processing system is turned off. Consequently, whenever the processing system is turned on, the processing system should recreate that environment before useful work can be performed.
  • the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ”
  • the time that elapses during the boot process may be referred to as the “boot time. ”
  • FIG. 1 depicts an illustration of a processing system to provide improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • FIG. 2 is a block diagram illustrating a processing system to improving system memory access performance using high performance memory, according to implementations of the disclosure.
  • FIG. 3 is a flow schematic depicting a boot process for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • FIG. 4 illustrates an example flow improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • FIG. 5 illustrates another example flow for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • FIG. 6 is a schematic diagram of an illustrative electronic computing device to enable improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • Embodiments described herein are directed to improving system memory access performance using high performance memory.
  • the processing system may execute a boot process before the processing system can be utilized for work.
  • the operations for preparing a processing system to execute an operating system (OS) may be referred to as the “boot process. ”
  • the time that elapses during the boot process may be referred to as the “boot time. ”
  • the control logic or firmware that performs or controls the boot process may be referred to as the “system firmware, ” the “system bootcode, ” the “platform bootcode, ” or simply the “bootcode. ”
  • the boot process may include a memory training phase.
  • memory training code such as memory reference code (MRC)
  • MRC uses a memory controller to test a memory bus and adjust timing/voltage reference (Vref) for determined margins for each channel of memory modules of the system.
  • Memory training data may be generated that is based on the system’s motherboard hardware and memory modules. As a result, the memory training phase cannot be skipped in order to reduce the overall boot process time.
  • memory module capacity such as dual in-line memory module (DIMM) capacity
  • DIMM dual in-line memory module
  • boot time increases in the memory module capacity as the memory training time and memory test time are directly proportional to the memory module (e.g., DIMM) size. This occurs in the conventional dual data rate 5 (DDR5) server platform.
  • DIMM dual in-line memory module
  • 5-level paging may refer to a processor extension to extend the size of virtual addresses from 48 bits to 57 bits, increasing the addressable virtual memory from 256 TiB to 128 PiB.
  • a translation lookaside buffer (TLB) is a memory cache that stores recent translations of virtual memory to physical addresses for faster retrieval. Once a virtual address has been translated into the corresponding real address, the results of that lookup are cached in the TLB to provide a 'fast path' lookup on subsequent accesses.
  • a TLB “miss” means that the association has not yet been cached, so the “long form” lookup translation is used and it the processing system should translate virtual address to physical addresses by parsing each level page table.
  • page tables used for translation of virtual addresses to physical addresses when a TLB miss occurs are maintained in memory modules such as DDR5 memory of DIMM modules.
  • the page tables are 5-level page tables, which results in 5 times memory reads when a TLB miss occurs and a total of 6 times memory read to obtain the memory data.
  • high-performance requirement software such as, for example, artificial intelligence (AI) training, virtualization systems, OS kernel, multi-thread shared data, cloud services, and enterprise application/databases, currently run on typical DDR memory in conventional processing systems. This impacts system performance of such high performance requirement software and of the underlying processing system itself when such software places a strain on the processing system due to its high performance requirements.
  • AI artificial intelligence
  • the high performance memory may refer to, for example, High Bandwidth Memory (HBM) , double data rate (DDR) memory modules, graphics DDR (GDDR) memory modules, a combination of DDRs and/or GDDRs, or any other memory providing high-performance bandwidth while using reduced power in a smaller form factor.
  • HBM may be integrated into a CPU package of a processing system and can provide improved memory bandwidth. In some cases, the HBM bandwidth can be 20 times that of conventional DDR5 bandwidth.
  • HBM memory has two primary usages, it can either be “flat-mode” or it can be used as cache for DDR5 far memory.
  • the high performance memory is initialized and established as system memory during a boot process of the processing system, and then utilized to host the system page tables and/or any high performance requirement software.
  • the boot phase of the boot process coordinates with an OS of the system to provide location and/or bandwidth information of the initialized high performance memory so that the OS can place page tables and/or high performance requirement software in the high performance memory.
  • the location and/or bandwidth information is provided during the boot process to enable the OS to establish page tables in the high bandwidth memory during system initialization.
  • the boot process creates one or more tables to store the location and bandwidth information of the high performance memory, where the one or more tables are reported to the OS during the boot process.
  • implementations of the disclosure improve processing system performance by improving processor virtual address to physical address translation efficiency, improve memory module access (such as DDR memory access, including DDR5 memory) and/or improve high performance requirement software performance (such as cloud services, AI training and inference, and virtual machine (VM) performance) .
  • processor virtual address to physical address translation efficiency improve memory module access (such as DDR memory access, including DDR5 memory) and/or improve high performance requirement software performance (such as cloud services, AI training and inference, and virtual machine (VM) performance) .
  • DDR memory access such as DDR memory access, including DDR5 memory
  • high performance requirement software performance such as cloud services, AI training and inference, and virtual machine (VM) performance
  • FIG. 1 depict an illustration of a processing system 100 to provide improving system memory access performance using high performance memory, according to some embodiments.
  • processing system 100 may be embodied as and/or may include any number and type of hardware and/or software components, such as (without limitation) a processor, including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on.
  • a processor including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on.
  • CPU central processing unit
  • GPU graphics processing unit
  • Processing system 100 may also include components such as drivers (also referred to as “driver logic” , user-mode driver (UMD) , UMD, user-mode driver framework (UMDF) , UMDF, “GPU driver” , “graphics driver logic” , or simply “driver” ) , memory, network devices, or the like, as well as input/output (I/O) sources, such as touchscreens, touch panels, touch pads, virtual or regular keyboards, virtual or regular mice, ports, connectors, etc.
  • I/O input/output
  • processing system 100 may include or enable operation of an operating system (OS) serving as an interface between hardware and/or physical resources of the processing system 100 and a user.
  • OS operating system
  • processing system 100 may vary from implementation to implementation depending upon numerous factors, such as price constraints, performance requirements, technological improvements, or other circumstances.
  • Embodiments may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC) , and/or a field programmable gate array (FPGA) .
  • the terms "logic” , “module” , “component” , “engine” , and “mechanism” may include, by way of example, software or hardware and/or combinations of software and hardware.
  • processing system 100 may be part of a communication and data processing device including (but not limited to) smart wearable devices, smartphones, virtual reality (VR) devices, head-mounted display (HMDs) , mobile computers, Internet of Things (IoT) devices, laptop computers, desktop computers, server computers, etc.
  • VR virtual reality
  • HMDs head-mounted display
  • IoT Internet of Things
  • Processing system 100 may further be a part of and/or assist in the operation of (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electromechanical agent or machine, etc.
  • autonomous machines or artificially intelligent agents may include (without limitation) robots, autonomous vehicles (e.g., self-driving cars, self-flying planes, self-sailing boats, etc. ) , autonomous equipment (self-operating construction vehicles, self-operating medical equipment, etc. ) , and/or the like.
  • “computing device” may be interchangeably referred to as “autonomous machine” or “artificially intelligent agent” or simply “robot” .
  • autonomous vehicle and “autonomous driving” may be referenced in this document, embodiments are not limited as such.
  • autonomous vehicle is not limed to an automobile but that it may include any number and type of autonomous machines, such as robots, autonomous equipment, household autonomous devices, and/or the like, and any one or more tasks or operations relating to such autonomous machines may be interchangeably referenced with autonomous driving.
  • An autonomous vehicle may refer to a vehicle that can drive itself from a starting point to a predetermined destination in “autopilot” mode using various in-vehicle technologies and sensors.
  • Processing system 100 may further include (without limitations) large computing systems, such as server computers, desktop computers, etc., and may further include set-top boxes (e.g., Internet-based cable television set-top boxes, etc. ) , global positioning system (GPS) -based devices, etc.
  • Processing system 100 may include mobile computing devices serving as communication devices, such as cellular phones including smartphones, personal digital assistants (PDAs) , tablet computers, laptop computers, e-readers, smart televisions, television platforms, wearable devices (e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc. ) , media players, etc.
  • PDAs personal digital assistants
  • tablet computers tablet computers
  • laptop computers e-readers
  • smart televisions television platforms
  • wearable devices e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc.
  • media players etc.
  • processing system 100 may include a mobile computing device employing a computer platform hosting an integrated circuit ( “IC” ) , such as system on a chip ( “SoC” or “SOC” ) , integrating various hardware and/or software components of processing system 100 on a single chip.
  • IC integrated circuit
  • SoC system on a chip
  • SOC system on a chip
  • Processing system 100 may host network interface (s) (not shown) to provide access to a network, such as a LAN, a wide area network (WAN) , a metropolitan area network (MAN) , a personal area network (PAN) , Bluetooth, a cloud network, a mobile network (e.g., 3 r d Generation (3G) , 4 t h Generation (4G) , etc. ) , an intranet, the Internet, etc.
  • Network interface (s) may include, for example, a wireless network interface having antenna, which may represent one or more antenna (e) .
  • Network interface (s) may also include, for example, a wired network interface to communicate with remote devices via network cable, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
  • Embodiments may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein.
  • a machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories) , and magneto-optical disks, ROMs, RAMs, EPROMs (Erasable Programmable Read Only Memories) , EEPROMs (Electrically Erasable Programmable Read Only Memories) , magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
  • embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection) .
  • a remote computer e.g., a server
  • a requesting computer e.g., a client
  • a communication link e.g., a modem and/or network connection
  • term “user” may be interchangeably referred to as “viewer” , “observer” , “person” , “individual” , “end-user” , and/or the like. It is to be noted that throughout this document, terms like “graphics domain” may be referenced interchangeably with “graphics processing unit” , “graphics processor” , or simply “GPU” and similarly, “CPU domain” or “host domain” may be referenced interchangeably with “computer processing unit” , “application processor” , or simply “CPU” .
  • processing system 100 can include, a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors or processor cores.
  • the processing system 100 can be a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices such as within Internet-of-things (IoT) devices with wired or wireless connectivity to a local or wide area network.
  • SoC system-on-a-chip
  • processing system 100 may couple with, or be integrated within: a server-based gaming platform; a game console, including a game and media console; a mobile gaming console, a handheld game console, or an online game console.
  • the processing system 100 is part of a mobile phone, smart phone, tablet computing device or mobile Internet-connected device such as a laptop with low internal storage capacity.
  • Processing system 100 can also include, couple with, or be integrated within: a wearable device, such as a smart watch wearable device; smart eyewear or clothing enhanced with augmented reality (AR) or virtual reality (VR) features to provide visual, audio or tactile outputs to supplement real world visual, audio or tactile experiences or otherwise provide text, audio, graphics, video, holographic images or video, or tactile feedback; other augmented reality (AR) device; or other virtual reality (VR) device.
  • processing system 100 includes or is part of a television or set top box device.
  • processing system 100 can include, couple with, or be integrated within a self-driving vehicle such as a bus, tractor trailer, car, motor or electric power cycle, plane or glider (or any combination thereof) .
  • the self-driving vehicle may use processing system 100 to process the environment sensed around the vehicle.
  • the processing system 100 includes one or more processors, such as a CPU (e.g. CPU 110) or GPU, which each include one or more processor cores to process instructions which, when executed, perform operations for system or user software.
  • processors such as a CPU (e.g. CPU 110) or GPU, which each include one or more processor cores to process instructions which, when executed, perform operations for system or user software.
  • at least one of the one or more processor cores is configured to process a specific instruction set.
  • instruction set may facilitate Complex Instruction Set Computing (CISC) , Reduced Instruction Set Computing (RISC) , or computing via a Very Long Instruction Word (VLIW) .
  • CISC Complex Instruction Set Computing
  • RISC Reduced Instruction Set Computing
  • VLIW Very Long Instruction Word
  • processor cores may process a different instruction set which may include instructions to facilitate the emulation of other instruction sets.
  • Processor core may also include other processing devices, such as a Digital Signal Processor (DSP) .
  • DSP Digital Signal Processor
  • processing system 100 provides for improving system memory access performance using high performance memory.
  • processing system 100 may include, but is not limited to, a processor, such as CPU 110, host-attached memory modules such as DIMM 120, host-attached high performance memory 130, and a memory device 140. More or less components than those illustrated in FIG. 1 may be included in processing system 100.
  • the components of processing system 100 may be connected by way of a system bus or other electrical communication path (not shown) .
  • the components of processing system 100 are operative to provide improving system memory access performance using high performance memory.
  • processing system 100 may execute a boot process before the processing system 100 can be utilized for work.
  • a boot process or booting is the process of starting a computer or processing system.
  • a boot process can be initiated by hardware such as a button press, or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so some process should load software into memory before it can be executed.
  • the boot process may refer to a warm boot or a cold boot.
  • a warm boot also called a "soft boot”
  • a warm boot is the process of restarting a computer. It may be used in contrast to a cold boot, which refers to starting up a computer that has been turned off.
  • memory device 140 such as ROM or a flash memory such as non-volatile random access memory (NVRAM) , may store platform initialization firmware 145 that includes program code containing the basic routines that help to start up the processing system 100 and to transfer information between elements within the processing system 100.
  • platform initialization firmware 145 may include firmware that is compatible with the Extensible Firmware Interface (EFI) specification, the extension of the EFI interface referred to as Unified Extensible Firmware Interface (UEFI) , or any other interface between the OS and the system firmware used for platform initialization.
  • EFI Extensible Firmware Interface
  • UEFI Unified Extensible Firmware Interface
  • the boot process provided by platform initialization firmware 145 may include a memory training phase.
  • memory training code such as memory reference code (MRC)
  • MRC memory reference code
  • DIMM 120 may include DDR5 memory.
  • CPU 110 may include a plurality of memory controllers including MC0 115A, MC1 115B, MC2 115C, and MC3 115D to manage input and output with the DIMM 120.
  • High performance memory 130 may refer to HBM or any other memory providing high-performance bandwidth while using reduced power in a smaller form factor.
  • the high performance memory may refer to, for example, High Bandwidth Memory (HBM) , double data rate (DDR) memory modules, graphics DDR (GDDR) memory modules, a combination of DDRs and/or GDDRs, or any other memory providing high-performance bandwidth while using reduced power in a smaller form factor.
  • HBM achieves higher bandwidth while using less power in a smaller form factor than DDR4 or GDDR5.
  • DRAM dynamic random access memory
  • base die often a silicon interposer
  • TSVs through-silicon vias
  • page tables used for translation of virtual addresses to physical addresses when a TLB miss occurs are conventionally maintained in memory modules, such as DDR5 memory of DIMM modules.
  • the page tables are 5-level page tables, which results in 5 times memory reads when a TLB miss occurs and a total of 6 times memory read to obtain the memory data.
  • high-performance requirement software such as, for example, AI training, virtualization systems, OS kernel, multi-thread shared data, cloud services, and enterprise application/databases, currently run on typical DDR memory in conventional processing systems. This impacts system performance of such high performance requirement software and of the underlying processing system itself when such software places a strain on the processing system due to its high performance requirements.
  • Implementations of the disclosure provide a solution to improve CPU virtual address to physical address translation efficiency, and high-performance requirement software’s performance significantly, by utilizing high performance memory 130.
  • the high performance memory 130 is initialized and established as system memory during a boot process of the processing system 100.
  • platform initialization firmware 145 may initiate and execute the boot process of the processing system 100
  • the boot process coordinates with the OS of the processing system 100 to enable the high performance memory 130 to be utilized to host page tables and/or any high performance requirement software for the processing system 100. Further details of utilization of high performance memory 130 of as system memory for page tables and/or high performance requirement software of the processing system 100 is described below with respect to FIG. 2.
  • FIG. 2 is a block diagram illustrating a memory access flow 200 using page tables maintained in high performance memory, according to implementations of the disclosure.
  • processing system 100 described with respect to FIG. 1 may perform flow 200. It is contemplated that embodiments are not limited to any particular implementation of flow 200 and that one or more of its components and/or processes may be variously implemented in implementations of the disclosure.
  • flow 200 may be performed subsequent to performance of a boot process by a processing system, where the boot process initializes high performance memory as system memory for the processing system.
  • Flow 200 may also be performed subsequent to coordination between the boot process and an OS of the processing system to cause page tables and/or high performance requirement software to be placed in the high performance memory, as detailed further below with respect to FIG. 3.
  • Flow 200 depicts a CPU core 210 requesting to access a target page 265 in DDR5 memory 260 using virtual address 230.
  • DDR memory 260 is depicted in flow 200, other types of memory modules may also be implemented for storage of target page 265 in implementations of the disclosure.
  • the CPU core 210 passes the virtual address 230 to the TLB 220 in order to determine whether the address translation for the virtual address 230 is cached in the TLB 220. If a TLB hit occurs, then the cached address translation for the virtual address 230 in the TLB 220 is used to obtain a physical address to access the target page 265 in DDR5 memory 260.
  • the virtual address 230 is stored in a CR3 register 240 for use in performing a page table lookup to translate the virtual address 230 to its corresponding physical address.
  • the boot process of the system and corresponding OS coordination have caused a 5-level page table including page tables of levels 1-5 215-255 to be stored in high performance memory 250.
  • the translation of virtual address 230 access level 1-5 page tables 251-255 using five read accesses to obtain the address translation to the physical address. This resulting physical address is used for a sixth read to access the target page 265 from DDR5 memory 260.
  • a 5-level page table 215-255 is depicted in flow 200, other types of page tables and levels of page tables may also be implemented in implementations of the disclosure.
  • high performance memory 250 such as HBM
  • memory accesses to the page tables 251-255 stored in high performance memory 250 can improve system performance significantly as compared to conventional storage of page tables in DDR5 memory 260.
  • the boot process and OS coordination also enable the placement of high performance requirement software 256 in the high performance memory 250.
  • memory access to high performance memory 250 associated with the high performance requirement software 256 is improved, resulting in improved system performance of the underlying processing system.
  • FIG. 3 is a flow schematic depicting a boot process 300 for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • boot process 300 may include multiple phases including, but not limited to, a BIOS boot phase 310 and an OS runtime phase 330.
  • processing system 100 described with respect to FIG. 1 may perform the boot process 300 of FIG. 3.
  • a power on (or reset) 305 of a processing system may trigger the boot process 300.
  • pre-memory silicon initialization 315 is performed.
  • a memory initialization 320 portion of the BIOS boot phase is performed.
  • the memory initialization 320 includes initialization and training of DDR memory 322, such as DDR5 and/or DDRT memory, for example.
  • the memory initialization 320 also includes initialization and training of high performance memory 324, such as HBM, for example.
  • the memory initialization 320 then installs 326 any discovered memory, such as the high performance memory.
  • the high performance memory is made available at 326 as system memory for system boot.
  • the memory initialization 320 then creates 328 DDR (including DDR5 and DDRT) memory and high performance memory advanced configuration and power interface (ACPI) tables.
  • the ACPI tables can include, but are not limited to, SRAT (System Resource Affinity Table) , SLIT (System Locality Distance Information Table) , and HMAT (Heterogeneous Memory Attribute Table) tables.
  • BIOS reports 329 location and bandwidth information of the high performance memory to the OS during the BIOS boot phase 310 via the ACPI SRAT, SLIT, and HMAT tables.
  • the boot process 300 may then proceed to the OS runtime phase 330.
  • an OS environment 332 of the system may be initialized, including initialization of applications 334 and the kernel 335 of the OS environment 332.
  • Initialization of the kernel 335 may include initialization of a memory management driver 338, and file system 339, for example.
  • initialization of the kernel 335 also includes the OS placing 336 a page table (e.g., a 5-level page table) and high-performance requirement software in the high performance memory space in accordance with the location and bandwidth information of the reported ACPI tables.
  • a page table e.g., a 5-level page table
  • high-performance requirement software in the high performance memory space in accordance with the location and bandwidth information of the reported ACPI tables.
  • FIG. 4 illustrates an example flow 400 for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
  • the various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device.
  • the example flow 400 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 or 200 of FIGS. 1 and/or 2. The embodiments are not limited in this context.
  • the processor may initiate a boot process of the processing system.
  • the boot process is initiated in response to a power on signal of the processing system.
  • the power on signal is caused by a reset of the processing system.
  • the processor may initialize, during the boot process, a high performance memory as system memory for the processing system.
  • the high performance memory is high bandwidth memory.
  • the processor may create, during the boot process, one or more tables that include location and bandwidth information for the high performance memory.
  • the one or more tables are ACPI SRAT, SLIT, and/or HMAT tables.
  • the processor may report, during the boot process, the location and the bandwidth information of the high performance memory to the OS using the one or more tables.
  • the location and the bandwidth information are used by the OS to place page tables and/or high performance requirement software in the high performance memory space.
  • the high performance requirement software includes, but is not limited to, AI training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • FIG. 5 illustrates an example flow 500 for improving system memory access performance using high performance memory via an OS, in accordance with implementations of the disclosure.
  • the various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device.
  • the example flow 500 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 or 200 of FIGS. 1 and/or 2. The embodiments are not limited in this context.
  • the processor may initiate a high performance memory as system memory during a boot process of a processing system.
  • the high performance memory is separate from DIMM of the processing system.
  • the processor may receive location and bandwidth information of the high performance memory via one or more tables during the boot process.
  • the one or more tables are ACPI SRAT, SLIT, and/or HMAT tables.
  • the processor may place page tables of the processing system in the high performance memory space in accordance with the location and the bandwidth information of the one or more tables.
  • the processor may place high performance requirement software in the high performance memory space in accordance with the location and the bandwidth information of the one or more tables.
  • the high performance requirement software includes, but is not limited to, AI training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • FIG. 6 is a schematic diagram of an illustrative electronic computing device to enable improving system memory access performance using high performance memory, according to some embodiments.
  • the computing device 600 includes one or more processors 610 including one or more processors dies (e.g., cores) 618 each including a platform initialization component 664, such as a component to execute platform initialization firmware 145 described with respect to FIG. 1.
  • the computing device is to provide improving system memory access performance using high performance memory by utilizing high performance memory 680, as provided in FIGS. 1-5.
  • the high performance memory 680 is the same as high performance memory 130 described with respect to FIG. 1 and/or the high performance memory 250 described with respect to FIG. 2.
  • the computing device 600 may additionally include one or more of the following: cache 662, a graphical processing unit (GPU) 612 (which may be the hardware accelerator in some implementations) , a wireless input/output (I/O) interface 620, a wired I/O interface 630, system memory 640 (e.g., memory circuitry) , power management circuitry 650, non-transitory storage device 660, and a network interface 670 for connection to a network 672.
  • a graphical processing unit GPU
  • I/O input/output
  • system memory 640 e.g., memory circuitry
  • power management circuitry 650 e.g., non-transitory storage device 660
  • network interface 670 for connection to a network 672.
  • Example, non-limiting computing devices 600 may include a desktop computing device, blade server device, workstation, or similar device or system.
  • the processor cores 618 are capable of executing machine-readable instruction sets 614, reading data and/or instruction sets 614 from one or more storage devices 660 and writing data to the one or more storage devices 660.
  • processors including portable electronic or handheld electronic devices, for instance smartphones, portable computers, wearable computers, consumer electronics, personal computers ( “PCs” ) , network PCs, minicomputers, server blades, mainframe computers, and the like.
  • the processor cores 618 may include any number of hardwired or configurable circuits, some or all of which may include programmable and/or configurable combinations of electronic components, semiconductor devices, and/or logic elements that are disposed partially or wholly in a PC, server, or other computing system capable of executing processor-readable instructions.
  • the computing device 600 includes a bus or similar communications link 616 that communicably couples and facilitates the exchange of information and/or data between various system components including the processor cores 618, the cache 662, the graphics processor circuitry 612, one or more wireless I/O interfaces 620, one or more wired I/O interfaces 630, one or more storage devices 660, and/or one or more network interfaces 670.
  • the computing device 600 may be referred to in the singular herein, but this is not intended to limit the embodiments to a single computing device 600, since in certain embodiments, there may be more than one computing device 600 that incorporates, includes, or contains any number of communicably coupled, collocated, or remote networked circuits or devices.
  • the processor cores 618 may include any number, type, or combination of currently available or future developed devices capable of executing machine-readable instruction sets.
  • the processor cores 618 may include (or be coupled to) but are not limited to any current or future developed single-or multi-core processor or microprocessor, such as: on or more systems on a chip (SOCs) ; central processing units (CPUs) ; digital signal processors (DSPs) ; graphics processing units (GPUs) ; application-specific integrated circuits (ASICs) , programmable logic units, field programmable gate arrays (FPGAs) , and the like.
  • SOCs systems on a chip
  • CPUs central processing units
  • DSPs digital signal processors
  • GPUs graphics processing units
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • the system memory 640 may include read-only memory ( “ROM” ) 642 and random access memory ( “RAM” ) 646.
  • ROM read-only memory
  • RAM random access memory
  • a portion of the ROM 642 may be used to store or otherwise retain a basic input/output system ( “BIOS” ) 644.
  • BIOS basic input/output system
  • the BIOS 644 provides basic functionality to the computing device 600, for example by causing the processor cores 618 to load and/or execute one or more machine-readable instruction sets 614.
  • At least some of the one or more machine-readable instruction sets 614 cause at least a portion of the processor cores 618 to provide, create, produce, transition, and/or function as a dedicated, specific, and particular machine, for example a word processing machine, a digital image acquisition machine, a media playing machine, a gaming system, a communications device, a smartphone, or similar.
  • the computing device 600 may include at least one wireless input/output (I/O) interface 620.
  • the at least one wireless I/O interface 620 may be communicably coupled to one or more physical output devices 622 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) .
  • the at least one wireless I/O interface 620 may communicably couple to one or more physical input devices 624 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) .
  • the at least one wireless I/O interface 620 may include any currently available or future developed wireless I/O interface.
  • Example wireless I/O interfaces include, but are not limited to: near field communication (NFC) , and similar.
  • NFC near field communication
  • the computing device 600 may include one or more wired input/output (I/O) interfaces 630.
  • the at least one wired I/O interface 630 may be communicably coupled to one or more physical output devices 622 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) .
  • the at least one wired I/O interface 630 may be communicably coupled to one or more physical input devices 624 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) .
  • the wired I/O interface 630 may include any currently available or future developed I/O interface.
  • Example wired I/O interfaces include, but are not limited to, universal serial bus (USB) , IEEE 1394 ( “FireWire” ) , and similar.
  • the computing device 600 may include one or more communicably coupled, non-transitory, data storage devices 660.
  • the data storage devices 660 may include one or more hard disk drives (HDDs) and/or one or more solid-state storage devices (SSDs) .
  • the one or more data storage devices 660 may include any current or future developed storage appliances, network storage devices, and/or systems. Non-limiting examples of such data storage devices 660 may include, but are not limited to, any current or future developed non-transitory storage appliances or devices, such as one or more magnetic storage devices, one or more optical storage devices, one or more electro-resistive storage devices, one or more molecular storage devices, one or more quantum storage devices, or various combinations thereof.
  • the one or more data storage devices 660 may include one or more removable storage devices, such as one or more flash drives, flash memories, flash storage units, or similar appliances or devices capable of communicable coupling to and decoupling from the computing device 600.
  • the one or more data storage devices 660 may include interfaces or controllers (not shown) communicatively coupling the respective storage device or system to the bus 616.
  • the one or more data storage devices 660 may store, retain, or otherwise contain machine-readable instruction sets, data structures, program modules, data stores, databases, logical structures, and/or other data useful to the processor cores 618 and/or graphics processor circuitry 612 and/or one or more applications executed on or by the processor cores 618 and/or graphics processor circuitry 612.
  • one or more data storage devices 660 may be communicably coupled to the processor cores 618, for example via the bus 616 or via one or more wired communications interfaces 630 (e.g., Universal Serial Bus or USB) ; one or more wireless communications interfaces 620 (e.g., Near Field Communication or NFC) ; and/or one or more network interfaces 670 (IEEE 802.3 or Ethernet, IEEE 802.11, or etc. ) .
  • wired communications interfaces 630 e.g., Universal Serial Bus or USB
  • wireless communications interfaces 620 e.g., Near Field Communication or NFC
  • network interfaces 670 IEEE 802.3 or Ethernet, IEEE 802.11, or etc.
  • Processor-readable instruction sets 614 and other programs, applications, logic sets, and/or modules may be stored in whole or in part in the system memory 640. Such instruction sets 614 may be transferred, in whole or in part, from the one or more data storage devices 660. The instruction sets 614 may be loaded, stored, or otherwise retained in system memory 640, in whole or in part, during execution by the processor cores 618 and/or graphics processor circuitry 612.
  • the computing device 600 may include power management circuitry 650 that controls one or more operational aspects of the energy storage device 652.
  • the energy storage device 652 may include one or more primary (i.e., non-rechargeable) or secondary (i.e., rechargeable) batteries or similar energy storage devices.
  • the energy storage device 652 may include one or more supercapacitors or ultracapacitors.
  • the power management circuitry 650 may alter, adjust, or control the flow of energy from an external power source 654 to the energy storage device 652 and/or to the computing device 600.
  • the power source 654 may include, but is not limited to, a solar power system, a commercial electric grid, a portable generator, an external energy storage device, or any combination thereof.
  • the processor cores 618, the graphics processor circuitry 612, the wireless I/O interface 620, the wired I/O interface 630, the storage device 660, and the network interface 670 are illustrated as communicatively coupled to each other via the bus 616, thereby providing connectivity between the above-described components.
  • the above-described components may be communicatively coupled in a different manner than illustrated in FIG. 6.
  • one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via one or more intermediary components (not shown) .
  • one or more of the above-described components may be integrated into the processor cores 618 and/or the graphics processor circuitry 612.
  • all or a portion of the bus 616 may be omitted and the components are coupled directly to each other using suitable wired or wireless connections.
  • Example 1 is a system to facilitate improving system memory access performance using high performance memory.
  • the system of Example 1 comprises a processing system comprising: a processor communicably coupled to a high performance memory; and a memory device communicably coupled to the processor to store platform initialization firmware to cause the processing system to: initialize, during a boot process of the processing system, the high performance memory as system memory for the processing system; generate, during the boot process, location information of the high performance memory; report, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forward information corresponding to the location information to the high performance memory.
  • OS operating system
  • Example 2 the subject matter of Example 1 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is initialized as system memory for the processing system.
  • HBM high bandwidth memory
  • Example 3 the subject matter of any one of Examples 1-2 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information.
  • Example 4 the subject matter of any one of Examples 1-3 can optionally include wherein the platform initialization firmware to further cause the processing system to report bandwidth information of the high performance memory to the OS during the boot process, and wherein the at least one of page tables and the high performance requirement software are placed in the high performance memory in accordance with the bandwidth information.
  • Example 5 the subject matter of any one of Examples 1-4 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables.
  • ACPI advanced configuration and power interface
  • Example 6 the subject matter of any one of Examples 1-5 can optionally include wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  • SRAT system resource affinity table
  • SIT system locality distance information table
  • HMAT heterogeneous memory attribute table
  • Example 7 the subject matter of any one of Examples 1-6 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • the subject matter of any one of Examples 1-7 can optionally include wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules.
  • Example 9 is a method for facilitating improving system memory access performance using high performance memory.
  • the method of Example 9 can optionally include initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; generating, during the boot process, location information of the high performance memory; reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forwarding information corresponding to the location information to the high performance memory.
  • OS operating system
  • Example 10 the subject matter of Example 9 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized for system memory for the processing system.
  • HBM high bandwidth memory
  • Example 11 the subject matter of any one of Examples 9-10 can optionally include wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system.
  • Example 12 the subject matter of any one of Examples 9-11 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  • SRAT system resource affinity table
  • SIT system locality distance information table
  • HMAT heterogeneous memory attribute table
  • Example 13 the subject matter of any one of Examples 9-12 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system.
  • the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • AI artificial intelligence
  • Example 15 is a non-transitory computer-readable storage medium for facilitating improving system memory access performance using high performance memory.
  • the at non-transitory computer-readable storage medium of Example 15 comprises executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; generating, during the boot process, location information of the high performance memory; reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forwarding information corresponding to the location information to the high performance memory.
  • OS operating system
  • Example 16 the subject matter of Example 15 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized for system memory of the processing system.
  • HBM high bandwidth memory
  • Example 17 the subject matter of any one of Examples 15-16 can optionally include wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system.
  • Example 18 the subject matter of any one of Examples 15-17 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  • SRAT system resource affinity table
  • SIT system locality distance information table
  • HMAT heterogeneous memory attribute table
  • Example 19 the subject matter of any one of Examples 15-18 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system.
  • the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • AI artificial intelligence
  • Example 21 is an apparatus to improving system memory access performance using high performance memory.
  • the apparatus of Example 21 comprises a memory device to store platform initialization firmware to cause a processing system to: initialize, during a boot process of the processing system, the high performance memory as system memory for the processing system; generate, during the boot process, location information of the high performance memory; report, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forward information corresponding to the location information to the high performance memory.
  • OS operating system
  • Example 22 the subject matter of Example 21 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is initialized as system memory for the processing system.
  • HBM high bandwidth memory
  • Example 23 the subject matter of any one of Examples 21-22 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information.
  • Example 24 the subject matter of any one of Examples 21-23 can optionally include wherein the platform initialization firmware to further cause the processing system to report bandwidth information of the high performance memory to the OS during the boot process, and wherein the at least one of page tables and the high performance requirement software are placed in the high performance memory in accordance with the bandwidth information.
  • Example 25 the subject matter of any one of Examples 21-24 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables.
  • ACPI advanced configuration and power interface
  • Example 26 the subject matter of any one of Examples 21-25 can optionally include wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  • SRAT system resource affinity table
  • SIT system locality distance information table
  • HMAT heterogeneous memory attribute table
  • Example 27 the subject matter of any one of Examples 21-26 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  • the subject matter of any one of Examples 21-27 can optionally include wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules.
  • AI artificial intelligence
  • Example 28 is an apparatus for facilitating improving system memory access performance using high performance memory, according to implementations of the disclosure.
  • the apparatus of Example 28 can comprise means for initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; means for generating, during the boot process, location information of the high performance memory; means for reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and means for forwarding information corresponding to the location information to the high performance memory.
  • OS operating system
  • Example 29 the subject matter of Example 28 can optionally include the apparatus further configured to perform the method of any one of the Examples 10 to 14.
  • Example 30 is at least one machine readable medium comprising a plurality of instructions that in response to being executed on a computing device, cause the computing device to carry out a method according to any one of Examples 9-14.
  • Example 31 is an apparatus for facilitating improving system memory access performance using high performance memory, configured to perform the method of any one of Examples 9-14.
  • Example 32 is an apparatus for facilitating improving system memory access performance using high performance memory comprising means for performing the method of any one of claims 9 to 14. Specifics in the Examples may be used anywhere in one or more embodiments.
  • Various embodiments may include various processes. These processes may be performed by hardware components or may be embodied in computer program or machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.
  • Portions of various embodiments may be provided as a computer program product, which may include a computer-readable medium (e.g., non-transitory computer-readable storage medium) having stored thereon computer program instructions, which may be used to program a computer (or other electronic devices) for execution by one or more processors to perform a process according to certain embodiments.
  • the computer-readable medium may include, but is not limited to, magnetic disks, optical disks, read-only memory (ROM) , random access memory (RAM) , erasable programmable read-only memory (EPROM) , electrically-erasable programmable read-only memory (EEPROM) , magnetic or optical cards, flash memory, or other type of computer-readable medium suitable for storing electronic instructions.
  • embodiments may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer.
  • element A may be directly coupled to element B or be indirectly coupled through, for example, element C.
  • a component, feature, structure, process, or characteristic A “causes” a component, feature, structure, process, or characteristic B, it means that “A” is at least a partial cause of “B” but that there may also be at least one other component, feature, structure, process, or characteristic that assists in causing “B. ” If the specification indicates that a component, feature, structure, process, or characteristic “may” , “might” , or “could” be included, that particular component, feature, structure, process, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, this does not mean there is only one of the described elements.
  • An embodiment is an implementation or example.
  • Reference in the specification to “an embodiment, ” “one embodiment, ” “some embodiments, ” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments.
  • the various appearances of “an embodiment, ” “one embodiment, ” or “some embodiments” are not all referring to the same embodiments. It should be appreciated that in the foregoing description of example embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various novel aspects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Stored Programmes (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A system includes a processor communicably coupled to a high performance memory (130); and a memory device (140) communicably coupled to the processor to store platform initialization firmware to cause the processing system (100) to: initialize, during a boot process of the processing system (100), the high performance memory (130) as system memory for the processing system (100); generate, during the boot process, location information of the high performance memory (130); report, during the boot process, the location information of the high performance memory (130) to an operating system (OS); and forward information corresponding to the location information to the high performance memory (130). The system is directed to improving system memory access performance using high performance memory (130).

Description

IMPROVING SYSTEM MEMORY ACCESS PERFORMANCE USING HIGH PERFORMANCE MEMORY BACKGROUND
A processing system may include hardware and software components. The software components may include one or more applications, an operating system (OS) , and firmware. The applications may include control logic for performing the work that is of value to the user of the processing system. In the processing system, the applications run on top of the OS, which runs at a lower logical level than the applications (i.e., closer to the hardware) to provide an underlying environment or abstraction layer that makes it easier to create and execute the applications. The firmware runs at an even lower logical level to provide an underlying environment or abstraction layers which makes it easier to create and execute the OS. For instance, the firmware may establish a basic input/output system (BIOS) , and the OS may use that BIOS to communicate with different hardware component within the processing system.
Typically, the OS and the applications execute out of random-access memory (RAM) , which is volatile. Some or all of the firmware may also execute out of RAM. However, since the RAM is volatile, the environment for performing useful work basically disappears whenever the processing system is turned off. Consequently, whenever the processing system is turned on, the processing system should recreate that environment before useful work can be performed. For purposes of this disclosure, the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ” Similarly, the time that elapses during the boot process may be referred to as the “boot time. ”
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments described here are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
FIG. 1 depicts an illustration of a processing system to provide improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
FIG. 2 is a block diagram illustrating a processing system to improving system memory access performance using high performance memory, according to implementations of the disclosure.
FIG. 3 is a flow schematic depicting a boot process for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
FIG. 4 illustrates an example flow improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
FIG. 5 illustrates another example flow for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
FIG. 6 is a schematic diagram of an illustrative electronic computing device to enable improving system memory access performance using high performance memory, in accordance with implementations of the disclosure.
DETAILED DESCRIPTION
Embodiments described herein are directed to improving system memory access performance using high performance memory.
As indicated above, when a processing system is turned on or reset, the processing system may execute a boot process before the processing system can be utilized for work. As discussed herein, the operations for preparing a processing system to execute an operating system (OS) may be referred to as the “boot process. ” Similarly, the time that elapses during the boot process may be referred to as the “boot time. ” The control logic or firmware that performs or controls the boot process may be referred to as  the “system firmware, ” the “system bootcode, ” the “platform bootcode, ” or simply the “bootcode. ”
The boot process may include a memory training phase. During the memory training phase, memory training code, such as memory reference code (MRC) , uses a memory controller to test a memory bus and adjust timing/voltage reference (Vref) for determined margins for each channel of memory modules of the system. Memory training data may be generated that is based on the system’s motherboard hardware and memory modules. As a result, the memory training phase cannot be skipped in order to reduce the overall boot process time.
As memory technology improves, memory module capacity, such as dual in-line memory module (DIMM) capacity, increases exponentially from generation from generation. Consequently, increases in the boot time correspond to increases in the memory module capacity as the memory training time and memory test time are directly proportional to the memory module (e.g., DIMM) size. This occurs in the conventional dual data rate 5 (DDR5) server platform.
Furthermore, to support DDR5 DIMM capacity increasing, conventional operating systems (OS’s ) utilize a five-level (5-level) page table to support DDR5. 5-level paging may refer to a processor extension to extend the size of virtual addresses from 48 bits to 57 bits, increasing the addressable virtual memory from 256 TiB to 128 PiB. A translation lookaside buffer (TLB) is a memory cache that stores recent translations of virtual memory to physical addresses for faster retrieval. Once a virtual address has been translated into the corresponding real address, the results of that lookup are cached in the TLB to provide a 'fast path' lookup on subsequent accesses. A TLB “miss” means that the association has not yet been cached, so the “long form” lookup translation is used and it the processing system should translate virtual address to physical addresses by parsing each level page table.
In a conventional processing systems, page tables used for translation of virtual addresses to physical addresses when a TLB miss occurs are maintained in memory modules such as DDR5 memory of DIMM modules. In some processing systems, the page tables are 5-level page tables, which results in 5 times memory reads when a TLB miss occurs and a total of 6 times memory read to obtain the memory data.  This impacts system memory access performance, which slows the overall processing system performance. Furthermore, high-performance requirement software, such as, for example, artificial intelligence (AI) training, virtualization systems, OS kernel, multi-thread shared data, cloud services, and enterprise application/databases, currently run on typical DDR memory in conventional processing systems. This impacts system performance of such high performance requirement software and of the underlying processing system itself when such software places a strain on the processing system due to its high performance requirements.
Implementations of the disclosure address the above technical problems by providing improved system memory access performance using high performance memory. The high performance memory may refer to, for example, High Bandwidth Memory (HBM) , double data rate (DDR) memory modules, graphics DDR (GDDR) memory modules, a combination of DDRs and/or GDDRs, or any other memory providing high-performance bandwidth while using reduced power in a smaller form factor. HBM may be integrated into a CPU package of a processing system and can provide improved memory bandwidth. In some cases, the HBM bandwidth can be 20 times that of conventional DDR5 bandwidth. HBM memory has two primary usages, it can either be “flat-mode” or it can be used as cache for DDR5 far memory.
In implementations of the disclosure, the high performance memory is initialized and established as system memory during a boot process of the processing system, and then utilized to host the system page tables and/or any high performance requirement software. In one implementation, the boot phase of the boot process coordinates with an OS of the system to provide location and/or bandwidth information of the initialized high performance memory so that the OS can place page tables and/or high performance requirement software in the high performance memory. The location and/or bandwidth information is provided during the boot process to enable the OS to establish page tables in the high bandwidth memory during system initialization. In some implementations, the boot process creates one or more tables to store the location and bandwidth information of the high performance memory, where the one or more tables are reported to the OS during the boot process.
As such, implementations of the disclosure improve processing system performance by improving processor virtual address to physical address translation efficiency, improve memory module access (such as DDR memory access, including DDR5 memory) and/or improve high performance requirement software performance (such as cloud services, AI training and inference, and virtual machine (VM) performance) .
FIG. 1 depict an illustration of a processing system 100 to provide improving system memory access performance using high performance memory, according to some embodiments. As illustrated in FIG. 1, processing system 100 may be embodied as and/or may include any number and type of hardware and/or software components, such as (without limitation) a processor, including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on. Processing system 100 may also include components such as drivers (also referred to as “driver logic” , user-mode driver (UMD) , UMD, user-mode driver framework (UMDF) , UMDF, “GPU driver” , “graphics driver logic” , or simply “driver” ) , memory, network devices, or the like, as well as input/output (I/O) sources, such as touchscreens, touch panels, touch pads, virtual or regular keyboards, virtual or regular mice, ports, connectors, etc. Although not specifically illustrated, processing system 100 may include or enable operation of an operating system (OS) serving as an interface between hardware and/or physical resources of the processing system 100 and a user.
It is to be appreciated that a lesser or more equipped system than the example described above may be utilized for certain implementations. Therefore, the configuration of processing system 100may vary from implementation to implementation depending upon numerous factors, such as price constraints, performance requirements, technological improvements, or other circumstances.
Embodiments may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC) , and/or a field programmable gate array (FPGA) . The terms "logic" , “module” , “component” , “engine” , and “mechanism” may  include, by way of example, software or hardware and/or combinations of software and hardware.
In one implementation, processing system 100 may be part of a communication and data processing device including (but not limited to) smart wearable devices, smartphones, virtual reality (VR) devices, head-mounted display (HMDs) , mobile computers, Internet of Things (IoT) devices, laptop computers, desktop computers, server computers, etc.
Processing system 100 may further be a part of and/or assist in the operation of (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electromechanical agent or machine, etc. Examples of autonomous machines or artificially intelligent agents may include (without limitation) robots, autonomous vehicles (e.g., self-driving cars, self-flying planes, self-sailing boats, etc. ) , autonomous equipment (self-operating construction vehicles, self-operating medical equipment, etc. ) , and/or the like. Throughout this document, “computing device” may be interchangeably referred to as “autonomous machine” or “artificially intelligent agent” or simply “robot” .
It is contemplated that although “autonomous vehicle” and “autonomous driving” may be referenced in this document, embodiments are not limited as such. For example, “autonomous vehicle” is not limed to an automobile but that it may include any number and type of autonomous machines, such as robots, autonomous equipment, household autonomous devices, and/or the like, and any one or more tasks or operations relating to such autonomous machines may be interchangeably referenced with autonomous driving. An autonomous vehicle may refer to a vehicle that can drive itself from a starting point to a predetermined destination in “autopilot” mode using various in-vehicle technologies and sensors.
Processing system 100 may further include (without limitations) large computing systems, such as server computers, desktop computers, etc., and may further include set-top boxes (e.g., Internet-based cable television set-top boxes, etc. ) , global positioning system (GPS) -based devices, etc. Processing system 100 may include mobile computing devices serving as communication devices, such as cellular phones including  smartphones, personal digital assistants (PDAs) , tablet computers, laptop computers, e-readers, smart televisions, television platforms, wearable devices (e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc. ) , media players, etc. For example, in one embodiment, processing system 100 may include a mobile computing device employing a computer platform hosting an integrated circuit ( “IC” ) , such as system on a chip ( “SoC” or “SOC” ) , integrating various hardware and/or software components of processing system 100 on a single chip.
Processing system 100 may host network interface (s) (not shown) to provide access to a network, such as a LAN, a wide area network (WAN) , a metropolitan area network (MAN) , a personal area network (PAN) , Bluetooth, a cloud network, a mobile network (e.g., 3 rd Generation (3G) , 4 th Generation (4G) , etc. ) , an intranet, the Internet, etc. Network interface (s) may include, for example, a wireless network interface having antenna, which may represent one or more antenna (e) . Network interface (s) may also include, for example, a wired network interface to communicate with remote devices via network cable, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
Embodiments may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein. A machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories) , and magneto-optical disks, ROMs, RAMs, EPROMs (Erasable Programmable Read Only Memories) , EEPROMs (Electrically Erasable Programmable Read Only Memories) , magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in  and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection) .
Throughout the document, term “user” may be interchangeably referred to as “viewer” , “observer” , “person” , “individual” , “end-user” , and/or the like. It is to be noted that throughout this document, terms like “graphics domain” may be referenced interchangeably with “graphics processing unit” , “graphics processor” , or simply “GPU” and similarly, “CPU domain” or “host domain” may be referenced interchangeably with “computer processing unit” , “application processor” , or simply “CPU” .
It is to be noted that terms like “node” , “computing node” , “server” , “server device” , “cloud computer” , “cloud server” , “cloud server computer” , “machine” , “host machine” , “device” , “computing device” , “computer” , “computing system” , and the like, may be used interchangeably throughout this document. It is to be further noted that terms like “application” , “software application” , “program” , “software program” , “package” , “software package” , and the like, may be used interchangeably throughout this document. Also, terms like “job” , “input” , “request” , “message” , and the like, may be used interchangeably throughout this document.
In one embodiment, processing system 100 can include, a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors or processor cores. In one embodiment, the processing system 100 can be a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices such as within Internet-of-things (IoT) devices with wired or wireless connectivity to a local or wide area network.
In one embodiment, processing system 100 may couple with, or be integrated within: a server-based gaming platform; a game console, including a game and media console; a mobile gaming console, a handheld game console, or an online game console. In some embodiments the processing system 100 is part of a mobile phone, smart phone, tablet computing device or mobile Internet-connected device such as a laptop with low internal storage capacity. Processing system 100 can also include, couple with, or be integrated within: a wearable device, such as a smart watch wearable device;  smart eyewear or clothing enhanced with augmented reality (AR) or virtual reality (VR) features to provide visual, audio or tactile outputs to supplement real world visual, audio or tactile experiences or otherwise provide text, audio, graphics, video, holographic images or video, or tactile feedback; other augmented reality (AR) device; or other virtual reality (VR) device. In some embodiments, processing system 100 includes or is part of a television or set top box device. In one embodiment, processing system 100 can include, couple with, or be integrated within a self-driving vehicle such as a bus, tractor trailer, car, motor or electric power cycle, plane or glider (or any combination thereof) . The self-driving vehicle may use processing system 100 to process the environment sensed around the vehicle.
In some embodiments, the processing system 100 includes one or more processors, such as a CPU (e.g. CPU 110) or GPU, which each include one or more processor cores to process instructions which, when executed, perform operations for system or user software. In some embodiments, at least one of the one or more processor cores is configured to process a specific instruction set. In some embodiments, instruction set may facilitate Complex Instruction Set Computing (CISC) , Reduced Instruction Set Computing (RISC) , or computing via a Very Long Instruction Word (VLIW) . One or more processor cores may process a different instruction set which may include instructions to facilitate the emulation of other instruction sets. Processor core may also include other processing devices, such as a Digital Signal Processor (DSP) .
In implementations of the disclosure, processing system 100 provides for improving system memory access performance using high performance memory. As shown in FIG. 1, processing system 100 may include, but is not limited to, a processor, such as CPU 110, host-attached memory modules such as DIMM 120, host-attached high performance memory 130, and a memory device 140. More or less components than those illustrated in FIG. 1 may be included in processing system 100. The components of processing system 100 may be connected by way of a system bus or other electrical communication path (not shown) . As discussed further below, the components of processing system 100 are operative to provide improving system memory access performance using high performance memory.
In implementations of the disclosure, processing system 100 may execute a boot process before the processing system 100 can be utilized for work. A boot process or booting is the process of starting a computer or processing system. A boot process can be initiated by hardware such as a button press, or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so some process should load software into memory before it can be executed. The boot process may refer to a warm boot or a cold boot. A warm boot (also called a "soft boot" ) is the process of restarting a computer. It may be used in contrast to a cold boot, which refers to starting up a computer that has been turned off.
In one implementation, memory device 140, such as ROM or a flash memory such as non-volatile random access memory (NVRAM) , may store platform initialization firmware 145 that includes program code containing the basic routines that help to start up the processing system 100 and to transfer information between elements within the processing system 100. In one implementation, platform initialization firmware 145 may include firmware that is compatible with the Extensible Firmware Interface (EFI) specification, the extension of the EFI interface referred to as Unified Extensible Firmware Interface (UEFI) , or any other interface between the OS and the system firmware used for platform initialization.
The boot process provided by platform initialization firmware 145 may include a memory training phase. During the memory training phase, memory training code, such as memory reference code (MRC) , performs memory initialization of DIMM 120 and high performance memory 130 utilized by CPU 110 for memory purposes during runtime of processing system 100. In one implementation, DIMM 120 may include DDR5 memory. CPU 110 may include a plurality of memory controllers including MC0 115A, MC1 115B, MC2 115C, and MC3 115D to manage input and output with the DIMM 120.
High performance memory 130 may refer to HBM or any other memory providing high-performance bandwidth while using reduced power in a smaller form factor. The high performance memory may refer to, for example, High Bandwidth Memory (HBM) , double data rate (DDR) memory modules, graphics DDR (GDDR) memory modules, a combination of DDRs and/or GDDRs, or any other memory  providing high-performance bandwidth while using reduced power in a smaller form factor. HBM achieves higher bandwidth while using less power in a smaller form factor than DDR4 or GDDR5. This is achieved by stacking up to eight dynamic random access memory (DRAM) dies (thus being a Three-dimensional integrated circuit) , including an optional base die (often a silicon interposer) with a memory controller, which are interconnected by through-silicon vias (TSVs) and microbumps.
As previously discussed, in a conventional processing systems, page tables used for translation of virtual addresses to physical addresses when a TLB miss occurs are conventionally maintained in memory modules, such as DDR5 memory of DIMM modules. In some processing systems, the page tables are 5-level page tables, which results in 5 times memory reads when a TLB miss occurs and a total of 6 times memory read to obtain the memory data. This impacts system memory access performance, which slows the overall processing system performance. Furthermore, high-performance requirement software, such as, for example, AI training, virtualization systems, OS kernel, multi-thread shared data, cloud services, and enterprise application/databases, currently run on typical DDR memory in conventional processing systems. This impacts system performance of such high performance requirement software and of the underlying processing system itself when such software places a strain on the processing system due to its high performance requirements.
Implementations of the disclosure provide a solution to improve CPU virtual address to physical address translation efficiency, and high-performance requirement software’s performance significantly, by utilizing high performance memory 130. In one implementation, the high performance memory 130 is initialized and established as system memory during a boot process of the processing system 100. As discussed above, platform initialization firmware 145 may initiate and execute the boot process of the processing system 100 Once initialized, the boot process coordinates with the OS of the processing system 100 to enable the high performance memory 130 to be utilized to host page tables and/or any high performance requirement software for the processing system 100. Further details of utilization of high performance memory 130 of as system memory for page tables and/or high performance requirement software of the processing system 100 is described below with respect to FIG. 2.
FIG. 2 is a block diagram illustrating a memory access flow 200 using page tables maintained in high performance memory, according to implementations of the disclosure. In one implementation, processing system 100 described with respect to FIG. 1 may perform flow 200. It is contemplated that embodiments are not limited to any particular implementation of flow 200 and that one or more of its components and/or processes may be variously implemented in implementations of the disclosure.
In implementations of the disclosure, flow 200 may be performed subsequent to performance of a boot process by a processing system, where the boot process initializes high performance memory as system memory for the processing system. Flow 200 may also be performed subsequent to coordination between the boot process and an OS of the processing system to cause page tables and/or high performance requirement software to be placed in the high performance memory, as detailed further below with respect to FIG. 3.
Flow 200 depicts a CPU core 210 requesting to access a target page 265 in DDR5 memory 260 using virtual address 230. Although DDR memory 260 is depicted in flow 200, other types of memory modules may also be implemented for storage of target page 265 in implementations of the disclosure. The CPU core 210 passes the virtual address 230 to the TLB 220 in order to determine whether the address translation for the virtual address 230 is cached in the TLB 220. If a TLB hit occurs, then the cached address translation for the virtual address 230 in the TLB 220 is used to obtain a physical address to access the target page 265 in DDR5 memory 260.
If a TLB miss occurs, then the virtual address 230 is stored in a CR3 register 240 for use in performing a page table lookup to translate the virtual address 230 to its corresponding physical address. As noted above, in implementations of the disclosure, the boot process of the system and corresponding OS coordination have caused a 5-level page table including page tables of levels 1-5 215-255 to be stored in high performance memory 250. The translation of virtual address 230 access level 1-5 page tables 251-255 using five read accesses to obtain the address translation to the physical address. This resulting physical address is used for a sixth read to access the target page 265 from DDR5 memory 260. Although a 5-level page table 215-255 is  depicted in flow 200, other types of page tables and levels of page tables may also be implemented in implementations of the disclosure.
In some implementations, high performance memory 250, such as HBM, may provide 20 times the amount of bandwidth as compared to DDR5 memory 260. As such, memory accesses to the page tables 251-255 stored in high performance memory 250 can improve system performance significantly as compared to conventional storage of page tables in DDR5 memory 260.
In implementations of the disclosure, the boot process and OS coordination also enable the placement of high performance requirement software 256 in the high performance memory 250. As a result, memory access to high performance memory 250 associated with the high performance requirement software 256 is improved, resulting in improved system performance of the underlying processing system.
FIG. 3 is a flow schematic depicting a boot process 300 for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure. In one implementation, boot process 300 may include multiple phases including, but not limited to, a BIOS boot phase 310 and an OS runtime phase 330. In one implementation, either of processing system 100 described with respect to FIG. 1 may perform the boot process 300 of FIG. 3.
A power on (or reset) 305 of a processing system may trigger the boot process 300. During the BIOS boot phase 310 of boot process 300, pre-memory silicon initialization 315 is performed. Then, a memory initialization 320 portion of the BIOS boot phase is performed. The memory initialization 320 includes initialization and training of DDR memory 322, such as DDR5 and/or DDRT memory, for example. The memory initialization 320 also includes initialization and training of high performance memory 324, such as HBM, for example. The memory initialization 320 then installs 326 any discovered memory, such as the high performance memory. The high performance memory is made available at 326 as system memory for system boot. The memory initialization 320 then creates 328 DDR (including DDR5 and DDRT) memory and high performance memory advanced configuration and power interface (ACPI) tables. The ACPI tables can include, but are not limited to, SRAT (System Resource Affinity  Table) , SLIT (System Locality Distance Information Table) , and HMAT (Heterogeneous Memory Attribute Table) tables.
In one implementation, BIOS reports 329 location and bandwidth information of the high performance memory to the OS during the BIOS boot phase 310 via the ACPI SRAT, SLIT, and HMAT tables. The boot process 300 may then proceed to the OS runtime phase 330. During the OS runtime phase 330, an OS environment 332 of the system may be initialized, including initialization of applications 334 and the kernel 335 of the OS environment 332. Initialization of the kernel 335 may include initialization of a memory management driver 338, and file system 339, for example.
In implementations of the disclosure, initialization of the kernel 335 also includes the OS placing 336 a page table (e.g., a 5-level page table) and high-performance requirement software in the high performance memory space in accordance with the location and bandwidth information of the reported ACPI tables.
FIG. 4 illustrates an example flow 400 for improving system memory access performance using high performance memory, in accordance with implementations of the disclosure. The various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device. The example flow 400 may be representative of some or all the operations that may be executed by or implemented on one or more components of  processing system  100 or 200 of FIGS. 1 and/or 2. The embodiments are not limited in this context.
At block 410, the processor may initiate a boot process of the processing system. In one implementation, the boot process is initiated in response to a power on signal of the processing system. In one implementation, the power on signal is caused by a reset of the processing system.
At block 420, the processor may initialize, during the boot process, a high performance memory as system memory for the processing system. In one implementation, the high performance memory is high bandwidth memory.
At block 430, the processor may create, during the boot process, one or more tables that include location and bandwidth information for the high performance memory. In one implementation, the one or more tables are ACPI SRAT, SLIT, and/or  HMAT tables. Lastly, at block 440, the processor may report, during the boot process, the location and the bandwidth information of the high performance memory to the OS using the one or more tables. In one implementation, the location and the bandwidth information are used by the OS to place page tables and/or high performance requirement software in the high performance memory space. In one implementation, the high performance requirement software includes, but is not limited to, AI training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
Some of the operations illustrated in FIG. 4 may be repeated, combined, modified or deleted where appropriate, and additional steps may also be added to the flow in various embodiments. Additionally, steps may be performed in any suitable order without departing from the scope of particular embodiments.
FIG. 5 illustrates an example flow 500 for improving system memory access performance using high performance memory via an OS, in accordance with implementations of the disclosure. The various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device. The example flow 500 may be representative of some or all the operations that may be executed by or implemented on one or more components of  processing system  100 or 200 of FIGS. 1 and/or 2. The embodiments are not limited in this context.
At block 510, the processor may initiate a high performance memory as system memory during a boot process of a processing system. In one implementation, the high performance memory is separate from DIMM of the processing system. At block 520, the processor may receive location and bandwidth information of the high performance memory via one or more tables during the boot process. In one implementation, the one or more tables are ACPI SRAT, SLIT, and/or HMAT tables.
At block 530, the processor may place page tables of the processing system in the high performance memory space in accordance with the location and the bandwidth information of the one or more tables. Lastly, at block 540, the processor may place high performance requirement software in the high performance memory space in accordance with the location and the bandwidth information of the one or more tables. In  one implementation, the high performance requirement software includes, but is not limited to, AI training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
Some of the operations illustrated in FIG. 5 may be repeated, combined, modified or deleted where appropriate, and additional steps may also be added to the flow in various embodiments. Additionally, steps may be performed in any suitable order without departing from the scope of particular embodiments.
FIG. 6 is a schematic diagram of an illustrative electronic computing device to enable improving system memory access performance using high performance memory, according to some embodiments. In some embodiments, the computing device 600 includes one or more processors 610 including one or more processors dies (e.g., cores) 618 each including a platform initialization component 664, such as a component to execute platform initialization firmware 145 described with respect to FIG. 1. In some embodiments, the computing device is to provide improving system memory access performance using high performance memory by utilizing high performance memory 680, as provided in FIGS. 1-5. In one implementation, the high performance memory 680 is the same as high performance memory 130 described with respect to FIG. 1 and/or the high performance memory 250 described with respect to FIG. 2.
The computing device 600 may additionally include one or more of the following: cache 662, a graphical processing unit (GPU) 612 (which may be the hardware accelerator in some implementations) , a wireless input/output (I/O) interface 620, a wired I/O interface 630, system memory 640 (e.g., memory circuitry) , power management circuitry 650, non-transitory storage device 660, and a network interface 670 for connection to a network 672. The following discussion provides a brief, general description of the components forming the illustrative computing device 600. Example, non-limiting computing devices 600 may include a desktop computing device, blade server device, workstation, or similar device or system.
In embodiments, the processor cores 618 are capable of executing machine-readable instruction sets 614, reading data and/or instruction sets 614 from one or more storage devices 660 and writing data to the one or more storage devices 660. Those skilled in the relevant art will appreciate that the illustrated embodiments as well  as other embodiments may be practiced with other processor-based device configurations, including portable electronic or handheld electronic devices, for instance smartphones, portable computers, wearable computers, consumer electronics, personal computers ( “PCs” ) , network PCs, minicomputers, server blades, mainframe computers, and the like.
The processor cores 618 may include any number of hardwired or configurable circuits, some or all of which may include programmable and/or configurable combinations of electronic components, semiconductor devices, and/or logic elements that are disposed partially or wholly in a PC, server, or other computing system capable of executing processor-readable instructions.
The computing device 600 includes a bus or similar communications link 616 that communicably couples and facilitates the exchange of information and/or data between various system components including the processor cores 618, the cache 662, the graphics processor circuitry 612, one or more wireless I/O interfaces 620, one or more wired I/O interfaces 630, one or more storage devices 660, and/or one or more network interfaces 670. The computing device 600 may be referred to in the singular herein, but this is not intended to limit the embodiments to a single computing device 600, since in certain embodiments, there may be more than one computing device 600 that incorporates, includes, or contains any number of communicably coupled, collocated, or remote networked circuits or devices.
The processor cores 618 may include any number, type, or combination of currently available or future developed devices capable of executing machine-readable instruction sets.
The processor cores 618 may include (or be coupled to) but are not limited to any current or future developed single-or multi-core processor or microprocessor, such as: on or more systems on a chip (SOCs) ; central processing units (CPUs) ; digital signal processors (DSPs) ; graphics processing units (GPUs) ; application-specific integrated circuits (ASICs) , programmable logic units, field programmable gate arrays (FPGAs) , and the like. Unless described otherwise, the construction and operation of the various blocks shown in FIG. 6 are of conventional design. Consequently, such blocks are not described in further detail herein, as they should be understood by those skilled in the relevant art. The bus 616 that interconnects at least some of the components  of the computing device 600 may employ any currently available or future developed serial or parallel bus structures or architectures.
The system memory 640 may include read-only memory ( “ROM” ) 642 and random access memory ( “RAM” ) 646. A portion of the ROM 642 may be used to store or otherwise retain a basic input/output system ( “BIOS” ) 644. The BIOS 644 provides basic functionality to the computing device 600, for example by causing the processor cores 618 to load and/or execute one or more machine-readable instruction sets 614. In embodiments, at least some of the one or more machine-readable instruction sets 614 cause at least a portion of the processor cores 618 to provide, create, produce, transition, and/or function as a dedicated, specific, and particular machine, for example a word processing machine, a digital image acquisition machine, a media playing machine, a gaming system, a communications device, a smartphone, or similar.
The computing device 600 may include at least one wireless input/output (I/O) interface 620. The at least one wireless I/O interface 620 may be communicably coupled to one or more physical output devices 622 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) . The at least one wireless I/O interface 620 may communicably couple to one or more physical input devices 624 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) . The at least one wireless I/O interface 620 may include any currently available or future developed wireless I/O interface. Example wireless I/O interfaces include, but are not limited to: 
Figure PCTCN2020115898-appb-000001
near field communication (NFC) , and similar.
The computing device 600 may include one or more wired input/output (I/O) interfaces 630. The at least one wired I/O interface 630 may be communicably coupled to one or more physical output devices 622 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) . The at least one wired I/O interface 630 may be communicably coupled to one or more physical input devices 624 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) . The wired I/O interface 630 may include any currently available or future developed I/O interface. Example wired I/O interfaces include, but are not limited to, universal serial bus (USB) , IEEE 1394 ( “FireWire” ) , and similar.
The computing device 600 may include one or more communicably coupled, non-transitory, data storage devices 660. The data storage devices 660 may include one or more hard disk drives (HDDs) and/or one or more solid-state storage devices (SSDs) . The one or more data storage devices 660 may include any current or future developed storage appliances, network storage devices, and/or systems. Non-limiting examples of such data storage devices 660 may include, but are not limited to, any current or future developed non-transitory storage appliances or devices, such as one or more magnetic storage devices, one or more optical storage devices, one or more electro-resistive storage devices, one or more molecular storage devices, one or more quantum storage devices, or various combinations thereof. In some implementations, the one or more data storage devices 660 may include one or more removable storage devices, such as one or more flash drives, flash memories, flash storage units, or similar appliances or devices capable of communicable coupling to and decoupling from the computing device 600.
The one or more data storage devices 660 may include interfaces or controllers (not shown) communicatively coupling the respective storage device or system to the bus 616. The one or more data storage devices 660 may store, retain, or otherwise contain machine-readable instruction sets, data structures, program modules, data stores, databases, logical structures, and/or other data useful to the processor cores 618 and/or graphics processor circuitry 612 and/or one or more applications executed on or by the processor cores 618 and/or graphics processor circuitry 612. In some instances, one or more data storage devices 660 may be communicably coupled to the processor cores 618, for example via the bus 616 or via one or more wired communications interfaces 630 (e.g., Universal Serial Bus or USB) ; one or more wireless communications interfaces 620 (e.g., 
Figure PCTCN2020115898-appb-000002
Near Field Communication or NFC) ; and/or one or more network interfaces 670 (IEEE 802.3 or Ethernet, IEEE 802.11, or
Figure PCTCN2020115898-appb-000003
etc. ) .
Processor-readable instruction sets 614 and other programs, applications, logic sets, and/or modules may be stored in whole or in part in the system memory 640. Such instruction sets 614 may be transferred, in whole or in part, from the one or more data storage devices 660. The instruction sets 614 may be loaded, stored, or  otherwise retained in system memory 640, in whole or in part, during execution by the processor cores 618 and/or graphics processor circuitry 612.
The computing device 600 may include power management circuitry 650 that controls one or more operational aspects of the energy storage device 652. In embodiments, the energy storage device 652 may include one or more primary (i.e., non-rechargeable) or secondary (i.e., rechargeable) batteries or similar energy storage devices. In embodiments, the energy storage device 652 may include one or more supercapacitors or ultracapacitors. In embodiments, the power management circuitry 650 may alter, adjust, or control the flow of energy from an external power source 654 to the energy storage device 652 and/or to the computing device 600. The power source 654 may include, but is not limited to, a solar power system, a commercial electric grid, a portable generator, an external energy storage device, or any combination thereof.
For convenience, the processor cores 618, the graphics processor circuitry 612, the wireless I/O interface 620, the wired I/O interface 630, the storage device 660, and the network interface 670 are illustrated as communicatively coupled to each other via the bus 616, thereby providing connectivity between the above-described components. In alternative embodiments, the above-described components may be communicatively coupled in a different manner than illustrated in FIG. 6. For example, one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via one or more intermediary components (not shown) . In another example, one or more of the above-described components may be integrated into the processor cores 618 and/or the graphics processor circuitry 612. In some embodiments, all or a portion of the bus 616 may be omitted and the components are coupled directly to each other using suitable wired or wireless connections.
The following examples pertain to further embodiments. Example 1 is a system to facilitate improving system memory access performance using high performance memory. The system of Example 1 comprises a processing system comprising: a processor communicably coupled to a high performance memory; and a memory device communicably coupled to the processor to store platform initialization firmware to cause the processing system to: initialize, during a boot process of the processing system, the high performance memory as system memory for the processing  system; generate, during the boot process, location information of the high performance memory; report, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forward information corresponding to the location information to the high performance memory.
In Example 2, the subject matter of Example 1 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is initialized as system memory for the processing system. In Example 3, the subject matter of any one of Examples 1-2 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information. In Example 4, the subject matter of any one of Examples 1-3 can optionally include wherein the platform initialization firmware to further cause the processing system to report bandwidth information of the high performance memory to the OS during the boot process, and wherein the at least one of page tables and the high performance requirement software are placed in the high performance memory in accordance with the bandwidth information.
In Example 5, the subject matter of any one of Examples 1-4 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables. In Example 6, the subject matter of any one of Examples 1-5 can optionally include wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
In Example 7, the subject matter of any one of Examples 1-6 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases. In Example 8, the subject matter of any one of Examples 1-7 can optionally include wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules.
Example 9 is a method for facilitating improving system memory access performance using high performance memory. The method of Example 9 can optionally  include initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; generating, during the boot process, location information of the high performance memory; reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forwarding information corresponding to the location information to the high performance memory.
In Example 10, the subject matter of Example 9 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized for system memory for the processing system. In Example 11, the subject matter of any one of Examples 9-10 can optionally include wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system. In Example 12, the subject matter of any one of Examples 9-11 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
In Example 13, the subject matter of any one of Examples 9-12 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system. In Example 14, the subject matter of any one of Examples 9-13 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
Example 15 is a non-transitory computer-readable storage medium for facilitating improving system memory access performance using high performance memory. The at non-transitory computer-readable storage medium of Example 15 comprises executable computer program instructions that, when executed by one or more  processors, cause the one or more processors to perform operations comprising: initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; generating, during the boot process, location information of the high performance memory; reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forwarding information corresponding to the location information to the high performance memory.
In Example 16, the subject matter of Example 15 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized for system memory of the processing system. In Example 17, the subject matter of any one of Examples 15-16 can optionally include wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system. In Example 18, the subject matter of any one of Examples 15-17 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
In Example 19, the subject matter of any one of Examples 15-18 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system. In Example 20, the subject matter of any one of Examples 15-19 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
Example 21 is an apparatus to improving system memory access performance using high performance memory. The apparatus of Example 21 comprises a memory device to store platform initialization firmware to cause a processing system to:  initialize, during a boot process of the processing system, the high performance memory as system memory for the processing system; generate, during the boot process, location information of the high performance memory; report, during the boot process, the location information of the high performance memory to an operating system (OS) ; and forward information corresponding to the location information to the high performance memory.
In Example 22, the subject matter of Example 21 can optionally include wherein the high performance memory comprises high bandwidth memory (HBM) that is initialized as system memory for the processing system. In Example 23, the subject matter of any one of Examples 21-22 can optionally include wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information. In Example 24, the subject matter of any one of Examples 21-23 can optionally include wherein the platform initialization firmware to further cause the processing system to report bandwidth information of the high performance memory to the OS during the boot process, and wherein the at least one of page tables and the high performance requirement software are placed in the high performance memory in accordance with the bandwidth information.
In Example 25, the subject matter of any one of Examples 21-24 can optionally include wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables. In Example 26, the subject matter of any one of Examples 21-25 can optionally include wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
In Example 27, the subject matter of any one of Examples 21-26 can optionally include wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases. In Example 28, the subject matter of any one of Examples 21-27 can optionally include wherein the page tables are accessed in the high performance memory to perform virtual  address to physical address translation to access a memory page stored in one or more memory modules.
Example 28 is an apparatus for facilitating improving system memory access performance using high performance memory, according to implementations of the disclosure. The apparatus of Example 28 can comprise means for initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system; means for generating, during the boot process, location information of the high performance memory; means for reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and means for forwarding information corresponding to the location information to the high performance memory.
In Example 29, the subject matter of Example 28 can optionally include the apparatus further configured to perform the method of any one of the Examples 10 to 14.
Example 30 is at least one machine readable medium comprising a plurality of instructions that in response to being executed on a computing device, cause the computing device to carry out a method according to any one of Examples 9-14. Example 31 is an apparatus for facilitating improving system memory access performance using high performance memory, configured to perform the method of any one of Examples 9-14. Example 32 is an apparatus for facilitating improving system memory access performance using high performance memory comprising means for performing the method of any one of claims 9 to 14. Specifics in the Examples may be used anywhere in one or more embodiments.
In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the described embodiments. It will be apparent, however, to one skilled in the art that embodiments may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form. There may be intermediate structure between illustrated components. The components described or illustrated herein may have additional inputs or outputs that are not illustrated or described.
Various embodiments may include various processes. These processes may be performed by hardware components or may be embodied in computer program or machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.
Portions of various embodiments may be provided as a computer program product, which may include a computer-readable medium (e.g., non-transitory computer-readable storage medium) having stored thereon computer program instructions, which may be used to program a computer (or other electronic devices) for execution by one or more processors to perform a process according to certain embodiments. The computer-readable medium may include, but is not limited to, magnetic disks, optical disks, read-only memory (ROM) , random access memory (RAM) , erasable programmable read-only memory (EPROM) , electrically-erasable programmable read-only memory (EEPROM) , magnetic or optical cards, flash memory, or other type of computer-readable medium suitable for storing electronic instructions. Moreover, embodiments may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer.
Many of the methods are described in their basic form, but processes can be added to or deleted from any of the methods and information can be added or subtracted from any of the described messages without departing from the basic scope of the present embodiments. It will be apparent to those skilled in the art that many further modifications and adaptations can be made. The particular embodiments are not provided to limit the concept but to illustrate it. The scope of the embodiments is not to be determined by the specific examples provided above but only by the claims below.
If it is said that an element “A” is coupled to or with element “B, ” element A may be directly coupled to element B or be indirectly coupled through, for example, element C. When the specification or claims state that a component, feature, structure, process, or characteristic A “causes” a component, feature, structure, process, or characteristic B, it means that “A” is at least a partial cause of “B” but that there may also be at least one other component, feature, structure, process, or characteristic that  assists in causing “B. ” If the specification indicates that a component, feature, structure, process, or characteristic “may” , “might” , or “could” be included, that particular component, feature, structure, process, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, this does not mean there is only one of the described elements.
An embodiment is an implementation or example. Reference in the specification to “an embodiment, ” “one embodiment, ” “some embodiments, ” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments. The various appearances of “an embodiment, ” “one embodiment, ” or “some embodiments” are not all referring to the same embodiments. It should be appreciated that in the foregoing description of example embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various novel aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed embodiments utilize more features than are expressly recited in each claim. Rather, as the following claims reflect, novel aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims are hereby expressly incorporated into this description, with each claim standing on its own as a separate embodiment.

Claims (20)

  1. A processing system comprising:
    a processor communicably coupled to a high performance memory; and
    a memory device communicably coupled to the processor to store platform initialization firmware to cause the processing system to:
    initialize, during a boot process of the processing system, the high performance memory as system memory for the processing system;
    generate, during the boot process, location information of the high performance memory;
    report, during the boot process, the location information of the high performance memory to an operating system (OS) ; and
    forward information corresponding to the location information to the high performance memory.
  2. The processing system of claim 1, wherein the high performance memory comprises high bandwidth memory (HBM) that is initialized as system memory for the processing system.
  3. The processing system of claim 1, wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information..
  4. The processing system of claim 3, wherein the platform initialization firmware to further cause the processing system to report bandwidth information of the high performance memory to the OS during the boot process, and wherein the at least one of page tables and the high performance requirement software are placed in the high performance memory in accordance with the bandwidth information.
  5. The processing system of claim 1, wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables.
  6. The processing system of claim 5, wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  7. The processing system of claim 3, wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  8. The processing system of claim 3, wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules.
  9. A method comprising:
    initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system;
    generating, during the boot process, location information of the high performance memory;
    reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and
    forwarding information corresponding to the location information to the high performance memory.
  10. The method of claim 9, wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized as system memory for the processing system.
  11. The method of claim 9, wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system.
  12. The method of claim 9, wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) ,  a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  13. The method of claim 9, wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system.
  14. The method of claim 13, wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
  15. A non-transitory computer-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
    initializing, during a boot process of a processing system, a high performance memory as system memory for a processing system;
    generating, during the boot process, location information of the high performance memory;
    reporting, during the boot process, the location information of the high performance memory to an operating system (OS) ; and
    forwarding information corresponding to the location information to the high performance memory.
  16. The non-transitory computer-readable storage medium of claim 15, wherein the high performance memory comprises high bandwidth memory (HBM) that is utilized as system memory for the processing system.
  17. The non-transitory computer-readable storage medium of claim 15, wherein the high performance memory is separate from one or more memory modules comprising dynamic random access memory (DRAM) modules of the processing system.
  18. The non-transitory computer-readable storage medium of claim 15, wherein the location information is provided in one or more tables comprising advanced configuration and power interface (ACPI) tables, and wherein the ACPI tables comprise at least one of a system resource affinity table (SRAT) , a system locality distance information table (SLIT) , or a heterogeneous memory attribute table (HMAT) .
  19. The non-transitory computer-readable storage medium of claim 15, wherein at least one of page tables or high performance requirement software are placed in the high performance memory in accordance with the location information, and wherein the page tables are accessed in the high performance memory to perform virtual address to physical address translation to access a memory page stored in one or more memory modules of the processing system.
  20. The non-transitory computer-readable storage medium of claim 19, wherein the high performance requirement software comprises at least one of artificial intelligence (AI) training, virtualization services, an OS kernel, multi-thread shared data, cloud services, enterprise applications, or enterprise databases.
PCT/CN2020/115898 2020-09-17 2020-09-17 Improving system memory access performance using high performance memory WO2022056779A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080103259.2A CN115885267A (en) 2020-09-17 2020-09-17 Improving system memory access performance using high performance memory
PCT/CN2020/115898 WO2022056779A1 (en) 2020-09-17 2020-09-17 Improving system memory access performance using high performance memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/115898 WO2022056779A1 (en) 2020-09-17 2020-09-17 Improving system memory access performance using high performance memory

Publications (1)

Publication Number Publication Date
WO2022056779A1 true WO2022056779A1 (en) 2022-03-24

Family

ID=80777535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115898 WO2022056779A1 (en) 2020-09-17 2020-09-17 Improving system memory access performance using high performance memory

Country Status (2)

Country Link
CN (1) CN115885267A (en)
WO (1) WO2022056779A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030142561A1 (en) * 2001-12-14 2003-07-31 I/O Integrity, Inc. Apparatus and caching method for optimizing server startup performance
CN104360979A (en) * 2014-10-21 2015-02-18 华侨大学 GPU-based (Graphic Processing Unit) computer system
US10379784B1 (en) * 2018-05-03 2019-08-13 International Business Machines Corporation Write management for increasing non-volatile memory reliability
US10560898B1 (en) * 2019-05-30 2020-02-11 Snap Inc. Wearable device location systems
US10725677B2 (en) * 2016-02-19 2020-07-28 Sandisk Technologies Llc Systems and methods for efficient power state transitions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030142561A1 (en) * 2001-12-14 2003-07-31 I/O Integrity, Inc. Apparatus and caching method for optimizing server startup performance
CN104360979A (en) * 2014-10-21 2015-02-18 华侨大学 GPU-based (Graphic Processing Unit) computer system
US10725677B2 (en) * 2016-02-19 2020-07-28 Sandisk Technologies Llc Systems and methods for efficient power state transitions
US10379784B1 (en) * 2018-05-03 2019-08-13 International Business Machines Corporation Write management for increasing non-volatile memory reliability
US10560898B1 (en) * 2019-05-30 2020-02-11 Snap Inc. Wearable device location systems

Also Published As

Publication number Publication date
CN115885267A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US10176018B2 (en) Virtual core abstraction for cloud computing
EP2972897B1 (en) Method, apparata and program product to save and store system memory management unit contexts
US8610732B2 (en) System and method for video memory usage for general system application
US9672583B2 (en) GPU accelerated address translation for graphics virtualization
US20180349194A1 (en) Accelerated data operations
US20210245046A1 (en) System architecture for cloud gaming
US20230114164A1 (en) Atomic handling for disaggregated 3d structured socs
US11037269B1 (en) High-speed resume for GPU applications
WO2020061805A1 (en) Power Off and Power On Method and Apparatus For an In-Vehicle System
US20180018095A1 (en) Method of operating storage device and method of operating data processing system including the device
EP3971713A1 (en) Boot process for early display initialization and visualization
EP3862060A1 (en) System architecture for cloud gaming
EP4152167A1 (en) Scalable address decoding scheme for cxl type-2 devices with programmable interleave granularity
US20100017588A1 (en) System, method, and computer program product for providing an extended capability to a system
US20230281135A1 (en) Method for configuring address translation relationship, and computer system
WO2022056779A1 (en) Improving system memory access performance using high performance memory
US11829119B2 (en) FPGA-based acceleration using OpenCL on FCL in robot motion planning
US20220335109A1 (en) On-demand paging support for confidential computing
US20130007768A1 (en) Atomic operations on multi-socket platforms
WO2022036536A1 (en) Improving memory training performance by utilizing compute express link (cxl) device-supported memory
WO2022056798A1 (en) Improving remote traffic performance on cluster-aware processors
WO2022032508A1 (en) Offloading processor memory training to on-die controller module
CN112486632A (en) User-state virtual device driving framework facing k8s
US8279229B1 (en) System, method, and computer program product for providing access to graphics processor CPU cores, to both a graphics processor and a CPU
US20220374254A1 (en) Virtualized system and method of preventing memory crash of same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20953637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20953637

Country of ref document: EP

Kind code of ref document: A1