WO2022032508A1 - Offloading processor memory training to on-die controller module - Google Patents

Offloading processor memory training to on-die controller module Download PDF

Info

Publication number
WO2022032508A1
WO2022032508A1 PCT/CN2020/108584 CN2020108584W WO2022032508A1 WO 2022032508 A1 WO2022032508 A1 WO 2022032508A1 CN 2020108584 W CN2020108584 W CN 2020108584W WO 2022032508 A1 WO2022032508 A1 WO 2022032508A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
die
module
processor
controller module
Prior art date
Application number
PCT/CN2020/108584
Other languages
French (fr)
Inventor
Jiewen Yao
Hong PU
Ye Li
Jun Dong
Fumin Lu
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2020/108584 priority Critical patent/WO2022032508A1/en
Publication of WO2022032508A1 publication Critical patent/WO2022032508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/02Detection or location of defective auxiliary circuits, e.g. defective refresh counters
    • G11C29/023Detection or location of defective auxiliary circuits, e.g. defective refresh counters in clock generator or timing circuitry
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/02Detection or location of defective auxiliary circuits, e.g. defective refresh counters
    • G11C29/028Detection or location of defective auxiliary circuits, e.g. defective refresh counters with adaption or trimming of parameters
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/18Address generation devices; Devices for accessing memories, e.g. details of addressing circuits
    • G11C29/22Accessing serial memories
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/02Disposition of storage elements, e.g. in the form of a matrix array
    • G11C5/04Supports for storage elements, e.g. memory modules; Mounting or fixing of storage elements on such supports

Definitions

  • Embodiments described herein generally relate to the field of booting processing systems and, more particularly, to offloading processor memory training to an on-die controller module.
  • a processing system may include hardware and software components.
  • the software components may include one or more applications, an operating system (OS) , and firmware.
  • the applications may include control logic for performing the work that is of value to the user of the processing system.
  • the applications run on top of the OS, which runs at a lower logical level that the applications (i.e., closer to the hardware) to provide an underlying environment or abstraction layer that makes it easier to create and execute the applications.
  • the firmware runs at an even lower logical level to provide an underlying environment or abstraction layers which makes it easier to create and execute the OS.
  • the firmware may establish a basic input/output system (BIOS) , and the OS may use that BIOS to communicate with different hardware component within the processing system.
  • BIOS basic input/output system
  • the OS and the applications execute out of random-access memory (RAM) , which is volatile. Some or all of the firmware may also execute out of RAM. However, since the RAM is volatile, the environment for performing useful work basically disappears whenever the processing system is turned off. Consequently, whenever the processing system is turned on, the processing system should recreate that environment before useful work can be performed.
  • the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ”
  • the time that elapses during the boot process may be referred to as the “boot time. ”
  • FIGS. 1A and 1B depict illustrations of a processing system to provide offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
  • FIG. 2 is a block diagram depicting a system implementing memory training between an integrated memory controller and a memory module implemented during offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
  • FIG. 3 illustrates an example flow for offloading processor memory training to an on-die controller module, in accordance with certain embodiments.
  • FIG. 4 illustrates another example flow for offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
  • FIG. 5 is a schematic diagram of an illustrative electronic computing device to enable offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
  • Embodiments described herein are directed to offloading processor memory training to an on-die controller module.
  • the processing system may execute a boot process before the processing system can be utilized for work.
  • the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ”
  • the time that elapses during the boot process may be referred to as the “boot time. ”
  • the control logic or firmware that performs or controls the boot process may be referred to as the “system firmware, ” the “system bootcode, ” the “platform bootcode, ” or simply the “bootcode. ”
  • BIOS basic input/output system
  • MRC memory reference code
  • BIOS boot time when the system has a full memory configuration in a multi-socket system.
  • a 2-socket configuration 24 dual in-line memory modules (DIMMs)
  • DIMMs dual in-line memory modules
  • 4 socket 48 DIMMs
  • 8 socket 96 DIMMs
  • the memory training code (e.g., MRC) uses the memory controller to test, for example, the double data rate (DDR) bus and adjust timing/reference voltage (Vref) for determined margins for each channel DIMM. Furthermore, the training data is based on the motherboard hardware and DIMM. As a result, the memory training process cannot be skipped in order to reduce the BIOS boot time.
  • DDR double data rate
  • Vref timing/reference voltage
  • the time incurred for some of the stages remains the same regardless of the number of sockets in the server system.
  • a full DIMM configuration above two sockets in a server system would not satisfy a 15 second boot time standard, for example.
  • a cold boot time may be in the order of multiple minutes.
  • One prior approach to optimize the memory training time in the boot process is to wake-up an application processor in the pre-memory environment and assign each application processor to train each channel DIMM.
  • all of the memory channels in a multi-socket system can be training in parallel to save boot time.
  • a limitation to this approach is that it utilizes complex code to enable the boot strap processor/application processor in pre-memory to train memory.
  • this approach utilizes a large cache size, which is not feasible in current systems.
  • this approach would resulting in incurring more time than typical to train a one channel DIMM.
  • Another prior approach to optimize the memory training time in the boot process is to add a fast boot option to save and restore the memory training data to reduce the full memory training time.
  • a limitation with this approach is that the save and restore methodology may not work in some cases (i.e., failed training) and can force the system to do a full memory training when the environment is changing, such as: temperature, humidity, brightness, and so on.
  • the first boot time using this approach is still too long to meet customer requirements.
  • a further prior approach to optimize memory training time in the boot process is to reduce the overall memory configuration (e.g., reduce overall DIMM configuration) .
  • this approach negatively impacts the overall system memory performance.
  • Implementations of the disclosure provide an approach to optimize the memory training time by offloading central processing unit (CPU) memory training (e.g., MRC) to an on-die controller module to improve BIOS boot performance significantly and to provide for modular firmware.
  • the on-die controller module is a Secure Startup Services Module (S3M) .
  • S3M is an ARC (Argonaut RISC core) microcontroller in an uncore complex of a CPU, where the S3M provides ROM and error correcting code (ECC) RAM for firmware execution.
  • a processing system may include multiple dies (e.g., four dies) and may also integrate one complex programmable logic device (CPLD) into a package of the shared dies of the processing system.
  • Each die may include an on-die controller module (ODCM) (such as, for example, an S3M) .
  • ODCM on-die controller module
  • the ODCM can run memory training code of the boot process in order to access an integrated memory controller (IMC) of the die to train the memory module (s) (e.g., DIMM) independently.
  • IMC integrated memory controller
  • each die when CPU is ‘power OK’, the CPU core of each die can fetch basic input/output system (BIOS) code to initialize the system hardware and the ODCM can fetch the memory training code to run memory training at the same time in parallel.
  • BIOS basic input/output system
  • each die’s ODCM can train its associated memory module independently in parallel.
  • the memory training time is consistent (e.g., 8 seconds) regardless of the particular memory module configuration.
  • the boot process time is also consistent (e.g., 13 seconds) regardless of the memory configuration (e.g., 1 socket, 2 sockets, 4 sockets, 8 sockets) of the full memory module (e.g., DIMM) system.
  • implementations of the disclosure improve overall system availability time to the operating system (OS) . Furthermore, the reduced boot time of implementations of the disclosure improves the boot performance when security patches of firmware or the OS utilize a system reboot. The reduced boot time further reduces time in system validation and system qualification processes.
  • OS operating system
  • FIGS. 1A and 1B depict illustrations of a processing system 100 to provide offloading processor memory training to an on-die controller module (ODCM) , according to implementations of the disclosure.
  • processing system 100 may be embodied as and/or may include any number and type of hardware and/or software components, such as (without limitation) a processor, including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on.
  • a processor including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on.
  • CPU central processing unit
  • GPU graphics processing unit
  • Processing system 100 may also include components such as drivers (also referred to as “driver logic” , user-mode driver (UMD) , UMD, user-mode driver framework (UMDF) , UMDF, “GPU driver” , “graphics driver logic” , or simply “driver” ) , memory, network devices, or the like, as well as input/output (I/O) sources, such as touchscreens, touch panels, touch pads, virtual or regular keyboards, virtual or regular mice, ports, connectors, etc.
  • I/O input/output
  • processing system 100 may include or enable operation of an operating system (OS) serving as an interface between hardware and/or physical resources of the processing system 100 and a user.
  • OS operating system
  • processing system 100 may vary from implementation to implementation depending upon numerous factors, such as price constraints, performance requirements, technological improvements, or other circumstances.
  • Embodiments may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC) , and/or a field programmable gate array (FPGA) .
  • the terms "logic” , “module” , “component” , “engine” , and “mechanism” may include, by way of example, software or hardware and/or combinations of software and hardware.
  • Processing system 100 may further be a part of and/or assist in the operation of (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electromechanical agent or machine, etc.
  • autonomous machines or artificially intelligent agents may include (without limitation) robots, autonomous vehicles (e.g., self-driving cars, self-flying planes, self-sailing boats, etc. ) , autonomous equipment (self-operating construction vehicles, self-operating medical equipment, etc. ) , and/or the like.
  • “computing device” may be interchangeably referred to as “autonomous machine” or “artificially intelligent agent” or simply “robot” .
  • autonomous vehicle and “autonomous driving” are referenced throughout this document, embodiments are not limited as such.
  • autonomous vehicle is not limed to an automobile but that it may include any number and type of autonomous machines, such as robots, autonomous equipment, household autonomous devices, and/or the like, and any one or more tasks or operations relating to such autonomous machines may be interchangeably referenced with autonomous driving.
  • Processing system 100 may further include (without limitations) large computing systems, such as server computers, desktop computers, etc., and may further include set-top boxes (e.g., Internet-based cable television set-top boxes, etc. ) , global positioning system (GPS) -based devices, etc.
  • Processing system 100 may include mobile computing devices serving as communication devices, such as cellular phones including smartphones, personal digital assistants (PDAs) , tablet computers, laptop computers, e-readers, smart televisions, television platforms, wearable devices (e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc. ) , media players, etc.
  • PDAs personal digital assistants
  • tablet computers tablet computers
  • laptop computers e-readers
  • smart televisions television platforms
  • wearable devices e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc.
  • media players etc.
  • processing system 100 may include a mobile computing device employing a computer platform hosting an integrated circuit ( “IC” ) , such as system on a chip ( “SoC” or “SOC” ) , integrating various hardware and/or software components of processing system 100 on a single chip.
  • IC integrated circuit
  • SoC system on a chip
  • SOC system on a chip
  • Processing system 100 may host network interface (s) (not shown) to provide access to a network, such as a LAN, a wide area network (WAN) , a metropolitan area network (MAN) , a personal area network (PAN) , Bluetooth, a cloud network, a mobile network (e.g., 3 rd Generation (3G) , 4 th Generation (4G) , etc. ) , an intranet, the Internet, etc.
  • Network interface (s) may include, for example, a wireless network interface having antenna, which may represent one or more antenna (e) .
  • Network interface (s) may also include, for example, a wired network interface to communicate with remote devices via network cable, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
  • Embodiments may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein.
  • a machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories) , and magneto-optical disks, ROMs, RAMs, EPROMs (Erasable Programmable Read Only Memories) , EEPROMs (Electrically Erasable Programmable Read Only Memories) , magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
  • embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection) .
  • a remote computer e.g., a server
  • a requesting computer e.g., a client
  • a communication link e.g., a modem and/or network connection
  • term “user” may be interchangeably referred to as “viewer” , “observer” , “person” , “individual” , “end-user” , and/or the like. It is to be noted that throughout this document, terms like “graphics domain” may be referenced interchangeably with “graphics processing unit” , “graphics processor” , or simply “GPU” and similarly, “CPU domain” or “host domain” may be referenced interchangeably with “computer processing unit” , “application processor” , or simply “CPU” .
  • processing system 100 can include, a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors or processor cores.
  • the processing system 100 can be a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices such as within Internet-of-things (IoT) devices with wired or wireless connectivity to a local or wide area network.
  • SoC system-on-a-chip
  • processing system 100 may couple with, or be integrated within: a server-based gaming platform; a game console, including a game and media console; a mobile gaming console, a handheld game console, or an online game console.
  • the processing system 100 is part of a mobile phone, smart phone, tablet computing device or mobile Internet-connected device such as a laptop with low internal storage capacity.
  • Processing system 100 can also include, couple with, or be integrated within: a wearable device, such as a smart watch wearable device; smart eyewear or clothing enhanced with augmented reality (AR) or virtual reality (VR) features to provide visual, audio or tactile outputs to supplement real world visual, audio or tactile experiences or otherwise provide text, audio, graphics, video, holographic images or video, or tactile feedback; other augmented reality (AR) device; or other virtual reality (VR) device.
  • processing system 100 includes or is part of a television or set top box device.
  • processing system 100 can include, couple with, or be integrated within a self-driving vehicle such as a bus, tractor trailer, car, motor or electric power cycle, plane or glider (or any combination thereof) .
  • the self-driving vehicle may use processing system 100 to process the environment sensed around the vehicle.
  • the processing system 100 includes one or more processors, such as a CPU or GPU, which each include one or more processor cores to process instructions which, when executed, perform operations for system or user software.
  • processors such as a CPU or GPU
  • processor cores to process instructions which, when executed, perform operations for system or user software.
  • at least one of the one or more processor cores is configured to process a specific instruction set.
  • instruction set may facilitate Complex Instruction Set Computing (CISC) , Reduced Instruction Set Computing (RISC) , or computing via a Very Long Instruction Word (VLIW) .
  • CISC Complex Instruction Set Computing
  • RISC Reduced Instruction Set Computing
  • VLIW Very Long Instruction Word
  • processor cores may process a different instruction set which may include instructions to facilitate the emulation of other instruction sets.
  • Processor core may also include other processing devices, such as a Digital Signal Processor (DSP) .
  • DSP Digital Signal Processor
  • processing system 100 includes a plurality of dies shown as CPU Die 0 110A, CPU Die 1 110B, CPU Die 2 110C, and CPU Die 3 110D. It is contemplated that embodiments are not limited to any particular implementation of processing system 100 and that one or more of its components (e.g., more or less CPU dies, GPU dies, xPU dies, etc. ) may be variously implemented in embodiments of the disclosure.
  • dies 110A-110D are referred to as CPU dies, other types of dies, such as GPU dies, are also contemplated in implementations of the disclosure, and the term “CPU die” may be connoting a broad usage in implementations of the disclosure.
  • processing system 100 may include multiple dies 110A-110D and may also integrate a complex programmable logic device (CPLD) 120 into a package of the shared dies 110A-110D of the processing system 100.
  • CPLD complex programmable logic device
  • Each die 110A-110D of processing system 100 may include a core 111A, 111B, 111C, 111D and an uncore 112A, 112B, 112C, 112D.
  • the core 111A-111D may include computation units (e.g., ALU, FPU) and upper levels of caches (e.g., L1 and L2) .
  • the uncore 112A-112D may refer to a collection of components of the CPU die 110A-11D that are not in the core 111A-111D but are used for core performance.
  • the core 111A-111D includes components involved in executing instructions, including executions units, L1 and L2 cache, branch prediction logic, and so on
  • the uncore 112A-111D functions may include the last level cache (LLC) , integrated memory controllers (IMCs) , quick path interconnect (QPI) controllers, on-chip interconnect (OCI) , power control logic (PWR) , and so on.
  • LLC last level cache
  • IMCs integrated memory controllers
  • QPI quick path interconnect
  • OCI on-chip interconnect
  • PWR power control logic
  • each uncore 112A-112D of the CPU dies 110A-110D includes an ODCM 113A, 113B, 113C, 113D.
  • the ODCM 113A-113D is a Secure Startup Services Module (S3M) .
  • S3M is an ARC microcontroller with ROM and ECC RAM used for firmware execution.
  • each uncore 112A-112D of the CPU dies 110A-110D further includes an IMC 114A, 114B, 114C, 114D.
  • the IMC 114A-114D enables reading, writing, and refreshing the memory modules, such as dynamic random access memory (DRAM) that is part of a dual in-line memory modules (DIMM) 116A, 116B, 116C, 116D, associated with each CPU die 110A-110D.
  • DRAM dynamic random access memory
  • DIMM dual in-line memory modules
  • Other implementations and configurations of memory modules are possible in implementations of the disclosure and are not limited solely to those depicted in the illustrations herein.
  • CPU when CPU is ‘power OK’ (e.g., POWER_GOOD or POWER_OK signal is sent indicating that system voltages are within specification and that the system may proceed to boot and operate) , the CPU core 111A-111D of each die 110A-110D can fetch BIOS code to initialize the system hardware and the ODCM 113A-113D can fetch the memory training code (e.g., MRC 115) to run memory training at the same time in parallel.
  • BIOS code e.g., BIOS or POWER_OK signal is sent indicating that system voltages are within specification and that the system may proceed to boot and operate
  • the CPU core 111A-111D of each die 110A-110D can fetch BIOS code to initialize the system hardware and the ODCM 113A-113D can fetch the memory training code (e.g., MRC 115) to run memory training at the same time in parallel.
  • the memory training code e.g., MRC 115
  • the ODCM 113A-113D can run modular memory training code, such as MRC 115, corresponding to a boot process of processing system 100.
  • the ODCM 113A-113D can fetch the MRC 115 from the CPLD 120 and independently execute the MRC 115 as MRC 115A, 115B, 115D, 115D in parallel at each ODCM 113A-113D.
  • MRC 115A-115D The independent and parallel execution of MRC 115A-115D at ODCMs 113A-113D enables each ODCM 113A-11D to access the IMC 114A-114D of the corresponding CPU die 110A-110D to initialize memory training of the memory module (s) 116A-116D (e.g., DIMM) of the processing system 100 independently and in parallel.
  • the MRC 115 is separated from CPU silicon reference code (e.g., BIOS boot code) and dedicated for the on-die IMC 114A-114D, while at the same time acting as modular and common memory training code for all of the IMCs 114A-114D of the CPU dies 110A-110D of processing system 100.
  • the MRC 115A-115D and the BIOS initialization can begin in parallel with one another.
  • the system memory (e.g., DIMMs 116A-116D) is trained when the BIOS begins to utilize the system memory.
  • Implementations of the disclosure result in the memory training time being consistent (e.g., 8 seconds) regardless of the particular memory module configuration.
  • the boot process time is also consistent (e.g., 13 seconds) regardless of the memory configuration (e.g., 1 socket, 2 sockets, 4 sockets, 8 sockets) of the full memory module (e.g., DIMM) system.
  • FIG. 2 is a block diagram depicting a system 200 implementing memory training between an integrated memory controller and memory module implemented during offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
  • system 200 may include a memory controller 210 and a DIMM module 220.
  • memory controller 210 may be the same as IMC 114A-114D described with respect to FIGS. 1A and 1B
  • DIMM module 220 may be the same as DIMM 116A-116D described with respect to FIGS. 1A and 1B.
  • DIMM module 220 is further depicted as including multiple DRAMs 225 (e.g., DRAM 0 through DRAM N) .
  • the processor e.g., CPU die 110A-110D of FIGS. 1A, 1B
  • System 200 performs memory training steps and signal hand shaking between the memory controller 210 and DRAM 225, so that the processor’s computing does not experience a bottleneck.
  • the memory training steps may include, but are not limited to, early CTL/CLK training, early CMD/CLK training, receive enable, receive DQ (data pins) and DQS (Strobe pin) 217 basic per bit, write leveling, write fly by, transmit DQ/DQS 217 basic per bit, early DQ Vref training, CMD Vref training, and late CMD/CLK training, to name a few examples.
  • FIG. 1B provides another implementation of processing system 100 providing offloading processor memory training to an on-die controller module in accordance with implementations of the disclosure.
  • Processing system 100 of FIG. 1B includes a number of components depicted and described with respect to FIG. 1A and their description is applies equally to FIG. 1B.
  • Processing system 100 of FIG. 1B depicts that each CPU die 110A-110D includes an S3M (Secure Startup Services Module) as the ODCM 113A-113D of the die 110A-110D.
  • S3M is an ARC microcontroller with ROM and ECC RAM used for firmware execution.
  • S3M 113A-113D supports many standard I/O controller functions through a combination of basic hardware and firmware, such as UART, SPI, SMBUS, and so on.
  • FIG. 1B also depicts the CPLD 120 providing non-volatile RAM (NVRAM) 150 to store the MRC 115, where the CPLD 120 and NVRAM 150 is shared to all of the S3M 113A-113D of the CPU dies 110A-110D.
  • NVRAM 150 refers to memory that is random-access memory that retains data without applied power.
  • the S3M 113A-113D can access the CPLD’s 120 NVRAM 150 to execute the MRC 115 similar to the process described above with respect to FIG. 1A.
  • implementations of the disclosure can extend the functionality at the S3M 113A-113D to access the IMC 114A-114D to train the DIMM 116A-116D.
  • MRC 115 is stored in the CPLD NVRAM 150.
  • the NVRAM 150 can also store the extensible firmware interface (EFI) memory configuration setting variables and training data as well.
  • EFI extensible firmware interface
  • a serial presence detect (SPD) of the DIMM 116A-116D connects to the S3M 113A-113D via an S3M system management bus (SMBUS) interface 160A, 160B, 160C, 160D.
  • SPD refers to a standardized way to automatically access information about a memory module. SPD data stored on a memory module may contain, for example, timing parameters, manufacturer, serial number, and other useful information about the memory module.
  • the SMBUS refers to a single-ended two-wire bus for the purpose of lightweight communication.
  • the S3M 113A-113D can fetch the MRC 115 from CPLD NVRAM 150 and execute the MRC 115A-115D in parallel at each S3M 113A-113D of each CPU die 110A-110D.
  • the S3M 113A-113D can also read the DIMM 116A-116D SPD information via the SMBUS interface 160A-160D.
  • the S3M 113A-113D can read SPD data and memory configuration settings from the SPD information.
  • the S3M 113A-113D can then execute the MRC 115A-115D in order to initialize the IMC 114A-114D for the DIMM 116A-116D.
  • offloading memory training to an ODCM 113A-113D can support memory setup configuration. For example, once an end user has changed a memory setting in a BIOS setup menu, the BIOS code can send the updated memory setting data to a setup data area of the CPLD 120 through, for example, a mailbox interface. On a next boot of the system, the MRC 115 code in CPLD 120 can read the setup data in the setup data area of the CPLD 120 and use it to train the memory.
  • offloading memory training to an ODCM 113A-113D can support “fast boot” to bypass the memory training as well.
  • the MRC 115 can save ‘golden’ memory training data into a data area of the CPLD 120 and load this saved golden memory training data on the next boot.
  • FIG. 3 illustrates an example flow 300 for offloading processor memory training to an on-die controller module, in accordance with certain embodiments.
  • the various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device.
  • the example flow 300 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 of FIGS. 1A and/or 1B, such as an ODCM 113A-113D of processing system 100.
  • the embodiments are not limited in this context.
  • Block 310 the processor may detect a CPU power on signal. Flow 300 then proceeds in parallel to blocks 310 and 320.
  • Block 310 provides a memory training portion of a boot process and block 320 provides a BIOS initialization portion of the boot process. Implementations of the disclosure enable the memory training portion 310 and the BIOS initialization 320 portion of the boot process to execute in parallel.
  • the ODCM fetches memory training code (e.g., MRC) .
  • the memory training code is fetched from a shared CPLD of the package hosting the CPU die of the ODCM.
  • the memory training code is fetched from NVRAM of the CPLD.
  • SPD data and memory configuration is read by the ODCM from the memory module corresponding to the CPU die hosting the ODCM.
  • memory training of the memory module is initialized by the ODCM.
  • BIOS initialization of block 320 is also performed.
  • BIOS initialization 320 at block 322 the CPU fetches the BIOS code.
  • SEC security
  • CAR RAM
  • micro-code loading micro-code.
  • PEI pre-EFI initialization
  • the processor may determine whether the memory training status indicates that the memory training is complete at decision block 330. If not, flow 300 proceeds to block 340 to retry the memory training. In some implementations, the processor may wait a determined time interval and recheck if the memory training is complete. If the memory training status indicates that the memory training is complete at decision block 330, then flow 300 proceeds to block 350 where the boot process continues. Lastly, at block 360 the processor performs a boot to the EFI shell.
  • FIG. 4 illustrates another example flow 400 for offloading processor memory training to an on-die controller module, in accordance with certain embodiments.
  • the various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device.
  • the example flow 400 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 of FIGS. 1A and/or 1B, such as an ODCM 113A-113D of processing system 100.
  • the embodiments are not limited in this context.
  • the processor may fetch, by an on-die controller module (ODCM) of a processor die of the processor, memory training code from a shared CPLD of the processor.
  • ODCM on-die controller module
  • the memory training code is MRC.
  • the ODCM is an S3M.
  • the processor may execute the memory training code at the ODCM.
  • the processor may read, by the ODCM, memory module data and configuration settings from the memory module.
  • the ODCM reads SPD data of the memory module via an SMBUS interface to obtain the memory module data and configuration settings.
  • the processor initializes, via the ODCM that is executing the memory training code, an IMC of the processor die to train the memory module.
  • the memory module is a DIMM.
  • FIG. 5 is a schematic diagram of an illustrative electronic computing device to enable offloading processor memory training to an on-die controller module according to some embodiments.
  • the computing device 500 includes one or more processors 510 including one or more processors dies 518 each including ODCMs 564, such as ODCM 113A-113D described with respect to FIGS. 1A and 1B.
  • the computing device is to provide offloading processor memory training to an on-die controller module, as provided in FIGS. 1-4.
  • the computing device 500 may additionally include one or more of the following: cache 562, a graphical processing unit (GPU) 512 (which may be the hardware accelerator in some implementations) , a wireless input/output (I/O) interface 520, a wired I/O interface 530, system memory 540 (e.g., memory circuitry) , power management circuitry 550, non-transitory storage device 560, and a network interface 570 for connection to a network 572.
  • GPU graphical processing unit
  • I/O input/output
  • system memory 540 e.g., memory circuitry
  • power management circuitry 550 e.g., non-transitory storage device 560
  • network interface 570 for connection to a network 572.
  • Example, non-limiting computing devices 500 may include a desktop computing device, blade server device, workstation, or similar device or system.
  • the processor cores 518 are capable of executing machine-readable instruction sets 514, reading data and/or instruction sets 514 from one or more storage devices 560 and writing data to the one or more storage devices 560.
  • processors including portable electronic or handheld electronic devices, for instance smartphones, portable computers, wearable computers, consumer electronics, personal computers ( “PCs” ) , network PCs, minicomputers, server blades, mainframe computers, and the like.
  • the processor cores 518 may include any number of hardwired or configurable circuits, some or all of which may include programmable and/or configurable combinations of electronic components, semiconductor devices, and/or logic elements that are disposed partially or wholly in a PC, server, or other computing system capable of executing processor-readable instructions.
  • the computing device 500 includes a bus or similar communications link 516 that communicably couples and facilitates the exchange of information and/or data between various system components including the processor cores 518, the cache 562, the graphics processor circuitry 512, one or more wireless I/O interfaces 520, one or more wired I/O interfaces 530, one or more storage devices 560, and/or one or more network interfaces 570.
  • the computing device 500 may be referred to in the singular herein, but this is not intended to limit the embodiments to a single computing device 500, since in certain embodiments, there may be more than one computing device 500 that incorporates, includes, or contains any number of communicably coupled, collocated, or remote networked circuits or devices.
  • the processor cores 518 may include any number, type, or combination of currently available or future developed devices capable of executing machine-readable instruction sets.
  • the processor cores 518 may include (or be coupled to) but are not limited to any current or future developed single-or multi-core processor or microprocessor, such as: on or more systems on a chip (SOCs) ; central processing units (CPUs) ; digital signal processors (DSPs) ; graphics processing units (GPUs) ; application-specific integrated circuits (ASICs) , programmable logic units, field programmable gate arrays (FPGAs) , and the like.
  • SOCs systems on a chip
  • CPUs central processing units
  • DSPs digital signal processors
  • GPUs graphics processing units
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • the bus 516 that interconnects at least some of the components of the computing device 500 may employ any currently available or future developed serial or parallel bus structures or architectures.
  • the system memory 540 may include read-only memory ( “ROM” ) 542 and random access memory ( “RAM” ) 546.
  • ROM read-only memory
  • RAM random access memory
  • a portion of the ROM 542 may be used to store or otherwise retain a basic input/output system ( “BIOS” ) 544.
  • BIOS basic input/output system
  • the BIOS 544 provides basic functionality to the computing device 500, for example by causing the processor cores 518 to load and/or execute one or more machine-readable instruction sets 514.
  • At least some of the one or more machine-readable instruction sets 514 cause at least a portion of the processor cores 518 to provide, create, produce, transition, and/or function as a dedicated, specific, and particular machine, for example a word processing machine, a digital image acquisition machine, a media playing machine, a gaming system, a communications device, a smartphone, or similar.
  • the computing device 500 may include at least one wireless input/output (I/O) interface 520.
  • the at least one wireless I/O interface 520 may be communicably coupled to one or more physical output devices 522 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) .
  • the at least one wireless I/O interface 520 may communicably couple to one or more physical input devices 524 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) .
  • the at least one wireless I/O interface 520 may include any currently available or future developed wireless I/O interface.
  • Example wireless I/O interfaces include, but are not limited to: near field communication (NFC) , and similar.
  • NFC near field communication
  • the computing device 500 may include one or more wired input/output (I/O) interfaces 530.
  • the at least one wired I/O interface 530 may be communicably coupled to one or more physical output devices 522 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) .
  • the at least one wired I/O interface 530 may be communicably coupled to one or more physical input devices 524 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) .
  • the wired I/O interface 530 may include any currently available or future developed I/O interface.
  • Example wired I/O interfaces include, but are not limited to: universal serial bus (USB) , IEEE 1394 ( “FireWire” ) , and similar.
  • the computing device 500 may include one or more communicably coupled, non-transitory, data storage devices 560.
  • the data storage devices 560 may include one or more hard disk drives (HDDs) and/or one or more solid-state storage devices (SSDs) .
  • the one or more data storage devices 560 may include any current or future developed storage appliances, network storage devices, and/or systems. Non-limiting examples of such data storage devices 560 may include, but are not limited to, any current or future developed non-transitory storage appliances or devices, such as one or more magnetic storage devices, one or more optical storage devices, one or more electro-resistive storage devices, one or more molecular storage devices, one or more quantum storage devices, or various combinations thereof.
  • the one or more data storage devices 560 may include one or more removable storage devices, such as one or more flash drives, flash memories, flash storage units, or similar appliances or devices capable of communicable coupling to and decoupling from the computing device 500.
  • the one or more data storage devices 560 may include interfaces or controllers (not shown) communicatively coupling the respective storage device or system to the bus 516.
  • the one or more data storage devices 560 may store, retain, or otherwise contain machine-readable instruction sets, data structures, program modules, data stores, databases, logical structures, and/or other data useful to the processor cores 518 and/or graphics processor circuitry 512 and/or one or more applications executed on or by the processor cores 518 and/or graphics processor circuitry 512.
  • one or more data storage devices 560 may be communicably coupled to the processor cores 518, for example via the bus 516 or via one or more wired communications interfaces 530 (e.g., Universal Serial Bus or USB) ; one or more wireless communications interfaces 520 (e.g., Near Field Communication or NFC) ; and/or one or more network interfaces 570 (IEEE 802.3 or Ethernet, IEEE 802.11, or etc. ) .
  • wired communications interfaces 530 e.g., Universal Serial Bus or USB
  • wireless communications interfaces 520 e.g., Near Field Communication or NFC
  • network interfaces 570 IEEE 802.3 or Ethernet, IEEE 802.11, or etc.
  • Processor-readable instruction sets 514 and other programs, applications, logic sets, and/or modules may be stored in whole or in part in the system memory 540. Such instruction sets 514 may be transferred, in whole or in part, from the one or more data storage devices 560. The instruction sets 514 may be loaded, stored, or otherwise retained in system memory 540, in whole or in part, during execution by the processor cores 518 and/or graphics processor circuitry 512.
  • the computing device 500 may include power management circuitry 550 that controls one or more operational aspects of the energy storage device 552.
  • the energy storage device 552 may include one or more primary (i.e., non-rechargeable) or secondary (i.e., rechargeable) batteries or similar energy storage devices.
  • the energy storage device 552 may include one or more supercapacitors or ultracapacitors.
  • the power management circuitry 550 may alter, adjust, or control the flow of energy from an external power source 554 to the energy storage device 552 and/or to the computing device 500.
  • the power source 554 may include, but is not limited to, a solar power system, a commercial electric grid, a portable generator, an external energy storage device, or any combination thereof.
  • the processor cores 518, the graphics processor circuitry 512, the wireless I/O interface 520, the wired I/O interface 530, the storage device 560, and the network interface 570 are illustrated as communicatively coupled to each other via the bus 516, thereby providing connectivity between the above-described components.
  • the above-described components may be communicatively coupled in a different manner than illustrated in FIG. 5.
  • one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via one or more intermediary components (not shown) .
  • one or more of the above-described components may be integrated into the processor cores 518 and/or the graphics processor circuitry 512.
  • all or a portion of the bus 516 may be omitted and the components are coupled directly to each other using suitable wired or wireless connections.
  • Example 1 is a system to facilitate offloading processor memory training to an on-die controller module.
  • the system of Example 1 comprises a processing system comprising a complex programmable logic device (CPLD) ; one or more memory modules; and a central processing unit (CPU) die communicably couple to the CPLD and to at least one of the one or more memory modules.
  • the CPU die comprising: at least one core; an integrated memory controller (IMC) responsive to the at least one core; and an on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of the one or more memory modules independently.
  • IMC integrated memory controller
  • Example 2 the subject matter of Example 1 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) .
  • S3M secure startup services module
  • Example 3 the subject matter of any one of Examples 1-2 can optionally include wherein the S3M is an ARC microcontroller.
  • Example 4 the subject matter of any one of Examples 1-3 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  • Example 5 the subject matter of any one of Examples 1-4 can optionally include wherein the on-die controller module is to fetch the memory training code from the NVRAM of the CPLD.
  • Example 6 the subject matter of any one of Examples 1-5 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the CPU die.
  • Example 7 the subject matter of any one of Examples 1-6 can optionally include wherein the at least one memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die controller module.
  • DIMM dual in-line memory module
  • SPD serial presence detect
  • SMBUS system management bus
  • Example 8 the subject matter of any one of Examples 1-7 can optionally include wherein the on-die controller module reads SPD data and memory configuration settings from the DIMM via the SMBUS interface.
  • Example 9 the subject matter of any one of Examples 1-8 can optionally include wherein changes to memory settings in BIOS cause BIOS code to send updating memory setting data to a setup data of the CPLD via a mailbox interface, and wherein the memory training code in the CPLD to read the setup data for use in training the one or more memory modules.
  • Example 10 is a method for facilitating offloading processor memory training to an on-die controller module.
  • the method of Example 10 can optional include fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; executing, by the on-die controller module, the memory training code at the processor die; reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
  • CPLD complex programmable logic device
  • Example 11 the subject matter of Example 10 can optionally include wherein the memory training code is memory reference code (MRC) .
  • the subject matter of any one of Examples 10-11 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) .
  • the subject matter of any one of Examples 10-12 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  • MRC memory reference code
  • S3M secure startup services module
  • Example 13 the subject matter of any one of Examples 10-12 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  • NVRAM non-volatile random access memory
  • Example 14 the subject matter of any one of Examples 10-13 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the package of the processor die.
  • the subject matter of any one of Examples 10-14 can optionally include wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
  • DIMM dual in-line memory module
  • SPD serial presence detect
  • SMBUS system management bus
  • Example 16 is a non-transitory computer-readable storage medium for facilitating offloading processor memory training to an on-die controller module.
  • the at non-transitory computer-readable storage medium of Example 16 comprises executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; executing, by the on-die controller module, the memory training code at the processor die; reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
  • IMC integrated memory controller
  • Example 17 the subject matter of Example 16 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) .
  • S3M secure startup services module
  • Example 18 the subject matter of any one of Examples 16-17 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  • NVRAM non-volatile random access memory
  • Example 19 the subject matter of any one of Examples 16-18 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the package of the processor die.
  • Example 20 the subject matter of any one of Examples 16-19 can optionally include wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
  • DIMM dual in-line memory module
  • SPD serial presence detect
  • SMBUS system management bus
  • Example 21 is an apparatus to facilitate offloading processor memory training to an on-die controller module.
  • the apparatus of Example 21 comprises a central processing unit (CPU) die comprising: at least one core; an integrated memory controller (IMC) responsive to the at least one core; and an on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of one or more memory modules independently.
  • CPU central processing unit
  • IMC integrated memory controller
  • IMC integrated memory controller
  • on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of one or more memory modules independently.
  • Example 22 the subject matter of Example 21 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) .
  • S3M secure startup services module
  • Example 23 the subject matter of any one of Examples 21-22 can optionally include wherein the S3M is an ARC microcontroller.
  • Example 24 the subject matter of any one of Examples 21-23 can optionally include wherein the ODCM obtains the memory training code from a complex programmable logic device (CPLD) that comprises non-volatile random access memory (NVRAM) to store the memory training code.
  • CPLD complex programmable logic device
  • NVRAM non-volatile random access memory
  • Example 25 the subject matter of any one of Examples 21-24 can optionally include wherein the on-die controller module is to fetch the memory training code from the NVRAM of the CPLD.
  • Example 26 the subject matter of any one of Examples 21-25 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the CPU die.
  • the at least one memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die controller module.
  • the on-die controller module reads SPD data and memory configuration settings from the DIMM via the SMBUS interface.
  • Example 29 the subject matter of any one of Examples 21-28 can optionally include wherein changes to memory settings in BIOS cause BIOS code to send updating memory setting data to a setup data of the CPLD via a mailbox interface, and wherein the memory training code in the CPLD to read the setup data for use in training the one or more memory modules.
  • Example 30 is an apparatus for facilitating offloading processor memory training to an on-die controller module according to implementations of the disclosure.
  • the apparatus of Example 30 can comprise means for fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; means for executing, by the on-die controller module, the memory training code at the processor die; means for reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and means for initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
  • CPLD complex programmable logic device
  • Example 31 the subject matter of Example 30 can optionally include the apparatus further configured to perform the method of any one of the Examples 11 to 15.
  • Example 32 is at least one machine readable medium comprising a plurality of instructions that in response to being executed on a computing device, cause the computing device to carry out a method according to any one of Examples 10-15.
  • Example 33 is an apparatus for facilitating offloading processor memory training to an on-die controller module, configured to perform the method of any one of Examples 10-15.
  • Example 34 is an apparatus for facilitating offloading processor memory training to an on-die controller module comprising means for performing the method of any one of claims 10 to 15. Specifics in the Examples may be used anywhere in one or more embodiments.
  • Various embodiments may include various processes. These processes may be performed by hardware components or may be embodied in computer program or machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.
  • Portions of various embodiments may be provided as a computer program product, which may include a computer-readable medium (e.g., non-transitory computer-readable storage medium) having stored thereon computer program instructions, which may be used to program a computer (or other electronic devices) for execution by one or more processors to perform a process according to certain embodiments.
  • the computer-readable medium may include, but is not limited to, magnetic disks, optical disks, read-only memory (ROM) , random access memory (RAM) , erasable programmable read-only memory (EPROM) , electrically-erasable programmable read-only memory (EEPROM) , magnetic or optical cards, flash memory, or other type of computer-readable medium suitable for storing electronic instructions.
  • embodiments may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer.
  • element A may be directly coupled to element B or be indirectly coupled through, for example, element C.
  • a component, feature, structure, process, or characteristic A “causes” a component, feature, structure, process, or characteristic B, it means that “A” is at least a partial cause of “B” but that there may also be at least one other component, feature, structure, process, or characteristic that assists in causing “B. ” If the specification indicates that a component, feature, structure, process, or characteristic “may” , “might” , or “could” be included, that particular component, feature, structure, process, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, this does not mean there is only one of the described elements.
  • An embodiment is an implementation or example.
  • Reference in the specification to “an embodiment, ” “one embodiment, ” “some embodiments, ” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments.
  • the various appearances of “an embodiment, ” “one embodiment, ” or “some embodiments” are not all referring to the same embodiments. It should be appreciated that in the foregoing description of exemplary embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various novel aspects.

Landscapes

  • Stored Programmes (AREA)

Abstract

Embodiments are directed to offloading processor memory training to an on-die controller module. An embodiment of a system includes a complex programmable logic device (CPLD), one or more memory modules, and a central processing unit (CPU) die communicably couple to the CPLD and to at least one of the one or more memory modules. The CPU die can further comprise at least one core, an integrated memory controller (IMC) responsive to the at least one core, and an on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of the one or more memory modules independently.

Description

[Title established by the ISA under Rule 37.2] OFFLOADING PROCESSOR MEMORY TRAINING TO ON-DIE CONTROLLER MODULE TECHNICAL FIELD
Embodiments described herein generally relate to the field of booting processing systems and, more particularly, to offloading processor memory training to an on-die controller module.
BACKGROUND
A processing system may include hardware and software components. The software components may include one or more applications, an operating system (OS) , and firmware. The applications may include control logic for performing the work that is of value to the user of the processing system. In the processing system, the applications run on top of the OS, which runs at a lower logical level that the applications (i.e., closer to the hardware) to provide an underlying environment or abstraction layer that makes it easier to create and execute the applications. The firmware runs at an even lower logical level to provide an underlying environment or abstraction layers which makes it easier to create and execute the OS. For instance, the firmware may establish a basic input/output system (BIOS) , and the OS may use that BIOS to communicate with different hardware component within the processing system.
Typically, the OS and the applications execute out of random-access memory (RAM) , which is volatile. Some or all of the firmware may also execute out of RAM. However, since the RAM is volatile, the environment for performing useful work basically disappears whenever the processing system is turned off. Consequently, whenever the processing system is turned on, the processing system should recreate that environment before useful work can be performed. For purposes of this disclosure, the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ” Similarly, the time that elapses during the boot process may be referred to as the “boot time. ”
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments described here are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
FIGS. 1A and 1B depict illustrations of a processing system to provide offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
FIG. 2 is a block diagram depicting a system implementing memory training between an integrated memory controller and a memory module implemented during offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
FIG. 3 illustrates an example flow for offloading processor memory training to an on-die controller module, in accordance with certain embodiments.
FIG. 4 illustrates another example flow for offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
FIG. 5 is a schematic diagram of an illustrative electronic computing device to enable offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure.
DETAILED DESCRIPTION
Embodiments described herein are directed to offloading processor memory training to an on-die controller module.
In response to a processing system being turned on or reset, the processing system may execute a boot process before the processing system can be utilized for work. As discussed herein, the operations for preparing a processing system to execute an OS may be referred to as the “boot process. ” Similarly, the time that elapses during the boot process may be referred to as the “boot time. ”
The control logic or firmware that performs or controls the boot process may be referred to as the “system firmware, ” the “system bootcode, ” the “platform bootcode, ” or simply the “bootcode. ”
In current data centers, such as Internet Data Centers (IDCs) , rack and cloud systems typically request a basic input/output system (BIOS) boot time to be less than 15 seconds. However, the memory training code (such as memory reference code (MRC) ) can spend more than 15 seconds in BIOS boot time when the system has a full memory configuration in a multi-socket system. For example, a 2-socket configuration (24 dual in-line memory modules (DIMMs) ) may take 15 seconds of boot time, while a 4 socket (48 DIMMs) configuration may take 22 seconds of boot time, and an 8 socket (96 DIMMs) configuration may take 63 seconds of boot time.
During memory training, the memory training code (e.g., MRC) uses the memory controller to test, for example, the double data rate (DDR) bus and adjust timing/reference voltage (Vref) for determined margins for each channel DIMM. Furthermore, the training data is based on the motherboard hardware and DIMM. As a result, the memory training process cannot be skipped in order to reduce the BIOS boot time.
Conventionally, in multi-socket processing systems, the time incurred for some of the stages (e.g., BIST/FIT/ACM, SEC, PEI-KTI, DXE, BDS, etc. ) of the boot process remains the same regardless of the number of sockets in the server system. However, in the same multi-socket processing systems, the time incurred for the memory training stage of the boot process can vary based on the particular memory configuration of the system. For example, the time incurred for the memory training stage increases relative to the number of sockets in a full DIMM configuration of the system (e.g., the boot time: 1 Socket = 17 seconds; 2 Sockets = 24 seconds; 4 Sockets = 31 seconds) . As such, conventionally, a full DIMM configuration above two sockets in a server system would not satisfy a 15 second boot time standard, for example. As a result, in some conventional processing systems, a cold boot time may be in the order of multiple minutes.
One prior approach to optimize the memory training time in the boot process is to wake-up an application processor in the pre-memory environment and  assign each application processor to train each channel DIMM. As a result, all of the memory channels in a multi-socket system can be training in parallel to save boot time. However, a limitation to this approach is that it utilizes complex code to enable the boot strap processor/application processor in pre-memory to train memory. Moreover, this approach utilizes a large cache size, which is not feasible in current systems. In addition, this approach would resulting in incurring more time than typical to train a one channel DIMM.
Another prior approach to optimize the memory training time in the boot process is to add a fast boot option to save and restore the memory training data to reduce the full memory training time. However, a limitation with this approach is that the save and restore methodology may not work in some cases (i.e., failed training) and can force the system to do a full memory training when the environment is changing, such as: temperature, humidity, brightness, and so on. Furthermore, the first boot time using this approach is still too long to meet customer requirements.
A further prior approach to optimize memory training time in the boot process is to reduce the overall memory configuration (e.g., reduce overall DIMM configuration) . However, this approach negatively impacts the overall system memory performance.
Implementations of the disclosure provide an approach to optimize the memory training time by offloading central processing unit (CPU) memory training (e.g., MRC) to an on-die controller module to improve BIOS boot performance significantly and to provide for modular firmware. In one implementation, the on-die controller module is a Secure Startup Services Module (S3M) . S3M is an ARC (Argonaut RISC core) microcontroller in an uncore complex of a CPU, where the S3M provides ROM and error correcting code (ECC) RAM for firmware execution.
In one implementation, a processing system may include multiple dies (e.g., four dies) and may also integrate one complex programmable logic device (CPLD) into a package of the shared dies of the processing system. Each die may include an on-die controller module (ODCM) (such as, for example, an S3M) . In implementations of the disclosure, the ODCM can run memory training code of the boot process in order to access an integrated memory controller (IMC) of the die to train the memory module (s)  (e.g., DIMM) independently. In implementations of the disclosure, when CPU is ‘power OK’, the CPU core of each die can fetch basic input/output system (BIOS) code to initialize the system hardware and the ODCM can fetch the memory training code to run memory training at the same time in parallel. After offloading the memory training to the ODCM in implementations of the disclosure, each die’s ODCM can train its associated memory module independently in parallel. As such, the memory training time is consistent (e.g., 8 seconds) regardless of the particular memory module configuration. As a result, the boot process time is also consistent (e.g., 13 seconds) regardless of the memory configuration (e.g., 1 socket, 2 sockets, 4 sockets, 8 sockets) of the full memory module (e.g., DIMM) system.
By reducing boot time, implementations of the disclosure improve overall system availability time to the operating system (OS) . Furthermore, the reduced boot time of implementations of the disclosure improves the boot performance when security patches of firmware or the OS utilize a system reboot. The reduced boot time further reduces time in system validation and system qualification processes.
FIGS. 1A and 1B depict illustrations of a processing system 100 to provide offloading processor memory training to an on-die controller module (ODCM) , according to implementations of the disclosure. As illustrated in FIGS. 1A and 1B, processing system 100 may be embodied as and/or may include any number and type of hardware and/or software components, such as (without limitation) a processor, including but not limited to, a central processing unit ( “CPU” or simply “application processor” ) , a graphics processing unit ( “GPU” or simply “graphics processor” ) , and so on. Processing system 100 may also include components such as drivers (also referred to as “driver logic” , user-mode driver (UMD) , UMD, user-mode driver framework (UMDF) , UMDF, “GPU driver” , “graphics driver logic” , or simply “driver” ) , memory, network devices, or the like, as well as input/output (I/O) sources, such as touchscreens, touch panels, touch pads, virtual or regular keyboards, virtual or regular mice, ports, connectors, etc. Although not specifically illustrated, processing system 100 may include or enable operation of an operating system (OS) serving as an interface between hardware and/or physical resources of the processing system 100 and a user.
It is to be appreciated that a lesser or more equipped system than the example described above may be preferred for certain implementations. Therefore, the configuration of processing system 100 may vary from implementation to implementation depending upon numerous factors, such as price constraints, performance requirements, technological improvements, or other circumstances.
Embodiments may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC) , and/or a field programmable gate array (FPGA) . The terms "logic" , “module” , “component” , “engine” , and “mechanism” may include, by way of example, software or hardware and/or combinations of software and hardware.
Processing system 100 may further be a part of and/or assist in the operation of (without limitations) an autonomous machine or an artificially intelligent agent, such as a mechanical agent or machine, an electronics agent or machine, a virtual agent or machine, an electromechanical agent or machine, etc. Examples of autonomous machines or artificially intelligent agents may include (without limitation) robots, autonomous vehicles (e.g., self-driving cars, self-flying planes, self-sailing boats, etc. ) , autonomous equipment (self-operating construction vehicles, self-operating medical equipment, etc. ) , and/or the like. Throughout this document, “computing device” may be interchangeably referred to as “autonomous machine” or “artificially intelligent agent” or simply “robot” .
It contemplated that although “autonomous vehicle” and “autonomous driving” are referenced throughout this document, embodiments are not limited as such. For example, “autonomous vehicle” is not limed to an automobile but that it may include any number and type of autonomous machines, such as robots, autonomous equipment, household autonomous devices, and/or the like, and any one or more tasks or operations relating to such autonomous machines may be interchangeably referenced with autonomous driving.
Processing system 100 may further include (without limitations) large computing systems, such as server computers, desktop computers, etc., and may further  include set-top boxes (e.g., Internet-based cable television set-top boxes, etc. ) , global positioning system (GPS) -based devices, etc. Processing system 100 may include mobile computing devices serving as communication devices, such as cellular phones including smartphones, personal digital assistants (PDAs) , tablet computers, laptop computers, e-readers, smart televisions, television platforms, wearable devices (e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc. ) , media players, etc. For example, in one embodiment, processing system 100 may include a mobile computing device employing a computer platform hosting an integrated circuit ( “IC” ) , such as system on a chip ( “SoC” or “SOC” ) , integrating various hardware and/or software components of processing system 100 on a single chip.
Processing system 100 may host network interface (s) (not shown) to provide access to a network, such as a LAN, a wide area network (WAN) , a metropolitan area network (MAN) , a personal area network (PAN) , Bluetooth, a cloud network, a mobile network (e.g., 3 rd Generation (3G) , 4 th Generation (4G) , etc. ) , an intranet, the Internet, etc. Network interface (s) may include, for example, a wireless network interface having antenna, which may represent one or more antenna (e) . Network interface (s) may also include, for example, a wired network interface to communicate with remote devices via network cable, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
Embodiments may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein. A machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories) , and magneto-optical disks, ROMs, RAMs, EPROMs (Erasable Programmable Read Only Memories) , EEPROMs (Electrically Erasable Programmable Read Only Memories) , magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection) .
Throughout the document, term “user” may be interchangeably referred to as “viewer” , “observer” , “person” , “individual” , “end-user” , and/or the like. It is to be noted that throughout this document, terms like “graphics domain” may be referenced interchangeably with “graphics processing unit” , “graphics processor” , or simply “GPU” and similarly, “CPU domain” or “host domain” may be referenced interchangeably with “computer processing unit” , “application processor” , or simply “CPU” .
It is to be noted that terms like “node” , “computing node” , “server” , “server device” , “cloud computer” , “cloud server” , “cloud server computer” , “machine” , “host machine” , “device” , “computing device” , “computer” , “computing system” , and the like, may be used interchangeably throughout this document. It is to be further noted that terms like “application” , “software application” , “program” , “software program” , “package” , “software package” , and the like, may be used interchangeably throughout this document. Also, terms like “job” , “input” , “request” , “message” , and the like, may be used interchangeably throughout this document.
In one embodiment, processing system 100 can include, a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors or processor cores. In one embodiment, the processing system 100 can be a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices such as within Internet-of-things (IoT) devices with wired or wireless connectivity to a local or wide area network.
In one embodiment, processing system 100 may couple with, or be integrated within: a server-based gaming platform; a game console, including a game and media console; a mobile gaming console, a handheld game console, or an online game console. In some embodiments the processing system 100 is part of a mobile phone,  smart phone, tablet computing device or mobile Internet-connected device such as a laptop with low internal storage capacity. Processing system 100 can also include, couple with, or be integrated within: a wearable device, such as a smart watch wearable device; smart eyewear or clothing enhanced with augmented reality (AR) or virtual reality (VR) features to provide visual, audio or tactile outputs to supplement real world visual, audio or tactile experiences or otherwise provide text, audio, graphics, video, holographic images or video, or tactile feedback; other augmented reality (AR) device; or other virtual reality (VR) device. In some embodiments, processing system 100 includes or is part of a television or set top box device. In one embodiment processing system 100 can include, couple with, or be integrated within a self-driving vehicle such as a bus, tractor trailer, car, motor or electric power cycle, plane or glider (or any combination thereof) . The self-driving vehicle may use processing system 100 to process the environment sensed around the vehicle.
In some embodiments, the processing system 100 includes one or more processors, such as a CPU or GPU, which each include one or more processor cores to process instructions which, when executed, perform operations for system or user software. In some embodiments, at least one of the one or more processor cores is configured to process a specific instruction set. In some embodiments, instruction set may facilitate Complex Instruction Set Computing (CISC) , Reduced Instruction Set Computing (RISC) , or computing via a Very Long Instruction Word (VLIW) . One or more processor cores may process a different instruction set which may include instructions to facilitate the emulation of other instruction sets. Processor core may also include other processing devices, such as a Digital Signal Processor (DSP) .
For example, as shown in FIG. 1A, processing system 100 includes a plurality of dies shown as CPU Die 0 110A, CPU Die 1 110B, CPU Die 2 110C, and CPU Die 3 110D. It is contemplated that embodiments are not limited to any particular implementation of processing system 100 and that one or more of its components (e.g., more or less CPU dies, GPU dies, xPU dies, etc. ) may be variously implemented in embodiments of the disclosure. Although dies 110A-110D are referred to as CPU dies, other types of dies, such as GPU dies, are also contemplated in implementations of the  disclosure, and the term “CPU die” may be connoting a broad usage in implementations of the disclosure.
As discussed above, implementations of the disclosure provide an approach to optimize the memory training time by offloading processor memory training (also referred to herein as MRC) to an ODCM to improve BIOS boot performance and provide for modular firmware. In implementations of the disclosure, processing system 100 may include multiple dies 110A-110D and may also integrate a complex programmable logic device (CPLD) 120 into a package of the shared dies 110A-110D of the processing system 100.
Each die 110A-110D of processing system 100 may include a  core  111A, 111B, 111C, 111D and an  uncore  112A, 112B, 112C, 112D. The core 111A-111D may include computation units (e.g., ALU, FPU) and upper levels of caches (e.g., L1 and L2) . The uncore 112A-112D may refer to a collection of components of the CPU die 110A-11D that are not in the core 111A-111D but are used for core performance. While the core 111A-111D includes components involved in executing instructions, including executions units, L1 and L2 cache, branch prediction logic, and so on, the uncore 112A-111D functions may include the last level cache (LLC) , integrated memory controllers (IMCs) , quick path interconnect (QPI) controllers, on-chip interconnect (OCI) , power control logic (PWR) , and so on.
In implementations of the disclosure, each uncore 112A-112D of the CPU dies 110A-110D includes an  ODCM  113A, 113B, 113C, 113D. In one implementation, as shown in FIG. 1B, the ODCM 113A-113D is a Secure Startup Services Module (S3M) . As previously discussed, S3M is an ARC microcontroller with ROM and ECC RAM used for firmware execution.
As shown in FIGS. 1A and 1B, each uncore 112A-112D of the CPU dies 110A-110D further includes an  IMC  114A, 114B, 114C, 114D. The IMC 114A-114D enables reading, writing, and refreshing the memory modules, such as dynamic random access memory (DRAM) that is part of a dual in-line memory modules (DIMM) 116A, 116B, 116C, 116D, associated with each CPU die 110A-110D. Other implementations and configurations of memory modules are possible in implementations of the disclosure and are not limited solely to those depicted in the illustrations herein.
In implementations of the disclosure, when CPU is ‘power OK’ (e.g., POWER_GOOD or POWER_OK signal is sent indicating that system voltages are within specification and that the system may proceed to boot and operate) , the CPU core 111A-111D of each die 110A-110D can fetch BIOS code to initialize the system hardware and the ODCM 113A-113D can fetch the memory training code (e.g., MRC 115) to run memory training at the same time in parallel.
Specifically, the ODCM 113A-113D can run modular memory training code, such as MRC 115, corresponding to a boot process of processing system 100. The ODCM 113A-113D can fetch the MRC 115 from the CPLD 120 and independently execute the MRC 115 as  MRC  115A, 115B, 115D, 115D in parallel at each ODCM 113A-113D. The independent and parallel execution of MRC 115A-115D at ODCMs 113A-113D enables each ODCM 113A-11D to access the IMC 114A-114D of the corresponding CPU die 110A-110D to initialize memory training of the memory module (s) 116A-116D (e.g., DIMM) of the processing system 100 independently and in parallel. As such, the MRC 115 is separated from CPU silicon reference code (e.g., BIOS boot code) and dedicated for the on-die IMC 114A-114D, while at the same time acting as modular and common memory training code for all of the IMCs 114A-114D of the CPU dies 110A-110D of processing system 100. In implementations of the disclosure, the MRC 115A-115D and the BIOS initialization can begin in parallel with one another.
As a result, the system memory (e.g., DIMMs 116A-116D) is trained when the BIOS begins to utilize the system memory. Implementations of the disclosure result in the memory training time being consistent (e.g., 8 seconds) regardless of the particular memory module configuration. As a result, the boot process time is also consistent (e.g., 13 seconds) regardless of the memory configuration (e.g., 1 socket, 2 sockets, 4 sockets, 8 sockets) of the full memory module (e.g., DIMM) system.
FIG. 2 is a block diagram depicting a system 200 implementing memory training between an integrated memory controller and memory module implemented during offloading processor memory training to an on-die controller module, in accordance with implementations of the disclosure. In one implementation, system 200 may include a memory controller 210 and a DIMM module 220. In one implementation, memory controller 210 may be the same as IMC 114A-114D described with respect to  FIGS. 1A and 1B, and DIMM module 220 may be the same as DIMM 116A-116D described with respect to FIGS. 1A and 1B. DIMM module 220 is further depicted as including multiple DRAMs 225 (e.g., DRAM 0 through DRAM N) .
During memory training, the processor (e.g., CPU die 110A-110D of FIGS. 1A, 1B) programs the registers of memory controller 210 with a desired signal and the memory controller 210 tests (e.g., via command and control signals 215) a DDR bus and adjusts timing/Vref for determined margins for each channel DIMM 220. System 200 performs memory training steps and signal hand shaking between the memory controller 210 and DRAM 225, so that the processor’s computing does not experience a bottleneck. The memory training steps may include, but are not limited to, early CTL/CLK training, early CMD/CLK training, receive enable, receive DQ (data pins) and DQS (Strobe pin) 217 basic per bit, write leveling, write fly by, transmit DQ/DQS 217 basic per bit, early DQ Vref training, CMD Vref training, and late CMD/CLK training, to name a few examples.
FIG. 1B provides another implementation of processing system 100 providing offloading processor memory training to an on-die controller module in accordance with implementations of the disclosure. Processing system 100 of FIG. 1B includes a number of components depicted and described with respect to FIG. 1A and their description is applies equally to FIG. 1B. Processing system 100 of FIG. 1B depicts that each CPU die 110A-110D includes an S3M (Secure Startup Services Module) as the ODCM 113A-113D of the die 110A-110D. As previously discussed, S3M is an ARC microcontroller with ROM and ECC RAM used for firmware execution. S3M 113A-113D supports many standard I/O controller functions through a combination of basic hardware and firmware, such as UART, SPI, SMBUS, and so on.
FIG. 1B also depicts the CPLD 120 providing non-volatile RAM (NVRAM) 150 to store the MRC 115, where the CPLD 120 and NVRAM 150 is shared to all of the S3M 113A-113D of the CPU dies 110A-110D. NVRAM 150 refers to memory that is random-access memory that retains data without applied power. The S3M 113A-113D can access the CPLD’s 120 NVRAM 150 to execute the MRC 115 similar to the process described above with respect to FIG. 1A.
As the S3M 113A-113D and IMC 114A-114D are in the CPU uncore 112A-112D, implementations of the disclosure can extend the functionality at the S3M 113A-113D to access the IMC 114A-114D to train the DIMM 116A-116D. To extend the functionality at the S3M 113A-113D, MRC 115 is stored in the CPLD NVRAM 150. In some implementations, the NVRAM 150 can also store the extensible firmware interface (EFI) memory configuration setting variables and training data as well.
In one implementation, a serial presence detect (SPD) of the DIMM 116A-116D connects to the S3M 113A-113D via an S3M system management bus (SMBUS) interface 160A, 160B, 160C, 160D. SPD refers to a standardized way to automatically access information about a memory module. SPD data stored on a memory module may contain, for example, timing parameters, manufacturer, serial number, and other useful information about the memory module. The SMBUS refers to a single-ended two-wire bus for the purpose of lightweight communication.
When CPU power is ‘power OK’ , the S3M 113A-113D can fetch the MRC 115 from CPLD NVRAM 150 and execute the MRC 115A-115D in parallel at each S3M 113A-113D of each CPU die 110A-110D. The S3M 113A-113D can also read the DIMM 116A-116D SPD information via the SMBUS interface 160A-160D. The S3M 113A-113D can read SPD data and memory configuration settings from the SPD information. The S3M 113A-113D can then execute the MRC 115A-115D in order to initialize the IMC 114A-114D for the DIMM 116A-116D.
In some implementations, offloading memory training to an ODCM 113A-113D, such as S3M, can support memory setup configuration. For example, once an end user has changed a memory setting in a BIOS setup menu, the BIOS code can send the updated memory setting data to a setup data area of the CPLD 120 through, for example, a mailbox interface. On a next boot of the system, the MRC 115 code in CPLD 120 can read the setup data in the setup data area of the CPLD 120 and use it to train the memory.
In some implementations, offloading memory training to an ODCM 113A-113D, such as S3M, can support “fast boot” to bypass the memory training as well. The MRC 115 can save ‘golden’ memory training data into a data area of the CPLD 120 and load this saved golden memory training data on the next boot.
FIG. 3 illustrates an example flow 300 for offloading processor memory training to an on-die controller module, in accordance with certain embodiments. The various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device. The example flow 300 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 of FIGS. 1A and/or 1B, such as an ODCM 113A-113D of processing system 100. The embodiments are not limited in this context.
At block 310, the processor may detect a CPU power on signal. Flow 300 then proceeds in parallel to  blocks  310 and 320. Block 310 provides a memory training portion of a boot process and block 320 provides a BIOS initialization portion of the boot process. Implementations of the disclosure enable the memory training portion 310 and the BIOS initialization 320 portion of the boot process to execute in parallel.
In the memory training 310, at block 312 the ODCM (e.g., S3M) fetches memory training code (e.g., MRC) . In one implementation, the memory training code is fetched from a shared CPLD of the package hosting the CPU die of the ODCM. In some implementations, the memory training code is fetched from NVRAM of the CPLD. At block 314, SPD data and memory configuration is read by the ODCM from the memory module corresponding to the CPU die hosting the ODCM. At block 316, memory training of the memory module is initialized by the ODCM.
In parallel (or concurrently) with memory training 310, the BIOS initialization of block 320 is also performed. In the BIOS initialization 320, at block 322 the CPU fetches the BIOS code. At block 324, a security (SEC) phase is performed which can include enabling cache as RAM (CAR) and loading micro-code. At block 326, a pre-EFI initialization (PEI) phase is performed that can include KTI, PCH, CPU, and so on.
Subsequent to the memory training 310 and BIOS initialization 320, the processor may determine whether the memory training status indicates that the memory training is complete at decision block 330. If not, flow 300 proceeds to block 340 to retry the memory training. In some implementations, the processor may wait a determined time interval and recheck if the memory training is complete. If the memory  training status indicates that the memory training is complete at decision block 330, then flow 300 proceeds to block 350 where the boot process continues. Lastly, at block 360 the processor performs a boot to the EFI shell.
Some of the operations illustrated in FIG. 3 may be repeated, combined, modified or deleted where appropriate, and additional steps may also be added to the flow in various embodiments. Additionally, steps may be performed in any suitable order without departing from the scope of particular embodiments.
FIG. 4 illustrates another example flow 400 for offloading processor memory training to an on-die controller module, in accordance with certain embodiments. The various operations of the flow may be performed by any suitable circuitry, such as a controller of a host computing device, a controller of a memory module, or other components of a computing device. The example flow 400 may be representative of some or all the operations that may be executed by or implemented on one or more components of processing system 100 of FIGS. 1A and/or 1B, such as an ODCM 113A-113D of processing system 100. The embodiments are not limited in this context.
At block 410, the processor may fetch, by an on-die controller module (ODCM) of a processor die of the processor, memory training code from a shared CPLD of the processor. In one implementation, the memory training code is MRC. In one implementation, the ODCM is an S3M. At block 420, the processor may execute the memory training code at the ODCM.
At block 430, the processor may read, by the ODCM, memory module data and configuration settings from the memory module. In one implementation, the ODCM reads SPD data of the memory module via an SMBUS interface to obtain the memory module data and configuration settings. Lastly, at block 440, the processor initializes, via the ODCM that is executing the memory training code, an IMC of the processor die to train the memory module. In one implementation, the memory module is a DIMM.
Some of the operations illustrated in FIG. 4 may be repeated, combined, modified or deleted where appropriate, and additional steps may also be added to the flow in various embodiments. Additionally, steps may be performed in any suitable order without departing from the scope of particular embodiments.
FIG. 5 is a schematic diagram of an illustrative electronic computing device to enable offloading processor memory training to an on-die controller module according to some embodiments. In some embodiments, the computing device 500 includes one or more processors 510 including one or more processors dies 518 each including ODCMs 564, such as ODCM 113A-113D described with respect to FIGS. 1A and 1B. In some embodiments, the computing device is to provide offloading processor memory training to an on-die controller module, as provided in FIGS. 1-4.
The computing device 500 may additionally include one or more of the following: cache 562, a graphical processing unit (GPU) 512 (which may be the hardware accelerator in some implementations) , a wireless input/output (I/O) interface 520, a wired I/O interface 530, system memory 540 (e.g., memory circuitry) , power management circuitry 550, non-transitory storage device 560, and a network interface 570 for connection to a network 572. The following discussion provides a brief, general description of the components forming the illustrative computing device 500. Example, non-limiting computing devices 500 may include a desktop computing device, blade server device, workstation, or similar device or system.
In embodiments, the processor cores 518 are capable of executing machine-readable instruction sets 514, reading data and/or instruction sets 514 from one or more storage devices 560 and writing data to the one or more storage devices 560. Those skilled in the relevant art will appreciate that the illustrated embodiments as well as other embodiments may be practiced with other processor-based device configurations, including portable electronic or handheld electronic devices, for instance smartphones, portable computers, wearable computers, consumer electronics, personal computers ( “PCs” ) , network PCs, minicomputers, server blades, mainframe computers, and the like.
The processor cores 518 may include any number of hardwired or configurable circuits, some or all of which may include programmable and/or configurable combinations of electronic components, semiconductor devices, and/or logic elements that are disposed partially or wholly in a PC, server, or other computing system capable of executing processor-readable instructions.
The computing device 500 includes a bus or similar communications link 516 that communicably couples and facilitates the exchange of information and/or  data between various system components including the processor cores 518, the cache 562, the graphics processor circuitry 512, one or more wireless I/O interfaces 520, one or more wired I/O interfaces 530, one or more storage devices 560, and/or one or more network interfaces 570. The computing device 500 may be referred to in the singular herein, but this is not intended to limit the embodiments to a single computing device 500, since in certain embodiments, there may be more than one computing device 500 that incorporates, includes, or contains any number of communicably coupled, collocated, or remote networked circuits or devices.
The processor cores 518 may include any number, type, or combination of currently available or future developed devices capable of executing machine-readable instruction sets.
The processor cores 518 may include (or be coupled to) but are not limited to any current or future developed single-or multi-core processor or microprocessor, such as: on or more systems on a chip (SOCs) ; central processing units (CPUs) ; digital signal processors (DSPs) ; graphics processing units (GPUs) ; application-specific integrated circuits (ASICs) , programmable logic units, field programmable gate arrays (FPGAs) , and the like. Unless described otherwise, the construction and operation of the various blocks shown in FIG. 5 are of conventional design. Consequently, such blocks are not described in further detail herein, as they should be understood by those skilled in the relevant art. The bus 516 that interconnects at least some of the components of the computing device 500 may employ any currently available or future developed serial or parallel bus structures or architectures.
The system memory 540 may include read-only memory ( “ROM” ) 542 and random access memory ( “RAM” ) 546. A portion of the ROM 542 may be used to store or otherwise retain a basic input/output system ( “BIOS” ) 544. The BIOS 544 provides basic functionality to the computing device 500, for example by causing the processor cores 518 to load and/or execute one or more machine-readable instruction sets 514. In embodiments, at least some of the one or more machine-readable instruction sets 514 cause at least a portion of the processor cores 518 to provide, create, produce, transition, and/or function as a dedicated, specific, and particular machine, for example a  word processing machine, a digital image acquisition machine, a media playing machine, a gaming system, a communications device, a smartphone, or similar.
The computing device 500 may include at least one wireless input/output (I/O) interface 520. The at least one wireless I/O interface 520 may be communicably coupled to one or more physical output devices 522 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) . The at least one wireless I/O interface 520 may communicably couple to one or more physical input devices 524 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) . The at least one wireless I/O interface 520 may include any currently available or future developed wireless I/O interface. Example wireless I/O interfaces include, but are not limited to: 
Figure PCTCN2020108584-appb-000001
near field communication (NFC) , and similar.
The computing device 500 may include one or more wired input/output (I/O) interfaces 530. The at least one wired I/O interface 530 may be communicably coupled to one or more physical output devices 522 (tactile devices, video displays, audio output devices, hardcopy output devices, etc. ) . The at least one wired I/O interface 530 may be communicably coupled to one or more physical input devices 524 (pointing devices, touchscreens, keyboards, tactile devices, etc. ) . The wired I/O interface 530 may include any currently available or future developed I/O interface. Example wired I/O interfaces include, but are not limited to: universal serial bus (USB) , IEEE 1394 ( “FireWire” ) , and similar.
The computing device 500 may include one or more communicably coupled, non-transitory, data storage devices 560. The data storage devices 560 may include one or more hard disk drives (HDDs) and/or one or more solid-state storage devices (SSDs) . The one or more data storage devices 560 may include any current or future developed storage appliances, network storage devices, and/or systems. Non-limiting examples of such data storage devices 560 may include, but are not limited to, any current or future developed non-transitory storage appliances or devices, such as one or more magnetic storage devices, one or more optical storage devices, one or more electro-resistive storage devices, one or more molecular storage devices, one or more quantum storage devices, or various combinations thereof. In some implementations, the one or more data storage devices 560 may include one or more removable storage devices,  such as one or more flash drives, flash memories, flash storage units, or similar appliances or devices capable of communicable coupling to and decoupling from the computing device 500.
The one or more data storage devices 560 may include interfaces or controllers (not shown) communicatively coupling the respective storage device or system to the bus 516. The one or more data storage devices 560 may store, retain, or otherwise contain machine-readable instruction sets, data structures, program modules, data stores, databases, logical structures, and/or other data useful to the processor cores 518 and/or graphics processor circuitry 512 and/or one or more applications executed on or by the processor cores 518 and/or graphics processor circuitry 512. In some instances, one or more data storage devices 560 may be communicably coupled to the processor cores 518, for example via the bus 516 or via one or more wired communications interfaces 530 (e.g., Universal Serial Bus or USB) ; one or more wireless communications interfaces 520 (e.g., 
Figure PCTCN2020108584-appb-000002
Near Field Communication or NFC) ; and/or one or more network interfaces 570 (IEEE 802.3 or Ethernet, IEEE 802.11, or
Figure PCTCN2020108584-appb-000003
etc. ) .
Processor-readable instruction sets 514 and other programs, applications, logic sets, and/or modules may be stored in whole or in part in the system memory 540. Such instruction sets 514 may be transferred, in whole or in part, from the one or more data storage devices 560. The instruction sets 514 may be loaded, stored, or otherwise retained in system memory 540, in whole or in part, during execution by the processor cores 518 and/or graphics processor circuitry 512.
The computing device 500 may include power management circuitry 550 that controls one or more operational aspects of the energy storage device 552. In embodiments, the energy storage device 552 may include one or more primary (i.e., non-rechargeable) or secondary (i.e., rechargeable) batteries or similar energy storage devices. In embodiments, the energy storage device 552 may include one or more supercapacitors or ultracapacitors. In embodiments, the power management circuitry 550 may alter, adjust, or control the flow of energy from an external power source 554 to the energy storage device 552 and/or to the computing device 500. The power source 554 may include, but is not limited to, a solar power system, a commercial electric grid, a portable generator, an external energy storage device, or any combination thereof.
For convenience, the processor cores 518, the graphics processor circuitry 512, the wireless I/O interface 520, the wired I/O interface 530, the storage device 560, and the network interface 570 are illustrated as communicatively coupled to each other via the bus 516, thereby providing connectivity between the above-described components. In alternative embodiments, the above-described components may be communicatively coupled in a different manner than illustrated in FIG. 5. For example, one or more of the above-described components may be directly coupled to other components, or may be coupled to each other, via one or more intermediary components (not shown) . In another example, one or more of the above-described components may be integrated into the processor cores 518 and/or the graphics processor circuitry 512. In some embodiments, all or a portion of the bus 516 may be omitted and the components are coupled directly to each other using suitable wired or wireless connections.
The following examples pertain to further embodiments. Example 1 is a system to facilitate offloading processor memory training to an on-die controller module. The system of Example 1 comprises a processing system comprising a complex programmable logic device (CPLD) ; one or more memory modules; and a central processing unit (CPU) die communicably couple to the CPLD and to at least one of the one or more memory modules. In Example 1, the CPU die comprising: at least one core; an integrated memory controller (IMC) responsive to the at least one core; and an on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of the one or more memory modules independently.
In Example 2, the subject matter of Example 1 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) . In Example 3, the subject matter of any one of Examples 1-2 can optionally include wherein the S3M is an ARC microcontroller. In Example 4, the subject matter of any one of Examples 1-3 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code. In Example 5, the subject matter of any one of Examples 1-4 can optionally include wherein the on-die controller module is to fetch the memory training code from the NVRAM of the CPLD.
In Example 6, the subject matter of any one of Examples 1-5 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the CPU die. In Example 7, the subject matter of any one of Examples 1-6 can optionally include wherein the at least one memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die controller module. In Example 8, the subject matter of any one of Examples 1-7 can optionally include wherein the on-die controller module reads SPD data and memory configuration settings from the DIMM via the SMBUS interface. In Example 9, the subject matter of any one of Examples 1-8 can optionally include wherein changes to memory settings in BIOS cause BIOS code to send updating memory setting data to a setup data of the CPLD via a mailbox interface, and wherein the memory training code in the CPLD to read the setup data for use in training the one or more memory modules.
Example 10 is a method for facilitating offloading processor memory training to an on-die controller module. The method of Example 10 can optional include fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; executing, by the on-die controller module, the memory training code at the processor die; reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
In Example 11, the subject matter of Example 10 can optionally include wherein the memory training code is memory reference code (MRC) . In Example 12, the subject matter of any one of Examples 10-11 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) . In Example 13, the subject matter of any one of Examples 10-12 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
In Example 14, the subject matter of any one of Examples 10-13 can optionally include wherein the memory training code and a BIOS initialization execute in  parallel on the package of the processor die. In Example 15, the subject matter of any one of Examples 10-14 can optionally include wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
Example 16 is a non-transitory computer-readable storage medium for facilitating offloading processor memory training to an on-die controller module. The at non-transitory computer-readable storage medium of Example 16 comprises executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; executing, by the on-die controller module, the memory training code at the processor die; reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
In Example 17, the subject matter of Example 16 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) . In Example 18, the subject matter of any one of Examples 16-17 can optionally include wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
In Example 19, the subject matter of any one of Examples 16-18 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the package of the processor die. In Example 20, the subject matter of any one of Examples 16-19 can optionally include wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
Example 21 is an apparatus to facilitate offloading processor memory training to an on-die controller module. The apparatus of Example 21 comprises a central processing unit (CPU) die comprising: at least one core; an integrated memory controller (IMC) responsive to the at least one core; and an on-die controller module to  execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of one or more memory modules independently.
In Example 22, the subject matter of Example 21 can optionally include wherein the on-die controller module comprises a secure startup services module (S3M) . In Example 23, the subject matter of any one of Examples 21-22 can optionally include wherein the S3M is an ARC microcontroller. In Example 24, the subject matter of any one of Examples 21-23 can optionally include wherein the ODCM obtains the memory training code from a complex programmable logic device (CPLD) that comprises non-volatile random access memory (NVRAM) to store the memory training code. In Example 25, the subject matter of any one of Examples 21-24 can optionally include wherein the on-die controller module is to fetch the memory training code from the NVRAM of the CPLD.
In Example 26, the subject matter of any one of Examples 21-25 can optionally include wherein the memory training code and a BIOS initialization execute in parallel on the CPU die. In Example 27, the subject matter of any one of Examples 21-26 can optionally include wherein the at least one memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die controller module. In Example 28, the subject matter of any one of Examples 21-27 can optionally include wherein the on-die controller module reads SPD data and memory configuration settings from the DIMM via the SMBUS interface. In Example 29, the subject matter of any one of Examples 21-28 can optionally include wherein changes to memory settings in BIOS cause BIOS code to send updating memory setting data to a setup data of the CPLD via a mailbox interface, and wherein the memory training code in the CPLD to read the setup data for use in training the one or more memory modules.
Example 30 is an apparatus for facilitating offloading processor memory training to an on-die controller module according to implementations of the disclosure. The apparatus of Example 30 can comprise means for fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package; means for executing, by the on-die controller module, the memory training code at the processor  die; means for reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and means for initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
In Example 31, the subject matter of Example 30 can optionally include the apparatus further configured to perform the method of any one of the Examples 11 to 15.
Example 32 is at least one machine readable medium comprising a plurality of instructions that in response to being executed on a computing device, cause the computing device to carry out a method according to any one of Examples 10-15. Example 33 is an apparatus for facilitating offloading processor memory training to an on-die controller module, configured to perform the method of any one of Examples 10-15. Example 34 is an apparatus for facilitating offloading processor memory training to an on-die controller module comprising means for performing the method of any one of claims 10 to 15. Specifics in the Examples may be used anywhere in one or more embodiments.
In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the described embodiments. It will be apparent, however, to one skilled in the art that embodiments may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form. There may be intermediate structure between illustrated components. The components described or illustrated herein may have additional inputs or outputs that are not illustrated or described.
Various embodiments may include various processes. These processes may be performed by hardware components or may be embodied in computer program or machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.
Portions of various embodiments may be provided as a computer program product, which may include a computer-readable medium (e.g., non-transitory  computer-readable storage medium) having stored thereon computer program instructions, which may be used to program a computer (or other electronic devices) for execution by one or more processors to perform a process according to certain embodiments. The computer-readable medium may include, but is not limited to, magnetic disks, optical disks, read-only memory (ROM) , random access memory (RAM) , erasable programmable read-only memory (EPROM) , electrically-erasable programmable read-only memory (EEPROM) , magnetic or optical cards, flash memory, or other type of computer-readable medium suitable for storing electronic instructions. Moreover, embodiments may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer.
Many of the methods are described in their basic form, but processes can be added to or deleted from any of the methods and information can be added or subtracted from any of the described messages without departing from the basic scope of the present embodiments. It will be apparent to those skilled in the art that many further modifications and adaptations can be made. The particular embodiments are not provided to limit the concept but to illustrate it. The scope of the embodiments is not to be determined by the specific examples provided above but only by the claims below.
If it is said that an element “A” is coupled to or with element “B, ” element A may be directly coupled to element B or be indirectly coupled through, for example, element C. When the specification or claims state that a component, feature, structure, process, or characteristic A “causes” a component, feature, structure, process, or characteristic B, it means that “A” is at least a partial cause of “B” but that there may also be at least one other component, feature, structure, process, or characteristic that assists in causing “B. ” If the specification indicates that a component, feature, structure, process, or characteristic “may” , “might” , or “could” be included, that particular component, feature, structure, process, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, this does not mean there is only one of the described elements.
An embodiment is an implementation or example. Reference in the specification to “an embodiment, ” “one embodiment, ” “some embodiments, ” or “other embodiments” means that a particular feature, structure, or characteristic described in  connection with the embodiments is included in at least some embodiments. The various appearances of “an embodiment, ” “one embodiment, ” or “some embodiments” are not all referring to the same embodiments. It should be appreciated that in the foregoing description of exemplary embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various novel aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed embodiments utilize more features than are expressly recited in each claim. Rather, as the following claims reflect, novel aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims are hereby expressly incorporated into this description, with each claim standing on its own as a separate embodiment.

Claims (20)

  1. A processing system comprising:
    a complex programmable logic device (CPLD) ;
    one or more memory modules; and
    a central processing unit (CPU) die communicably couple to the CPLD and to at least one of the one or more memory modules, the CPU die comprising:
    at least one core;
    an integrated memory controller (IMC) responsive to the at least one core; and
    an on-die controller module to execute memory training code to initialize the integrated memory controller (IMC) to train at least one memory module of the one or more memory modules independently.
  2. The processing system of claim 1, wherein the on-die controller module comprises a secure startup services module (S3M) .
  3. The processing system of claim 2, wherein the S3M is an ARC microcontroller.
  4. The processing system of claim 1, wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  5. The processing system of claim 4, wherein the on-die controller module is to fetch the memory training code from the NVRAM of the CPLD.
  6. The processing system of claim 1, wherein the memory training code and a BIOS initialization execute in parallel on the CPU die.
  7. The processing system of claim 1, wherein the at least one memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die controller module.
  8. The processing system of claim 7, wherein the on-die controller module reads SPD data and memory configuration settings from the DIMM via the SMBUS interface.
  9. The processing system of claim 1, wherein changes to memory settings in BIOS cause BIOS code to send updating memory setting data to a setup data of the CPLD via a mailbox interface, and wherein the memory training code in the CPLD to read the setup data for use in training the one or more memory modules.
  10. A method comprising:
    fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package;
    executing, by the on-die controller module, the memory training code at the processor die;
    reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and
    initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
  11. The method of claim 10, wherein the memory training code is memory reference code (MRC) .
  12. The method of claim 10, wherein the on-die controller module comprises a secure startup services module (S3M) .
  13. The method of claim 10, wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  14. The method of claim 10, wherein the memory training code and a BIOS initialization execute in parallel on the package of the processor die.
  15. The method of claim 10, wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
  16. A non-transitory computer-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
    fetching, by an on-die controller module of a processor die in a package of processor dies, memory training code from a shared complex programmable logic device (CPLD) of the package;
    executing, by the on-die controller module, the memory training code at the processor die;
    reading, by the on-die controller module, data and configuration settings from a memory module corresponding the processor die; and
    initializing, by the on-die controller module via the executing memory training code, an integrated memory controller (IMC) of the processor die to train the memory module.
  17. The non-transitory computer-readable storage medium of claim 16, wherein the on-die controller module comprises a secure startup services module (S3M) .
  18. The non-transitory computer-readable storage medium of claim 16, wherein the CPLD comprises non-volatile random access memory (NVRAM) to store the memory training code.
  19. The non-transitory computer-readable storage medium of claim 16, wherein the memory training code and a BIOS initialization execute in parallel on the package of the processor die.
  20. The non-transitory computer-readable storage medium of claim 16, wherein the memory module is a dual in-line memory module (DIMM) comprising a serial presence detect (SPD) that connects to a system management bus (SMBUS) interface of the on-die module.
PCT/CN2020/108584 2020-08-12 2020-08-12 Offloading processor memory training to on-die controller module WO2022032508A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/108584 WO2022032508A1 (en) 2020-08-12 2020-08-12 Offloading processor memory training to on-die controller module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/108584 WO2022032508A1 (en) 2020-08-12 2020-08-12 Offloading processor memory training to on-die controller module

Publications (1)

Publication Number Publication Date
WO2022032508A1 true WO2022032508A1 (en) 2022-02-17

Family

ID=80247577

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108584 WO2022032508A1 (en) 2020-08-12 2020-08-12 Offloading processor memory training to on-die controller module

Country Status (1)

Country Link
WO (1) WO2022032508A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804247A (en) * 2017-05-03 2018-11-13 大唐移动通信设备有限公司 A kind of the startup judgment method and device of processor
US20200065266A1 (en) * 2017-09-28 2020-02-27 Intel Corporation Memory bus mr register programming process
CN111095228A (en) * 2017-09-29 2020-05-01 英特尔公司 First boot with one memory channel
CN111221582A (en) * 2020-01-02 2020-06-02 深圳中电长城信息安全系统有限公司 Memory training method and system
CN111459557A (en) * 2020-03-12 2020-07-28 烽火通信科技股份有限公司 Method and system for shortening starting time of server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804247A (en) * 2017-05-03 2018-11-13 大唐移动通信设备有限公司 A kind of the startup judgment method and device of processor
US20200065266A1 (en) * 2017-09-28 2020-02-27 Intel Corporation Memory bus mr register programming process
CN111095228A (en) * 2017-09-29 2020-05-01 英特尔公司 First boot with one memory channel
CN111221582A (en) * 2020-01-02 2020-06-02 深圳中电长城信息安全系统有限公司 Memory training method and system
CN111459557A (en) * 2020-03-12 2020-07-28 烽火通信科技股份有限公司 Method and system for shortening starting time of server

Similar Documents

Publication Publication Date Title
US9098305B2 (en) Computer system and bootup and shutdown method thereof
US20180330095A1 (en) Collated multi-image check in system-on-chips
US20180144136A1 (en) Secure system memory training
US9058257B2 (en) Persistent block storage attached to memory bus
US11176986B2 (en) Memory context restore, reduction of boot time of a system on a chip by reducing double data rate memory training
US11816220B2 (en) Phased boot process to dynamically initialize devices in a verified environment
US20230385070A1 (en) Boot process for early display initialization and visualization
US11893379B2 (en) Interface and warm reset path for memory device firmware upgrades
US20140025930A1 (en) Multi-core processor sharing li cache and method of operating same
KR20130105663A (en) Fine grained power management in virtualized mobile platforms
EP4002175A1 (en) Seamless smm global driver update base on smm root-of-trust
US20180293012A1 (en) System and Method for Cost and Power Optimized Heterogeneous Dual-Channel DDR DIMMs
US20090327681A1 (en) Self test initialization
KR20140083530A (en) System on chip including boot shell debugging hardware and driving method thereof
US20210349731A1 (en) Booting and using a single cpu socket as a multi-cpu partitioned platform
EP3884386A1 (en) Programming and controlling compute units in an integrated circuit
US20100017588A1 (en) System, method, and computer program product for providing an extended capability to a system
CN114296750A (en) Firmware boot task distribution for low latency boot performance
US20080148037A1 (en) Efficient platform initialization
WO2022032508A1 (en) Offloading processor memory training to on-die controller module
WO2022036536A1 (en) Improving memory training performance by utilizing compute express link (cxl) device-supported memory
US20230176735A1 (en) Accelerating system boot times via host-managed device memory
US20230041115A1 (en) Implementing external memory training at runtime
WO2022056779A1 (en) Improving system memory access performance using high performance memory
US20230289303A1 (en) Improving remote traffic performance on cluster-aware processors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20949007

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20949007

Country of ref document: EP

Kind code of ref document: A1