CN116991544B - Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client - Google Patents

Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client Download PDF

Info

Publication number
CN116991544B
CN116991544B CN202311252710.9A CN202311252710A CN116991544B CN 116991544 B CN116991544 B CN 116991544B CN 202311252710 A CN202311252710 A CN 202311252710A CN 116991544 B CN116991544 B CN 116991544B
Authority
CN
China
Prior art keywords
cxl
virtual
client
storage medium
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311252710.9A
Other languages
Chinese (zh)
Other versions
CN116991544A (en
Inventor
刘俊
岳龙
王彦伟
李霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202311252710.9A priority Critical patent/CN116991544B/en
Publication of CN116991544A publication Critical patent/CN116991544A/en
Application granted granted Critical
Publication of CN116991544B publication Critical patent/CN116991544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45541Bare-metal, i.e. hypervisor runs directly on hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the application provides a simulation method, a simulation device, an electronic device and a client of CXL equipment, wherein the method is applied to the client of a host machine, the client comprises at least one first NUMA node, the first NUMA node comprises a processor and an operating system running on the processor, and the method comprises the following steps: compiling the kernel of the operating system to enable the compiled kernel code to support a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol; a root file system for making a release of the operating system; the kernel image file and the root file system are loaded to establish a virtual CXL device in the client. Through this application, the problem of the difficult analysis CXL device access performance has been solved.

Description

Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client
Technical Field
Embodiments of the present application relate to the field of computers, and in particular, to a method, an apparatus, a computer-readable storage medium, an electronic device, and a client for simulating a CXL device.
Background
The rapid development of data-intensive technology has driven an increasing demand for new architectural solutions with scalable, combinable, and consistent computing environments. CXL (Compute Express Link, computing fast link) is an open standard interconnect protocol that overcomes the limitations of current architectures by effectively expanding memory capacity and bandwidth, providing support for the development of various applications.
Various interconnect technologies, such as CCIX (Cache Coherent Interconnect for Accelerators, accelerator cache coherence interconnect), gen-Z, NVLink, etc., are currently also presented, however, other interconnect technologies have various limitations, such as the NVLink technology is biased closed, few vendors are supported, and the CCIX uses symmetrical protocols, so that the adaptation cost of the device is increased, and therefore, more and more vendors choose to support CXL as the current and future efficient interconnect technology.
After the first generation CXL1.1 protocol specification was released in 2019, the protocol specification CXL3.0 has been released so far, and then since the host chip and the device supporting the CXL need to implement hardware-level adaptation, the hardware adaptation cost is relatively high, and support of the device and the CPU chip is required, most vendors do not even have access to the hardware IP of the CXL at present, so it is very difficult to learn about the characteristics of the CXL, and even do some applications thereon.
Disclosure of Invention
The embodiment of the application provides a simulation method, a simulation device, a computer-readable storage medium, an electronic device and a client of CXL equipment, which at least solve the problem that the access performance of the CXL equipment is difficult to analyze in the related technology.
According to one embodiment of the present application, there is provided a method of emulating a CXL device, the method being applied to a client of a host, the client including at least one first NUMA (Non Uniform Memory Access, non-coherent memory access) node including a processor and an operating system running on the processor, the method comprising: compiling the kernel of the operating system to enable the compiled kernel code to support a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol; a root file system of a release version of the operating system is manufactured; and loading the kernel image file and the root file system to establish a virtual CXL device in the client.
In an exemplary embodiment, compiling the kernel of the operating system so that the compiled kernel code supports a predetermined protocol to obtain a kernel image file includes: in response to a configuration instruction generated according to the predetermined protocol, configuring predetermined parameters for the kernel, wherein the predetermined parameters comprise the type of a storage medium of the virtual CXL device, the access mode of the storage medium and the drive type of the virtual CXL device; and compiling the kernel code configured with the preset parameters to obtain the kernel image file.
In one exemplary embodiment, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: under the condition that the type is a persistent storage medium, dividing the persistent storage medium of the virtual CXL device into a plurality of storage medium areas; and according to the access mode, configuring a name space for at least each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node, wherein the second NUMA node does not comprise the processor.
In one exemplary embodiment, the partitioning of the persistent storage medium of the virtual CXL device into a plurality of storage medium regions includes: calling cxl tools to divide the areas of the persistent storage media to obtain a plurality of storage media areas.
In one exemplary embodiment, according to the access manner, at least a namespace is configured for each storage medium area to allocate the virtual CXL device to a second NUMA node, including: under the condition that the access mode is direct memory access, configuring the naming space for each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node; under the condition that the access mode is system memory access, configuring the naming space for each storage medium area; converting the access mode of each storage medium area configured with the name space into the system memory access; and running a first node checking instruction to distribute the converted virtual CXL equipment to the second NUMA node.
In one exemplary embodiment, configuring the namespaces for each of the storage media areas includes: and calling ndctl tools to configure the namespaces for the storage medium areas.
In an exemplary embodiment, converting the access manner of each storage medium area configured with the namespace into the system memory access includes: and calling a daxctl tool to convert the access mode of each storage medium area configured with the name space into the system memory access.
In one exemplary embodiment, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: in the case where the type is a volatile storage medium, a second node view instruction is executed to allocate the virtual CXL device onto a second NUMA node that does not include the processor.
In one exemplary embodiment, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: creating a driver for the virtual CXL device in the kernel.
In one exemplary embodiment, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: a coherence protocol engine module is created in the processor for converting a protocol of cache data of the processor to the predetermined protocol.
In one exemplary embodiment, loading the kernel image file and the root file system to establish a virtual CXL device in the client comprises: loading the kernel image file and the root file system, and establishing an initial virtual CXL device in the client, wherein the initial virtual CXL device is a virtual device supporting the communication of the preset protocol, and the initial virtual CXL device comprises a storage medium; creating in the initial virtual CXL device at least one of: and the PCIe (Peripheral Component Interconnect Express, high-speed serial computer expansion bus standard) function module is used for analyzing the memory address of the host, and is used for accessing the configuration space of the processor, modifying the memory mapping of the base address register and processing message interrupt.
In one exemplary embodiment, loading the kernel image file and the root file system to establish a virtual CXL device in the client comprises: loading the kernel mirror image file and the root file system, and determining whether to generate a memory file of the virtual CXL device; in the event that a memory file for the virtual CXL device is generated, it is determined to establish the virtual CXL device in the client.
In one exemplary embodiment, the operating system includes an application program, and after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: acquiring operation demand information of the application program, wherein the operation demand information comprises one of large demand memory capacity, large demand memory bandwidth and small demand calculation time delay; determining a memory allocation strategy of the client according to the operation demand information; and operating the client according to the memory allocation strategy, and testing the access performance of the virtual CXL equipment.
In an exemplary embodiment, determining the memory allocation policy of the client according to the operation requirement information includes: and under the condition that the operation requirement information is that the required memory capacity is large, determining the memory allocation strategy as follows: preferentially using the local memory of the kernel before using the storage medium of the virtual CXL device; and under the condition that the operation requirement information is that the required memory bandwidth is large, determining the memory allocation strategy as follows: storing the same data in the storage medium and the local memory respectively; and under the condition that the operation demand information is small in time delay of the demand calculation, determining the memory allocation strategy as follows: and when the locality of the application program is larger than a preset value, using the storage medium, and when the locality is smaller than or equal to the preset value, using the local memory.
In one exemplary embodiment, testing the access performance of the virtual CXL device includes: invoking an MLC (Memory Latency Checker, memory pressure test) tool to measure throughput between the processor and a storage medium of the virtual CXL device to obtain a bandwidth performance parameter corresponding to the virtual CXL device; and calling the MLC tool to measure the access time delay between the processor and the storage medium, and obtaining the time delay performance parameter corresponding to the virtual CXL equipment.
In an exemplary embodiment, the predetermined protocol further includes a CXL caching protocol.
In one exemplary embodiment, the virtual CXL device in the client has at least one, and after loading the kernel image file and the root file system to establish the virtual CXL device in the client, the method further comprises: establishing a virtual CXL switch in the client; communication connections between the virtual CXL switch and at least one of the virtual CXL devices and at least one of the processors are established, respectively.
According to another embodiment of the present application, there is provided an apparatus for simulating a CXL device, the apparatus being applied to a client of a host, the client including at least one first NUMA node including a processor and an operating system running on the processor, the apparatus comprising: the compiling unit is used for compiling the kernel of the operating system so that the compiled kernel code supports a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol; a production unit for producing a root file system of a release version of the operating system; and the loading unit is used for loading the kernel image file and the root file system to establish a virtual CXL device in the client.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments when run.
According to a further embodiment of the present application, there is also provided an electronic device comprising a memory, in which a computer program is stored, and a processor arranged to run the computer program to perform the steps of any of the method embodiments.
According to still another embodiment of the present application, there is also provided a client including: at least one first NUMA node that includes a processor and an operating system that runs on the processor; and the virtual CXL equipment is obtained by simulating the steps of any one of the methods.
Through the method and the device, the virtual client is built, CXL equipment supporting the CXL memory protocol is simulated in the client, the CXL type3 equipment is simulated through pure software, the characteristic analysis can be carried out on the virtual CXL type3 equipment, the access performance of the equipment is known, and support is provided for subsequent analysis to know the access performance of the CXL type3 equipment.
Drawings
Fig. 1 is a hardware block diagram of a mobile terminal according to an analog method of a CXL device according to an embodiment of the present application;
FIG. 2 is a flow chart of a simulation method of a CXL device according to an embodiment of the present application;
FIG. 3 is a diagram of a correspondence between clients and hosts according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a client according to an embodiment of the present application;
FIG. 5 is another architectural diagram of a client according to an embodiment of the present application;
FIG. 6 is a schematic diagram of yet another architecture of a client according to an embodiment of the present application;
fig. 7 is a block diagram of a simulation apparatus of a CXL device according to an embodiment of the present application.
Wherein the above figures include the following reference numerals:
102. a processor; 104. a memory; 106. a transmission device; 108. and an input/output device.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and in the drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for simulating a CXL device according to an embodiment of the present application. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting on the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store computer programs, such as software programs of application software and modules, such as computer programs corresponding to the simulation method of the CXL device in the embodiments of the present application, and the processor 102 executes the computer programs stored in the memory 104, thereby performing various functional applications and data processing, i.e., implementing the method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, there is provided a simulation method of a CXL device running on an electronic device, which may be integrated in a host machine, and in which a simulation processor software is installed, the method being implemented by running the simulation processor software on the electronic device, and fig. 2 is a flowchart of a simulation method of a CXL device according to an embodiment of the present application, where the method is applied to a client of the host machine, and the client includes at least one first NUMA node, and the first NUMA node includes a processor and an operating system running on the processor, as shown in fig. 2, and the flowchart includes the following steps:
step S204, compiling the kernel of the operating system to enable the compiled kernel code to support a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol;
in particular, the CXL memory protocol, also known as the cxl.mem protocol, defines the transport interface between the processor and the memory, a protocol required for CXL type3 (type 3) devices.
Step S206, a root file system of a release version of the operating system is manufactured;
in particular, release versions of the operating system include, but are not limited to Debian, ubuntu and CentOS, among others.
Step S208, loading the kernel image file and the root file system to establish a virtual CXL device in the client.
Specifically, the virtual CXL device is built by starting the simulation processor software to load the kernel image file and the root file system.
Firstly compiling a kernel of an operating system according to a preset protocol including CXL memory protocol to obtain a kernel mirror image file; then, a root file system of the operating system is manufactured; and finally, loading the obtained kernel image file and the root file system, and simulating in a client to obtain the virtual CXL equipment. Compared with the prior art, CXL hardware equipment is in shortage, the problem that analysis and understanding are difficult to be carried out on the access performance of the CXL equipment is caused, the CXL equipment supporting the CXL memory protocol is simulated in the client through setting up a virtual client, the CXL type3 equipment is simulated through pure software, the characteristic analysis can be carried out on the virtual CXL type3 equipment, the access performance of the equipment is known, and support is provided for subsequent analysis and understanding of the access performance of the CXL type3 equipment.
The execution subject of the steps may be a server, a terminal, or the like, but is not limited thereto.
The execution order of step S204 and step S206 may be interchanged, i.e. step S206 may be executed first and then step S204 may be executed.
The operating system may be any suitable operating system, such as Linux. The first NUMA node also includes memory, the type of memory including, but not limited to, DRAM (Dynamic Random Access Memory ). In the case of multiple first NUMA nodes, interconnection is achieved between the multiple first NUMA nodes through an interconnection module such as QPI (Intel Quick Path Interconnect, fast path interconnect)/UPI (Ultra Path Interconnect, hyperPath interconnect) bus. There may be one or more processors in one of the first NUMA nodes. The processor may be a CPU (Central Processing Unit ) or other types of processors such as GPU (Graphics Processing Unit, graphics processor).
It should be noted that the CXL device is a hardware device that supports the CXL system, that is, the CXL device interacts with the CXL protocol of other devices through hardware implementation. The virtual CXL device simulates the CXL protocol channel through a simulator, and simulates the information generation, response and action of the hardware device supporting the CXL system.
Furthermore, the predetermined protocol includes a CXL cache protocol, also called a CXL cache protocol, in addition to the cxl.mem protocol. The CXL.cache protocol is the protocol required by CXL type2 devices, and by the scheme of the application, the CXL type2 devices can be obtained in a client in an analog mode besides the CXL type3 devices.
Specifically, the client for building the host can be implemented by creating a process by the simulation processor software to simulate a client-side system, including simulating various device resources such as the processor and the RAM of the client side. Under the condition that the simulation processor software is QEMU (Quick simulator), the specific flow is as follows: creating an analog chipset; creating a CPU thread to represent the CPI (Cycle Per Instruction, average number of execution cycles) execution flow of the client; allocating space in the virtual address space of the QEMU as the physical address of the client; creating corresponding virtual equipment for the client according to equipment appointed by a user in a command line; various events are monitored in the main thread, including I/O (Input/Output) access of the client to the device, user interface of the user to the client, some I/O events on the host to which the virtual device corresponds (such as receipt of client network data), etc.
As shown in fig. 3, when the client runs, the corresponding relationship between each component in the client and the host side includes:
processor (CPU) of client: a CPU of the client corresponds to a thread of the host, and through the cooperation of QEMU and KVM (Kernel Virtual Machine ), the threads are directly normally scheduled by an operating system of the host, and directly execute codes in the client;
memory of the client: the physical memory of the client corresponds to the virtual memory in the QEMU, the virtual address of the client is converted into the physical address of the host, the virtual address of the client is firstly converted into the physical address of the client, and then the physical address of the client is converted into the physical address of the host through a page table of the KVM;
client device (virtual CXL device): the equipment in the client is presented to the client through QEMU, the operating system enumerates the equipment when starting, and corresponding driving program is loaded;
interaction of client with host: the operating system of the client interacts through I/O ports or MMIO (Memory Mapped I/O), the KVM intercepts I/O requests from ports in the operating system of the client, and most of the time the KVM distributes the requests to the QEMU processes in the user space, which are processed by the QEMU.
According to some exemplary embodiments of the present application, step S204 compiles a kernel of the operating system, so that the compiled kernel code supports a predetermined protocol, and a kernel image file is obtained, which specifically includes:
step S2041: in response to a configuration instruction generated according to the predetermined protocol, configuring predetermined parameters for the kernel, wherein the predetermined parameters comprise the type of a storage medium of the virtual CXL device, the access mode of the storage medium and the drive type of the virtual CXL device;
specifically, the configuration instruction is generated by displaying a configuration interface including a type option and an access mode option of the storage medium in response to an instruction for generating the configuration interface, and an operator performs a selection operation on the configuration interface according to the predetermined protocol. The types include persistent storage media and volatile storage media, and the access modes include direct memory access and system memory access. The drive type is used to identify and use the virtual CXL device.
Step S2042: and compiling the kernel code configured with the preset parameters to obtain the kernel image file.
In the embodiment, according to the configuration instruction, the type of the storage medium, the access mode and other preset parameters are configured for the kernel, and then the compiling of the kernel code is performed, so that the obtained kernel image file is further ensured to support CXL.mem protocol, and CXL type3 equipment can be obtained through simulation after the kernel image file and the root file system are loaded.
To further implement configuration of a virtual CXL device such that the configured CXL device can more accurately reflect the characteristics of the CXL hardware device, in another exemplary embodiment, after loading the kernel image file and the root file system to create a virtual CXL device in the client, the method further comprises: under the condition that the type is a persistent storage medium, dividing the persistent storage medium of the virtual CXL device into a plurality of storage medium areas; and according to the access mode, configuring a name space for at least each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node, wherein the second NUMA node does not comprise the processor.
Specifically, the partitioning of the persistent storage medium of the virtual CXL device to obtain a plurality of storage medium regions includes: calling cxl tools to divide the areas of the persistent storage media to obtain a plurality of storage media areas. The CXL tool is a management tool of the virtual CXL device, and comprises a region division operation and the like related to a storage medium of the virtual CXL device, and by calling the tool, the memory partition of the virtual CXL device can be further ensured to be realized simply and quickly.
In an actual application process, the access manners of the storage media in the virtual CXL device are different, and the corresponding allocation manners are also different.
Under the condition that the access mode is direct memory access, configuring the naming space for each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node;
specifically, in the case that the access manner is direct memory access, after the namespaces are configured for the storage medium areas, the files of the v/dax0.0 can be seen on the corresponding graphical interfaces of the clients, which characterizes that the allocation of the virtual CXL devices is completed. A Direct Access (DAX) mechanism is a mechanism that supports user-mode software to directly Access files stored in persistent memory.
Configuring the namespaces for the storage medium areas in the case that the access mode is System memory access (also called System-RAM mode);
converting the access mode of each storage medium area configured with the name space into the system memory access;
And running a first node checking instruction to distribute the converted virtual CXL equipment to the second NUMA node.
Specifically, the first node look-up instruction may be a Numactl-H instruction in QEMU.
By the embodiment, the allocation of the virtual CXL devices with different access modes and different types to the NUMA nodes is further realized.
Specifically, configuring the namespaces for each of the storage medium areas includes: and calling ndctl tools to configure the namespaces for the storage medium areas. The ndctl tool is a management tool of an NVDIMM (Non-Volatile Dual in-Line Memory Module, nonvolatile Dual in-line memory module) and includes configuration operations on namespaces and the like. By calling the tool, the naming space configuration of the memory partition can be further ensured to be realized simply and quickly.
To further ensure that the storage medium area is converted more simply and quickly, in other embodiments, the converting the access mode of each storage medium area configured with the namespace into the system memory access includes: and calling a daxctl tool to convert the access mode of each storage medium area configured with the name space into the system memory access. The daxctl tool is a tool for managing and monitoring the DAX equipment, and is used with a kernel and an application program in the client to create, destroy and convert the DAX equipment.
In one exemplary embodiment, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: in the case where the type is a volatile storage medium, a second node view instruction is executed to allocate the virtual CXL device onto a second NUMA node that does not include the processor. In this embodiment, when the type of the storage medium is a volatile storage medium, the storage medium is directly regarded as a system memory access mode without a processor, and the second node checking instruction is executed to implement a correspondence between the virtual CXL device and the second NUMA node.
The second node checking instruction may also be a Numactl-H instruction in QEMU.
In a specific embodiment, as shown in fig. 4, in the client, two first NUMA nodes are NUMA node0 and NUMA node1, after a virtual CXL device is created in the client and configured accordingly, a second NUMA node is newly added in the system of the client, namely NUMA node2 in fig. 4, where NUMA node0 and NUMA node1 respectively have a CPU, the CPUs of node0 and node1 are interconnected and communicated through a QPI/UPI bus, and no CPU in NUMA node2 is connected to the CPUs of other nodes, but is connected to NUMA node0 through a PCIe bus.
According to still further alternative embodiments of the present application, after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises: creating a driver for the virtual CXL device in the kernel. By creating a driver for the virtual CXL device in the kernel, discovery and enumeration of the virtual CXL device in the client can be realized, and establishment of a mapping relationship of a memory address space in the client system can be realized.
Specifically, the relevant drivers of the CXL device have been integrated in the kernel code, and the compiled system supports the drivers of the CXL device as long as compiled in a particular kernel version.
To further achieve consistency between the processor and the virtual CXL device memory in the client, in yet another exemplary scenario, after loading the kernel image file and the root file system to establish the virtual CXL device in the client, the method further comprises: a coherence protocol engine module is created in the processor for converting a protocol of cache data of the processor to the predetermined protocol. By creating the coherence protocol engine module, cache protocol data on the processor side can be converted into CXL.mem protocol data, so that the coherence between the processor side and the virtual CXL device memory is maintained.
The consistency protocol engine module integrated on the CPU side in the client is used for enabling the CPU to access the virtual CXL equipment memory in a memory access mode, so that the consistency protocol engine module is responsible for collecting and processing and forwarding access requests of the CPU to the virtual CXL equipment memory. The CPU accesses the memory of the virtual CXL device to check whether the Cache (memory) has relevant data or not preferentially, if not, the CPU sends the relevant access request to the virtual CXL device through the consistency protocol engine module, the virtual CXL device returns the corresponding data to the host through the consistency protocol engine module and caches the data into the Cache, and the host sets the consistency state of the data according to the situation and maintains the data.
Specifically, loading the kernel image file and the root file system to establish a virtual CXL device in the client, comprising: loading the kernel image file and the root file system, and establishing an initial virtual CXL device in the client, wherein the initial virtual CXL device is a virtual device supporting the communication of the preset protocol, namely as shown in fig. 5, the initial virtual CXL device comprises a CXL protocol communication function, and the initial virtual CXL device further comprises a storage medium; creating in the initial virtual CXL device at least one of: and the PCIe function module is used for accessing the configuration space of the processor, modifying the memory mapping of the base address register and processing message interrupt. In the embodiment, the key characteristics of the CXL hardware device are simulated by adding the interface of the PCIe function and/or adding the storage controller function related to the storage medium in the initial virtual CXL device, so that the better simulation effect of the CXL hardware device by pure software is further ensured, the access performance result obtained by the subsequent analysis of the CXL type3 device is further ensured to be more attached to the result obtained by the analysis of the CXL hardware device, and the accuracy of the analysis result is further ensured.
In one embodiment of the present application, at least one of the following is created in the initial virtual CXL device: storage controller, PCIe function module, including: and respectively creating a storage controller and a PCIe function module in the initial virtual CXL device. In particular, the QEMU simulator can be realized by supporting the simulation of the functions of the storage controllers of different storage media and providing related interfaces. The existing QEMU simulator integrates the simulation of the basic function of PCIe protocol interface equipment, and the latest version of QEMU simulator also integrates the simulation of CXL protocol interface, so that when the QEMU simulator is started, the equipment modules only need to be configured correspondingly.
Optionally, loading the kernel image file and the root file system to establish a virtual CXL device in the client, comprising: loading the kernel mirror image file and the root file system, and determining whether to generate a memory file of the virtual CXL device; in the event that a memory file for the virtual CXL device is generated, it is determined to establish the virtual CXL device in the client. Specifically, after the QEMU is started to load the kernel image file and the root file system, the virtual CXL device is determined to be successfully simulated under the condition that a memory file similar to/dev/CXL/mem 0 appears.
In practical application, when all data are stored in the storage medium of the CXL device, the CXL device is tested when being directly loaded from the storage medium of the CXL device during computation, and the obtained partial test result indicates that the access delay of the storage medium of the CXL device is greater than that of the local memory, and the access bandwidth is smaller than that of the local memory, so that the result is normal, because, compared with the access to the local memory, the storage medium on the CXL device needs to undergo more address translation operations, and additionally requires the processing consumption of the CXL protocol, but the capacity and bandwidth of the local memory are limited due to technical reasons, and the CXL device storage medium is used as external storage connected through the PCIe bus, so that the bandwidth and capacity of the memory can be effectively expanded without increasing additional cost, and therefore, for the measurement under the memory allocation policy in which all data are stored in the storage medium of the CXL device, the measured delay and bandwidth result directly loaded from the storage medium of the CXL device during computation cannot represent the advantages brought by the CXL device.
In order to solve the technical problem, further ensure that the virtual CXL device can more objectively reflect the performance of the CXL hardware device, in the specific application process of the present application, the operating system includes an application program, and after loading the kernel image file and the root file system to build the virtual CXL device in the client, the method further includes: acquiring operation demand information of the application program, wherein the operation demand information comprises one of large demand memory capacity, large demand memory bandwidth and small demand calculation time delay; determining a memory allocation strategy of the client according to the operation demand information; and operating the client according to the memory allocation strategy, and testing the access performance of the virtual CXL equipment. In the embodiment of the application, a scheme of performing performance test after executing different memory allocation strategies for different operation requirement information is provided, the performance difference of application programs before and after a storage medium of CXL equipment is fully considered, the advantages of the CXL equipment can be better represented, and the access performance of the CXL equipment can be more objectively reflected.
Specifically, the method further comprises: an operation requirement information identifier as shown in fig. 6 is created in the client for acquiring operation requirement information of the application program.
In an exemplary embodiment, determining the memory allocation policy of the client according to the operation requirement information includes:
and under the condition that the operation requirement information is that the required memory capacity is large, determining the memory allocation strategy as follows: preferentially using the local memory of the kernel before using the storage medium of the virtual CXL device;
specifically, for applications with large Memory capacity requirements, in order to reduce performance loss caused by a larger access delay of a storage medium of the CXL device as much as possible, it is necessary to modify a control policy of an operating system Memory so that the application uses a local Memory preferentially, and when an OOM (Out of Memory) occurs in the local Memory, the storage medium of the CXL device is reused.
And under the condition that the operation requirement information is that the required memory bandwidth is large, determining the memory allocation strategy as follows: storing the same data in the storage medium and the local memory respectively;
specifically, for applications with high memory bandwidth requirements, the same data needs to be stored separately in the local memory and the storage medium of the CXL device, and when the processor accesses a certain data, the processor may issue an instruction to read the local memory and the storage medium of the CXL device at the same time, thereby improving the memory access bandwidth.
And under the condition that the operation demand information is small in time delay of the demand calculation, determining the memory allocation strategy as follows: and when the locality of the application program is larger than a preset value, using the storage medium, and when the locality is smaller than or equal to the preset value, using the local memory.
In particular, for applications with low computational latency requirements, since data on the storage medium of the CXL device may be cached by the processor side, it is desirable to have the data on the storage medium of the CXL device hit by the processor side cache as much as possible, depending on the characteristics of the application. The high locality program is placed in the memory of the CXL device, so that the high cache hit rate can be utilized to reduce the access to CXL memory data, thereby masking the performance disadvantage of CXL memory. The local memory may not be designed deliberately because the access latency is relatively low. The identification of the locality of the application program can be carried out when the program is compiled. The locality includes temporal locality and/or spatial locality.
In other embodiments, testing access performance of the virtual CXL device includes: calling an MLC tool to measure throughput between the processor and a storage medium of the virtual CXL device to obtain a bandwidth performance parameter corresponding to the virtual CXL device; and calling the MLC tool to measure the access time delay between the processor and the storage medium, and obtaining the time delay performance parameter corresponding to the virtual CXL equipment.
Specifically, assuming that the number of processors (CPUs) in the client is m, the client system will first create m-1 threads, which will be responsible for generating loads, and the remaining 1 CPU will create 1 thread dedicated to measuring latency, and this thread will traverse an array of pointers, each pointer within pointing to the next object of the array, corresponding to creating a dependency of a read operation, the average time of the read operation for this array will be denoted as latency, and the latency will be different each time depending on the load generated by the previously generated threads. Running mlc command using root rights: the latency_matrix and the bandwidth_matrix can be tested for latency and bandwidth, respectively.
In one exemplary embodiment, the virtual CXL device in the client has at least one, and after loading the kernel image file and the root file system to establish the virtual CXL device in the client, the method further comprises: establishing a virtual CXL Switch (Switch) in the client; communication connections between the virtual CXL switch and at least one of the virtual CXL devices and at least one of the processors are established, respectively. The topology structure of the CXL equipment can be expanded by simulating to obtain the virtual CXL switch, so that the characteristics and the performance of the CXL equipment can be more conveniently and subsequently analyzed and known.
Specifically, a device type of the CXL switch is newly built in a simulator software code and is responsible for forwarding and processing a memory access request of a virtual CXL device connected with the device type. When the simulator is started, the devices of the CXL switch are newly built, and the topology structures of the devices and other CXL devices are configured.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the described embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
In this embodiment, an analog device of a CXL device running on an electronic device, where the electronic device may be integrated in a host machine, and analog processor software is installed in the electronic device, where the method is implemented by running the electronic device in the analog processor software, and the device is used to implement the foregoing embodiments and preferred embodiments, which are not described herein. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 7 is a block diagram of an emulation apparatus of a CXL device according to an embodiment of the present application wherein the apparatus is applied to a client of a host machine, the client including at least a first NUMA node that includes a processor and an operating system that runs on the processor, as shown in FIG. 7, the apparatus comprising:
a compiling unit 10, configured to compile a kernel of the operating system, so that a compiled kernel code supports a predetermined protocol, and a kernel image file is obtained, where the predetermined protocol includes a CXL memory protocol;
in particular, the CXL memory protocol, also known as the cxl.mem protocol, defines the transport interface between the processor and the memory, a protocol required for CXL type 3 devices.
A creating unit 20 for creating a root file system of a release version of the operating system;
in particular, release versions of the operating system include, but are not limited to Debian, ubuntu and CentOS, among others.
And a loading unit 30, configured to load the kernel image file and the root file system, so as to establish a virtual CXL device in the client.
Specifically, the virtual CXL device is built by starting the simulation processor software to load the kernel image file and the root file system.
According to the embodiment, a compiling unit compiles a kernel of an operating system according to a preset protocol including CXL memory protocol to obtain a kernel image file; a root file system of the operating system is manufactured through a manufacturing unit; and loading the obtained kernel image file and the root file system by a loading unit, and simulating in a client to obtain the virtual CXL device. Compared with the prior art, CXL hardware equipment is in shortage, the problem that analysis and understanding are difficult to be carried out on the access performance of the CXL equipment is caused, the CXL equipment supporting the CXL memory protocol is simulated in the client through setting up a virtual client, the CXL type3 equipment is simulated through pure software, the characteristic analysis can be carried out on the virtual CXL type3 equipment, the access performance of the equipment is known, and support is provided for subsequent analysis and understanding of the access performance of the CXL type3 equipment.
The execution subject of the device may be a server, a terminal, or the like, but is not limited thereto.
The operating system may be any suitable operating system, such as Linux. The first NUMA node also includes memory, the type of memory including, but not limited to, DRAM (Dynamic Random Access Memory ). In the case of multiple first NUMA nodes, interconnection is achieved between the multiple first NUMA nodes through an interconnection module such as QPI (Intel Quick Path Interconnect, fast path interconnect)/UPI (Ultra Path Interconnect, hyperPath interconnect) bus. There may be one or more processors in one of the first NUMA nodes. The processor may be a CPU (Central Processing Unit ) or other types of processors such as GPU (Graphics Processing Unit, graphics processor).
It should be noted that the CXL device is a hardware device that supports the CXL system, that is, the CXL device interacts with the CXL protocol of other devices through hardware implementation. The virtual CXL device simulates the CXL protocol channel through a simulator, and simulates the information generation, response and action of the hardware device supporting the CXL system.
Furthermore, the predetermined protocol includes a CXL cache protocol, also called a CXL cache protocol, in addition to the cxl.mem protocol. The CXL.cache protocol is the protocol required by CXL type2 devices, and by the scheme of the application, the CXL type2 devices can be obtained in a client in an analog mode besides the CXL type3 devices.
Specifically, the client for building the host can be implemented by creating a process by the simulation processor software to simulate a client-side system, including simulating various device resources such as the processor and the RAM of the client side. Under the condition that the simulation processor software is QEMU (Quick simulator), the specific flow is as follows: creating an analog chipset; creating a CPU thread to represent the CPI (Cycle Per Instruction, average number of execution cycles) execution flow of the client; allocating space in the virtual address space of the QEMU as the physical address of the client; creating corresponding virtual equipment for the client according to equipment appointed by a user in a command line; various events are monitored in the main thread, including I/O (Input/Output) access of the client to the device, user interface of the user to the client, some I/O events on the host to which the virtual device corresponds (such as receipt of client network data), etc.
As shown in fig. 3, when the client runs, the corresponding relationship between each component in the client and the host side includes:
the processor of the client: a CPU of the client corresponds to a thread of the host, and through the cooperation of QEMU and KVM (Kernel Virtual Machine ), the threads are directly normally scheduled by an operating system of the host, and directly execute codes in the client;
memory of the client: the physical memory of the client corresponds to the virtual memory in the QEMU, the virtual address of the client is converted into the physical address of the host, the virtual address of the client is firstly converted into the physical address of the client, and then the physical address of the client is converted into the physical address of the host through a page table of the KVM;
device of client: the equipment in the client is presented to the client through QEMU, the operating system enumerates the equipment when starting, and corresponding driving program is loaded;
interaction of client with host: the operating system of the client interacts through I/O ports or MMIO (Memory Mapped I/O), the KVM intercepts I/O requests from ports in the operating system of the client, and most of the time the KVM distributes the requests to the QEMU processes in the user space, which are processed by the QEMU.
According to some exemplary embodiments of the present application, the compiling unit specifically includes:
the first configuration module is used for responding to a configuration instruction generated according to the preset protocol, and configuring preset parameters for the kernel, wherein the preset parameters comprise the type of a storage medium of the virtual CXL device, the access mode of the storage medium and the driving type of the virtual CXL device;
specifically, the configuration instruction is generated by displaying a configuration interface including a type option and an access mode option of the storage medium in response to an instruction for generating the configuration interface, and an operator performs a selection operation on the configuration interface according to the predetermined protocol. The types include persistent storage media and volatile storage media, and the access modes include direct memory access and system memory access. The drive type is used to identify and use the virtual CXL device.
And the compiling module is used for compiling the kernel codes configured with the preset parameters to obtain the kernel image file.
In the embodiment, according to the configuration instruction, the type of the storage medium, the access mode and other preset parameters are configured for the kernel, and then the compiling of the kernel code is performed, so that the obtained kernel image file is further ensured to support CXL.mem protocol, and CXL type3 equipment can be obtained through simulation after the kernel image file and the root file system are loaded.
In order to further implement configuration of the virtual CXL device such that the configured CXL device may reflect the characteristics of the CXL hardware device relatively accurately, in another exemplary embodiment, the apparatus further comprises: the partitioning unit is used for partitioning the persistent storage medium of the virtual CXL device to obtain a plurality of storage medium areas under the condition that the type is the persistent storage medium after the kernel image file and the root file system are loaded to establish the virtual CXL device in the client; and the configuration unit is used for configuring a name space for at least each storage medium area according to the access mode so as to distribute the virtual CXL equipment to a second NUMA node, wherein the second NUMA node does not comprise the processor.
Specifically, the dividing unit includes: and the first calling module is used for calling cxl tools to divide the areas of the persistent storage medium to obtain a plurality of storage medium areas. The CXL tool is a management tool of the virtual CXL device, and comprises a region division operation and the like related to a storage medium of the virtual CXL device, and by calling the tool, the memory partition of the virtual CXL device can be further ensured to be realized simply and quickly.
In the actual application process, the access modes of the storage media in the virtual CXL device are different, and the corresponding allocation modes are also different.
The second configuration module is configured to configure the namespaces for the storage medium areas when the access mode is direct memory access, so as to allocate the virtual CXL device to a second NUMA node;
specifically, in the case that the access manner is direct memory access, after the namespaces are configured for the storage medium areas, the files of the v/dax0.0 can be seen on the corresponding graphical interfaces of the clients, which characterizes that the allocation of the virtual CXL devices is completed. The direct memory access mechanism is a mechanism for supporting user mode software to directly access files stored in the persistent memory.
The third configuration module is used for configuring the name space for each storage medium area under the condition that the access mode is system memory access;
the conversion module is used for converting the access mode of each storage medium area configured with the name space into the system memory access;
and the operation module is used for operating a first node checking instruction so as to distribute the converted virtual CXL equipment to the second NUMA node.
Specifically, the first node look-up instruction may be a Numactl-H instruction in QEMU.
By the embodiment, the allocation of the virtual CXL devices with different access modes and different types to the NUMA nodes is further realized.
Specifically, the second configuration module includes: and the first calling sub-module is used for calling the ndctl tool to configure the namespaces for the storage medium areas. The ndctl tool is a management tool of the NVDIMM, and comprises configuration operation on a naming space and the like. By calling the tool, the naming space configuration of the memory partition can be further ensured to be realized simply and quickly.
To further ensure that the storage medium area is converted more simply and quickly, in other embodiments, the conversion module includes: and the second calling sub-module is used for calling the daxctl tool to convert the access mode of each storage medium area configured with the name space into the system memory access. The daxctl tool is a tool for managing and monitoring the DAX equipment, and is used with a kernel and an application program in the client to create, destroy and convert the DAX equipment.
In an exemplary embodiment, the apparatus further comprises: and the first running unit is used for running a second node checking instruction to distribute the virtual CXL device to a second NUMA node after loading the kernel image file and the root file system to establish the virtual CXL device in the client, and the second NUMA node does not comprise the processor in the case that the type is a volatile storage medium. In this embodiment, when the type of the storage medium is a volatile storage medium, the storage medium is directly regarded as a system memory access mode without a processor, and the second node checking instruction is executed to implement a correspondence between the virtual CXL device and the second NUMA node.
The second node checking instruction may also be a Numactl-H instruction in QEMU.
In a specific embodiment, as shown in fig. 4, in the client, two first NUMA nodes are NUMA node0 and NUMA node1, after a virtual CXL device is created in the client and configured accordingly, a second NUMA node is newly added in the system of the client, namely NUMA node2 in fig. 4, where NUMA node0 and NUMA node1 respectively have a CPU, the CPUs of node0 and node1 are interconnected and communicated through a QPI/UPI bus, and no CPU in NUMA node2 is connected to the CPUs of other nodes, but is connected to NUMA node0 through a PCIe bus.
According to still further alternative embodiments of the present application, the apparatus further comprises: and the first creating unit is used for creating a driver of the virtual CXL device in the kernel after loading the kernel image file and the root file system to establish the virtual CXL device in the client. By creating a driver for the virtual CXL device in the kernel, discovery and enumeration of the virtual CXL device in the client can be realized, and establishment of a mapping relationship of a memory address space in the client system can be realized.
Specifically, the relevant drivers of the CXL device have been integrated in the kernel code, and the compiled system supports the drivers of the CXL device as long as compiled in a particular kernel version.
To further achieve consistency between the processor and the virtual CXL device memory in the client, in yet another exemplary arrangement, the apparatus further comprises: and the second creation unit is used for creating a consistency protocol engine module for converting the protocol of the cache data of the processor into the preset protocol in the processor after loading the kernel image file and the root file system to establish the virtual CXL device in the client. By creating the coherence protocol engine module, cache protocol data on the processor side can be converted into CXL.mem protocol data, so that the coherence between the processor side and the virtual CXL device memory is maintained.
The consistency protocol engine module integrated on the CPU side in the client is used for enabling the CPU to access the virtual CXL equipment memory in a memory access mode, so that the consistency protocol engine module is responsible for collecting and processing and forwarding access requests of the CPU to the virtual CXL equipment memory. The CPU accesses the memory of the virtual CXL device to check whether the Cache (memory) has relevant data or not preferentially, if not, the CPU sends the relevant access request to the virtual CXL device through the consistency protocol engine module, the virtual CXL device returns the corresponding data to the host through the consistency protocol engine module and caches the data into the Cache, and the host sets the consistency state of the data according to the situation and maintains the data.
Specifically, the second creation unit includes: the first loading module is configured to load the kernel image file and the root file system, and establish an initial virtual CXL device in the client, where the initial virtual CXL device is a virtual device supporting the predetermined protocol communication, that is, as shown in fig. 5, the initial virtual CXL device includes a CXL protocol communication function, and the initial virtual CXL device further includes a storage medium; a creation module for creating at least one of the following in the initial virtual CXL device: and the PCIe function module is used for accessing the configuration space of the processor, modifying the memory mapping of the base address register and processing message interrupt. In the embodiment, the key characteristics of the CXL hardware device are simulated by adding the interface of the PCIe function and/or adding the storage controller function related to the storage medium in the initial virtual CXL device, so that the better simulation effect of the CXL hardware device by pure software is further ensured, the access performance result obtained by the subsequent analysis of the CXL type3 device is further ensured to be more attached to the result obtained by the analysis of the CXL hardware device, and the accuracy of the analysis result is further ensured.
In one embodiment of the present application, the creating module includes: and the creation submodule is used for respectively creating a storage controller and a PCIe function module in the initial virtual CXL device. In particular, the QEMU simulator can be realized by supporting the simulation of the functions of the storage controllers of different storage media and providing related interfaces. The existing QEMU simulator integrates the simulation of the basic function of PCIe protocol interface equipment, and the latest version of QEMU simulator also integrates the simulation of CXL protocol interface, so that when the QEMU simulator is started, the equipment modules only need to be configured correspondingly.
Optionally, the loading unit includes: the second recording module is used for loading the kernel mirror image file and the root file system and determining whether to generate the memory file of the virtual CXL device; and the first determining module is used for determining to establish the virtual CXL equipment in the client under the condition of generating the memory file of the virtual CXL equipment. Specifically, after the QEMU is started to load the kernel image file and the root file system, the virtual CXL device is determined to be successfully simulated under the condition that a memory file similar to/dev/CXL/mem 0 appears.
In practical application, when all data are stored in the storage medium of the CXL device, the CXL device is tested when being directly loaded from the storage medium of the CXL device during computation, and the obtained partial test result indicates that the access delay of the storage medium of the CXL device is greater than that of the local memory, and the access bandwidth is smaller than that of the local memory, so that the result is normal, because, compared with the access to the local memory, the storage medium on the CXL device needs to undergo more address translation operations, and additionally requires the processing consumption of the CXL protocol, but the capacity and bandwidth of the local memory are limited due to technical reasons, and the CXL device storage medium is used as external storage connected through the PCIe bus, so that the bandwidth and capacity of the memory can be effectively expanded without increasing additional cost, and therefore, for the measurement under the memory allocation policy in which all data are stored in the storage medium of the CXL device, the measured delay and bandwidth result directly loaded from the storage medium of the CXL device during computation cannot represent the advantages brought by the CXL device.
In order to solve the technical problem, further guarantee that the virtual CXL device can reflect the performance of CXL hardware device relatively objectively, in the specific application process of this application, operating system includes the application program, the device still includes: the obtaining unit is used for obtaining the operation requirement information of the application program after loading the kernel image file and the root file system to establish the virtual CXL equipment in the client, wherein the operation requirement information comprises one of large required memory capacity, large required memory bandwidth and small required calculation time delay; the determining unit is used for determining the memory allocation strategy of the client according to the operation demand information; and the second operation unit is used for operating the client according to the memory allocation strategy and testing the access performance of the virtual CXL equipment. In the embodiment of the application, a scheme of performing performance test after executing different memory allocation strategies for different operation requirement information is provided, the performance difference of application programs before and after a storage medium of CXL equipment is fully considered, the advantages of the CXL equipment can be better represented, and the access performance of the CXL equipment can be more objectively reflected.
Specifically, the device further comprises: a third creating unit for creating an operation requirement information identifier as shown in fig. 6 in the client for acquiring the operation requirement information of the application program.
In an exemplary embodiment, the determining unit includes:
the second determining module is configured to determine, when the operation requirement information is that the required memory capacity is large, that the memory allocation policy is: preferentially using the local memory of the kernel before using the storage medium of the virtual CXL device;
specifically, for applications with large Memory capacity requirements, in order to reduce performance loss caused by a larger access delay of a storage medium of the CXL device as much as possible, it is necessary to modify a control policy of an operating system Memory so that the application uses a local Memory preferentially, and when an OOM (Out of Memory) occurs in the local Memory, the storage medium of the CXL device is reused.
A third determining module, configured to determine, when the operation requirement information is that the required memory bandwidth is large, that the memory allocation policy is: storing the same data in the storage medium and the local memory respectively;
Specifically, for applications with high memory bandwidth requirements, the same data needs to be stored separately in the local memory and the storage medium of the CXL device, and when the processor accesses a certain data, the processor may issue an instruction to read the local memory and the storage medium of the CXL device at the same time, thereby improving the memory access bandwidth.
A fourth determining module, configured to determine, when the operation requirement information is that the requirement calculation delay is small, that the memory allocation policy is: and when the locality of the application program is larger than a preset value, using the storage medium, and when the locality is smaller than or equal to the preset value, using the local memory.
In particular, for applications with low computational latency requirements, since data on the storage medium of the CXL device may be cached by the processor side, it is desirable to have the data on the storage medium of the CXL device hit by the processor side cache as much as possible, depending on the characteristics of the application. The high locality program is placed in the memory of the CXL device, so that the high cache hit rate can be utilized to reduce the access to CXL memory data, thereby masking the performance disadvantage of CXL memory. The local memory may not be designed deliberately because the access latency is relatively low. The identification of the locality of the application program can be carried out when the program is compiled. The locality includes temporal locality and/or spatial locality.
In other embodiments, the second operation unit includes: the second calling module is used for calling an MLC tool to measure the throughput between the processor and the storage medium of the virtual CXL device and obtain the bandwidth performance parameter corresponding to the virtual CXL device; and the third calling module is used for calling the MLC tool to measure the access time delay between the processor and the storage medium and obtain the time delay performance parameter corresponding to the virtual CXL equipment.
Specifically, assuming that the number of processors (CPUs) in the client is m, the client system will first create m-1 threads, which will be responsible for generating loads, and the remaining 1 CPU will create 1 thread dedicated to measuring latency, and this thread will traverse an array of pointers, each pointer within pointing to the next object of the array, corresponding to creating a dependency of a read operation, the average time of the read operation for this array will be denoted as latency, and the latency will be different each time depending on the load generated by the previously generated threads. Running mlc command using root rights: the latency_matrix and the bandwidth_matrix can be tested for latency and bandwidth, respectively.
In an exemplary embodiment, at least one of the virtual CXL devices in the client machine, the apparatus further comprising: a first establishing unit configured to establish a virtual CXL switch in the client after loading the kernel image file and the root file system to establish a virtual CXL device in the client; and the second establishing unit is used for respectively establishing communication connection between the virtual CXL switch and at least one virtual CXL device and at least one processor. The topology structure of the CXL equipment can be expanded by simulating to obtain the virtual CXL switch, so that the characteristics and the performance of the CXL equipment can be more conveniently and subsequently analyzed and known.
Specifically, a device type of the CXL switch is newly built in a simulator software code and is responsible for forwarding and processing a memory access request of a virtual CXL device connected with the device type. When the simulator is started, the devices of the CXL switch are newly built, and the topology structures of the devices and other CXL devices are configured.
It should be noted that the respective modules may be implemented by software or hardware, and for the latter, may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the modules may be located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments when run.
In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Embodiments of the present application also provide an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor and an input/output device connected to the processor.
Embodiments of the present application also provide a client, comprising: at least one first NUMA node that includes a processor and an operating system that runs on the processor; and the virtual CXL equipment is obtained by simulating the steps of any one of the methods.
Specific examples in this embodiment may refer to examples described in the embodiments and the exemplary implementation manners, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code that is executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principles of the present application should be included in the protection scope of the present application.

Claims (20)

1. A method of modeling a CXL device, the method applied to a client of a host, the client comprising at least one first NUMA node comprising a processor and an operating system running on the processor, the method comprising:
compiling the kernel of the operating system to enable the compiled kernel code to support a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol;
a root file system of a release version of the operating system is manufactured;
loading the kernel image file and the root file system to establish a virtual CXL device in the client,
compiling the kernel of the operating system to enable the compiled kernel code to support a preset protocol to obtain a kernel image file, wherein the compiling comprises the following steps:
in response to a configuration instruction generated according to the predetermined protocol, configuring predetermined parameters for the kernel, wherein the predetermined parameters comprise the type of a storage medium of the virtual CXL device, the access mode of the storage medium and the drive type of the virtual CXL device;
and compiling the kernel code configured with the preset parameters to obtain the kernel image file.
2. The method of claim 1, wherein after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises:
under the condition that the type is a persistent storage medium, dividing the persistent storage medium of the virtual CXL device into a plurality of storage medium areas;
and according to the access mode, configuring a name space for at least each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node, wherein the second NUMA node does not comprise the processor.
3. The method of claim 2, wherein the partitioning of the persistent storage medium of the virtual CXL device into a plurality of storage medium regions comprises:
calling cxl tools to divide the areas of the persistent storage media to obtain a plurality of storage media areas.
4. The method of claim 2, wherein configuring a namespace for at least each of the storage medium regions to allocate the virtual CXL device to a second NUMA node based on the access manner comprises:
Under the condition that the access mode is direct memory access, configuring the naming space for each storage medium area so as to distribute the virtual CXL equipment to a second NUMA node;
under the condition that the access mode is system memory access, configuring the naming space for each storage medium area;
converting the access mode of each storage medium area configured with the name space into the system memory access;
and running a first node checking instruction to distribute the converted virtual CXL equipment to the second NUMA node.
5. The method of claim 4, wherein configuring the namespaces for each of the storage medium areas comprises:
and calling ndctl tools to configure the namespaces for the storage medium areas.
6. The method of claim 4, wherein converting the access manner of each of the storage medium regions configured with the namespaces into the system memory access comprises:
and calling a daxctl tool to convert the access mode of each storage medium area configured with the name space into the system memory access.
7. The method of claim 2, wherein after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises:
In the case where the type is a volatile storage medium, a second node view instruction is executed to allocate the virtual CXL device onto a second NUMA node that does not include the processor.
8. The method of any one of claims 1 to 7, wherein after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises:
creating a driver for the virtual CXL device in the kernel.
9. The method of any one of claims 1 to 7, wherein after loading the kernel image file and the root file system to establish a virtual CXL device in the client, the method further comprises:
a coherence protocol engine module is created in the processor for converting a protocol of cache data of the processor to the predetermined protocol.
10. The method of any one of claims 1 to 7, wherein loading the kernel image file and the root file system to establish a virtual CXL device in the client comprises:
loading the kernel image file and the root file system, and establishing an initial virtual CXL device in the client, wherein the initial virtual CXL device is a virtual device supporting the communication of the preset protocol, and the initial virtual CXL device comprises a storage medium;
Creating in the initial virtual CXL device at least one of: and the PCIe function module is used for accessing the configuration space of the processor, modifying the memory mapping of the base address register and processing message interrupt.
11. The method of any one of claims 1 to 7, wherein loading the kernel image file and the root file system to establish a virtual CXL device in the client comprises:
loading the kernel mirror image file and the root file system, and determining whether to generate a memory file of the virtual CXL device;
in the event that a memory file for the virtual CXL device is generated, it is determined to establish the virtual CXL device in the client.
12. The method of any one of claims 1 to 7, wherein the operating system comprises an application program, the method further comprising, after loading the kernel image file and the root file system to establish a virtual CXL device in the client:
Acquiring operation demand information of the application program, wherein the operation demand information comprises one of large demand memory capacity, large demand memory bandwidth and small demand calculation time delay;
determining a memory allocation strategy of the client according to the operation demand information;
and operating the client according to the memory allocation strategy, and testing the access performance of the virtual CXL equipment.
13. The method of claim 12, wherein determining the memory allocation policy of the client based on the operational requirement information comprises:
and under the condition that the operation requirement information is that the required memory capacity is large, determining the memory allocation strategy as follows: preferentially using the local memory of the kernel before using the storage medium of the virtual CXL device;
and under the condition that the operation requirement information is that the required memory bandwidth is large, determining the memory allocation strategy as follows: storing the same data in the storage medium and the local memory respectively;
and under the condition that the operation demand information is small in time delay of the demand calculation, determining the memory allocation strategy as follows: and when the locality of the application program is larger than a preset value, using the storage medium, and when the locality is smaller than or equal to the preset value, using the local memory.
14. The method of claim 12, wherein testing the access performance of the virtual CXL device comprises:
calling an MLC tool to measure throughput between the processor and a storage medium of the virtual CXL device to obtain a bandwidth performance parameter corresponding to the virtual CXL device;
and calling the MLC tool to measure the access time delay between the processor and the storage medium, and obtaining the time delay performance parameter corresponding to the virtual CXL equipment.
15. The method of any one of claims 1 to 7, wherein the predetermined protocol further comprises a CXL caching protocol.
16. The method of any one of claims 1 to 7, wherein at least one of the virtual CXL devices in the client, the method further comprising, after loading the kernel image file and the root file system to establish a virtual CXL device in the client:
establishing a virtual CXL switch in the client;
communication connections between the virtual CXL switch and at least one of the virtual CXL devices and at least one of the processors are established, respectively.
17. An apparatus for simulating a CXL device, the apparatus being for application to a client of a host, the client comprising at least one first NUMA node, the first NUMA node comprising a processor and an operating system running on the processor, the apparatus comprising:
The compiling unit is used for compiling the kernel of the operating system so that the compiled kernel code supports a preset protocol to obtain a kernel mirror image file, wherein the preset protocol comprises CXL memory protocol;
a production unit for producing a root file system of a release version of the operating system;
a loading unit for loading the kernel image file and the root file system to establish a virtual CXL device in the client,
the compiling unit includes:
the first configuration module is used for responding to a configuration instruction generated according to the preset protocol, and configuring preset parameters for the kernel, wherein the preset parameters comprise the type of a storage medium of the virtual CXL device, the access mode of the storage medium and the driving type of the virtual CXL device;
and the compiling module is used for compiling the kernel codes configured with the preset parameters to obtain the kernel image file.
18. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 16.
19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 16 when the computer program is executed.
20. A client, comprising:
at least one first NUMA node that includes a processor and an operating system that runs on the processor;
a virtual CXL device simulated using the steps of the method of any one of claims 1 to 16.
CN202311252710.9A 2023-09-26 2023-09-26 Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client Active CN116991544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311252710.9A CN116991544B (en) 2023-09-26 2023-09-26 Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311252710.9A CN116991544B (en) 2023-09-26 2023-09-26 Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client

Publications (2)

Publication Number Publication Date
CN116991544A CN116991544A (en) 2023-11-03
CN116991544B true CN116991544B (en) 2024-01-26

Family

ID=88523570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311252710.9A Active CN116991544B (en) 2023-09-26 2023-09-26 Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client

Country Status (1)

Country Link
CN (1) CN116991544B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645945A (en) * 2013-11-04 2014-03-19 天津汉柏信息技术有限公司 Automatic probing and drive loading method of virtual network interface card
CN109783117A (en) * 2019-01-18 2019-05-21 中国人民解放军国防科技大学 Mirror image file making and starting method of diskless system
CN113886019A (en) * 2021-10-20 2022-01-04 北京字节跳动网络技术有限公司 Virtual machine creation method, device, system, medium and equipment
CN114691286A (en) * 2020-12-29 2022-07-01 华为云计算技术有限公司 Server system, virtual machine creation method and device
CN116414526A (en) * 2023-06-12 2023-07-11 芯动微电子科技(珠海)有限公司 Simulation device and method based on virtual machine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645945A (en) * 2013-11-04 2014-03-19 天津汉柏信息技术有限公司 Automatic probing and drive loading method of virtual network interface card
CN109783117A (en) * 2019-01-18 2019-05-21 中国人民解放军国防科技大学 Mirror image file making and starting method of diskless system
CN114691286A (en) * 2020-12-29 2022-07-01 华为云计算技术有限公司 Server system, virtual machine creation method and device
CN113886019A (en) * 2021-10-20 2022-01-04 北京字节跳动网络技术有限公司 Virtual machine creation method, device, system, medium and equipment
CN116414526A (en) * 2023-06-12 2023-07-11 芯动微电子科技(珠海)有限公司 Simulation device and method based on virtual machine

Also Published As

Publication number Publication date
CN116991544A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
JP6373840B2 (en) System and method for tuning a cloud computing system
JP5926864B2 (en) System and method for configuring a cloud computing system
US9229838B2 (en) Modeling and evaluating application performance in a new environment
US20140047084A1 (en) System and method for modifying a hardware configuration of a cloud computing system
US20140047095A1 (en) System and method for tuning a cloud computing system
US20140047342A1 (en) System and method for allocating a cluster of nodes for a cloud computing system based on hardware characteristics
US20140047227A1 (en) System and method for configuring boot-time parameters of nodes of a cloud computing system
US20140047079A1 (en) System and method for emulating a desired network configuration in a cloud computing system
WO2012106908A1 (en) Simulation method and simulator for remote memory access in multi-processor system
CN112732501B (en) Test method and multiprocessor SOC chip
US11483416B2 (en) Composable infrastructure provisioning and balancing
Denoyelle et al. Modeling large compute nodes with heterogeneous memories with cache-aware roofline model
US20220253336A1 (en) System, method and computer-accessible medium for a domain decomposition aware processor assignment in multicore processing system(s)
Li et al. Analysis of NUMA effects in modern multicore systems for the design of high-performance data transfer applications
CN117032812B (en) Management method, device and apparatus of server, storage medium and electronic device
KR20170088277A (en) Electronic system with data exchange mechanism and method of operation thereof
CN116991544B (en) Simulation method and device of CXL (control information and automation) equipment, electronic equipment and client
CN110727611A (en) Configurable consistency verification system with state monitoring function
CN113098730B (en) Server testing method and equipment
Foyer et al. A survey of software techniques to emulate heterogeneous memory systems in high-performance computing
Geier et al. SherlockFog: a new tool to support application analysis in Fog and Edge computing
Tang et al. Exploring Performance and Cost Optimization with ASIC-Based CXL Memory
Reischer et al. Bio-algebras
KR20140096897A (en) Programmable intelligent storage architecture based on application and business requirements
JP2728002B2 (en) Embedded software debug support device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant