CN116521608A - Data migration method and computing device - Google Patents

Data migration method and computing device Download PDF

Info

Publication number
CN116521608A
CN116521608A CN202310279463.5A CN202310279463A CN116521608A CN 116521608 A CN116521608 A CN 116521608A CN 202310279463 A CN202310279463 A CN 202310279463A CN 116521608 A CN116521608 A CN 116521608A
Authority
CN
China
Prior art keywords
memory
data migration
processor
data
delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310279463.5A
Other languages
Chinese (zh)
Inventor
王运富
姚爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202310279463.5A priority Critical patent/CN116521608A/en
Publication of CN116521608A publication Critical patent/CN116521608A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a data migration method and computing equipment, relates to the technical field of computing equipment, and can improve the memory access performance of an application program. The method is applied to a first processor, wherein the first processor is one of a plurality of processors included in a computing device, and each of the plurality of processors is connected with at least one memory; the first memory accessed by the first processor comprises data to be migrated; the method comprises the following steps: determining a second memory from the data migration mapping table in response to the data migration instruction; the data migration mapping table comprises a memory set corresponding to a first memory, wherein the memory set corresponding to the first memory is obtained based on the comprehensive access time delay of each memory, and the comprehensive access time delay of each memory is determined according to the real access time delay from the first processor to the memory and the equivalent access time delay from the first processor to the memory; and migrating the data to be migrated from the first memory to the second memory. The embodiment of the application can be used in the process of optimizing the server.

Description

Data migration method and computing device
Technical Field
The present disclosure relates to the field of computing devices, and in particular, to a data migration upgrade method and a computing device.
Background
In a computing device, an application program often occupies a certain memory space. With the continuous advancement of enterprise digitization and informatization transformation, various diversified application programs are layered, and higher requirements are placed on the content capacity of computing devices. The computing fast link (compute express link, CXL) is an industry standard protocol based bus architecture for memory expansion of computing devices to increase the memory bandwidth and capacity of the computing device. Because the CXL protocol and the link have certain time delay, the computing equipment after CXL expansion has memories with different access time delays.
In the running process of the application program, the computing device needs to transfer the access data in the high-latency memory to the low-latency memory so as to improve the running performance of the application program. However, the current data migration scheme has a problem of causing the performance of the application program to be reduced in the actual operation process.
Disclosure of Invention
The embodiment of the application provides a data migration method and computing equipment, which can effectively improve the performance of an application program during data migration.
In a first aspect, an embodiment of the present application provides a data migration method, applied to a first processor, where the first processor is one of a plurality of processors included in a computing device, and each of the plurality of processors is connected to at least one memory; the first memory accessed by the first processor comprises data to be migrated; the method comprises the following steps: determining a second memory from the data migration mapping table in response to the data migration instruction; the data migration mapping table comprises a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delays of the memories, the comprehensive access time delays of the memories are determined according to real access time delays from a first processor to the memories and equivalent access time delays from the first processor to the memories, and the equivalent access time delays are used for indicating performance loss when the first processor accesses the memories; and migrating the data to be migrated from the first memory to the second memory.
According to the data migration method provided by the embodiment of the application, after the data migration instruction is received, the second memory is determined from the data migration mapping table determined based on the comprehensive access time delay of each memory, and the data to be migrated is migrated to the second memory. The comprehensive access time delay not only considers the real access time delay of the processor to access the memory, but also considers the performance loss caused by the processor accessing the memory. Compared with the traditional second memory which only considers real access delay to determine migration, the data migration method provided by the embodiment of the application is more in line with the situation of performance loss caused by cross-processor access to the memory in an actual scene, and based on the method, data migration is performed, so that hardware performance can be fully exerted, and the memory access performance of an application program can be effectively improved.
In one possible implementation, the data migration map is stored in a memory of the computing device; in response to the data migration instruction, determining a second memory from the data migration map comprises: responding to the data migration instruction and acquiring a data migration mapping table from a memory; and determining a second memory from the data migration mapping table. It should be understood that, the data migration mapping table is built in advance in the stored memory, and the processor can be directly invoked for use when data migration is performed, so that the data migration mapping table is not required to be created with extra time and calculation, the data migration efficiency is improved, and the waste of calculation resources is reduced.
In another possible implementation manner, after responding to the data migration instruction, the method further includes: acquiring a memory delay table, wherein the memory delay table comprises comprehensive access delay from a first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In yet another possible implementation, after responding to the data migration instruction, the method further includes: constructing and acquiring a memory delay table based on the memory relation mapping table; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; the memory relation mapping table is determined by the memory information of each memory; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In yet another possible implementation, after responding to the data migration instruction, the method further includes: acquiring memory information of each memory; acquiring a memory relation mapping table based on the memory information; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; acquiring a memory delay table based on the memory relation mapping table; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In yet another possible implementation, the integrated access latency of each memory in the data migration map table satisfies:
T_total=T_equal+T_real
wherein, T_total represents comprehensive access time delay, T_real represents real access time delay, and T_equivalent represents equivalent access time delay.
In yet another possible implementation, the equivalent access latency satisfies:
T_equal=α×T_remote
wherein, when the memory is connected with the first processor, alpha is a first value; when the memory is connected with the second processor, alpha is a second value; the second processor is one of the plurality of processors except the first processor; the second value is larger than the first value; t_remote represents a time delay conversion coefficient corresponding to the performance loss when the first processor accesses the memory under the condition that the memory is connected with the second processor.
In yet another possible implementation, the data migration instruction includes: a first data migration direction; the first data migration direction is used for indicating the migration direction of the thermal data; determining a second memory from the data migration map comprises: determining that a first memory set corresponding to a first data migration direction exists in a first memory from a data migration mapping table; the comprehensive access time delay of each memory in the first memory set is smaller than the comprehensive access time delay of the first memory; and taking the target memory in the first memory set as the second memory, wherein the comprehensive access time delay of the target memory is smaller than the comprehensive access time delay of other memories in the first memory set. It should be appreciated that for hot data migration, hot data may be located in a memory with a lower latency, which shortens the time taken by the processor to access the hot data, thereby ensuring memory access performance of the application.
In yet another possible implementation manner, after migrating the data to be migrated from the first memory to the second memory, the method further includes: and under the condition of migration failure, taking the next memory of the target memory in the first memory set as a second memory according to the sequence from small to large of the comprehensive access time delay. It should be understood that in the case of failure of hot data migration, new memories are sequentially selected for retry, so that full utilization of low-latency memory capacity can be ensured under the condition of meeting the requirement of data migration, and further, the memory access performance of an application program is ensured.
In yet another possible implementation, the data migration instruction includes: a second data migration direction; the second data migration direction is used for indicating the migration direction of the cold data; determining a second memory from the data migration map comprises: determining that a second memory set corresponding to a second data migration direction exists in the first memory from the data migration mapping table; the comprehensive access time delay of each memory in the second memory set is larger than the comprehensive access time delay of the first memory; and taking one memory in the second memory set as a second memory. It should be appreciated that, for migration of cold data, more capacity of the lower latency memory may be released, so that more hot data may be migrated to the lower latency memory, further ensuring memory access performance of the application.
In yet another possible implementation manner, after migrating the data to be migrated from the first memory to the second memory, the method further includes: and under the condition of migration failure, according to the magnitude of the comprehensive access delay, selecting one memory from the second memory set again as the second memory.
In yet another possible implementation manner, the connection manner between the at least one memory and the processor includes one or more of the following: direct connection, connection via CXL switch chip.
In a second aspect, an embodiment of the present application provides a data migration apparatus, applied to a first processor, where the first processor is one of a plurality of processors included in a computing device, and each of the plurality of processors is connected to at least one memory; the first memory accessed by the first processor comprises data to be migrated; the device comprises: a determination module and a migration module. The determining module is used for determining a second memory from the data migration mapping table; the data migration mapping table comprises a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delays of the memories, the comprehensive access time delays of the memories are determined according to real access time delays from a first processor to the memories and equivalent access time delays from the first processor to the memories, and the equivalent access time delays are used for indicating performance loss when the first processor accesses the memories; the migration module is used for migrating the data to be migrated from the first memory to the second memory.
In one possible implementation, the data migration map is stored in a memory of the computing device; the determining module is specifically used for responding to the data migration instruction and acquiring a data migration mapping table from the memory; and determining a second memory from the data migration mapping table.
In another possible implementation manner, the apparatus further includes: and an acquisition module. The acquisition module is used for acquiring a memory delay table, wherein the memory delay table comprises comprehensive access delays from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In another possible implementation manner, the obtaining module is further configured to obtain a memory delay table based on the memory relationship mapping table; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; the memory relation mapping table is determined by the memory information of each memory; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In another possible implementation manner, the acquiring module is further configured to acquire memory information of each memory; acquiring a memory relation mapping table based on the memory information; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; acquiring a memory delay table based on the memory relation mapping table; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In yet another possible implementation, the integrated access latency of each memory in the data migration map table satisfies:
T_total=T_equal+T_real
wherein, T_total represents comprehensive access time delay, T_real represents real access time delay, and T_equivalent represents equivalent access time delay.
In yet another possible implementation, the equivalent access latency satisfies:
T_equal=α×T_remote
wherein, when the memory is connected with the first processor, alpha is a first value; when the memory is connected with the second processor, alpha is a second value; the second processor is one of the plurality of processors except the first processor; the second value is larger than the first value; t_remote represents a time delay conversion coefficient corresponding to the performance loss when the first processor accesses the memory under the condition that the memory is connected with the second processor.
In yet another possible implementation, the first data migration direction; the first data migration direction is used for indicating the migration direction of the thermal data; the determining module is specifically configured to determine, from the data migration mapping table, that a first memory set corresponding to the first data migration direction exists in the first memory; the comprehensive access time delay of each memory in the first memory set is smaller than the comprehensive access time delay of the first memory; and taking the target memory in the first memory set as the second memory, wherein the comprehensive access time delay of the target memory is smaller than the comprehensive access time delay of other memories in the first memory set.
In another possible implementation manner, the migration module is further configured to, in the case of migration failure, use, in order from small to large, a next memory of the target memory in the first memory set as the second memory.
In yet another possible implementation, the data migration instruction includes: a second data migration direction; the second data migration direction is used for indicating the migration direction of the cold data; the determining module is specifically configured to determine, from the data migration mapping table, that a second memory set corresponding to the second data migration direction exists in the first memory; the comprehensive access time delay of each memory in the second memory set is larger than the comprehensive access time delay of the first memory; and taking one memory in the second memory set as a second memory.
In another possible implementation manner, the migration module is further configured to, in case of migration failure, reselect one memory from the second memory set as the second memory according to the integrated access latency.
In yet another possible implementation manner, the connection manner between the at least one memory and the processor includes one or more of the following: direct connection, connection through a computational fast link CXL, connection through a CXL switching chip.
In a third aspect, embodiments of the present application provide a computing device comprising: a processor and a memory; the memory stores instructions executable by the processor; the processor is configured to execute the instructions to cause the computing device to implement the method of the first aspect described above.
In a fourth aspect, embodiments of the present application provide a computing device, the computing device including a plurality of processors, each processor of the plurality of processors coupled to at least one memory, a first processor being one of the plurality of processors, the first processor being configured to: determining a second memory from the data migration mapping table in response to the data migration instruction; the data migration mapping table comprises a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delays of the memories, the comprehensive access time delays of the memories are determined according to real access time delays from a first processor to the memories and equivalent access time delays from the first processor to the memories, and the equivalent access time delays are used for indicating performance loss when the first processor accesses the memories; and migrating the data to be migrated from the first memory to the second memory.
In a fifth aspect, embodiments of the present application provide a computer-readable storage medium comprising: computer software instructions; the computer software instructions, when executed in a computing device, cause the computing device to implement the method of the first aspect described above.
In a sixth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the steps of the related method described in the first aspect above, to implement the method of the first aspect above.
Advantageous effects of the second aspect to the sixth aspect described above may refer to corresponding descriptions of the first aspect, and are not repeated.
Drawings
FIG. 1 is a schematic diagram of a computing device according to an embodiment of the present application;
fig. 2 is a schematic diagram of connection between a local memory and a remote memory according to an embodiment of the present application;
fig. 3 is a schematic diagram of a relationship between multi-level delay memories according to an embodiment of the present application;
fig. 4 is a flow chart of a data migration method according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another data migration method according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a data migration method according to a first embodiment of the present disclosure;
Fig. 7 is a flow chart of a data migration method according to a second embodiment of the present application;
fig. 8 is a flow chart of a data migration method according to a third embodiment of the present application;
fig. 9 is a flow chart of a data migration method according to a fourth embodiment of the present application;
FIG. 10 is a schematic diagram of a system architecture according to an embodiment of the present disclosure;
fig. 11 is a schematic diagram of a data migration apparatus according to an embodiment of the present application;
fig. 12 is a schematic diagram of the composition of another computing device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the terms "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect, and those skilled in the art will understand that the terms "first", "second", and the like are not limited in number and execution order.
In order to facilitate understanding of the technical solutions of the embodiments of the present application, the terms referred to herein will be briefly described below.
1. And (3) data migration: in order to increase the access speed of data and the performance of application programs, the common data is transferred to a low-delay memory, and the unusual data is transferred to a high-delay memory.
2. Computing a fast link: compute express link, CXL for short. The bus architecture is an open industry standard bus architecture, and can provide high-bandwidth and low-delay connection among special computing, memory, input/output interfaces and storage elements in a data center. The memory expansion is performed on the computing device through CXL, so that the memory bandwidth and capacity of the computing device can be improved, and the method is better suitable for application scenes with intensive memory.
As described in the background art, in order to meet the requirements of diversified application programs on memory capacity and memory bandwidth, CXL is often adopted in the related art to enable a computing device to connect with more memories, so as to realize expansion of the memory capacity and the memory bandwidth of the computing device. Because CXL itself has certain time delay, therefore the memory through CXL connection is bigger in read-write access time delay compared with the traditional direct-hanging memory. Therefore, after connecting the memories through the CXL, the computing device has memories with different access delays, which may cause a performance degradation of the application program when the application program does not distinguish between the memories with different random access delays. Therefore, at present, data used by an application program is generally migrated from a high-latency memory to a low-latency memory in a data migration manner, so that the application program can read the data as quickly as possible, and further, the performance of the application program is improved.
The computing device may be a tower server, a rack server, or a blade server, and embodiments of the present application are not limited to a particular configuration of computing devices.
Fig. 1 is a schematic structural diagram of a computing device according to an embodiment of the present application, as shown in fig. 1, a hardware portion of the computing device may include a motherboard, a plurality of processors (two are shown in the figure as examples), and a basic input output system (basic input output system, BIOS) chip. The BIOS chip and the plurality of processors are arranged on the main board, the BIOS chip is respectively connected with the plurality of processors, and each processor in the plurality of processors is connected with at least one memory. It should be noted that, the multiple processors may be connected to each other in pairs, or may be sequentially connected to form a ring, which may be specific according to an actual scenario, and this embodiment of the present application is not limited in particular. The software portion of the computing device may additionally include an Operating System (OS) kernel (which may also be referred to as an OS management unit) and a BIOS. The OS kernel may be located within one or more processors and the BIOS is located within the BIOS chip.
In embodiments of the present application, the processor may be a processor based on the x86 architecture, or the ARM (advanced RISC machines) architecture. For example, the processor may be a central processing unit (central processing unit, CPU).
Taking a processor as a CPU as an example, in a multi-CPU computing device, when an application program is executed, there is a situation that the idle state of the CPU is unstable, so that one application program process may be executed on a different CPU, and dynamic changes exist. In addition, each CPU is connected with a memory, so when an application program running on a certain CPU (such as a first CPU) applies for a memory, if the memory capacity connected by the first CPU meets the application program, the memory connected by the first CPU is allocated to the application program, and for the application program, the memory is a local memory. If the memory capacity of the first CPU connection is insufficient, the application may be allocated with the memory connected to other CPUs (e.g., referred to as a second CPU), where the memory is not the memory connected to the first CPU, and for the application, the memory is a non-local memory or a remote memory. In general, the access delay of the remote memory is greater than that of the local memory, so in the computing device with multiple CPUs, the application program may directly access the local memory with smaller access delay, or may be a non-local memory with larger access delay, and a certain optimization space exists for memory access.
Fig. 2 is a schematic diagram of connection between a local memory and a remote memory according to an embodiment of the present application. As shown in fig. 1, CPU0 is connected to memory 0, CPU1 is connected to memory 1, and CPU0 is connected to CPU1. For example, memory 0 and memory 1 herein may be dual-inline-memory-modules (DIMMs). Taking CPU0 as the first CPU, the CPU0 accesses the memory 0 directly, i.e. accesses the local memory. CPU0 accesses memory 1 requiring access across CPU1, i.e., to non-local memory or to remote memory. Therefore, when the application program accesses the data in the non-local memory, the data in the non-local memory can be migrated to the local memory by a certain technical means on the premise of sufficient local memory capacity, so that the application program running in the CPU0 can read and write the data from the local memory with smaller time delay, and the memory access performance of the application program is improved.
For different processors, after CXL memory expansion, each processor in the computing device may have a memory with different access delays, for example, fig. 3 is a schematic diagram of a relationship between multi-level delay memories provided in an embodiment of the present application. The computing device may include two processors, CPU0 and CPU1, respectively. The CPU0 is directly connected to the memory 1, connected to the memory 3 and the memory 4 through the CXL, and connected to the memory 7 and the memory 8 through the CXL Switch chip (CXL Switch). CPU1 is directly connected with memory 2, through CXL connection memory 5 and memory 6, through CXL exchange chip connection memory 9 and memory 10. Wherein, the direct connection refers to that the memory is directly connected with a pin (pin) provided by the CPU. The CXL exchange chip is an exchange chip based on CXL protocol, and can be used for expanding and connecting a plurality of memories and configuring a processor corresponding to the expanded and connected memories according to actual requirements. For example, in fig. 3, the CXL switch chip connects memory 7-memory 10, where memories 7 and 8 are configured as CPU0 connects memory, and memories 9 and 10 are configured as CPU1 connects memory.
In the current data migration process, the memory to be migrated (such as the second memory) is generally selected according to the real access delay of the memory. The real access delay refers to the time for waiting for the memory to respond before the state of the processor executing the read-write operation is ready. For example, described in connection with fig. 3. Taking CPU0 as an example, the real access latency of CPU0 to access the local memory (memory 1) is about 100 ns, the real access latency to access the remote memory (memory 2) is about 180 ns, the real access latency to access the local CXL memory (memory 3 or memory 4) is about 260 ns, the real access latency to access the remote CXL memory (memory 5 or memory 6) is about 440 ns, the real access latency to access the local CXL Switch memory (memory 7 or memory 8) is about 500 ns, and the real access latency to access the remote CXL Switch memory (memory 9 or memory 10) is about 600 ns.
If the data being accessed by the CPU0 is located on the memory 7, when the data needs to be migrated to the memory with low latency, the memory 3 or the memory 2 can be selected for migration. It can be seen that, although the real access latency of the memory 2 is smaller than the real access latency of the memory 3, for the CPU0, the memory 2 belongs to a remotely accessed memory, and if the memory 2 is used as the second memory to perform data migration, the common data of the local memory is frequently migrated to the remote memory (for example, the memory 2 connected to the CPU 1) when the local memory is stressed. In this case, since the memory 2 is occupied for the CPU1, when the CPU1 performs data migration, there is also a problem that data has to be migrated to the memory connected to the CPU0 due to the shortage of the local memory, which causes the application running on the CPU0 or the CPU1 to frequently access the data in the remote memory, thereby causing a decrease in the performance of the application.
That is, current data migration schemes only consider the real access latency of the memory, and do not consider the factor of accessing the memory across processors. Especially in the case where access to the remote memory (real access latency of path 1 in fig. 3) and access to the local CXL memory (real mode latency of path 2 in fig. 3) are relatively close, the factor across the processor has a greater impact on the performance of the application.
Based on this, the embodiment of the application provides a data migration method, which comprehensively considers the real access time delay and the performance loss when accessing the memory to determine the second memory to be migrated when data migration is performed, and optimizes the deficiency of the data migration scheme in the related technology, thereby effectively improving the performance of the application program.
The data migration method provided by the embodiment of the application can be applied to the scene shown in fig. 1 or fig. 3. It should be noted that, for each processor, the specific execution steps of the method for data migration provided in the embodiments of the present application are similar, and any processor (e.g., referred to as a first processor) will be described below as an example. The first memory accessed by the first processor includes data to be migrated, and it should be noted that the first memory is any memory in the computing device.
Example 1
Fig. 4 is a flowchart of a data migration method according to an embodiment of the present application. As shown in fig. 4, the data migration method provided in the first embodiment of the present application may specifically include the following steps:
s401, responding to a data migration instruction, and determining a second memory from a data migration mapping table by the first processor.
The data migration instruction is used for indicating that the first processor needs to execute migration operation on data to be migrated. The data migration instruction may come from an operating system kernel, and the specific time for the operating system kernel to send the data migration instruction is not limited in this embodiment, for example, a user manually issues the data migration instruction to the first processor through the operating system, or the operating system kernel itself supports a memory management function, and the operating system kernel may periodically scan attribute information (such as a utilization rate, a remaining space, etc.) of each memory, and issue the data migration instruction when the attribute information meets a preset condition.
In this embodiment of the present application, when an application running on the first processor is accessing the first memory, if the first processor receives a data migration instruction, the second memory (i.e. the target of migration) may be determined from the data migration mapping table in response to the data migration instruction.
The data migration mapping table includes a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on a comprehensive access delay of each memory, the comprehensive access delay of each memory is determined according to a real access delay from the first processor to the memory, and an equivalent access delay from the first processor to the memory, and each memory refers to a plurality of memories connected by a plurality of processors in the computing device. The equivalent access time delay is used for indicating the performance loss when the first processor accesses the memory; or the performance penalty of the first processor accessing memory across processors.
It should be noted that, in the embodiment of the present application, the data migration mapping table may be pre-configured by any one of the processors in the computing device and stored in the memory of the computing device. By way of example, the memory may be a random access device in a computing device, a dynamic storage device, a magnetic disk storage medium, or any other form of storage device. Specifically, S401 may be implemented to obtain, in response to a data migration instruction, a data migration mapping table from the storage, and further determine the second memory from the data migration mapping table. It should be understood that, the data migration mapping table is built in advance in the stored memory, and the processor can be directly invoked for use when data migration is performed, so that the data migration mapping table is not required to be created with extra time and calculation, the data migration efficiency is improved, and the waste of calculation resources is reduced.
The comprehensive access delay of each memory in the data migration mapping table satisfies the following expression:
T_total=T_equal+T_real
wherein, T_total represents comprehensive access time delay, T_real represents real access time delay, and T_equivalent represents equivalent access time delay. The equivalent access latency satisfies the following expression:
T_equal=α×T_remote
wherein α represents whether the memory belongs to the first processor. When the memory is connected with the first processor, alpha is a first value; when the memory and the second processor are in the same state, alpha is a second value; the second processor is one of the plurality of processors except the first processor, and the second value is larger than the first value. The t_remote represents a delay conversion coefficient corresponding to a performance loss when the first processor accesses the memory when the memory is connected to the second processor. Wherein the value of t_remote can be determined from experimental accumulation. In addition, the size of t_remote is related to the number of processors, e.g., memory is accessed across one processor, the value of t_remote may be 100 nanoseconds, memory is accessed across two processors, the value of t_remote may be 200 nanoseconds, and so on.
It should be noted that, the data in the computing device includes hot data and cold data, and accordingly, data migration is divided into hot data migration and cold data migration. If the data is hot data migration, the data needs to be frequently used, and the hot data needs to be migrated from the memory with larger time delay to the memory with smaller time delay, so that the access efficiency is improved. If the data is cold data migration, the data use times are less, and the cold data needs to be migrated from the memory with smaller time delay to the memory with larger time delay, so that the memory with smaller time delay is ensured to have larger capacity.
In the embodiment of the present application, a specific distinguishing manner between hot data and hot data is not limited, and as an example, an operating system kernel may use a special statistics counter to count, and determine that the data is hot data if the number of data accesses is greater than a certain threshold, and determine that the data is cold data if the number of data accesses is less than or equal to a certain threshold. As yet another example, the hot data and the cold data may also be partitioned by using a least recently used (least recently used, LRU) algorithm used in the current industry memory management, and in particular, reference may be made to the related art, which is not described in detail in the embodiments of the present application. According to the difference of the cold data and the hot data, the data migration instruction issued by the kernel of the operating system can comprise a data migration direction, wherein the data migration direction is used for indicating the migration direction of the hot data or the migration direction of the cold data. The detailed steps of determining the second memory for the difference of the cold and hot data in the embodiment of the present application are described as follows:
in one embodiment, the data migration direction includes: a first data migration direction. The first data migration direction is used for indicating the migration direction of the thermal data, and the data to be migrated in the first memory is the thermal data. Based on this, as shown in fig. 5, the above-described S401 may be embodied as follows S401a to S401b.
S401a, the first processor determines that a first memory set corresponding to a first data migration direction exists in a first memory from the data migration mapping table.
The comprehensive access time delay of each memory in the first memory set is smaller than the comprehensive access time delay of the first memory.
S401b, the first processor takes a target memory in the first memory set as a second memory.
The comprehensive access time delay of the target memory is smaller than the comprehensive access time delay of other memories in the first memory set.
As described above, the data migration table includes the memory set corresponding to the first memory. After receiving the data migration instruction, the first processor determines that a first memory set corresponding to the first data migration direction exists in the first memory from the data migration mapping table according to the first data migration direction, and then selects one memory with the minimum comprehensive access time delay (namely a candidate memory) from the first memory set as a second memory to be migrated for hot data.
It should be appreciated that for hot data migration, hot data may be located in a memory with a lower latency, which shortens the time taken by the processor to access the hot data, thereby ensuring memory access performance of the application.
In another embodiment, the data migration direction includes: and a second data migration direction. The second data migration direction is used for indicating the migration direction of the cold data, and the data to be migrated in the first memory is the cold data. As shown in fig. 6, S401 described above may be embodied as S401c-S401d.
S401c, the first processor determines that a second memory set corresponding to the second data migration direction exists in the first memory from the data migration mapping table.
The comprehensive access time delay of each memory in the second memory set is larger than that of the first memory.
S401d, the first processor takes one memory in the second memory set as a second memory.
As described above, the data migration table includes the memory set corresponding to the first memory. After receiving the data migration instruction, the first processor determines that a first memory set corresponding to the second data migration direction exists in the first memory according to the second data migration direction from the data migration mapping table, and then uses one memory in the first memory set as a second memory to be migrated for cold data. It should be noted that, the memory may be selected in a manner of selecting a memory with the smallest comprehensive access delay, selecting a memory with the largest comprehensive access delay, or selecting a memory randomly, which may be determined according to an actual scenario, which is not particularly limited in the embodiments of the present application.
It should be appreciated that, for migration of cold data, more capacity of the lower latency memory may be released, so that more hot data may be migrated to the lower latency memory, further ensuring memory access performance of the application.
The two embodiments described above are described in detail below with reference to specific examples.
The first value of α is 0, the second value is 1, and the value of t_remote is 100 ns, which are described by taking the schematic diagram shown in fig. 3 and the integrated access delay expression of the memory as an example. Taking the first processor as the CPU0 as an example, the memory 1 is connected with the CPU0, and if the value of α is 0, the comprehensive access delay of the memory 1 is 0×100+100=100 nanoseconds, that is, the comprehensive access delay of the memory 1 is equal to the real access delay. And if the memory 2 is not connected with the CPU0, the value of alpha is 1, and the comprehensive access time delay of the memory 2 is 1 multiplied by 100+180=280 nanoseconds. Similarly, the comprehensive access time delay of the memory 3 and the memory 4 is 260 nanoseconds, the comprehensive access time delay of the memory 5 and the memory 6 is 540 nanoseconds, the comprehensive access time delay of the memory 7 and the memory 8 is 500 nanoseconds, and the comprehensive access time delay of the memory 9 and the memory 10 is 600 nanoseconds.
Further, based on the comparison of the comprehensive access time delay, for the memory 1, the comprehensive access time delay of no other memory is smaller than that of the memory 1, and the comprehensive access time delay of the memories 2-10 is larger than that of the memory 1. Thus, determining that the first set of memory 1 in the hot data migration direction is empty (NA) and that the second set of memory 2 in the cold data migration direction comprises: memory 3, memory 4, memory 2, memory 7, memory 8, memory 5, memory 6, memory 9, and memory 10 (ordered from small to large in latency of integrated access). Based on the principles of the above analysis, a data migration map is determined as shown in table 1 below.
TABLE 1
/>
In table 1, for CPU0, there is a first memory set corresponding in the hot data migration direction and a second memory set corresponding in the cold data migration direction within each. After the CPU0 receives the data migration instruction, the second memory may be selected for migration based on table 1.
For example, as described in connection with table 1, when the data accessed by the application running in CPU0 is located in memory 5, memory 5 is the first memory. After receiving the hot data migration instruction, the CPU may determine from table 1 that there are memory 1, memory 3, memory 4, memory 2, memory 7, and memory 8 with a comprehensive access latency less than that of memory 5. The second memory to which the hot data in the memory 5 needs to be migrated is the second memory, where the integrated access latency of the memory 1 (i.e., the target memory) is the smallest.
Similarly, after receiving the cold data migration instruction, the CPU0 may determine from table 1 that the integrated access latency is greater than that of the memory 5, and that there is a memory 9 and a memory 10. Then memory 9 and memory 10 form a second memory subset. If the comprehensive access time delay of the memory 9 is consistent with that of the memory 10, one of the access time delays can be determined at will, and the second memory to which the cold data in the memory 5 needs to be migrated can be used.
Note that, the data migration map shown in table 1 is for CPU0, and if for CPU1, the data migration map is created as shown in table 2 below.
TABLE 2
/>
In table 2, for the CPU1, there is a first memory set corresponding in the hot data migration direction and a second memory set corresponding in the cold data migration direction in each. After the CPU1 receives the data migration instruction, the second memory may be selected for migration based on table 2.
S402, the first processor migrates the data to be migrated from the first memory to the second memory.
In this embodiment of the present application, after determining the target location, i.e., the second memory, to which the data to be migrated is to be migrated, the first processor may migrate the data to be migrated from the first memory to the second memory. The embodiments of the present application do not limit the specific migration process. As an example, a function of data migration (page migration) is provided in the CPU, which can be called by the CPU to implement the data migration. As yet another example, the CPU may set up a process to effect replication of pages for data migration purposes.
It should be noted that, in the process of data migration, a situation of insufficient memory capacity may be encountered, which may result in migration failure. In this case, it is necessary to redetermine a memory for data migration, which is described in detail as follows:
In an embodiment, in the case of hot data migration, after S402, the data migration method provided in the embodiment of the present application may further include the following S403a.
And S403a, under the condition of migration failure, taking the next memory of the target memory in the first memory set as a second memory according to the sequence of the comprehensive access time delay from small to large.
The next memory refers to a memory in the first memory set, which has a larger comprehensive access time delay than the target memory, but smaller comprehensive access time delay than other memories.
It should be appreciated that in the event of a hot data migration failure, this indicates that the currently determined second memory capacity is insufficient to accommodate the data to be migrated. Therefore, the first processor can determine the next memory from the first memory subset as the new second memory again according to the sequence from the small delay to the large delay in an integrated manner, and execute the data migration operation.
In other embodiments, in the case of cold data migration, after S402, the data migration method provided in the embodiments of the present application further includes the following S403b.
S403b, under the condition of migration failure, selecting one memory from the second memory set again as a second memory according to the comprehensive access delay.
Similarly, in the case of a cold data migration failure, the first processor may determine, again, the next memory from the second memory subset according to the integrated access latency, as a new second memory, and perform the data migration operation. The selection mode may be selected according to the sequence from the small to the large of the comprehensive access time delay, or may be selected according to the sequence from the large to the small of the comprehensive access time delay, which may be specific according to the actual scene, and the embodiment of the present application does not specifically limit this.
It should be understood that in the case of migration failure, the second memory node is reselected according to the magnitude of the comprehensive access delay to perform migration retry, so that full utilization of the memory capacity can be ensured under the condition of meeting the data migration requirement, and further, the performance of the application program is ensured.
The process of data migration is illustrated in detail in conjunction with table 1 or table 2 above.
Taking thermal data migration as an example, when the CPU0 receives a data migration instruction and needs to migrate data on the memory 5, a first memory set corresponding to the memory 5 in the thermal data migration direction is determined according to table 1, and from left to right, the first memory set is respectively a memory 1, a memory 3, a memory 4, a memory 2, a memory 7 and a memory 8. Therefore, when data migration is performed, the CPU0 first performs migration with the memory 1 with the smallest integrated access latency as a target, and if migration fails, then sequentially attempts to perform migration with the memories 3, 4, 2, 7, and 8 as targets. It should be appreciated that when a memory migration is successful, then it ends, and the attempted migration process is no longer performed on the subsequent memory. And when all the attempted migration fails, canceling the current migration of the data in the memory 5.
Taking cold data migration as an example, when the CPU1 receives a data migration instruction and needs to migrate the memory 10, the CPU1 determines, according to table 2, a second memory set corresponding to the memory 10 in the cold data migration direction, and from left to right, is the memory 3, the memory 4, the memory 7, and the memory 8, respectively. Therefore, when data migration is performed, the CPU1 may select the second memory to attempt to perform migration in order of the comprehensive access delay from small to large (i.e., the order from left to right in the table). The second memory may also be selected to attempt migration in accordance with the integrated access latency from large to small (i.e., right to left order in the table). Similarly, when a certain memory migration is successful, the process is ended, and the subsequent memory attempted migration process is not continued. When all the attempted migration fails, the current migration of the data in the memory 10 is canceled.
The technical solution provided by the foregoing embodiment at least brings the following beneficial effects, and in the data migration method provided by the embodiment of the present application, after receiving the data migration instruction, the second memory is determined from the data migration mapping table determined based on the comprehensive access delay of each memory, and the data to be migrated is migrated to the second memory. The comprehensive access time delay not only considers the real access time delay of the processor to access the memory, but also considers the performance loss caused by the processor accessing the memory. Compared with the traditional second memory which only considers real access delay to determine migration, the data migration method provided by the embodiment of the application is more in line with the situation of performance loss caused by cross-processor access to the memory in an actual scene, and based on the method, data migration is performed, so that hardware performance can be fully exerted, and the memory access performance of an application program can be effectively improved.
Furthermore, under the condition of data migration failure, new memories can be sequentially selected for retry based on the magnitude of comprehensive access delay, so that full utilization of memory capacity is ensured under the condition of meeting the requirement of data migration, and the memory access performance of an application program is further ensured.
It should be noted that, in the first embodiment, the data migration mapping table is established in advance, and is described by directly using as an example when data migration. In other embodiments, the data migration mapping table may be obtained after the first processor receives the data migration instruction, which is described in detail in the following second embodiment to fourth embodiment.
Example two
Fig. 7 is a flow chart of a data migration method according to a second embodiment of the present application. As shown in fig. 7, the data migration method provided in the second embodiment of the present application may specifically include the following steps:
s701, responding to a data migration instruction, and acquiring memory information of each memory; acquiring a memory relation mapping table based on the memory information; acquiring a memory delay table based on the memory relation mapping table; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
The memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor. The memory delay table includes a comprehensive access delay from the first processor to each memory.
In this embodiment, at the time of booting, the computing device performs a self-checking process by using multiple processors to select one processor (e.g., referred to as a main processor) from the multiple processors, where the main processor is responsible for invoking the BIOS to start the process of performing the boot self-checking. Further, the BIOS can scan the memory in the computing device and obtain the related information of the memory during the boot self-test process, including the identification of the memory, the memory timing sequence, the connection mode between the memory and the processor, and the like. The memory timing is a parameter describing the performance of the memory, and the real access delay of the memory can be determined. In some implementations, the BIOS may also scan the memory in the computing device to obtain information such as the memory size and the distance between the memory and the processor.
Further, after the operating system is started, the operating system kernel may call an advanced configuration and power management interface (advanced configuration and power management interface, ACPI) to obtain information such as a memory identifier, a memory timing sequence, a connection mode between the memory and the processor, and the like from the BIOS. Among other things, ACPI is a standard interface that defines a communication interface between an operating system, firmware, and hardware. ACPI is typically implemented in BIOS and is controlled by an operating system driver. After the operating system kernel obtains the memory timing sequence of each memory, the real access delay of each processor to access the memory can be measured based on the memory timing sequence, and specific reference can be made to the related technology, which is not described in detail herein.
After receiving the data migration instruction, the first processor may obtain memory information such as a memory identifier issued by the operating system kernel, a real access delay of the memory, a connection manner between the memory and the processor, and so on, so as to obtain a memory relationship mapping table based on the memory information, where the table may be stored in the memory. The table is mainly used for reflecting different connection modes between the processor and the memory in the computing equipment.
For example, table 3 is a memory mapping table provided in the embodiments of the present application.
TABLE 3 Table 3
Table 3 includes two processors (CPU 0 and CPU1, respectively), and a plurality of memories connected to each processor. The memories are divided into three groups, and the memories in different groups are connected with the corresponding processors in different ways, for example, the connection ways include: direct connection, via CXL exchange chip connection.
Further, in combination with table 3, a memory latency table is obtained based on the expression for determining the integrated access latency, which may be stored in a memory.
Taking the CPU0 as the first processor as an example, it can be known from table 3 that the memory 1, the memory 3, the memory 4, the memory 7 and the memory 8 are memories connected with the CPU0, and the integrated access time delay of these memories can be determined as the real access time delay according to the above analysis. And if other memories are not connected with the CPU, the value of alpha is 1. For example, continuing to take the value of t_remote as 100 ns, the integrated access latency of memory 2 is 1×100+180=280 ns, the integrated access latency of memory 5 and memory 6 is 1×100+440=540 ns, and the integrated access latency of memory 9 and memory 10 is 1×100+500=600 ns. Based on the analysis, the memory latency table obtained by the first processor is shown in table 4 below.
TABLE 4 Table 4
Similarly, taking the CPU1 as the first processor as an example, the memory latency table obtained by the first processor is shown in the following table 5:
TABLE 5
Finally, based on the memory delay table and the comprehensive access delay size ordering of the memory, the first processor acquires the acquirable data migration table. Taking CPU0 as an example of the first processor, the first processor migrates the data obtained according to table 4 as shown in table 1 above. Taking CPU1 as an example of the first processor, the data migration table obtained by the first processor according to table 5 is shown in the foregoing table 2.
S702, the first processor determines a second memory from the data migration mapping table.
S703, the first processor migrates the data to be migrated from the first memory to the second memory.
The descriptions of S702 and S703 may be referred to the descriptions of the first embodiment, and the detailed description is not repeated here.
Example III
Fig. 8 is a flow chart of a data migration method according to a third embodiment of the present application. As shown in fig. 8, the data migration method provided in the third embodiment of the present application may specifically include the following steps:
s801, responding to a data migration instruction, and acquiring a memory delay table based on a memory relation mapping table; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In this embodiment of the present application, the BIOS may obtain the identifier of each memory and the connection relationship between the memory and the processor in the boot self-checking process, construct the memory relationship mapping table described in table 3 above according to these information, or report these information to the operating system kernel, where the operating system kernel instructs any processor to construct the memory relationship mapping table and store it in the memory. When the first processor receives the data migration instruction, a memory relation mapping table can be obtained from the memory, and based on the memory relation mapping table and the real access time delay of the memory, the memory time delay table shown in the above table 4 or table 5 is obtained. And further, based on the memory delay table and the comprehensive access delay order of the memory, acquiring the data migration mapping table shown in the table 1 or the table 2.
S802, the first processor determines a second memory from the data migration mapping table.
S803, the first processor migrates the data to be migrated from the first memory to the second memory.
The descriptions of S802 and S803 may be referred to the descriptions of the first embodiment, and the detailed description is not repeated here.
Example IV
Fig. 9 is a flow chart of a data migration method according to a fourth embodiment of the present application. As shown in fig. 9, the data migration method provided in the fourth embodiment of the present application may specifically include the following steps:
S901, responding to a data migration instruction, and acquiring a memory delay table; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration table.
In this embodiment of the present application, the operating system kernel may instruct any one or more processors to construct in advance a memory latency table corresponding to each processor to store in the memory in combination with the memory related information reported by the BIOS. After receiving the data migration instruction, the first processor may obtain the corresponding memory delay table from the memory, and further obtain the data migration table shown in table 1 or table 2 based on the memory delay table and the comprehensive access delay order of the memory.
S902, determining a second memory from the data migration mapping table.
S903, migrating the data to be migrated from the first memory to the second memory.
The descriptions of S902 and S903 may be referred to the descriptions of the first embodiment, and the detailed description is not repeated here.
The advantageous effects of the above-described second to fourth embodiments can be seen from the description of the first embodiment.
Fig. 10 is a schematic diagram of a system architecture according to an embodiment of the present application, as shown in fig. 10, and Application (APP) is sequentially shown from top to bottom in the figure, where the APP is in a user mode and the APP is in a kernel mode. The user state and the kernel state are two running levels of the operating system, and most Applications (APP) directly used by users, such as a Redis database or a HANA memory database, run in the user state. When an application program involves an operation on hardware (for example, migration of data in a memory in the embodiment of the present application), the user mode operation level does not support the operation of the related instruction of the hardware operation, and then needs to be switched to the kernel mode to execute the related instruction of the hardware operation. The glibc library is a c operation library and can be operated in a user mode. An application programming interface (application programming interface, API) of the lowest layer of the operating system is provided in the glibc library, and a user can perform policy configuration of the data migration method provided by the embodiment of the application based on the API interface provided by the glibc library. The kernel mode comprises a memory relation mapping table creation module, a cold and hot data migration target memory calculation module and a page migration module, wherein the modules are realized by writing codes through an API interface by a user. The memory relation mapping table creating module is configured to create a memory mapping table (table 3 above) according to a policy configured by a user, and the hot and cold data migration target memory calculating module further creates a hot data migration target memory list and a cold data migration target memory list (table 1 or table 2 above) of different calculating nodes based on the relation mapping table, and the page migration module performs page migration scheduling (data migration).
It can be seen that the foregoing description of the solution provided by the embodiments of the present application has been presented mainly from a method perspective. To achieve the above-mentioned functions, embodiments of the present application provide corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In an exemplary embodiment, the embodiment of the application further provides a data migration device. The data migration apparatus may include one or more functional modules for implementing the data migration method of the above method embodiment. For example, fig. 11 is a schematic diagram of a data migration apparatus according to an embodiment of the present application. The data migration apparatus may be applied to a first processor, where the first processor is one of a plurality of processors included in a computing device, and each of the plurality of processors is connected to at least one memory; the first memory accessed by the first processor comprises data to be migrated. As shown in fig. 11, the apparatus includes: a determination module 1101 and a migration module 1102.
The determining module 1101 is configured to determine a second memory from the data migration mapping table; the data migration mapping table comprises a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delays of the memories, the comprehensive access time delays of the memories are determined according to real access time delays from the first processor to the memories and equivalent access time delays from the first processor to the memories, and the equivalent access time delays are used for indicating performance loss when the first processor accesses the memories.
The migration module 1102 is configured to migrate data to be migrated from the first memory to the second memory.
In some embodiments, the data migration map is stored in a memory of the computing device; the determining module 1101 is specifically configured to obtain a data migration mapping table from the memory in response to the data migration instruction; and determining a second memory from the data migration mapping table.
In some embodiments, the apparatus further comprises: and an acquisition module 1103. The obtaining module 1103 is configured to obtain a memory latency table, where the memory latency table includes a comprehensive access latency from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In some embodiments, the obtaining module 1103 is further configured to obtain a memory latency table based on the memory relationship mapping table; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; the memory relation mapping table is determined by the memory information of each memory; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In some embodiments, the acquiring module 1103 is further configured to acquire memory information of each memory; acquiring a memory relation mapping table based on the memory information; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; acquiring a memory delay table based on the memory relation mapping table; the memory delay table comprises comprehensive access delay from the first processor to each memory; and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring a data migration mapping table.
In some embodiments, the integrated access latency for each memory in the data migration map satisfies:
T_total=T_equal+T_real
wherein, T_total represents comprehensive access time delay, T_real represents real access time delay, and T_equivalent represents equivalent access time delay.
In some embodiments, the equivalent access latency satisfies:
T_equal=α×T_remote
wherein, when the memory is connected with the first processor, alpha is a first value; when the memory is connected with the second processor, alpha is a second value; the second processor is one of the plurality of processors except the first processor; the second value is larger than the first value; t_remote represents a time delay conversion coefficient corresponding to the performance loss when the first processor accesses the memory under the condition that the memory is connected with the second processor.
In some embodiments, a first data migration direction; the first data migration direction is used for indicating the migration direction of the thermal data; the determining module 1101 is specifically configured to determine, from the data migration mapping table, that a first memory set corresponding to the first data migration direction exists in the first memory; the comprehensive access time delay of each memory in the first memory set is smaller than the comprehensive access time delay of the first memory; and taking the target memory in the first memory set as the second memory, wherein the comprehensive access time delay of the target memory is smaller than the comprehensive access time delay of other memories in the first memory set.
In some embodiments, the migration module 1102 is further configured to, in the case of migration failure, take, as the second memory, a next memory of the target memory in the first memory set in order of from smaller to larger comprehensive access latency.
In some embodiments, the data migration instructions include: a second data migration direction; the second data migration direction is used for indicating the migration direction of the cold data; the determining module 1101 is specifically configured to determine that a second memory set corresponding to the second data migration direction exists in the first memory from the data migration mapping table; the comprehensive access time delay of each memory in the second memory set is larger than the comprehensive access time delay of the first memory; and taking one memory in the second memory set as a second memory.
In some embodiments, the migration module 1102 is further configured to, in case of migration failure, reselect one memory from the second memory set to be the second memory according to the integrated access latency.
In some embodiments, the at least one memory is coupled to the processor in one or more of the following ways: direct connection, connection through a computational fast link CXL, connection through a CXL switching chip.
In the case of implementing the functions of the integrated modules in the form of hardware, the embodiments of the present application provide a schematic structural diagram of a computing device, which may be the data migration apparatus described above. As shown in fig. 12, the computing device 1200 includes: a processor 1202, a communication interface 1203, and a bus 1204. Optionally, the computing device may also include a memory 1201.
The processor 1202 may be a variety of exemplary logic blocks, modules, and circuits that implement or perform the description of the various embodiments disclosed herein. The processor 1202 may be a central processor, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with the disclosure of embodiments of the present application. The processor 1202 may also be a combination that performs computing functions, such as including one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
A communication interface 1203 is configured to connect with other devices via a communication network. The communication network may be an ethernet, a radio access network, a wireless local area network (wireless local area networks, WLAN), etc.
The memory 1201 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
As a possible implementation, the memory 1201 may exist separately from the processor 1202, and the memory 1201 may be connected to the processor 1202 by the bus 1204 for storing instructions or program code. The processor 1202, when calling and executing instructions or program code stored in the memory 1201, is capable of implementing the data migration method provided in the embodiments of the present application.
In another possible implementation, the memory 1201 may also be integrated with the processor 1202.
Bus 1204, which may be an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. Bus 1204 may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 12, but not only one bus or one type of bus.
It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the data migration apparatus is divided into different functional modules to perform all or part of the functions described above.
Embodiments of the present application also provide a computer-readable storage medium. All or part of the flow in the above method embodiments may be implemented by computer instructions to instruct related hardware, and the program may be stored in the above computer readable storage medium, and the program may include the flow in the above method embodiments when executed. The computer readable storage medium may be any of the foregoing embodiments or memory. The computer readable storage medium may be an external storage device of the data migration apparatus, for example, a plug-in hard disk (SMC) provided in the data migration apparatus, a Secure Digital (SD) card, a flash card, or the like. Further, the computer readable storage medium may further include both an internal storage unit and an external storage device of the data migration apparatus. The computer-readable storage medium is used for storing the computer program and other programs and data required by the data migration apparatus. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The present application also provides a computer program product comprising a computer program which, when run on a computer, causes the computer to perform any of the data migration methods provided in the above embodiments.
Although the present application has been described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the figures, the disclosure, and the appended claims. In the claims, the word "Comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the present application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A data migration method, applied to a first processor, wherein the first processor is one of a plurality of processors included in a computing device, and each of the plurality of processors is connected with at least one memory; the first memory accessed by the first processor comprises data to be migrated; the method comprises the following steps:
determining a second memory from the data migration mapping table in response to the data migration instruction; the data migration mapping table comprises a memory set corresponding to a first memory, the second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delay of each memory, the comprehensive access time delay of each memory is determined according to real access time delay from the first processor to the memory and equivalent access time delay from the first processor to the memory, and the equivalent access time delay is used for indicating performance loss when the first processor accesses the memory;
And migrating the data to be migrated from the first memory to the second memory.
2. The method of claim 1, wherein the data migration map is stored in a memory of the computing device;
the determining, in response to the data migration instruction, a second memory from the data migration mapping table includes:
responding to the data migration instruction, and acquiring the data migration mapping table from the memory;
and determining a second memory from the data migration mapping table.
3. The method of claim 1, wherein after responding to the data migration instruction, the method further comprises:
acquiring a memory delay table, wherein the memory delay table comprises comprehensive access delay from the first processor to each memory;
and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring the data migration mapping table.
4. The method of claim 1, wherein after responding to the data migration instruction, the method further comprises:
acquiring a memory delay table based on the memory relation mapping table; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor; the memory relation mapping table is determined by the memory information of each memory; the memory delay table comprises comprehensive access delays from the first processor to the memories;
And based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring the data migration mapping table.
5. The method of claim 1, wherein after responding to the data migration instruction, the method further comprises:
acquiring memory information of each memory;
acquiring a memory relation mapping table based on the memory information; the memory relation mapping table comprises a connection mode between each processor in the plurality of processors and a memory connected with each processor;
acquiring a memory delay table based on the memory relation mapping table; the memory delay table comprises comprehensive access delays from the first processor to the memories;
and based on the memory delay table and the comprehensive access delay size ordering of the memory, acquiring the data migration mapping table.
6. The method according to any one of claims 1-5, wherein the integrated access latency of each memory in the data migration map table satisfies:
T_total=T_equal+T_real
wherein, T_total represents comprehensive access time delay, T_real represents real access time delay, and T_equivalent represents equivalent access time delay.
7. The method of claim 6, wherein the equivalent access latency satisfies:
T_equal=α×T_remote
Wherein, when the memory is connected with the first processor, α is a first value; when the memory is connected with the second processor, alpha is a second value; the second processor is one of the plurality of processors except the first processor; the second value is greater than the first value; and the T_remote represents a time delay conversion coefficient corresponding to the performance loss when the first processor accesses the memory under the condition that the memory is connected with the second processor.
8. The method of any of claims 1-7, wherein the data migration instruction comprises: a first data migration direction; the first data migration direction is used for indicating the migration direction of the thermal data;
the determining the second memory from the data migration mapping table includes:
determining a first memory set corresponding to the first memory in the first data migration direction from the data migration mapping table; the comprehensive access time delay of each memory in the first memory set is smaller than the comprehensive access time delay of the first memory;
and taking the target memory in the first memory set as the second memory, wherein the comprehensive access time delay of the target memory is smaller than that of other memories in the first memory set.
9. The method of claim 8, wherein after migrating the data to be migrated from the first memory to the second memory, the method further comprises:
and under the condition of migration failure, taking the next memory of the target memory in the first memory set as the second memory according to the sequence from small to large of the comprehensive access time delay.
10. The method of any of claims 1-7, wherein the data migration instruction comprises: a second data migration direction; the second data migration direction is used for indicating the migration direction of cold data;
the determining the second memory from the data migration mapping table includes:
determining that a second memory set corresponding to a second data migration direction exists in the first memory from the data migration mapping table; the comprehensive access time delay of each memory in the second memory set is larger than the comprehensive access time delay of the first memory;
and taking one memory in the second memory set as the second memory.
11. The method according to any one of claims 1 to 10, wherein,
the connection mode of the at least one memory and the processor comprises one or more of the following modes: direct connection, connection through a computational fast link CXL, connection through a CXL switching chip.
12. A computing device comprising a plurality of processors, each processor of the plurality of processors coupled to at least one memory, a first processor being one of the plurality of processors, the first processor being configured to:
determining a second memory from the data migration mapping table in response to the data migration instruction; the data migration mapping table comprises a memory set corresponding to a first memory, a second memory is one of the memory sets, the memory set corresponding to the first memory is obtained based on comprehensive access time delay of each memory, the comprehensive access time delay of each memory is determined according to real access time delay from the first processor to the memory and equivalent access time delay from the first processor to the memory, and the equivalent access time delay is used for indicating performance loss when the first processor accesses the memory;
and migrating the data to be migrated from the first memory to the second memory.
CN202310279463.5A 2023-03-21 2023-03-21 Data migration method and computing device Pending CN116521608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310279463.5A CN116521608A (en) 2023-03-21 2023-03-21 Data migration method and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310279463.5A CN116521608A (en) 2023-03-21 2023-03-21 Data migration method and computing device

Publications (1)

Publication Number Publication Date
CN116521608A true CN116521608A (en) 2023-08-01

Family

ID=87398410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310279463.5A Pending CN116521608A (en) 2023-03-21 2023-03-21 Data migration method and computing device

Country Status (1)

Country Link
CN (1) CN116521608A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881016A (en) * 2023-09-06 2023-10-13 苏州浪潮智能科技有限公司 Processing method and device of server process, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881016A (en) * 2023-09-06 2023-10-13 苏州浪潮智能科技有限公司 Processing method and device of server process, storage medium and electronic equipment
CN116881016B (en) * 2023-09-06 2024-01-19 苏州浪潮智能科技有限公司 Processing method and device of server process, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US20160085585A1 (en) Memory System, Method for Processing Memory Access Request and Computer System
US20170075818A1 (en) Memory management method and device
US20160170886A1 (en) Multi-core processor supporting cache consistency, method, apparatus and system for data reading and writing by use thereof
CN107621959B (en) Electronic device and software training method and computing system thereof
CN106445834A (en) Managing Operational State Data in Memory Module
CN110457261B (en) Data access method, device and server
US11940915B2 (en) Cache allocation method and device, storage medium, and electronic device
CN112445423A (en) Memory system, computer system and data management method thereof
CN111104219A (en) Binding method, device, equipment and storage medium of virtual core and physical core
CN116521608A (en) Data migration method and computing device
CN115904212A (en) Data processing method and device, processor and hybrid memory system
JP2004527040A (en) Using volatile memory to buffer non-volatile memory
CN106354428B (en) Storage sharing system of multi-physical layer partition computer system structure
US20210132979A1 (en) Goal-directed software-defined numa working set management
CN111831451A (en) Cloud host memory allocation method, cloud host, cloud device and storage medium
JP6343722B2 (en) Method and device for accessing a data visitor directory in a multi-core system
CN116401043A (en) Execution method of computing task and related equipment
CN110447019B (en) Memory allocation manager and method for managing memory allocation performed thereby
CN113282407B (en) User layer asynchronous IO method and system based on persistent memory
CN114610243A (en) Thin volume conversion method, system, storage medium and equipment
CN115495433A (en) Distributed storage system, data migration method and storage device
CN112748989A (en) Virtual machine memory management method, system, terminal and medium based on remote memory
US9021506B2 (en) Resource ejectability in multiprocessor systems
CN116483740B (en) Memory data migration method and device, storage medium and electronic device
CN113704165B (en) Super fusion server, data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination