CN117687835A - Data processing system, memory mirroring method, memory mirroring device and computing equipment - Google Patents

Data processing system, memory mirroring method, memory mirroring device and computing equipment Download PDF

Info

Publication number
CN117687835A
CN117687835A CN202211519995.3A CN202211519995A CN117687835A CN 117687835 A CN117687835 A CN 117687835A CN 202211519995 A CN202211519995 A CN 202211519995A CN 117687835 A CN117687835 A CN 117687835A
Authority
CN
China
Prior art keywords
area
node
memory
data
management node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211519995.3A
Other languages
Chinese (zh)
Inventor
陈智勇
孙宏伟
潘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2023/102963 priority Critical patent/WO2024051292A1/en
Publication of CN117687835A publication Critical patent/CN117687835A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing system, a memory mirroring method, a memory mirroring device and a computing device are disclosed, and relate to the field of computers. The system includes a plurality of nodes and a management node. The first node requests to mirror a first area in a memory used by the first node; the management node allocates a second area, the second area is used for indicating the storage space which is the same as the first area in the second node, and the second area is used for backing up and storing the data of the first area. When the node does not put forward the memory mirror image demand, the storage resources in the system are used for storing different data, and only when the memory mirror image demand is put forward, the mirror image area is allocated from the storage resources of the system, so that the mirror image area backups and stores the data stored in the area to be mirror image, and the high reliability of the data is improved. In addition, the region to be mirrored and the mirror region can be storage spaces in different nodes, so that the mirror region is flexibly and dynamically allocated to realize memory mirror, and flexibility of memory mirror configuration and utilization rate of storage resources are improved.

Description

Data processing system, memory mirroring method, memory mirroring device and computing equipment
The present application claims priority from the chinese patent application filed on 09 of 2022, at 09 of 2022, with application number 202211105202.3, entitled "method for implementing memory mirroring", the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of computers, and in particular, to a data processing system, a memory mirroring method, a memory mirroring device, and a computing device.
Background
Memory mirroring (mirror) is an effective means of solving uncorrectable errors (Uncorrectable Error, UCE) in memory, i.e., storing backup data in a mirrored region of memory in which one portion of memory space is used as another portion of memory space. Generally, memory mirroring is implemented by using a static configuration mode or by using an operating system to allocate adjacent pages in a memory as a mirroring area. If the mirrored area is too large, memory resources of the memory are wasted. If the mirrored area is too small, this results in a UCE that cannot resolve the memory. Therefore, the current memory mirror configuration is inflexible, resulting in lower utilization of storage resources.
Disclosure of Invention
The application provides a data processing system, a memory mirroring method, a memory mirroring device and a computing device, so that memory mirroring is flexibly configured, and the utilization rate of storage resources of a memory is improved.
In a first aspect, a data processing system is provided, the data processing system comprising a plurality of nodes and a management node. The first node is used for requesting to mirror a first area in a memory used by the first node; the management node is used for distributing a second area, namely the first area is an area to be mirrored, the second area is a mirrored area of the first area, the second area is used for indicating a storage space which is the same as the first area in the second node in size, and the second area is used for backing up and storing data of the first area.
Compared with the method that a mirror image area is pre-configured before the system is started in a static configuration mode, storage resources are wasted, when the node does not provide memory mirror image requirements, the storage resources in the system are used for storing different data, and only when the memory mirror image requirements are provided, the mirror image area is allocated from the storage resources of the system, so that the mirror image area backs up and stores the data stored in the area to be mirror image, and the high reliability of the data is improved. In addition, compared with the case that the adjacent pages in the memory are allocated by the operating system to serve as the mirror image areas to realize memory mirror image, the scheme provided by the application does not limit the position relation between the areas to be mirror image and the mirror image areas, and the areas to be mirror image and the mirror image areas can be storage spaces in different nodes, so that the mirror image areas are flexibly and dynamically allocated to realize memory mirror image, and the flexibility of memory mirror image configuration and the utilization rate of storage resources are improved.
With reference to the first aspect, in one possible implementation manner, the first node indicates a first physical address of the first area; the management node is further configured to generate a mirror relationship between the first area and the second area, where the mirror relationship is used to indicate a correspondence between the first physical address and the second physical address, and the second physical address is used to indicate the second area. Therefore, when the first node performs a read operation or a write operation on the first area, the management node is convenient to determine the mirror image area of the first area according to the mirror image relationship, perform the write operation on the mirror image area of the first area, or read the first data from the second area when an uncorrectable error occurs in the first area, so that the phenomenon of data processing failure is avoided.
In one example, the management node is further configured to receive a write instruction sent by the first node, and write the first data to the first area and the second area. The write indication is for indicating that the first data is stored to the first area.
In another example, the management node is further configured to receive a read indication of the first node, the read indication being configured to indicate that the first data is read from the first area; the management node is further configured to read the first data from the first area when no uncorrectable errors occur in the first area. Or the management node is further configured to read the first data from the second area when an uncorrectable error occurs in the first area, so that the first node successfully reads the first data, and the service of the required first data is prevented from being affected.
With reference to the first aspect, in another possible implementation manner, the first area is a main storage space, and the second area is a spare storage space; the management node is further configured to determine the second area as the main storage space when an uncorrectable error occurs in the first area.
With reference to the first aspect, in another possible implementation manner, the management node is further configured to instruct the first node to modify the image identifier of the first area to be invalid. Therefore, the node can release the storage resources of the first area conveniently, and the utilization rate of the storage resources is improved.
With reference to the first aspect, in another possible implementation manner, the size of the first area is determined by an application requirement.
With reference to the first aspect, in another possible implementation manner, the second area includes any one of a local storage space of the second node, an extended storage space of the second node, and a storage space of the second node in the global memory pool.
With reference to the first aspect, in another possible implementation manner, the management node supports a cache coherence protocol.
In a second aspect, a memory mirroring method is provided, where a data processing system includes a plurality of nodes and a management node; the method comprises the following steps: the first node requests to mirror a first area in a memory used by the first node; the management node allocates a second area, wherein the second area is a mirror image area of the first area, the second area is used for indicating a storage space which is the same as the first area in the second node, and the second area is used for backing up and storing data of the first area.
With reference to the second aspect, in one possible implementation manner, the first node indicates a first physical address of the first area; the method further comprises the steps of: the management node generates a mirror image relationship between the first area and the second area, wherein the mirror image relationship is used for indicating the corresponding relationship between the first physical address and the second physical address, and the second physical address is used for indicating the second area.
With reference to the second aspect, in another possible implementation manner, the method further includes: the management node receives a write instruction sent by a first node, wherein the write instruction is used for indicating to store first data into a first area; the management node writes the first data to the first area and the second area.
With reference to the second aspect, in another possible implementation manner, the method further includes: the management node receives a reading instruction of the first node, wherein the reading instruction is used for indicating reading of first data from the first area; the management node reads the first data from the first area when the uncorrectable error does not occur in the first area.
With reference to the second aspect, in another possible implementation manner, the method further includes: the management node reads the first data from the second area when an uncorrectable error occurs in the first area.
With reference to the second aspect, in another possible implementation manner, the first area is a main storage space, and the second area is a spare storage space, and the method further includes: the management node determines the second region as the primary storage space when an uncorrectable error occurs in the first region.
With reference to the second aspect, in another possible implementation manner, the method further includes: the management node instructs the first node to modify the image identification of the first area to be invalid.
With reference to the second aspect, in another possible implementation manner, the size of the first area is determined by an application requirement.
With reference to the second aspect, in another possible implementation manner, the second area includes any one of a local storage space of the second node, an extended storage space of the second node, and a storage space of the second node in the global memory pool.
With reference to the second aspect, in another possible implementation manner, the management node supports a cache coherence protocol.
In a third aspect, there is provided a management apparatus comprising respective modules for performing the method performed by the management node in the second aspect or any of the possible designs of the second aspect.
In a fourth aspect, there is provided a data processing node comprising respective modules for performing the method performed by the node of the second aspect or any of the possible designs of the second aspect.
In a fifth aspect, a computing device is provided, the computing device comprising at least one processor and memory for storing a set of computer instructions; the method further comprises the step of executing the memory mirroring method of the second aspect or any of the possible implementations of the second aspect when the processor executes the set of computer instructions as a management node of the second aspect or any of the possible implementations of the second aspect.
In a sixth aspect, there is provided a chip comprising: a processor and a power supply circuit; wherein the power supply circuit is used for supplying power to the processor; the processor is configured to perform the operation steps of the memory mirroring method in the second aspect or any possible implementation manner of the second aspect.
In a seventh aspect, there is provided a computer readable storage medium comprising: computer software instructions; the computer software instructions, when executed in a computing device, cause the computing device to perform the operational steps of the method as described in the second aspect or any one of the possible implementations of the second aspect.
In an eighth aspect, there is provided a computer program product for, when run on a computer, causing a computing device to perform the operational steps of the method as described in the second aspect or any one of the possible implementations of the second aspect.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
FIG. 1 is a schematic diagram of a data processing system according to the present application;
FIG. 2 is a schematic diagram of a deployment scenario of a global memory pool provided in the present application;
FIG. 3 is a flow chart of a memory mirroring method provided in the present application;
FIG. 4 is a schematic flow chart of a data processing method provided in the present application;
fig. 5 is a schematic structural diagram of a management device provided in the present application;
fig. 6 is a schematic structural diagram of a computing device provided in the present application.
Detailed Description
For ease of description, the terms referred to in this application are first briefly introduced.
Memories are also called internal memories and main memories (main memories). Memory is an important component of computer systems, namely, the bridge between external memory (or referred to as secondary memory) and central processing units (central processing unit, CPU). The memory is used for temporarily storing operation data in the CPU and data exchanged between the CPU and an external memory such as a hard disk. For example, the computer starts to operate, data to be operated is loaded into the CPU from the memory to operate, and after the operation is completed, the CPU stores the operation result into the memory.
Correctable errors (Correctable Error, CE), which are memory errors that can be corrected using error correction code (Error Correction Code, ECC) techniques, ensure that the host is reliable and maintainable (Reliability, availability and Serviceability, RAS).
Uncorrectable errors (Uncorrectable Error, UCE) are those in which the memory errors cannot be corrected by the ECC technique when they exceed the error correction capability of the ECC. If the storage space in which the uncorrectable error occurs is configured with the mirror area, backup data of the storage space can be obtained from the mirror area.
Global mirroring (Global Mirror) refers to taking half of the storage space in the memory as a Mirror area of the other half of the storage space, and is used for backing up and storing data stored in the other half of the storage space.
The local mirroring is also called memory address mirroring based on address intervals, and refers to taking a half area in a storage space indicated by an address segment in a memory as a mirroring area of the other half area.
Cache line (cacheline), which refers to a unit of read or write operations performed by a computer device to a storage space of a memory. One cache line may be 64 bytes (B) in size.
Interleaving refers to uniformly distributing data for accessing a memory over multiple memory channels in units of storage space (e.g., cache lines). The interleaving mode can be configured by a system administrator, and can be used for interleaving among a plurality of memory channels connected with one processor or interleaving among a plurality of memory channels of a plurality of processors.
The memory channel refers to a plurality of memories connected with a processor in the computer equipment. The processor may operate on the memory using interleaving techniques. For example, the processor distributes data to be written to the memory evenly across the plurality of memory channels according to the size of the cache line. Further, the processor reads data from the plurality of memory channels according to the size of the cache line. Therefore, data processing is performed based on the memory channels, so that the memory bandwidth utilization rate and the processing performance of the computer equipment are improved.
Super Node refers to interconnecting multiple nodes into a high performance cluster through a high bandwidth, low latency inter-slice interconnect bus and switch. The scale of the super node is larger than the node scale under a Cache consistent non-uniform memory addressing (Cache-Coherent Non Uniform Memory Access, CC-NUMA) architecture, and the interconnection bandwidth of the nodes in the super node is larger than the interconnection bandwidth of the Ethernet.
High performance computing (High Performance Computing, HPC) clusters, refer to a computer cluster system. An HPC cluster comprises a plurality of computers connected together using various interconnection techniques. The interconnection technology may be, for example, infiniband (IB), remote direct memory access over an aggregated ethernet (Remote Direct Memory Access over Converged Ethernet, roCE) or transmission control protocol (Transmission Control Protocol, TCP). HPC provides ultra-high floating point computing capability, and can be used for solving the computing demands of services such as computation-intensive and mass data processing. The integrated computing power of multiple computers connected together can handle large computing problems. For example, the industries of scientific research, weather forecasting, finance, simulation experiments, biopharmaceuticals, gene sequencing, and image processing involve large computational problems and computational demands addressed with HPC clusters. Processing large-scale computing problems by using HPC clusters can effectively shorten the computing time for processing data and improve the computing precision.
Memory operation instructions, which may be referred to as memory semantics or memory operation functions. The memory operation instruction includes at least one of memory allocation (malloc), memory setting (memset), memory copy (memcpy), memory move (memmode), memory release (memory release), and memory comparison (memcmp).
The memory allocation is used for supporting the running of the application program to allocate a section of memory.
The memory settings are used to set the data mode of the global memory pool, e.g., initialization.
The memory copy is used for copying data stored in a storage space indicated by a source address (source) to a storage space indicated by a destination address (destination).
The memory move is used to copy data stored in a storage space indicated by a source address (source) to a storage space indicated by a destination address (destination), and delete data stored in the storage space indicated by the source address (source).
The memory comparison is used for comparing whether the data stored in the two storage spaces are equal.
The memory release is used for releasing the data stored in the memory so as to improve the utilization rate of the memory resources of the system and further improve the system performance.
In order to solve the problem that the memory mirror image is inflexible in configuration and the utilization rate of storage resources is low, the application provides a data processing system which comprises a plurality of nodes and management nodes. When the first node requests to mirror the first area in the memory used by the first node, the management node allocates a second area, namely the first area is an area to be mirrored, the second area is a mirrored area of the first area, the second area is used for indicating a storage space with the same size as the first area in the second node, and the second area is used for backing up and storing data of the first area. Compared with the prior art that the mirror image area is pre-configured before the system is started in a static configuration mode, storage resources are wasted, when the memory mirror image requirement is not set up, the storage resources in the system are used for storing different data, and only when the memory mirror image requirement is set up, the mirror image area is allocated from the storage resources of the system, so that the mirror image area backs up and stores the data stored in the area to be mirrored, and the high reliability of the data is improved. In addition, compared with the method that the adjacent pages in the memory are allocated by the operating system to serve as the mirror image areas to realize memory mirror image, the method does not limit the position relation between the areas to be mirror image and the mirror image areas, and the areas to be mirror image and the mirror image areas can be storage spaces in different nodes, so that the mirror image areas are flexibly and dynamically allocated to realize memory mirror image, and flexibility of memory mirror image configuration and utilization rate of storage resources are improved.
FIG. 1 is a schematic diagram of a data processing system according to the present application. As shown in FIG. 1, data processing system 100 is an entity that provides high performance computing. Data processing system 100 includes a plurality of nodes 110. Node 110 may include a compute node and a storage node.
For example, node 110 may be a processor, a server, a desktop computer, a smart network card, a memory expansion card, a controller of a storage array, a memory, and the like. The processors may be XPU for data processing such as central processing unit (central processing unit, CPU), graphics processor (graphics processing unit, GPU), data processing unit (data processing unit, DPU), neural processing unit (neural processing unit, NPU), and embedded neural network processor (neural-network processing unit, NPU).
When the node 110 is an XPU for data processing such as GPU, DPU, NPU with higher Computing Power, the node 110 can be used as an accelerator to offload the job of the general processor (such as CPU) to the accelerator, and the accelerator processes the job with higher Computing demands (such as HPC, big data job, database job, etc.), so as to solve the problem that the heavy floating point Computing demands of the scenes such as HPC, artificial intelligence (Artificial Intelligence, AI) cannot be satisfied due to the insufficient floating point Computing Power of the general processor, thereby shortening the data processing time, reducing the system energy consumption, and improving the system performance. The computational power of a node may also be referred to as the computational power of the node. In some embodiments, an accelerator may also be integrated within node 110. The nodes of the independently deployed accelerators and the integrated accelerators support flexible plug-in, and the scale of the data processing system can be elastically expanded according to the needs, so that the computing requirements in different application scenes are met.
The storage node comprises one or more controllers, a network card and a plurality of hard disks. The hard disk is used for storing data. The hard disk may be a magnetic disk or other type of storage medium such as a solid state disk or a shingled magnetic recording hard disk, or the like. The network card is used for communicating with the computing nodes contained in the computing cluster. The controller is used for writing data into the hard disk or reading data from the hard disk according to the data reading/writing request sent by the computing node. In the process of reading and writing data, the controller needs to convert an address carried in a read/write data request into an address which can be identified by the hard disk.
The plurality of nodes 110 are connected based on high-speed interconnect links having high bandwidth, low latency. In some embodiments, as shown in FIG. 1, a management node 120 (e.g., a switch) connects a plurality of nodes 110 based on a high-speed interconnect link. For example, the management node 120 connects the plurality of nodes 110 via optical fibers, copper cables, or copper wires. The management node may be referred to as a switch chip or interconnect chip or baseboard management controller (Baseboard Management Controller, BMC).
Data processing system 100, which is made up of multiple nodes 110 connected by management node 120 based on high-speed interconnect links, may also be referred to as a supernode. The plurality of supernodes are connected through a data center network. The data center network includes a plurality of core switches and a plurality of aggregation switches. The data center network may constitute a scale domain. Multiple supernodes may constitute a performance domain. More than two supernodes may constitute a macro-cabinet. Macro-cabinets may also be connected based on a data center network.
The management node 120 is configured to allocate, according to the memory mirroring requirement sent by the node 110, a mirroring area with the same size as the to-be-mirrored area in the memory used by the node 110. The management node 120 may support cache coherence protocols such as computing fast links (Compute Express Link, CXL), and maintain high performance, low latency, and data coherence of the memory image.
In other embodiments, the plurality of nodes 110 are directly connected based on high-speed interconnect links having high bandwidth and low latency. The node 110 has the function of managing the node 120 provided in the present application.
Data processing system 100 supports the execution of large data, databases, high performance computing, artificial intelligence, distributed storage, and Yun Yuansheng applications. The data to be backed up and stored in the embodiment of the present application includes service data of applications such as Virtual Machines (VM), containers, high Available (HA) applications, big data, databases, high performance computing, artificial intelligence (Artificial Intelligence, AI), distributed storage, yun Yuansheng, and the like.
The area to be mirrored and the mirrored area may be storage spaces in different nodes. The mirrored region may be provided by a local storage medium, an extended storage medium, or a global memory pool of any one of the nodes 110 in the system.
In some embodiments, the storage media of nodes 110 in data processing system 100 are uniformly addressed to form a global memory pool, enabling memory semantic access across supernode internal nodes (simply referred to as cross-nodes). The global memory pool is a resource shared by nodes formed by uniformly addressing storage media of the nodes.
The global memory pool provided by the application can comprise a storage medium of a computing node in the supernode and a storage medium of the storage node. The storage medium of the computing node includes at least one of a local storage medium within the computing node and an extended storage medium to which the computing node is connected. The storage medium of the storage node comprises at least one of a local storage medium within the storage node and an extended storage medium to which the storage node is connected.
For example, the global memory pool includes local storage media within the compute node and local storage media within the storage node.
As another example, the global memory pool includes any one of a local storage medium within the compute node, an extended storage medium to which the compute node is connected, and a local storage medium within the storage node and an extended storage medium to which the storage node is connected.
As another example, the global memory pool includes a local storage medium within the compute node, an extended storage medium to which the compute node is connected, a local storage medium within the storage node, and an extended storage medium to which the storage node is connected.
For example, as shown in fig. 2, a deployment scenario diagram of a global memory pool is provided in the present application. Global memory pool 200 includes storage medium 210 within each of the N computing nodes, expansion storage medium 220 to which each of the N computing nodes is connected, storage medium 230 within each of the M storage nodes, and expansion storage medium 240 to which each of the M storage nodes is connected.
It should be appreciated that the storage capacity of the global memory pool may include a portion of the storage capacity in the storage medium of the compute node and a portion of the storage capacity in the storage medium of the storage node. The global memory pool is a storage medium accessible to both computing nodes and storage nodes within the uniformly addressed supernode. The storage capacity of the global memory pool can be used by the computing node or the storage node through memory interfaces such as large memory, distributed data structures, data caches, metadata and the like. The computing node running application may use these memory interfaces to perform memory operations on the global memory pool. In this way, the global memory pool north-orientation constructed based on the storage capacity of the storage medium of the computing node and the storage medium of the storage node provides a unified memory interface for the computing node to use, so that the computing node uses the unified memory interface to write data into the storage space provided by the computing node or the storage space provided by the storage node of the global memory pool, thereby realizing the computation and storage of the data based on the memory operation instruction, reducing the time delay of data processing, and improving the speed of data processing.
The foregoing describes an example of a storage medium in a computing node and a global memory pool constructed by the storage medium in the storage node. The deployment mode of the global memory pool can be flexible and changeable, and the embodiment of the application is not limited. For example, a global memory pool is built from storage media of storage nodes. As another example, a global memory pool is built from storage media of the compute nodes. Constructing the global memory pool using the storage medium of the separate storage node or the storage medium of the computing node may reduce the occupation of storage resources on the storage side and provide a more flexible extension scheme.
According to the type of storage medium, the storage medium of the global memory pool provided in the embodiments of the present application includes a dynamic random access memory (Dynamic Random Access Memory, DRAM), a Solid State Disk (SSD) or Solid State Drive, and a storage-class memory (SCM).
In some embodiments, the global memory pool can be set according to the type of the storage medium, that is, one type of storage medium is utilized to construct one type of memory pool, and different types of storage media are utilized to construct different types of global memory pools, so that the global memory pool is applied to different scenes, and the computing node selects the storage medium according to the access characteristics of the application, thereby enhancing the control authority of the user to the system, improving the system experience of the user and expanding the application scene applicable to the system. For example, DRAM in a compute node and DRAM in a storage node are uniformly addressed to form a DRAM memory pool. The DRAM memory pool is used for application scenes with high requirements on access performance, moderate data capacity and no data persistence requirements. For another example, the SCM in the computing node and the SCM in the storage node are uniformly addressed to form an SCM memory pool. The SCM memory pool is used for application scenes which are insensitive to the access performance, have large data capacity and have the appeal of data persistence.
Next, embodiments of the memory mirroring method provided in the present application are described in detail with reference to fig. 3 to 4.
Fig. 3 is a flow chart of a memory mirroring method provided in the present application. The node 110A is illustrated herein as requesting a memory mirror. As shown in fig. 3, the method includes the following steps.
Step 310, node 110A sends a memory mirror requirement to management node 120.
In order to improve the reliability of data, the node 110A may send a memory mirroring requirement to the management node 120, so as to request to perform memory mirroring on a first area where data is stored, that is, the management node 120 allocates a second area having the same size as the first area, that is, the first area is a to-be-mirrored area, the second area is a mirrored area of the first area, and the second area is used for indicating a storage space having the same size as the first area in the second node, where the data stored in the to-be-mirrored area is backed up by the mirrored area.
The data to be backed up may include Virtual Machines (VMs), containers, high Availability (HA) applications, and business requirements. The business requirements may indicate the need for backup storage of important data during business execution. That is, data to be backed up is stored to the area to be mirrored and the mirrored area. If the region to be mirrored is faulty or the data stored in the region to be mirrored is wrong, the data can be acquired from the mirrored region, so that the reliability of the data is improved, and the problem of service occurrence caused by the fault or the data error of the storage space for storing the data is avoided, and the user experience is influenced.
In some embodiments, after node 110A boots up, memory mirroring requirements may be sent to management node 120 according to a mirroring policy. The mirroring policy indicates that the memory mirroring requirements are determined according to the reliability level of the application. Reliability indicates the nature of the product that has not failed during use. For a product, the higher the reliability of the product, the longer the product can operate without failure. For example, a system administrator may pre-configure the reliability level of an application, and node 110A may send a memory mirror requirement according to the reliability level of the application, for applications with high reliability requirements, apply for memory mirrors to management node 120, and for applications with low reliability requirements, do not need to apply for memory mirrors to management node 120.
Step 320, the management node 120 obtains the memory mirror requirement.
Management node 120 may receive the memory mirror requirements sent by node 110A via the optical fibers of node 110A. The memory mirroring requirement is used to indicate the region to be mirrored in the memory used by the node 110A.
The memory used by node 110A includes at least one of a local storage medium, an extended storage medium, and a global memory pool. It is understood that the area to be mirrored that the node 110A requests for memory mirroring may be a storage space in any one of a local storage medium, an extended storage medium, and a global memory pool of the node 110A.
The memory mirroring requirement specifically indicates the physical address of the region to be mirrored and the size of the region to be mirrored, so that the management node 120 directly obtains the size of the region to be mirrored from the memory mirroring requirement.
In one example, the memory mirroring requirements include a physical address field of the region to be mirrored. The management node 120 determines the size of the area to be mirrored from the physical address segment.
In another example, the memory mirroring requirements include physical addresses and offset addresses of the region to be mirrored. The management node 120 determines the size of the region to be mirrored according to the physical address and the offset address of the region to be mirrored.
In step 330, the management node 120 allocates the mirror region according to the memory mirror requirement.
The management node 120 determines a free storage medium from among the storage media managed by it, and divides an area having the same size as the area to be mirrored from the free storage medium as a mirrored area. The storage media managed by the management node 120 include a local storage medium of any node in the system, an extended storage medium, and a storage medium constituting a global memory pool.
The storage medium to which the mirror area belongs may be any storage medium in the system, and the relationship between the storage medium to which the mirror area belongs and the storage medium to which the area to be mirrored belongs is not limited. The free storage medium may be a storage medium that is farther away from the storage medium to which the area to be mirrored belongs. For example, the storage medium to which the mirroring area belongs and the storage medium to which the area to be mirrored belongs may be located in different rooms or different cabinets. Therefore, the mirror image area and the area to be mirrored are far away, namely the mirror image area is distributed from a storage medium which is different from the storage medium to which the area to be mirrored belongs, so that the problem that the mirror image area and the area to be mirrored are invalid at the same time due to the fact that the mirror image area and the area to be mirrored are deployed on the same storage medium is avoided, the possibility of the simultaneous invalidation of the mirror image area and the area to be mirrored is reduced, and the reliability of memory mirroring is improved.
It is assumed that the management node 120 divides an area having the same size as the area to be mirrored from the node 110B as a mirrored area. Node 110A and node 110B may be two separate physical devices, the distance between node 110A and node 110B being relatively large, node 110A and node 110B may be located in different rooms or different cabinets.
Alternatively, the management node 120 may also determine the number of mirror image areas allocated according to the reliability level, that is, the management node 120 allocates different numbers of mirror image areas according to the reliability level from high to low, so as to achieve the effect of multi-point backup on the data with high reliability, and ensure the reliability of the data. For example, the reliability levels include from low to high reliability level 1 to reliability level 5. When the memory mirroring requirement indicates reliability level 1, the management node 120 allocates a mirroring area according to the reliability level 1 indicated by the memory mirroring requirement. When the memory mirroring requirement indicates reliability level 2, the management node 120 allocates two mirroring areas according to the reliability level 2 indicated by the memory mirroring requirement.
The present application does not limit the type of storage medium to which the mirroring area belongs and the storage medium to which the area to be mirrored belongs, and for example, the storage medium includes any one of DRAM, SSD, and SCM.
In addition, the size of the region to be mirrored is not limited, i.e. the memory mirroring granularity is not limited. The management node 120 may perform memory mirroring on a storage area with any size, so that the memory mirroring is performed according to the memory mirroring requirement to improve the utilization rate of storage resources. When the mirror image area is configured statically, the mirror image area is too large, so that the storage resource of the memory is wasted; the mirrored area is too small to account for the UCE of the memory. For example, the memory mirroring granularity is larger than the memory interleaving granularity, and the mirroring area fault can cause a plurality of data accessing the memory in an interleaving mode to be affected, so that the utilization rate of storage resources is reduced. Alternatively, the memory mirroring granularity may be 64 Bytes (Bytes), in combination with the granularity of memory interleaving, so as to avoid additional memory waste caused by the expansion of the interleaved memory region isolation.
In other embodiments, the management node 120 may construct a mirroring relationship between the area to be mirrored and the mirrored area, so that the management node 120 determines the mirrored area according to the mirroring relationship, and performs a read operation or a write operation on the mirrored area.
In one example, the mirroring relationship of the area to be mirrored and the mirroring area indicates a correspondence of a physical address of the area to be mirrored and a physical address of the mirroring area. The mirror relationship may be presented in tabular form, as shown in table 1.
TABLE 1
Mirror image relationship Mirror image address pair
Mirror image relationship 1 Physical Address 1 of the region to be mirrored<->Physical Address 2 of mirror region
Mirror image relationship 2 Physical Address 3 of the region to be mirrored<->Physical Address 4 of mirror region
As shown in table 1, the physical address 1 of the area to be mirrored corresponds to the physical address 2 of the mirrored area, the management node 120 looks up the table according to the physical address 1 of the area to be mirrored, determines the physical address of the mirrored area as the physical address 2, and performs a read operation or a write operation on the mirrored area according to the physical address 2 of the mirrored area.
It should be noted that, table 1 only illustrates a storage form of the correspondence in the storage device in a form of a table, and is not limited to the storage form of the correspondence in the storage device, and of course, the storage form of the correspondence in the storage device may also be stored in other forms, which is not limited in this embodiment.
Step 340, the management node 120 feeds back a mirror success response to the node 110A.
After the management node 120 allocates the mirror image area with the same size as the area to be mirrored according to the memory mirror image requirement, the mirror image success response is fed back to the node 110A. The node 110A may generate a mirror identifier of the area to be mirrored, where the mirror identifier indicates that the area to be mirrored is a successfully mirrored area, and is a clone. The node 110A may also generate a mapping relationship between a Virtual Address (VA) of the area to be mirrored and a Physical Address (PA) of the area to be mirrored, so that the node 110A determines the Physical Address of the area to be mirrored according to the Virtual Address of the area to be mirrored, and performs a read operation or a write operation on the area to be mirrored.
Further, after the memory mirror image configuration is completed, when service execution in the system is completed, virtual machines are deleted, containers are deleted and the like, and high-reliability data are not required to be backed up, the storage resources of the memory mirror image can be released. The present application also includes step 350.
Step 350, the management node 120 sends a memory mirror release indication to the node 110A and the node 110B.
In some embodiments, the management node 120 may receive a memory mirror release request from the node 110A, where the memory mirror release request indicates a region to be mirrored that is requested to be released, e.g., the memory mirror release request includes a physical address of the region to be mirrored and a size of the region to be mirrored. As another example, the memory mirror release request includes a physical address segment of the region to be mirrored. As another example, the memory mirror release request includes a physical address and an offset address of the region to be mirrored.
In other embodiments, the management node 120 determines that the area to be mirrored of the node 110A and the mirrored area of the node 110B are not used in the monitoring period, and the management node 120 determines to release the area to be mirrored of the node 110A and the mirrored area of the node 110B, so that the area to be mirrored and the mirrored area can be used for storing other data, thereby improving the utilization of the storage resource.
The management node 120 sends a first memory mirror release indication to the node 110A, where the first memory mirror release indication includes a physical address of the region to be mirrored. The management node 120 sends a second memory mirror release indication to the node 110B, the second memory mirror release indication including the physical address of the mirror region.
The node 110A releases the region to be mirrored according to the first memory mirror release instruction, or modifies the mirror identifier of the region to be mirrored to be invalid. The node 110B releases the mirror region according to the second memory mirror release indication or modifies the mirror identifier of the mirror region to be invalid.
Thus, the memory mirroring method does not depend on the operating system of the node, and the management node dynamically allocates the mirroring area according to the memory mirroring requirement to realize the memory mirroring without restarting the host for configuring the memory mirroring; and when the memory mirror image is not needed, the memory resource of the memory mirror image is dynamically released, so that the simpler and more efficient dynamic memory mirror image is realized, and the utilization rate of the memory resource is improved.
After the configuration of the memory mirror image is completed, a write operation is performed to the physical storage spaces which are mirror images by adopting a complete copying mode, so that the effect of the memory mirror image is realized. Fig. 4 is a flow chart of a data processing method provided in the present application. The write operation and the read operation of the area to be mirrored by the node 110A are described herein as an example. As shown in fig. 4, the method includes the following steps.
Step 410, node 110A sends a write indication to management node 120.
The write indication is used to indicate that the first data is stored to the area to be mirrored. For example, the node 110A determines the physical address of the region to be mirrored according to the virtual address query address mapping table of the region to be mirrored, and the write instruction includes the physical address of the region to be mirrored. The address mapping table indicates a mapping relationship of the virtual address and the physical address.
In step 420, the management node 120 writes the first data into the area to be mirrored and the mirrored area.
After the management node 120 obtains the write instruction, the first data is written into the area to be mirrored according to the physical address of the area to be mirrored included in the write instruction.
In some embodiments, the management node 120 supports a cache coherence protocol such as CXL3.0, p2p mode, etc., and the management node 120 writes the first data to the mirrored region.
In other embodiments, the management node 120 queries the mirroring relationship according to the physical address of the area to be mirrored, determines the physical address of the mirrored area, and writes the first data into the mirrored area according to the physical address of the mirrored area.
Step 430, node 110A sends a read indication to management node 120.
The read indication is used to indicate that the first data is read from the area to be mirrored. For example, the node 110A determines the physical address of the region to be mirrored according to the virtual address query address mapping table of the region to be mirrored, and the read indication includes the physical address of the region to be mirrored.
When no uncorrectable errors occur in the area to be mirrored, step 440 is performed. When an uncorrectable error occurs in the area to be mirrored, step 450 is performed.
Step 440, the management node 120 reads the first data from the area to be mirrored. The management node 120 feeds back the first data to the node 110A.
Step 450, the management node 120 reads the first data from the mirrored region.
The management node 120 determines that an uncorrectable error occurs in the region to be mirrored, queries the mirroring relationship according to the physical address of the region to be mirrored, determines the physical address of the mirrored region of the region to be mirrored, and reads the first data from the mirrored region according to the physical address of the mirrored region.
The management node 120 reads the first data from the area to be mirrored or reads the first data from the mirrored area, and then feeds back the first data to the node 110A.
In some embodiments, after the node 110A reads the data from the area to be mirrored, the data read from the area to be mirrored is checked to determine that the read data is erroneous, e.g., the data read from the area to be mirrored is not the first data. If the node 110A cannot correct the read error data by using the ECC technique, the management node 120 is instructed to read the first data from the mirror area, i.e. step 450 is performed.
The management node 120 supports cache coherence protocols such as CXL3.0 and p2p modes, and writes the first data into the region to be mirrored after reading the first data from the mirrored region.
The management node 120 does not support cache coherence protocols such as CXL3.0 and p2p modes, and the management node 120 feeds back the first data read from the mirrored region to the node 110A, and the node 110A requests the management node 120 to write the first data into the region to be mirrored.
If the first data is successfully written into the area to be mirrored, the area to be mirrored is not subject to hardware failure, and the error is accidental. If the first data fails to be written into the region to be mirrored, the hardware failure of the region to be mirrored is indicated, and the region to be mirrored and the mirror region are started to perform main-standby switching.
Further, when an uncorrectable error occurs in the area to be mirrored, the management node 120 may perform active-standby switching on the area to be mirrored and the mirrored area. For example, the management node 120 determines the mirrored region as the primary storage space. Thus, the node 110 is caused to perform a read operation or a write operation on the first data.
It will be appreciated that, in order to implement the functions of the above embodiments, the management node includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application scenario and design constraints imposed on the solution.
The memory mirroring method provided according to the present embodiment is described in detail above with reference to fig. 1 to 4, and the management device and the node provided according to the present embodiment will be described below with reference to fig. 5.
Fig. 5 is a schematic structural diagram of a possible management apparatus according to this embodiment. These management devices may be used to implement the functions of the management node in the above method embodiments, so that the beneficial effects of the above method embodiments may also be implemented. In this embodiment, the management device may be the management node 120 shown in fig. 3 or fig. 4, or may be a module (such as a chip) applied to a server.
As shown in fig. 5, the management apparatus 500 includes a communication module 510, a control module 520, and a storage module 530. The management device 500 is used to implement the functions of the management node 120 in the method embodiments shown in fig. 3 or fig. 4 described above.
The communication module 510 is configured to receive a memory mirroring requirement of a first node, and request to mirror a first area in a memory used by the first node. For example, the communication module 510 is configured to perform step 320 in fig. 3.
The control module 520 is configured to allocate a second area when the first node requests to mirror a first area in the memory used by the first node, where the second area is a mirror area of the first area, the second area is used to indicate a storage space in the second node that is the same as the first area in size, and the second area is used to store data of the first area in a backup manner. For example, the control module 520 is configured to perform step 330 of fig. 3.
The control module 520 is further configured to generate a mirror relationship between the first area and the second area, where the mirror relationship is used to indicate a correspondence between the first physical address and a second physical address, and the second physical address is used to indicate the second area.
The communication module 510 is further configured to receive a write operation or a read operation on the first area. For example, the communication module 510 is configured to perform step 340 in fig. 3. For example, the communication module 510 is configured to perform step 420, step 440, and step 450 of fig. 4.
The control module 520 is further configured to perform a write operation or a read operation on the first area and the second area according to the mirror relationship.
The communication module 510 is further configured to feed back to the node that the mirroring was successful. For example, the communication module 510 is configured to perform step 340 in fig. 3.
The communication module 510 is further configured to send a memory mirror release request to the node. For example, the communication module 510 is configured to perform step 350 of fig. 3.
The storage module 530 is configured to store the mirror relationship such that the control module 520 accesses the mirror region according to the mirror relationship.
It should be appreciated that the management apparatus 500 of the embodiments of the present application may be implemented by an application specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD), which may be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof. When the memory mirroring method shown in fig. 3 or fig. 4 is implemented by software, each module thereof may be a software module, and each module thereof may be a software module of the management apparatus 500.
The management apparatus 500 according to the embodiments of the present application may correspond to performing the methods described in the embodiments of the present application, and the above and other operations and/or functions of each unit in the management apparatus 500 are respectively for implementing the corresponding flow of each method in fig. 3 or fig. 4, and are not described herein for brevity.
Fig. 6 is a schematic structural diagram of a computing device 600 according to the present embodiment. As shown, computing device 600 includes a processor 610, a bus 620, a memory 630, a communication interface 640, and a memory unit 650 (which may also be referred to as a main memory unit). Processor 610, storage 630, memory unit 650, and communication interface 640 are connected by bus 620.
It should be appreciated that in this embodiment, the processor 610 may be a CPU, and the processor 610 may also be other general purpose processors, digital signal processors (digital signal processing, DSP), ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.
The processor may also be a graphics processor (graphics processing unit, GPU), a neural network processor (neural network processing unit, NPU), a microprocessor, ASIC, or one or more integrated circuits for controlling program execution in the present application.
The communication interface 640 is used to enable communication of the computing device 600 with external devices or appliances. In this embodiment, when the computing device 600 is used to implement the functions of the management node 120 shown in fig. 1, the communication interface 640 is used to obtain the memory mirroring requirements, and the processor 610 allocates the mirroring area. When computing device 600 is used to implement the functionality of node 110 shown in fig. 1, communication interface 640 is used to send memory mirroring requirements.
Bus 620 may include a path for transferring information between components such as processor 610, memory unit 650, and storage 630. The bus 620 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus 620 in the drawing. Bus 620 may be a peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. Bus 620 may be divided into an address bus, a data bus, a control bus, and the like.
As one example, computing device 600 may include multiple processors. The processor may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or computing units for processing data (e.g., computer program instructions). In this embodiment, when the computing device 600 is configured to implement the function of the management node 120 shown in fig. 3, the processor 610 is further configured to allocate, when the first node requests to mirror a first area in the memory used by the first node, a second area, where the second area is a mirror area of the first area, and the second area is used to indicate a storage space in the second node that is the same size as the first area, and the second area is used to store data of the first area in a backup manner.
The processor 610 is also configured to request a write or read operation to the mirrored region when the computing device 600 is configured to implement the functionality of the node 110 shown in fig. 4.
When the computing device 600 is used to implement the functionality of the management node 120 shown in fig. 4, the processor 610 is further configured to perform a write operation or a read operation on the mirrored region according to the mirroring relationship.
It should be noted that, in fig. 6, only the computing device 600 includes 1 processor 610 and 1 memory 630 as an example, where the processor 610 and the memory 630 are used to indicate a type of device or device, respectively, and in a specific embodiment, the number of each type of device or device may be determined according to service requirements.
Memory unit 650 may correspond to the memory mirror relationship used in the method embodiments described above. The memory unit 650 may be a volatile memory Chi Huofei pool of volatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The memory 630 may correspond to a storage medium used for storing information such as computer instructions, memory operation instructions, node identification, etc. in the above method embodiments, for example, a magnetic disk, such as a mechanical hard disk or a solid state hard disk.
The computing device 600 may be a general purpose device or a special purpose device. For example, computing device 600 may be an edge device (e.g., a box carrying a chip with processing capabilities), or the like. Alternatively, computing device 600 may be a server or other computing device.
It should be understood that the computing device 600 according to the present embodiment may correspond to the management apparatus 500 in the present embodiment, and may correspond to the execution of the respective main bodies in any one of the methods according to fig. 3 or fig. 4, and that the above and other operations and/or functions of the respective modules in the management apparatus 500 are respectively for implementing the respective flows of the respective methods in fig. 3 or fig. 4, and are not repeated herein for brevity.
The method steps in this embodiment may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a computing device. The processor and the storage medium may reside as discrete components in a computing device.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; optical media, such as digital video discs (digital video disc, DVD); but also semiconductor media such as solid state disks (solid state drive, SSD). While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (23)

1. A data processing system, the data processing system comprising a plurality of nodes and a management node;
the first node is used for requesting to mirror a first area in a memory used by the first node;
the management node is configured to allocate a second area, where the second area is a mirror image area of the first area, the second area is used to indicate a storage space in the second node that is the same as the first area in size, and the second area is used to store data of the first area in a backup manner.
2. The system of claim 1, wherein the first node indicates a first physical address of the first region;
the management node is further configured to generate a mirror relationship between the first area and the second area, where the mirror relationship is used to indicate a correspondence between the first physical address and a second physical address, and the second physical address is used to indicate the second area.
3. The system according to claim 1 or 2, wherein,
the management node is further configured to receive a write instruction sent by the first node, where the write instruction is used to instruct to store first data in the first area;
The management node is further configured to write the first data into the first area and the second area.
4. The system of claim 3, wherein the system further comprises a controller configured to control the controller,
the management node is further configured to receive a read instruction of the first node, where the read instruction is used to instruct reading of the first data from the first area;
the management node is further configured to read the first data from the first area when no uncorrectable error occurs in the first area.
5. The system of claim 4, wherein the system further comprises a controller configured to control the controller,
the management node is further configured to read the first data from the second area when an uncorrectable error occurs in the first area.
6. The system of claim 5, wherein the first region is a primary storage space and the second region is a backup storage space;
the management node is further configured to determine the second area as a main storage space when an uncorrectable error occurs in the first area.
7. The system of any one of claims 1-6, wherein,
the management node is further configured to instruct the first node to modify the image identifier of the first area to be invalid.
8. The system of any of claims 1-7, wherein the size of the first region is determined by application requirements.
9. The system of any of claims 1-8, wherein the second region includes any of a local storage space of the second node, an extended storage space of the second node, and a storage space of the second node in a global memory pool.
10. The system of any of claims 1-9, wherein the management node supports a cache coherence protocol.
11. The memory mirroring method is characterized in that a data processing system comprises a plurality of nodes and a management node; the method comprises the following steps:
the method comprises the steps that a first node requests to mirror a first area in a memory used by the first node;
the management node allocates a second area, wherein the second area is a mirror image area of the first area, the second area is used for indicating a storage space which is the same as the first area in the second node, and the second area is used for backing up and storing the data of the first area.
12. The method of claim 11, wherein the first node indicates a first physical address of the first region; the method further comprises the steps of:
The management node generates a mirror relationship between the first area and the second area, wherein the mirror relationship is used for indicating a corresponding relationship between the first physical address and a second physical address, and the second physical address is used for indicating the second area.
13. The method according to claim 11 or 12, characterized in that the method further comprises:
the management node receives a write instruction sent by the first node, wherein the write instruction is used for indicating to store first data into the first area;
the management node writes the first data to the first area and the second area.
14. The method of claim 13, wherein the method further comprises:
the management node receives a read instruction of the first node, wherein the read instruction is used for indicating the first data to be read from the first area;
the management node reads the first data from the first area when an uncorrectable error does not occur in the first area.
15. The method of claim 14, wherein the method further comprises:
the management node reads the first data from the second area when an uncorrectable error occurs in the first area.
16. The method of claim 15, wherein the first region is a primary storage space and the second region is a backup storage space, the method further comprising:
the management node determines the second region as a primary storage space when an uncorrectable error occurs in the first region.
17. The method according to any one of claims 11-16, further comprising:
the management node instructs the first node to modify the image identification of the first region to be invalid.
18. The method of any of claims 11-17, wherein the size of the first region is determined by application requirements.
19. The method of any of claims 11-18, wherein the second region includes any of a local storage space of the second node, an extended storage space of the second node, and a storage space of the second node in a global memory pool.
20. The method according to any of claims 11-19, wherein the management node supports a cache coherence protocol.
21. A management apparatus for use in a data processing system, the data processing system comprising a base node, the base node comprising a first node and a second node, the apparatus comprising:
The control module is used for distributing a second area when the first node requests to mirror a first area in the memory used by the first node, wherein the second area is a mirror area of the first area, the second area is used for indicating a storage space with the same size as the first area in the second node, and the second area is used for backing up and storing data of the first area.
22. The apparatus of claim 21, wherein the first node indicates a first physical address of the first region;
the control module is further configured to generate a mirror relationship between the first area and the second area, where the mirror relationship is used to indicate a correspondence between the first physical address and a second physical address, and the second physical address is used to indicate the second area.
23. A computing device comprising a memory and at least one processor, the memory for storing a set of computer instructions; when the processor executes the set of computer instructions, the controller performs the method of any of claims 11-20.
CN202211519995.3A 2022-09-09 2022-11-30 Data processing system, memory mirroring method, memory mirroring device and computing equipment Pending CN117687835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/102963 WO2024051292A1 (en) 2022-09-09 2023-06-27 Data processing system, memory mirroring method and apparatus, and computing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211105202 2022-09-09
CN2022111052023 2022-09-09

Publications (1)

Publication Number Publication Date
CN117687835A true CN117687835A (en) 2024-03-12

Family

ID=90127199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211519995.3A Pending CN117687835A (en) 2022-09-09 2022-11-30 Data processing system, memory mirroring method, memory mirroring device and computing equipment

Country Status (2)

Country Link
CN (1) CN117687835A (en)
WO (1) WO2024051292A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10037371B1 (en) * 2014-07-17 2018-07-31 EMC IP Holding Company LLC Cumulative backups
KR20210041655A (en) * 2019-10-07 2021-04-16 삼성전자주식회사 Memory chip, memory system having the same and operating method thereof
CN113282342A (en) * 2021-05-14 2021-08-20 北京首都在线科技股份有限公司 Deployment method, device, system, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
WO2024051292A1 (en) 2024-03-14

Similar Documents

Publication Publication Date Title
JP6791834B2 (en) Storage system and control software placement method
US10229024B2 (en) Assisted coherent shared memory
CN110795206B (en) System and method for facilitating cluster-level caching and memory space
WO2013081616A1 (en) Hardware based memory migration and resilvering
US20210064234A1 (en) Systems, devices, and methods for implementing in-memory computing
US11822445B2 (en) Methods and systems for rapid failure recovery for a distributed storage system
US11693738B2 (en) Storage system spanning multiple failure domains
US11693818B2 (en) Data migration in a distributive file system
US11899621B2 (en) Access redirection in a distributive file system
CN117687835A (en) Data processing system, memory mirroring method, memory mirroring device and computing equipment
WO2018055686A1 (en) Information processing system
US10437471B2 (en) Method and system for allocating and managing storage in a raid storage system
US20240103740A1 (en) Storage system, data control method
TWI763331B (en) Backup method and backup system for virtual machine
US11816331B2 (en) Storage system and storage program update method
JP7057408B2 (en) Storage system and its control method
US20240069742A1 (en) Chassis servicing and migration in a scale-up numa system
US20210271419A1 (en) All flash array server and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination