WO2023143039A1 - 一种数据处理方法及装置 - Google Patents

一种数据处理方法及装置 Download PDF

Info

Publication number
WO2023143039A1
WO2023143039A1 PCT/CN2023/071510 CN2023071510W WO2023143039A1 WO 2023143039 A1 WO2023143039 A1 WO 2023143039A1 CN 2023071510 W CN2023071510 W CN 2023071510W WO 2023143039 A1 WO2023143039 A1 WO 2023143039A1
Authority
WO
WIPO (PCT)
Prior art keywords
operating system
slave
memory
request
slave operating
Prior art date
Application number
PCT/CN2023/071510
Other languages
English (en)
French (fr)
Inventor
屈欢
高军
高超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023143039A1 publication Critical patent/WO2023143039A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of computer technology, in particular to a data processing method and device.
  • Memory is one of the more error-prone components of server motherboards. With the doubling of memory capacity and memory speed, the probability of memory errors will also double. Memory errors may cause system failures, and system recovery usually takes a long time, which has a great impact on business.
  • the present application provides a data processing method and device, which can improve system reliability and reduce the impact of memory errors on services without changing the original memory of the device.
  • the embodiment of the present application provides a data processing method, which can be applied to a first computing device, and the first computing device runs at least a master operating system and a first slave operating system; in the method, The main operating system of the first computing device sends the request to the second slave operating system when the first slave operating system is faulty and cannot handle the request, wherein the second slave operating system can be the first slave operating system on the first computing device
  • a slave operating system previously running on the system may also be a slave operating system running on the second computing device; the second slave operating system processes the request.
  • the main operating system is used to provide services for external devices, such as receiving requests, and the second slave operating system and the second slave operating system are both used to process requests.
  • the main operating system can Sending the request to the second slave operating system for processing will not cause service interruption due to memory errors in the first slave operating system, thereby improving system reliability.
  • the memory reliability of the master operating system is higher than that of any slave operating system.
  • the main operating system is used to communicate with the client.
  • the memory reliability of the main operating system is high, the probability of memory errors in the main operating system is low.
  • the main operating system does not fail, it can provide services for the client. Services will not be interrupted, thereby improving system reliability and reducing the impact of memory failures on services.
  • the first computing device includes a first hardware resource group and a second hardware resource group; each hardware resource group includes processor resources and memory resources; the first hardware resource group provides resources for the main operating system; the second The second hardware resource group provides resources for the first slave operating system.
  • the hardware resource group of the main operating system and the hardware resource group of the first slave operating system are different, so as to provide a way to configure the memory type and memory capacity of the main operating system, and at the same time ensure that the main operating system and the first slave operating system
  • the main operating system will not access the memory of the first slave operating system, the memory capacity of the main operating system is lower than that of the first computing device running a separate operating system, and the probability of memory errors is also low , improving the reliability of the main operating system in providing services to the client.
  • the memory of the main operating system is configured using a mirroring technology.
  • the main operating system uses technologies such as mirroring to configure the memory of the main operating system, thereby improving the memory reliability of the main operating system, reducing the probability of service interruption caused by the main operating system accessing the memory error, and improving the overall reliability of the system .
  • the master operating system sends the request to the first slave operating system; the first slave operating system processes the request.
  • the aforementioned request may be a request from the client, such as a write request or a read request
  • the second slave operating system processes the request, including: if the request is a write request, the second slave operating system will request Write the data to be written carried in the storage device into the storage device; or if the request is a read request, the second slave operating system obtains the read data requested by the request from the storage device, and sends the data to the main operating system.
  • the embodiment of the present application also provides a data processing device, the data processing device has the function of realizing the first computing device in the method embodiment of the first aspect above, and the beneficial effect can be referred to the description of the first aspect here No longer.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the data processing device includes a master operating system instance, a first slave operating system instance, and optionally a second operating system instance, and these instances can execute the above-mentioned first aspect method
  • the beneficial effects please refer to the description of the first aspect and will not be repeated here.
  • the embodiment of the present application also provides a data processing device, the data processing device has the function of implementing the first computing device in the method example of the first aspect above, and the beneficial effect can be referred to the description of the first aspect, which is not here Let me repeat.
  • the structure of the device includes a first processor, a second processor, a first memory, and a second memory, and optionally, may also include a communication interface.
  • the first processor and the second processor are configured to support the data processing device to perform corresponding functions in the method of the first aspect above.
  • the structure of the data processor device also includes a communication interface for communicating with other devices, such as sending a request from the client to the second computing device, or receiving data requested by the client sent by the second computing device , the beneficial effect can refer to the description of the first aspect and will not be repeated here.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer executes the above-mentioned first aspect and each possibility of the first aspect.
  • the beneficial effects may refer to the description of the first aspect and will not be repeated here.
  • the present application also provides a computer program product containing instructions, which, when run on a computer, cause the computer to execute the method of the first computing device in the above-mentioned first aspect and each possible implementation manner of the first aspect,
  • a computer program product containing instructions, which, when run on a computer, cause the computer to execute the method of the first computing device in the above-mentioned first aspect and each possible implementation manner of the first aspect,
  • the present application also provides a computer chip, the chip is connected to the memory, and the chip is used to read and execute the software program stored in the memory, and implement the above first aspect and each possibility of the first aspect.
  • the beneficial effects may refer to the description of the first aspect and will not be repeated here.
  • the embodiment of the present application also provides a data processing system, the system includes at least a first computing device and a second computing device, the first computing device runs at least a first master operating system and a first slave operating system, The second computing device runs at least a second main operating system and a second slave operating system; the first computing device has the functions of the first computing device in the method example of the above-mentioned first aspect, and the second slave operating system is used in the first computing device.
  • the slave operating system When the slave operating system is faulty and cannot process the request, it receives the request sent by the first master operating system and processes the request.
  • the beneficial effects please refer to the description of the first aspect and will not repeat them here.
  • the embodiment of the present application also provides a system, the system includes a client and a first computing device, the first computing device runs at least a master operating system and a first slave operating system; The first computing device sends a request, and the request is used to request access to data; the first computing device has the function of implementing the first computing device in the method example of the first aspect above, and for the beneficial effect, please refer to the description of the first aspect. Let me repeat.
  • FIG. 1 is a schematic diagram of a hardware architecture of a system provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a software architecture of a controller provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart corresponding to a data processing method provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a scene of a data processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another data processing method provided by the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • User state/kernel state The so-called user state and kernel state refer to the operating state of the operating system.
  • computer systems generally implement hierarchical protection, that is, according to the severity of the impact on the computer system, certain operations must be performed by certain roles with corresponding permissions. For example, operations such as direct access to hardware and modification of hardware operating modes require the highest permissions to execute.
  • Modern CPUs generally provide multiple operating permission levels.
  • the operating system is generally divided into multiple operating states to cooperate with the CPU.
  • the common states of the operating system are user mode and kernel mode.
  • the kernel state generally has the highest authority, and all instructions and operations are allowed to be executed by the CPU; while the user state generally has lower authority, in which software programs can only execute limited instructions and operations, and high-risk operations are not allowed by the CPU.
  • Hardware allows, such as configuring the CPU internal control registers, accessing the memory address of the kernel part, etc.
  • a process can refer to a running activity of a program with certain independent functions, or a carrier for running an application program. It can also be understood that a process is a running instance of an application program and is a dynamic execution of the application program. For example, when a user runs the Notepad program (Notepad), the user creates a process for accommodating the codes that make up Notepad.exe and the dynamic link library that it needs to call.
  • Notepad Notepad program
  • Memory errors including correctable errors (corrected error, CE) and uncorrectable errors (uncorrected error, UCE), where correctable errors refer to errors that can be corrected through memory error checking and correction functions, and vice versa, uncorrectable errors refers to errors that cannot be corrected by the memory error checking and correcting function.
  • FIG. 1 is a schematic diagram of a possible applicable system architecture provided by the embodiment of the present application.
  • the system architecture includes an application server 100 , a switch 101 , and a storage system 120 .
  • the application server 100 may be a physical machine or a virtual machine. Physical application servers include, but are not limited to, desktops, servers, laptops, and mobile devices.
  • the application server accesses the storage system 120 through the optical fiber switch 101 to access data.
  • the switch 101 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the storage system 120 shown in FIG. 1 is a centralized storage system.
  • the characteristic of the centralized storage system is that there is a unified entrance, and all data from external devices must pass through this entrance, and this entrance is the engine 121 of the centralized storage system.
  • the engine 121 is the most core component in the centralized storage system, where many advanced functions of the storage system are implemented.
  • FIG. 1 there are one or more controllers within the engine 121 .
  • Figure 1 illustrates that the engine includes two controllers as an example. There is a mirror channel between controller 0 and controller 1, so that the two controllers can back up each other.
  • the engine 121 also includes a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100 .
  • the back-end interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 is connected with more hard disks 134, thereby forming a very large storage resource pool.
  • controller 1 The hardware components and software structure of the controller 1 (and other controllers not shown in FIG. 1 ) are similar to those of the controller 0.
  • controller 0 is taken as an example for illustration.
  • the controller 0 includes at least a processor 123 and a memory 124 .
  • Processor 123 is a central processing unit (central processing unit, CPU), which is used to process data access requests from outside the storage system (server or other storage systems), and is also used to process requests generated inside the storage system, such as read requests, write requests, etc.
  • CPU central processing unit
  • the processor 123 may preferentially store the data in the memory, such as the memory 124 .
  • the processor 123 When the amount of data in the memory 124 reaches a certain threshold, the processor 123 sends the data stored in the memory 124 to the hard disk 134 through the back-end port 126 for persistent storage. It should be noted that only one processor 123 is shown in FIG. 1 , and in actual applications, there are often multiple CPUs 123, wherein one processor 123 has one or more CPU cores. This embodiment does not limit the number of CPUs and the number of CPU cores.
  • the memory 124 refers to an internal memory directly exchanging data with the processor, it can read and write data at any time, and the speed is very fast.
  • the memory 124 includes various types of memory, such as random access memory (random access memory, RAM) and read only memory (Read Only Memory, ROM).
  • random access memory includes, but is not limited to: Dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), Double Data Rate Synchronous Dynamic Random Access Memory (double data rate, DDR), Static Random Access Memory (Static Random Access Memory) Random Access Memory, SRAM), etc.
  • Read-only memory includes but is not limited to: Programmable Read Only Memory (Programmable Read Only Memory, PROM), Erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, EPROM), etc.
  • PROM Programmable Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • multiple memories 124 and different types of memories 124 may be configured in the controller 0 . This embodiment does not limit the quantity and type of the memory 124 .
  • the memory 124 is used as a temporary data storage for an operating system (operating system, OS) or other running programs, in which program codes are stored, and the processor 123 executes the program codes stored in the memory 124 to realize the functions designed by the program codes. If a memory error occurs during the operation of the processor 123, it will cause a process failure, and the process needs to be restarted, resulting in service interruption. In more serious cases, the operating system needs to be restarted, increasing the time for service recovery. It can be seen that the memory 124 plays a vital role in the normal operation of the system. However, with the increase of memory capacity and frequency, the probability of memory errors is gradually increasing, making it difficult to guarantee the reliability of the system and immeasurable impact on business.
  • an embodiment of the present application provides a data processing method, which is used to improve system reliability and reduce the impact of memory errors on services.
  • the software structure applicable to the data processing method is firstly introduced.
  • the following describes the software structure of the controller 0 provided in the embodiment of the present application by taking the controller 0 in the system shown in FIG. 1 as an example with reference to FIG. 2 .
  • the controller 0 is installed and runs with at least two operating systems ( Figure 2 only shows two operating systems, but the embodiment of the present application does not limit this), the at least two operating systems
  • the system includes a master operating system (master OS) and one or more slave operating systems (slave OS) ( Figure 2 only shows a slave operating system, but this embodiment of the present application does not limit it).
  • main operating system and the slave operating system have different functions and memory reliability levels, which are introduced as follows:
  • the main operating system is used to provide services for user equipment (such as the application server 100). For example, when providing the aforementioned storage service, the main operating system (or the processor running the main operating system) receives the information sent by the application server 100 through the front-end interface 125. Requests, such as read requests, write requests, etc.
  • the main operating system is used to provide services for external devices. As shown in FIG. , the service is used to communicate with the application server 100 through the front-end interface 125 .
  • the processes or software that provide services to users run in the user state of the operating system, and the similarities will not be repeated below.
  • the main operating system can also run front-end peripheral drivers. It is assumed that the driver runs in the kernel state, and is used to drive the front-end interface 125 to communicate with the application server 100 .
  • the main operating system can also be used to manage the slave operating system. For example, when the system is powered on, the main operating system is first pulled up. The main operating system allocates hardware resources for the slave operating system and detects whether the slave operating system is running normally. For example, by monitoring the heartbeat of the slave operating system Check whether the slave operating system is faulty. When a slave operating system is detected to be faulty, it can be restored to normal operation by restarting the slave operating system. Exemplarily, as shown in FIG. 2 , the Mgmt OS process running on the master operating system is used to provide the function of managing the slave operating system.
  • the main operating system also has the function of communicating with other controllers (such as controller 1). Exemplarily, as shown in FIG.
  • controllers such as controller 1). Exemplarily, as shown in FIG.
  • the received request is forwarded to the controller 1, and the request is processed by the controller 1.
  • the main operating system also includes a memory management driver, which is used to manage the memory of the main operating system.
  • Memory management strategies such as memory mirroring, memory inspection, such as an independent disk structure (raid) for distributed parity in memory, etc., wherein , memory mirroring is similar to hard disk hot backup.
  • Memory mirroring is to make two copies of memory data and place them in the main memory and mirror memory respectively. In this way, when there is an error in the memory data in the main memory, it can also be obtained from the mirror memory. data, thereby improving the memory reliability of the main operating system.
  • memory raid refers to distributing data in the memory of the main operating system in units of blocks.
  • the memory raid can be raid4, raid5, raid6, raid10, raid0, raid1, wherein, raid5 allows a memory unit to be damaged, that is, when a memory unit is damaged, the data in other memory units can be used to restore the damaged data ;raid6 allows two memory units to be damaged.
  • raid4 allows a memory unit to be damaged, that is, when a memory unit is damaged, the data in other memory units can be used to restore the damaged data
  • raid6 allows two memory units to be damaged.
  • memory management can also provide basic management of memory, such as memory application and release, and this application does not specifically limit this function.
  • the operating system From the operating system, it is used to calculate or process data, and in this application, it can be used to process read requests and write requests from the application server 100 .
  • the services provided by the operating system include, but are not limited to: data plane services, metadata services, and backend disk processing services.
  • Drivers running in the kernel mode of the slave operating system include but are not limited to: back-end peripheral drivers, memory management, etc.
  • the data plane service can be used to process the read request and write request from the application server 100.
  • the read request for the read request, the read request requested by the read request can be obtained from the memory 124 or from the hard disk 134 through the back-end interface 126. fetched data.
  • the data to be written carried in the write request can be written into the internal memory 124, or the data can be written into the hard disk 134 through the back-end interface 126.
  • data to be written could also be calculated or processed, such as data deduplication, data compression, and data verification.
  • the data plane service may run in the user mode of the slave operating system.
  • the slave operating system may read or write to the hard disk 134 through the back-end interface 126 through the back-end peripheral driver running in the kernel mode.
  • Metadata service used to generate metadata.
  • Memory management is used to manage the memory of the slave operating system, such as memory application and release.
  • the master OS and the slave OS can also have more or less functions than those shown in Figure 2.
  • the master OS can also include services for managing the slave OS, infrastructure services, scheduling Management, from the OS may also include control plane services, scheduling services, etc.
  • manage the service of the slave OS such as detecting whether the slave OS is faulty, restarting the slave OS if faulty, and so on.
  • Infrastructure services are used to manage basic components such as cpu, threads, etc., which can be referred to the introduction of related technologies, and will not be enumerated here. This application does not specifically limit the functions of the master operating system and the slave operating system.
  • the names of the above-mentioned services or drivers are just surrogate names, and may have different names in different application scenarios, which are not specifically limited in this application.
  • the hardware used by the main operating system and the slave operating system are all from the controller 0, and the hardware structure of the control 0 can be referred to the introduction in Figure 1, and will not be described here.
  • the main operating system and the slave operating system correspond to a set of hardware respectively, or in other words, the hardware of different operating systems is isolated from each other.
  • the master operating system will not use the hardware of the slave operating system, and the slave operating system will not use The hardware of the main operating system.
  • the process of hardware allocation may include: splitting (or isolating) the hardware of controller 0 (including but not limited to processor 123, memory 124, front-end interface 125, and back-end interface 126) into two parts, to Obtain two hardware groups (such as hardware group 1 and hardware group 2 in Fig. 2), each hardware group is uniquely assigned to an operating system to use, for example, assume that controller 0 includes two processors 123, then can be One of the processors 123 is allocated to the main operating system for running the main operating system. Another processor 123 is assigned to the slave operating system for running the slave operating system.
  • the multiple cores can be split into two parts, wherein a part of the cores is allocated to the main operating system, and the rest of the cores are allocated to the main operating system. to the slave operating system.
  • the memory 124 is divided into two parts. Similarly, a part of the memory is allocated to the master operating system, and the rest is allocated to the slave operating system.
  • the front-end interface 125 is assigned to the master operating system
  • the back-end interface 126 is assigned to the slave operating system.
  • the hardware splitting here can be realized by multi-OS partition technology, and can also be realized by other methods, which is not specifically limited in this application.
  • hardware group 1 is allocated to the main operating system, and hardware group 1 includes part of processor resources, part of memory resources and front-end interface 125 of controller 0 .
  • the hardware group 2 is allocated to the slave operating system, and the hardware group 2 includes the rest of the processor resources of the controller 0 , the rest of the memory resources and the back-end interface 126 .
  • the main operating system uses the hardware of hardware group 1 and does not use the hardware of hardware group 2.
  • the slave operating system uses the hardware of hardware group 2, and does not use the hardware of hardware group 1.
  • the number or capacity of the hardware included in the above two hardware groups is not necessarily equal. In other words, when splitting the hardware, it does not need to be evenly distributed.
  • the capacity is different from the memory capacity of the slave OS.
  • the memory capacity of the master OS is much lower than the memory capacity of the slave OS, that is, the memory capacity included in hardware group 1 is lower than that included in hardware group 2, assuming that the controller
  • the total capacity of the memory 124 included in 0 is 4GB
  • the memory capacity allocated to the main operating system can be 512MB
  • the rest of the memory (3GB+512MB) is all allocated to the slave operating system.
  • this is just an example, and this application does not specifically limit it.
  • the above allocation method allocates more memory to the slave operating system, which can ensure the performance of processing requests from the slave operating system, and allocates relatively less memory to the master operating system, thus reducing the chance of accessing memory errors in the master operating system. Probability, the lower the probability of memory errors, the higher the memory reliability. This can also reflect that the memory reliability level of the master operating system is different from that of the slave operating system. In this application, the memory reliability of the main operating system is higher than that of the operating system.
  • the type of memory allocated to the main operating system and the type of memory allocated to the slave operating system can be the same or different.
  • the memory allocated to the master operating system includes SRAM
  • the memory allocated to the slave operating system includes SRAM and DRAM, etc. Thereby further improving memory reliability and response speed of the main OS.
  • controller 1 includes at least one master operating system and one slave operating system.
  • the software structures of controller 1 and controller 0 can be the same or different, such as controller 0 and controller
  • the software structure of 1 includes a master OS and a slave OS, or controller 0 includes a master OS and a slave OS, and controller 1 includes a master OS and multiple slave OSs, which is not limited in this application.
  • Figure 1 the above describes the hardware structure and software structure of a system applicable to the embodiment of this application.
  • a centralized storage system integrated with 121 has a hard disk slot, and the hard disk 134 can be directly deployed in the engine 121, that is, the hard disk 134 and the engine 121 are deployed in the same device.
  • the storage system 120 may also be a storage system with separate disk control.
  • the engine 121 may not have a hard disk slot, and the hard disk 134 needs to be placed in the hard disk enclosure 130.
  • the back-end interface 126 and the hard disk enclosure 130 communication is a centralized storage system integrated with 121 has a hard disk slot, and the hard disk 134 can be directly deployed in the engine 121, that is, the hard disk 134 and the engine 121 are deployed in the same device.
  • the storage system 120 may also be a storage system with separate disk control.
  • the engine 121 may not have a hard disk slot, and the hard disk 134 needs to be placed in the hard disk enclosure 130.
  • the back-end interface 126 exists in the engine 121 in the form of an adapter card, and one engine 121 can use two or more back-end interfaces 126 to connect multiple hard disk enclosures at the same time.
  • the adapter card can also be integrated on the motherboard, and at this time the adapter card can communicate with the processor 123 through a high-speed serial computer expansion (peripheral component interconnect express, PCI-E) bus.
  • PCI-E peripheral component interconnect express
  • the data processing method provided by the embodiment of the present application is not only applicable to the centralized storage system, but also applicable to the distributed storage system, as shown in Figure 3, which is a distributed storage system provided by the embodiment of the present application
  • the server cluster includes one or more servers 140 (three servers 110 are shown in FIG. 3 , but not limited to three servers 110 ), and each server 140 can communicate with each other.
  • the server 110 includes at least a processor 112 , a memory 113 , a network card 114 , and optionally, a hard disk 105 .
  • the processor 112, the memory 113, the network card 114 and the hard disk 105 are connected through a bus.
  • the software structure of each server 110 can refer to the introduction of the software structure of the controller 0 shown in FIG. 2 , which will not be repeated here.
  • controller 0 shown in FIG. 1 and the server 110 shown in FIG. 3 are only examples. In actual products, the controller 0 and the server 110 may have more or fewer components.
  • the controller 0 and the server 110 may also include input/output devices such as a keyboard, a mouse, and a display screen.
  • the present application does not specifically limit the hardware structure of the device in the applicable system of the embodiment of the present application, and any device that can install at least two operating systems is applicable to the embodiment of the present application.
  • controller 0 or controller 1 where controller 0 runs at least one master OS and one slave OS, and controller 1 also runs at least one master OS and one slave OS, as described below taking controller 0 as an example .
  • Fig. 4 is a schematic flow chart of a data processing method provided in the embodiment of the present application. As shown in Fig. 4, the method includes the following steps:
  • Step 401 the main OS of the controller 0 receives the request sent by the application server 100 .
  • the main OS can configure the memory of the main OS using mirroring technology.
  • the main OS backs up the memory data generated during the entire process of processing the request from the application server 100 to the mirror memory In this way, when the main OS accesses the memory error, the memory data can also be restored from the mirrored memory space, which will not cause the failure of the main OS, improve the memory reliability of the main OS, and reduce the probability of system failure and the impact on business.
  • step 402a the master OS sends the request to the first slave OS without detecting the failure of the slave OS of the controller 0 (referred to as the first slave OS).
  • Step 403a the first slave OS processes the request.
  • the processing flow of the request by the first slave OS refer to the processing flow of the second slave OS below, which will not be repeated here.
  • Step 402b the master OS detects that the first slave OS is faulty.
  • Step 403b the master OS sends the request to the second slave OS.
  • Step 404b the second slave OS processes the request.
  • controller 0 in addition to the main OS, controller 0 only runs one slave OS, then the second slave OS can be a slave OS running in controller 1, please understand it in conjunction with Figure 5, which shows the control is a schematic diagram of the transmission path of the request exchanged between controller 0 and controller 1.
  • the front-end protocol service of the master OS of controller 0 receives the request sent by the application server 100 through the front-end interface 125.
  • the front-end protocol service sends the request to the forwarding service, and the forwarding service sends the request to the second slave OS of the controller 1 .
  • the second slave OS may first temporarily cache the data to be written carried in the write request in the memory of the second slave OS, and when the memory of the second slave OS When the amount of data reaches a certain threshold, the second slave OS sends the data in the memory to the hard disk 134 for storage through the back-end interface 126.
  • the slave OS can also perform data processing on the data, such as deduplication, data compression, or other processing, and write the processed data into the hard disk 134 to reduce the storage space occupied by the data.
  • the metadata service can also generate metadata of the data, and write the metadata into the memory or hard disk 134 of the second slave OS (not shown in FIG. 5 ).
  • the second slave OS obtains the data requested by the read request. For example, if the data is stored in the memory of the second slave OS, that is, a memory hit, the second slave OS slaves The memory reads the data, and if the memory misses, the data is read from the hard disk 134, and the read data is returned to the first main OS, and the read data is sent to the controller 0, and the controller 0 sends the data to the application server 100 through the front-end interface 125, see the transmission path shown by the dotted line in FIG. 5 .
  • the controller 0 runs multiple slave OSs
  • the second slave OS can also be a slave OS on the controller 0 other than the first slave OS, as shown in Figure 6 , when the controller 0 detects the failure of the first slave OS, it switches the service to another slave OS of the controller 0 (for example, it is denoted as the third slave OS), and the third slave OS processes the request, wherein, For the request transmission process and the request processing process of the third slave OS, reference may be made to the foregoing description, which will not be repeated here.
  • the hardware of controller 0 is split to construct at least two OSs with different memory reliability levels.
  • the main OS with high reliability level is used to deploy upper-layer business-related services, while the main OS with low reliability level
  • the master OS can detect the failure of the slave OS at a faster speed, and switch the request to other slave OS for processing, such as other slave OS of this controller, or other controllers. From the OS, those skilled in the art have determined through experiments that the time for the master OS to switch the request from the first slave OS to the second slave OS can reach the second level, see Table 1, Table 1 is some experiments obtained by the technicians of the present application through experiments data.
  • the single OS system means that in the hardware architecture shown in Figure 1, the controller 0 only runs one OS (that is, a single OS).
  • the required recovery time is 5 seconds. (second) to 3 minutes (minute).
  • the kernel mode driver of the OS accesses the UCE, the recovery time required is 2 minutes to 10 minutes.
  • controller 0 and controller 1 are single OS, when controller 0 fails, controller 1 can take over the business of controller 0, and when controller 0 fails, controller 1 can take over the business of controller 0
  • the required switching time is about 30 seconds, and the business is interrupted during the above recovery period or during the business switching period.
  • the memory allocated to the main OS is less, so the memory management can be performed by using mirroring technology or memory raid to further improve the memory reliability of the main OS.
  • the probability of memory errors in the master OS is quite low. If the master OS detects that the slave OS is faulty, it can switch the request to other slave OSs, and the service will not be interrupted due to memory errors in the slave OS, and the entire switching time is less than 1 second , the upper-layer business only perceives a slight delay, which can greatly reduce the impact of memory failures on the business and improve the reliability of the entire system.
  • the embodiment of the present application further provides a data processing device, which is configured to execute the method executed by the controller 0 in the above method embodiment.
  • the device may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the device 700 may be implemented by a system on a chip.
  • the system-on-a-chip may be composed of chips, or may include chips and other discrete devices.
  • the data processing apparatus 700 includes a master operating system instance 701 and a first slave operating system instance 702 ; optionally, a second slave operating system instance 703 may also be included.
  • the master operating system instance 701 is used to send the request to the target slave operating system instance when the first slave operating system instance 702 fails to process the request; for the specific implementation, please refer to the description of steps 402b to 403b in FIG. 4 , Wherein, the target slave operating system instance is the second slave operating system instance 703, or refer to the process description shown in FIG. 5 and will not be repeated here. Or the target slave operating system instance is the slave operating system instance in the second computing device; for the specific implementation, please refer to the process description shown in FIG. 6 and will not repeat it here.
  • the target slave operating system instance is used to process the request.
  • step 403b in FIG. 4 please refer to the description of step 403b in FIG. 4 , which will not be repeated here.
  • the main operating system instance 701 is also used to receive the request sent by the client device (such as the application server 100); for the specific implementation, please refer to the description of step 401 in FIG. repeat.
  • the memory reliability of the master operating system instance 701 is higher than the memory reliability of the first slave operating system instance 702 and the second slave operating system instance 703 .
  • the memory of the main operating system instance 701 is configured using a mirroring technology.
  • the device 700 when the device 700 includes the first operating system instance 701 and the second operating system instance 702, the device 700 includes a first hardware resource group and a second hardware resource group; each The hardware resource group includes processor resources and memory resources; the first hardware resource group provides resources for the master operating system instance 701 ; the second hardware resource group provides resources for the first slave operating system instance 702 .
  • the master operating system instance 701 is further configured to: if the first slave operating system instance is not faulty, send the request to the first slave operating system instance; Slave OS instance for processing the request.
  • the master operating system instance 701 is further configured to: if the first slave operating system instance is not faulty, send the request to the first slave operating system instance; Slave OS instance for processing the request.
  • the data processing apparatus provided in the embodiment of the present invention may be applied to a storage system or a server, which is not limited in the embodiment of the present invention.
  • the embodiment of the present application also provides a computer storage medium, the computer storage medium stores computer instructions, and when the computer instructions are run on the data processing device, the data processing device executes the steps of the above-mentioned related methods to realize the above-mentioned steps in the above-mentioned embodiments.
  • the controller executes the steps of the above-mentioned related methods to realize the above-mentioned steps in the above-mentioned embodiments.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the above-mentioned related steps, so as to realize the method executed by the controller 0 in the above-mentioned embodiment, see FIG. 4 The description of the steps will not be repeated here, and will not be repeated here.
  • an embodiment of the present application also provides a device, which may specifically be a chip, a component or a module, and the device may include a connected processor and a memory; wherein the memory is used to store computer-executable instructions, and when the device is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the methods executed by the controller 0 in the above method embodiments. Refer to the description of each step in FIG. 4 , which will not be repeated here.
  • the storage device, computer storage medium, computer program product or chip provided in the embodiment of the present application is used to execute the method corresponding to the controller 0 provided above, therefore, the beneficial effects it can achieve can refer to the above-mentioned The beneficial effects of the provided corresponding method will not be repeated here.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or It may be integrated into another device, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may be one physical unit or multiple physical units, which may be located in one place or distributed to multiple different places. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit (or module) in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • an integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium Among them, several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present application.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk.
  • the computer-executed instructions in the embodiments of the present application may also be referred to as application program codes, which is not specifically limited in the embodiments of the present application.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.
  • a magnetic medium such as a floppy disk, a hard disk, or a magnetic tape
  • an optical medium such as a DVD
  • a semiconductor medium such as a solid state disk (Solid State Disk, SSD)
  • the various illustrative logic units and circuits described in the embodiments of the present application can be implemented by a general-purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, Discrete gate or transistor logic, discrete hardware components, or any combination of the above designed to implement or operate the described functions.
  • the general-purpose processor may be a microprocessor, and optionally, the general-purpose processor may also be any conventional processor, controller, microcontroller or state machine.
  • a processor may also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration to accomplish.
  • the steps of the method or algorithm described in the embodiments of this application may be directly embedded in hardware, a software unit executed by a processor, or a combination of both.
  • the software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other storage medium in the art.
  • the storage medium can be connected to the processor, so that the processor can read information from the storage medium, and can write information to the storage medium.
  • the storage medium can also be integrated into the processor.
  • the processor and storage medium can be provided in an ASIC.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

一种数据处理方法及装置,在该方法中,第一计算设备至少运行有主操作系统和第一从操作系统;主操作系统在第一从操作系统发生故障时,将该请求发送给第二从操作系统;其中,第二从操作系统可以是第一计算设备上除该第一从操作系统之外运行的一个从操作系统,也可以是第二计算设备上运行的一个从操作系统;第二从操作系统对该请求进行处理。在本申请中,主操作系统用于为外部设备提供服务,如接收请求,第二从操作系统、第二从操作系统均用于处理请求,当第一从操作系统出现故障时,主操作系统可以将请求发送给第二从操作系统处理,不会因为第一从操作系统出现内存错误,导致业务中断,从而提高系统可靠性。

Description

一种数据处理方法及装置 技术领域
本申请涉及计算机技术领域,尤其涉及一种数据处理方法及装置。
背景技术
内存是服务器主板较容易出错的器件之一,随着内存容量以及内存速度的翻倍,内存出错的概率也将倍增。而内存出错可能会导致系统故障,系统恢复通常需要较长的时间,对业务的影响较大。
发明内容
本申请提供一种数据处理方法及装置,在不改变设备原有内存的情况下,也可以提高系统可靠性,降低内存错误对业务的影响。
第一方面,本申请实施例提供了一种数据处理方法,该方法可以应用于第一计算设备,所述第一计算设备至少运行有主操作系统和第一从操作系统;在该方法中,第一计算设备的主操作系统,在第一从操作系统故障无法处理请求时,将请求发送给第二从操作系统,其中,第二从操作系统可以是第一计算设备上除第一从操作系统之前运行的一个的从操作系统,也可以是第二计算设备上运行的一个从操作系统;第二从操作系统对该请求进行处理。
通过上述设计,主操作系统用于为外部设备提供服务,如接收请求,第二从操作系统、第二从操作系统均用于处理请求,当第一从操作系统出现故障时,主操作系统可以将请求发送给第二从操作系统处理,不会因为第一从操作系统出现内存错误,导致业务中断,从而提高系统可靠性。
在一种实现方式中,主操作系统的内存可靠性高于任意一个从操作系统的内存可靠性。
通过上述设计,主操作系统用于与客户端通信,当主操作系统的内存可靠性高时,主操作系统出现内存错误的概率便低,只要主操作系统不故障,就可以为客户端提供服务,业务便不会中断,从而提高系统的可靠性,降低内存故障对业务的影响。
在一种实现方式中,第一计算设备包括第一硬件资源组和第二硬件资源组;每一个硬件资源组包含处理器资源和内存资源;第一硬件资源组为主操作系统提供资源;第二硬件资源组为第一从操作系统提供资源。
通过上述设计,主操作系统的硬件资源组和第一从操作系统的硬件资源组不同,以提供配置主操作系统内存类型、内存容量的方式,同时能够保证主操作系统和第一从操作系统之间的硬件隔离,主操作系统不会访问第一从操作系统的内存,主操作系统的内存容量相比于第一计算设备运行单独操作系统时的内存容量要低,出现内存错误的概率也低,提高了主操作系统为客户端提供服务的可靠性。
在一种实现方式中,所述主操作系统的内存使用镜像技术进行配置。
通过上述设计,主操作系统使用镜像等技术配置主操作系统的内存,从而提高主操作 系统的内存可靠性,降低因主操作系统访问到内存错误而导致业务中断的概率,提高系统整体的可靠性。
在一种实现方式中,若第一从操作系统未故障,主操作系统将该请求发送给第一从操作系统;第一从操作系统对所述请求进行处理。
在一种实现方式中,前述的请求可以是来自客户端的请求,如写请求或读请求,第二从操作系统处理请求,包括:若所述请求为写请求,则第二从操作系统将请求中携带的待写入数据写入存储设备;或若所述请求为读请求,则第二从操作系统从存储设备获取该请求所请求读取的数据,并将该数据发送至主操作系统。
第二方面,本申请实施例还提供了一种数据处理装置,该数据处理装置具有实现上述第一方面的方法实施例中第一计算设备的功能,有益效果可以参见第一方面的描述此处不再赘述。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述数据处理装置的结构中包括主操作系统实例,第一从操作系统实例,可选的,还可以包括第二操作系统实例,这些实例可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述,有益效果可以参见第一方面的描述此处不再赘述。
第三方面,本申请实施例还提供了一种数据处理装置,该数据处理装置具有实现上述第一方面的方法实例中第一计算设备的功能,有益效果可以参见第一方面的描述此处不再赘述。所述装置的结构中包括第一处理器、第二处理器、第一存储器和第二存储器,可选的,还可以包括通信接口。所述第一处理器、第二处理器被配置为支持所述数据处理装置执行上述第一方面方法中相应的功能。所述第一存储器与所述第一处理器耦合,所述第二存储器与所述第二处理器耦合,其保存所述通信装置必要的计算机程序指令和数据。所述数据处理器装置的结构中还包括通信接口,用于与其他设备进行通信,如可以将来自客户端的请求发送给第二计算设备,或接收第二计算设备发送的客户端所请求的数据,有益效果可以参见第一方面的描述此处不再赘述。
第四方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中第一计算设备的方法,有益效果可以参见第一方面的描述此处不再赘述。
第五方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中第一计算设备的方法,有益效果可以参见第一方面的描述此处不再赘述。
第六方面,本申请还提供一种计算机芯片,所述芯片与存储器相连,所述芯片用于读取并执行所述存储器中存储的软件程序,执行上述第一方面以及第一方面的各个可能的实施方式中第一计算设备的方法,有益效果可以参见第一方面的描述此处不再赘述。
第七方面,本申请实施例还提供了一种数据处理系统,该系统至少包括第一计算设备和第二计算设备,第一计算设备至少运行有第一主操作系统和第一从操作系统,第二计算设备至少运行有第二主操作系统和第二从操作系统;第一计算设备具有实现上述第一方面的方法实例中第一计算设备的功能,第二从操作系统,用于在第一从操作系统故障无法处理请求时,接收第一主操作系统发送的请求,并处理该请求,有益效果可以参见第一方面的描述此处不再赘述。
第八方面,本申请实施例还提供了一种系统,该系统包括客户端和第一计算设备,第 一计算设备至少运行有主操作系统和第一从操作系统;客户端,用于向所述第一计算设备发送请求,所述请求用于请求访问数据;第一计算设备具有实现上述第一方面的方法实例中第一计算设备的功能,有益效果可以参见第一方面的描述此处不再赘述。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请实施例提供的一种系统的硬件架构示意图;
图2为本申请实施例提供的一种控制器的软件架构示意图;
图3为本申请实施例提供的另一种系统架构示意图;
图4为本申请实施例提供的一种数据处理方法所对应的流程示意图;
图5为本申请实施例提供的一种数据处理方法的场景示意图;
图6为本申请实施例提供的另一种数据处理方法的场景示意图;
图7为本申请实施例提供的一种数据处理装置的结构示意图。
具体实施方式
为了使本申请更容易被理解,下面首先对本申请实施例涉及的一些基本概念进行解释。需要说明的是,这些解释是为了便于本领域技术人员理解,并不是对本申请所要求的保护范围构成限定。
1,用户态/内核态,所谓的用户态和内核态是指操作系统的运行状态。目前,计算机系统中一般实行分级保护,即根据计算机系统中受影响的严重程度而区分某些操作必须由某些具有相应权限的角色来执行,例如直接访问硬件和修改硬件工作模式等操作需要最高权限来执行。
计算机的这种保护需要CPU和操作系统共同配合完成,现代CPU一般会提供多种运行权限级别,操作系统一般也分为多个运行状态以与CPU配合,操作系统常见的状态是用户态和内核态,其中内核态一般拥有最高权限,所有指令和操作都被CPU允许执行;而用户态一般都是较低权限,在此状态下软件程序只能执行有限的指令和操作,高危操作不被CPU硬件允许,例如配置CPU内部控制寄存器、访问内核部分的内存地址等。操作系统需要执行处于不同权限下的程序时,通常会先将CPU的权限状态切换到对应状态,再执行对应的程序。
2,进程,可以是指具有一定独立功能的程序的一次运行活动,或是应用程序运行的载体,也可以理解为,进程为应用程序的运行实例,是应用程序的一次动态执行。例如,当用户运行记事本程序(Notepad)时,该用户就创建了一个用来容纳组成Notepad.exe的代码及其所需调用动态链接库的进程。
3,内存错误,包括可纠正错误(corrected error,CE)和不可纠正错误(uncorrected error,UCE),其中,可纠正错误是指通过内存错误检查和纠正功能可纠正的错误,反之,不可纠正错误指通过内存错误检查和纠正功能不能纠正的错误。
图1为本申请实施例提供的一种可能适用的系统架构示意图。该系统架构中包括应用服务器100、交换机101、存储系统120。
用户通过应用程序来存取数据。运行这些应用程序的计算机被称为“应用服务器”。应用服务器100可以是物理机,也可以是虚拟机。物理应用服务器包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。应用服务器通过光纤交换机101访问存储系统120以存取数据。然而,交换机101只是一个可选设备,应用服务器100也可以直接通过网络与存储系统120通信。
图1所示的存储系统120是一个集中式存储系统。集中式存储系统的特点是有一个统一的入口,所有从外部设备来的数据都要经过这个入口,这个入口就是集中式存储系统的引擎121。引擎121是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。
如图1所示,引擎121中有一个或多个控制器。图1以引擎包含两个控制器为例予以说明,控制器0与控制器1之间具有镜像通道,使得两个控制器可以互为备份。引擎121还包含前端接口125和后端接口126,其中前端接口125用于与应用服务器100通信,为应用服务器100提供存储服务。而后端接口126用于与硬盘134通信,以扩充存储系统的容量。通过后端接口126,引擎121连接更多的硬盘134,从而形成一个非常大的存储资源池。
控制器1(以及其他图1中未示出的控制器)的硬件组件和软件结构与控制器0类似,这里以控制器0为例进行说明。
在硬件上,如图1所示,控制器0至少包括处理器123、内存124。处理器123是一个中央处理器(central processingunit,CPU),用于处理来自存储系统外部(服务器或者其他存储系统)的数据访问请求,也用于处理存储系统内部生成的请求,如读请求、写请求,示例性的,处理器123通过前端端口125接收应用服务器100发送的写请求时,处理器123可以优先将数据存储在内存中,如存储在内存124中。当内存124中的数据量达到一定阈值时,处理器123通过后端端口126将内存124中存储的数据发送给硬盘134进行持久化存储。应注意,图1中仅示出了一个处理器123,在实际应用中,CPU 123的数量往往有多个,其中,一个处理器123又具有一个或多个CPU核。本实施例不对CPU的数量,以及CPU核的数量进行限定。
内存124是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快。内存124包含多种类型的存储器,例如随机存取存储器(random access memory,RAM)、只读存储器(Read Only Memory,ROM)。举例来说,随机存取存储器包括但不限于:动态随机存取存储器(Dynamic Random Access Memory,DRAM)、双倍数据速率同步动态随机存储器(double data rate,DDR)、静态随机存取存储器(Static Random Access Memory,SRAM)等。只读存储器包括但不限于:可编程只读存储器(Programmable Read Only Memory,PROM)、可抹除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)等。实际应用中,控制器0中可配置多个内存124,以及不同类型的内存124。本实施例不对内存124的数量和类型进行限定。
内存124作为操作系统(operating system,OS)或其他正在运行中的程序的临时数据存储器,其中存储有程序代码,处理器123执行内存124中存储的程序代码,以实现程序代码所设计的功能。若在处理器123的运行过程中出现内存错误,则会引起进程故障,需要重新启动进程,导致业务中断,更严重的情况,需要重启操作系统,增加业务恢复的时长。可以看出,内存124对系统的正常运行起着至关重要的作用。然而,随着内存容量和频率的增大,出现内存错误的概率在逐步提升,导致系统的可靠性难以保障,对业务的 影响不可估量。
为此,本申请实施例提供了一种数据处理方法,用于提高系统可靠性,降低内存错误对业务的影响。
在介绍本申请实施例提供的数据处理方法之前,首先对该数据处理方法所适用的软件结构进行介绍。如下结合图2,以图1中所示系统中的控制器0为例,对本申请实施例提供的控制器0的软件结构进行介绍。
在软件层面,如图2所示,控制器0安装并运行有至少两个操作系统(图2仅示出两个操作系统,但本申请实施例对此不做限定),该至少两个操作系统包括一个主操作系统(master OS)和一个或多个从操作系统(slave OS)(图2仅示出一个从操作系统,但本申请实施例对此不做限定)。
在本申请中,主操作系统和从操作系统的功能不同、内存可靠性等级不同,如下分别进行介绍:
一,主操作系统:
主操作系统,用于为用户设备(如应用服务器100)提供服务,如提供前述的存储服务时,主操作系统(或者说运行主操作系统的处理器)通过前端接口125接收应用服务器100发送的请求,如读请求、写请求等。
具体的,主操作系统上用于为外部设备提供服务,如图2中,主操作系统用于提供前端协议收发服务,这里的前端协议可以是前端接口125与应用服务器100交互所使用的通信协议,该服务用于通过前端接口125与应用服务器100通信。通常,为用户提供服务的进程或软件运行在操作系统的用户态,下文相似之处不再赘述,与运行在用户态的进程对应的,主操作系统还可以运行有前端外设驱动,前端外设驱动运行在内核态,用于驱动前端接口125与应用服务器100通信。
主操作系统还可以用于管理从操作系统,如系统上电首先拉起主操作系统,主操作系统为从操作系统分配硬件资源、检测从操作系统运行是否正常,如通过监听从操作系统的心跳检查从操作系统是否出现故障,在检测到从操作系统出现故障时,可以以重新启动从操作系统等方式使其恢复正常运行。示例性地,如图2中,主操作系统上运行的Mgmt OS进程,用于提供管理从操作系统的功能。
主操作系统还具有与其他控制器(如控制器1)通信的功能,示例性地,如图2中,主操作系统还提供转发服务,用于与控制器1交互,如将从应用服务器100接收到的请求,转发给控制器1,由控制器1对该请求进行处理。
主操作系统还包括内存管理驱动,用于对主操作系统的内存进行管理,内存管理策略如内存镜像、内存检验,如在内存中做分布式奇偶校验的独立磁盘结构(raid)等,其中,内存镜像与硬盘的热备份类似,内存镜像是将内存数据做两个拷贝,分别放在主内存和镜像内存中,这样,主内存中的内存数据出现错误时,还可以从镜像内存获取该数据,从而提高主操作系统的内存可靠性。其中,内存raid是指将数据以块为单位分布到主操作系统的内存中,具体的,将数据和与其相对应的奇偶校验信息存储到组成主操作系统内存的各个内存单元上,并且奇偶校验信息和相对应的数据分别存储于不同的内存单元上。当其中一个内存单元损坏后,利用剩下的数据和相应的奇偶校验信息去恢复被损坏的数据,提高内存可靠性。示例性地,内存raid可以是raid4、raid5、raid6、raid10、raid0、raid1,其中,raid5允许一个内存单元损坏,即当一个内存单元损坏时,可以使用其他内存单元中 的数据恢复被损坏的数据;raid6允许两个内存单元损坏,具体实施方式或内存raid类型可以参考相关技术的说明,这里不再一一介绍。
除此之外,内存管理还可以提供对内存的基础管理,如内存申请、释放等,本申请对此功能不做具体限定。
二、从操作系统;
从操作系统,用于对数据进行计算或处理,在本申请中,可以用于来自应用服务器100的读请求、写请求等进行处理。示例性地,从操作系统提供的服务包括但不限于:数据面服务进、元数据服务、后端下盘处理服务进。运行在从操作系统内核态的驱动包括但不限于:后端外设驱动、内存管理等。
其中,数据面服务,可以用于对来自应用服务器100的读请求和写请求进行处理,如对于读请求,可以从内存124,或通过后端接口126从硬盘134中获取该读请求所请求读取的数据。又如,对于写请求,可以将该写请求中携带的待写入的数据写入内存124,或通过后端接口126将该数据写入硬盘134,可选的,在将数据写入硬盘134之前,还可以对待写入的数据进行计算或处理,例如重复数据删除、数据压缩及数据校验等。示例性地,数据面服务可以运行在从操作系统的用户态,与此对应的,从操作系统可以通过运行在内核态的后端外设驱动通过后端接口126对硬盘134进行读或写。元数据服务,用于生成元数据。内存管理,用于对从操作系统的内存进行管理,如内存申请、释放等。
需要说明的是,上述仅为举例,主OS和从OS还可以具有比图2示出的更多或更少的功能,比如,主OS还可以包括管理从OS的服务、基础设施服务、调度管理,从OS还可以包括控制面服务、调度服务等。具体的,管理从OS的服务,如用于检测从OS是否故障,若故障则重启从OS,等。基础设施服务用于对基础组件如cpu、线程等进行管理,可以参见相关技术的介绍,此处不再一一枚举,本申请对主操作系统、从操作系统的功能不做具体限定。另外,上述各服务或驱动的名称仅为一种代称,在不同的应用场景中,可以具有不同的名称,本申请对此不做具体限定。
在硬件层面,主操作系统和从操作系统所使用的硬件均来自于控制器0,控制0的硬件结构可以参见图1的介绍,此处不做赘述。具体的,主操作系统和从操作系统分别对应一组硬件,或者说,不同操作系统的硬件之间相互隔离,换言之,主操作系统不会使用从操作系统的硬件,从操作系统也不会使用主操作系统的硬件。
示例性地,硬件分配的过程可以包括:将控制器0的硬件(包括但不限于处理器123、内存124、前端接口125、后端接口126)拆分(或者说隔离)为两部分,以得到两个硬件组(如图2中的硬件组1和硬件组2),每个硬件组唯一分配给一个操作系统使用,举例来说,假设控制器0包括两个处理器123,则可以将其中一个处理器123分配给主操作系统,用于运行主操作系统。另一个处理器123分配给从操作系统,用于运行从操作系统。又比如,假设控制器0仅包括一个处理器123,该处理器123包括多个核,则可以将该多个核拆分为两部分,其中一部分核分配给主操作系统,剩余的部分核分配给从操作系统。同理,将内存124拆分为两部分,同理,一部分内存分配给主操作系统,其余部分内存分配给从操作系统。
在本申请中,由于主操作系统的功能是与应用服务器100交互,从操作系统用于处理请求,因此,将前端接口125分配给主操作系统,将后端接口126分配给从操作系统。示例性地,这里的硬件拆分可以通过多OS分区技术实现,也可以通过其他方式实现,本申请 对此不做具体限定。
简言之,图2中,硬件组1分配给主操作系统使用,硬件组1包括控制器0的部分处理器资源,部分内存资源和前端接口125。硬件组2分配给从操作系统使用,硬件组2包括控制器0的其余部分处理器资源,其余部分内存资源和后端接口126。主操作系统使用硬件组1的硬件,不会使用硬件组2的硬件。同理,从操作系统使用硬件组2的硬件,不会使用硬件组1的硬件。
值得注意的是,上述两个硬件组所包括的硬件的数量或者说容量不一定是均等的,换言之,在拆分硬件时,不需要平均分配,例如,在本申请中,主操作系统的内存容量与从操作系统的内存容量是不同的,甚至,主OS的内存容量远低于从OS的内存容量,即硬件组1包括的内存容量低于硬件组2所包括内存容量,如假设控制器0所包括的内存124的总容量为4GB,分配给主操作系统的内存容量可以是512MB,其余内存(3GB+512MB)全部分配给从操作系统。当然,这只是一个示例,本申请对此不做具体限定。
上述分配方式,将较多的内存分配给从操作系统,可以保证从操作系统处理请求的性能,将相对较少的内存分配给主操作系统,这样,降低了主操作系统中访问到内存错误的概率,发生内存错误概率越低,内存可靠性便越高。这也可以体现出主操作系统与从操作系统的内存可靠性级别不同。在本申请中,主操作系统的内存可靠性高于操作系统。另外,分配给主操作系统的内存的类型与分配给从操作系统的内存的类型可以相同,也可以不同,比如,分配给主操作系统的内存包括SRAM,分配给从操作系统的内存包括SRAM和DRAM等。从而进一步提高主OS的内存可靠性和响应速度。
控制器1的软件结构与控制器0类似,控制器1至少包括一个主操作系统和一个从操作系统,控制器1和控制器0的软件结构可以相同也可以不同,如控制器0和控制器1的软件结构均为包括一个主OS和一个从OS,或者控制器0包括一个主OS和一个从OS,控制器1包括一个主OS和多个从OS,本申请对此不做限定。
上文以图1为例,介绍了本申请实施例所适用的一种系统的硬件结构和软件结构,应注意的是,图1所示的是一种盘控一体的集中式存储系统,引擎121具有硬盘槽位,硬盘134可直接部署在引擎121中,即硬盘134和引擎121部署于同一台设备。可选的,该存储系统120还可以是盘控分离的存储系统,在该系统中,引擎121还可以不具有硬盘槽位,硬盘134需要放置在硬盘框130中,后端接口126与硬盘框130通信。后端接口126以适配卡的形态存在于引擎121中,一个引擎121上可以同时使用两个或两个以上后端接口126来连接多个硬盘框。或者,适配卡也可以集成在主板上,此时适配卡可通过高速串行计算机扩展(peripheral component interconnect express,PCI-E)总线与处理器123通信。
除此之外,本申请实施例提供的数据处理方法除了适用于集中式存储系统,也同样适用于分布式存储系统,如图3所示,为本申请实施例提供的一种分布式存储系统的系统架构示意图,分布式存储系统包括服务器集群。服务器集群包括一个或多个服务器140(图3中示出了三个服务器110,但不限于三个服务器110),各个服务器140之间可以相互通信。
在硬件上,如图3所示,服务器110至少包括处理器112、内存113、网卡114,可选的,还可以包括和硬盘105。处理器112、内存113、网卡114和硬盘105之间通过总线连接。关于处理器112、内存113、网卡114和硬盘105作用和具体类型可以参见图1中的相关说明,此处不再赘述。在软件层面,每个服务器110的软件结构可以参见对图2所示的 控制器0的软件结构的介绍,此处不再赘述。
需要说明的是,图1所示的控制器0、图3所示的服务端110的结构仅为示例,在实际产品中,控制器0、服务端110可能具有更多或更少的组件,比如,控制器0、服务端110还可以包括键盘、鼠标、显示屏等输入/输出设备等。本申请对本申请实施例适用系统中设备的硬件结构不做具体限定,凡是可以安装至少两个操作系统的设备均适用于本申请实施例。
下面以图1至图3所示的系统构架中为例,对本申请实施例提供的数据处理方法应用于图1所示的系统为例,对该方法进行详细说明,该方法可以由图1中的控制器0或控制器1执行,其中,控制器0至少运行有一个主OS和一个从OS,控制器1也至少运行有一个主OS和一个从OS,如下以控制器0为例进行说明。
图4为本申请实施例提供的一种数据处理方法的流程示意图,如图4所示,该方法包括如下步骤:
步骤401,控制器0的主OS接收应用服务器100发送的请求。
在一种可选的实施方式中,主OS可以使用镜像技术对主OS的内存进行配置,示例性地,主OS将处理来自应用服务器100的请求的整个过程所产生的内存数据备份至镜像内存空间,这样,当主OS访问到内存错误时,也可以从镜像内存空间恢复内存数据,不会造成主OS故障,提高主OS的内存可靠性,减少系统故障的概率以及对业务的影响。
步骤402a,主OS在没有检测到控制器0的从OS(记为第一从OS)故障的情况下,将该请求发送至第一从OS。
步骤403a,第一从OS对该请求进行处理。第一从OS对请求的处理流程可以参见下文第二从OS的处理流程,这里不做重复说明。
步骤402b,主OS检测到第一从OS故障。
步骤403b,主OS将该请求发送至第二从OS。
步骤404b,第二从OS对该请求进行处理。
在一种实施方式中,除主OS外,控制器0仅运行有一个从OS,则第二从OS可以为控制器1中运行的一个从OS,请结合图5理解,图5为该控制器0与控制器1交互该请求的传输路径示意图,如图5所示,控制器0的主OS的前端协议服务通过前端接口125接收应用服务器100发送的请求,若第一从OS故障,则前端协议服务将该请求发送给转发服务,转发服务将该请求发送至控制器1的第二从OS。
若该请求为写请求,在一种实施方式中,第二从OS可以先将该写请求中携带的待写入的数据暂时缓存在第二从OS的内存中,当第二从OS的内存的数据量达到一定阈值,第二从OS通过后端接口126,将该内存中的数据发送给硬盘134进行存储,在一种可选的方式中,在将该数据写入硬盘134之前,第二从OS还可以对该数据进行数据处理,如重复数据删除、数据压缩或其他处理等,将处理后得到的数据写入硬盘134,以减少数据所占用的存储空间。元数据服务还可以生成该数据的元数据,并将元数据写入第二从OS的内存或硬盘134(图5未示出)。
若该请求为读请求,则第二从OS获取该读请求所请求读取的数据,示例性地,若第二从OS的内存中存储有该数据,即内存命中,则第二从OS从该内存读取该数据,若内存未命中,则从硬盘134读取该数据,并将读取到的数据返回第一主OS,并将读取到的数据发 送给控制器0,由控制器0通过前端接口125将该数据发送给应用服务器100,参见图5中的虚线所示的传输路径。
在另一种可选的实施方式中,控制器0运行有多个从OS,则第二从OS还可以是控制器0上除第一从OS之外的一个从OS,如图6所示,控制器0在检测到第一从OS故障时,将业务切换至控制器0的其他一个从OS(如记为第三从OS)上,由第三从OS对该请求进行处理,其中,请求的传输流程和第三从OS对请求的处理流程可以参见前述的说明,此处不再赘述。
上述设计,将控制器0的硬件拆分,以构造至少两个内存可靠性等级不同的OS,其中,可靠性等级高的主OS用于部署上层业务相关的服务,当可靠性等级较低的从OS访问到内存错误从而产生故障时,主OS能以较快的速度感知从OS出现故障,并将请求切换至其他从OS来处理,如本控制器的其他从OS,或其他控制器的从OS,本领域技术人员通过实验确定,主OS将请求从第一从OS切换至第二从OS的时间可以达到秒级,参见表1,表1为本申请技术人员通过实验得到的一些实验数据。
表1
Figure PCTCN2023071510-appb-000001
其中,单OS系统是指在图1所示的硬件架构中,控制器0仅运行一个OS(即单OS),当该OS的用户态进程访问到UCE时,所需的恢复时长在5秒(second)至3分钟(minute)之间。当该OS的内核态驱动访问到UCE时,所需的恢复时长为2分钟至10分钟。当若控制器0和控制器1均为单OS时,当控制器0发生故障时,控制器1可以接管控制器0的业务,当控制器0发生故障时,控制器1可以接管控制器0的业务,所需的切换时间约30秒,在上述恢复期间或业务切换期间内,业务中断。
而本申请提供的技术方案,由于主OS不用于处理请求,分配给主OS的内存较少,因此可以使用镜像技术或内存raid等方式进行内存管理,进一步提高主OS的内存可靠性,这样,主OS出现内存错误的概率相当低,若主OS检测到从OS出现故障,可以将请求切换至其他从OS,不会因为从操作系统出现内存错误,导致业务中断,并且整个切换时长小于1秒,上层业务仅感知到轻微时延,能够很大程度减少内存故障对业务带来的影响,提高整个系统的可靠性。
基于与方法实施例同一发明构思,本申请实施例还提供了一种数据处理装置,该装置用于执行上述方法实施例中控制器0执行的方法。该装置可以是硬件结构、软件模块、或硬件结构加软件模块。装置700可以由芯片系统实现。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。如图7所示,数据处理装置700包括主操作系统实例701、第一从操作系统实例702;可选的,还可以包括第二从操作系统实例703。
主操作系统实例701,用于在第一从操作系统实例702故障无法处理请求,将所述请求 发送给目标从操作系统实例;具体实现方式请参见图4中的步骤402b至步骤403b的描述,其中,目标从操作系统实例为第二从操作系统实例703,或参见图5所示的流程描述此处不再赘述。或目标从操作系统实例为第二计算设备中的从操作系统实例;具体实现方式请参见图6所示的流程描述此处不再赘述。
目标从操作系统实例,用于对所述请求进行处理。具体实现方式请参见图4中的步骤403b的描述,此处不再赘述。
在一种可能的实施方式中,主操作系统实例701,还用于接收客户端设备(如应用服务器100)发送的请求;具体实现方式请参见图4中的步骤401的描述,此处不再赘述。
在一种可能的实施方式中,所述主操作系统实例701的内存可靠性高于第一从操作系统实例702、第二从操作系统实例703的内存可靠性。
在一种可能的实施方式中,所述主操作系统实例701的内存使用镜像技术进行配置。
在一种可能的实施方式中,所述装置700包括第一操作系统实例701和所述第二操作系统实例702时,所述装置700包含第一硬件资源组和第二硬件资源组;每一个硬件资源组包含处理器资源和内存资源;所述第一硬件资源组为所述主操作系统实例701提供资源;所述第二硬件资源组为所述第一从操作系统实例702提供资源。
在一种可能的实施方式中,所述主操作系统实例701还用于:若所述第一从操作系统未故障,将所述请求发送给所述第一从操作系统实例;所述第一从操作系统实例,用于对所述请求进行处理。具体实现方式请参见图4中的步骤402a至步骤403a的描述,此处不再赘述。
本发明实施例提供的数据处理装置可以应用于存储系统中,也可以应用于服务器中,本发明实施例对此不作限定。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在数据处理装置上运行时,使得数据处理装置执行上述相关方法步骤以实现上述实施例中的控制器0所执行的方法,参见图4各步骤的描述,此处不再赘述,此处不再赘述。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的控制器0所执行的方法,参见图4各步骤的描述,此处不再赘述,此处不再赘述。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的控制器0所执行的方法,参见图4各步骤的描述,此处不再赘述,此处不再赘述。
其中,本申请实施例提供的存储设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的控制器0对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其他 的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其他的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元(或模块)可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本申请实施例中所描述的各种说明性的逻辑单元和电路可以通过通用处理器,数字信号处理器,专用集成电路(ASIC),现场可编程门阵列(FPGA)或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合的设计来实现或操作所描述的功能。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。
本申请实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件单 元、或者这两者的结合。软件单元可以存储于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地,存储媒介可以与处理器连接,以使得处理器可以从存储媒介中读取信息,并可以向存储媒介存写信息。可选地,存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。

Claims (13)

  1. 一种数据处理方法,其特征在于,所述方法应用于第一计算设备,所述第一计算设备至少运行有主操作系统和第一从操作系统;
    所述方法包括:
    若所述第一从操作系统故障无法处理请求,则所述主操作系统将所述请求发送给第二从操作系统;其中,所述第二从操作系统为第二计算设备中的从操作系统,或所述第二从操作系统为所述第一计算设备中除所述第一从操作系统之外的一个从操作系统;
    所述第二从操作系统对所述请求进行处理。
  2. 如权利要求1所述的方法,其特征在于,所述主操作系统的内存可靠性高于任意一个从操作系统的内存可靠性。
  3. 如权利要求2所述的方法,其特征在于,所述主操作系统的内存使用镜像技术进行配置。
  4. 如权利要求1或2所述的方法,其特征在于,所述第一计算设备包含第一硬件资源组和第二硬件资源组;每一个硬件资源组包含处理器资源和内存资源;所述第一硬件资源组为所述主操作系统提供资源;所述第二硬件资源组为所述第一从操作系统提供资源。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    若所述第一从操作系统未故障,所述主操作系统将所述请求发送给所述第一从操作系统;所述第一从操作系统对所述请求进行处理。
  6. 一种数据处理装置,其特征在于,所述装置包括;
    所述主操作系统实例,用于在第一从操作系统实例故障无法处理请求,将所述请求发送给第二从操作系统实例;其中,所述第二从操作系统实例为所述第一计算设备中除所述第一从操作系统实例之外的一个从操作系统实例,或所述第二从操作系统实例为第二计算设备中的从操作系统实例;
    所述第二从操作系统实例,用于对所述请求进行处理。
  7. 如权利要求6所述的装置,其特征在于,所述主操作系统实例的内存可靠性高于任意一个从操作系统实例的内存可靠性。
  8. 如权利要求7所述的装置,其特征在于,所述主操作系统实例的内存使用镜像技术进行配置。
  9. 如权利要求6或7所述的装置,其特征在于,所述装置至少包含第一硬件资源组和第二硬件资源组;每一个硬件资源组包含处理器资源和内存资源;所述第一硬件资源组为所述主操作系统实例提供资源;所述第二硬件资源组为所述第一从操作系统实例提供资源。
  10. 如权利要求6-9任一项所述的装置,其特征在于,所述主操作系统实例还用于:若所述第一从操作系统未故障,将所述请求发送给所述第一从操作系统实例;所述第一从操作系统实例,用于对所述请求进行处理。
  11. 一种计算设备,其特征在于,包括第一处理器、第二处理器、第一存储器和第二存储器;
    所述第一存储器存储有主操作系统的计算机程序指令;所述第二存储器存储有第一从操作系统的计算机程序指令;
    所述第一处理器执行所述第一存储器中的计算机程序指令,以实现如权利要求1至5中任一项所述主操作系统执行的方法;所述第二处理器执行所述第二存储器中的计算机程序指 令,以实现如权利要求1至5中任一项所述第一从操作执行的方法。
  12. 一种计算机可读存储介质,其特征在于,包括程序代码,所述程序代码包括的指令用于实现如权利要求1至5中任一项所述的方法。
  13. 一种计算设备系统,其特征在于,所述系统至少包括第一计算设备和第二计算设备;
    所述第一计算设备运行有第一主操作系统和第一从操作系统;所述第二计算设备运行有第二主操作系统和第二从操作系统;
    所述第一主操作系统,用于在所述第一从操作系统故障无法处理请求,将所述请求发送给第二从操作系统;
    所述第二从操作系统,用于对所述请求进行处理。
PCT/CN2023/071510 2022-01-28 2023-01-10 一种数据处理方法及装置 WO2023143039A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210115056.6 2022-01-28
CN202210115056.6A CN116560827A (zh) 2022-01-28 2022-01-28 一种数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023143039A1 true WO2023143039A1 (zh) 2023-08-03

Family

ID=87470421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071510 WO2023143039A1 (zh) 2022-01-28 2023-01-10 一种数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN116560827A (zh)
WO (1) WO2023143039A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000207232A (ja) * 1999-01-18 2000-07-28 Fujitsu Ltd マルチオペレ―ティングシステム制御装置および記録媒体
CN103902316A (zh) * 2012-12-27 2014-07-02 联想(北京)有限公司 切换方法和电子设备
CN107807827A (zh) * 2017-10-19 2018-03-16 安徽皖通邮电股份有限公司 一种支持多核cpu多操作系统的方法
CN110874261A (zh) * 2018-08-31 2020-03-10 横河电机株式会社 可用性系统、方法和存储有程序的存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000207232A (ja) * 1999-01-18 2000-07-28 Fujitsu Ltd マルチオペレ―ティングシステム制御装置および記録媒体
CN103902316A (zh) * 2012-12-27 2014-07-02 联想(北京)有限公司 切换方法和电子设备
CN107807827A (zh) * 2017-10-19 2018-03-16 安徽皖通邮电股份有限公司 一种支持多核cpu多操作系统的方法
CN110874261A (zh) * 2018-08-31 2020-03-10 横河电机株式会社 可用性系统、方法和存储有程序的存储介质

Also Published As

Publication number Publication date
CN116560827A (zh) 2023-08-08

Similar Documents

Publication Publication Date Title
US10642704B2 (en) Storage controller failover system
US7739677B1 (en) System and method to prevent data corruption due to split brain in shared data clusters
US9766992B2 (en) Storage device failover
US8713362B2 (en) Obviation of recovery of data store consistency for application I/O errors
US20190235777A1 (en) Redundant storage system
US8412672B1 (en) High availability network storage system incorporating non-shared storage suitable for use with virtual storage servers
US8910160B1 (en) Handling of virtual machine migration while performing clustering operations
US10146632B2 (en) Efficient mechanism to replicate data for multiple controllers
US20120179771A1 (en) Supporting autonomous live partition mobility during a cluster split-brained condition
US20120066543A1 (en) Autonomous propagation of virtual input/output (vio) operation(s) to second vio server (vios) due to a detected error condition at a first vios
US20120066678A1 (en) Cluster-aware virtual input/output server
US11573737B2 (en) Method and apparatus for performing disk management of all flash array server
US10782898B2 (en) Data storage system, load rebalancing method thereof and access control method thereof
WO2013188332A1 (en) Software handling of hardware error handling in hypervisor-based systems
US10318393B2 (en) Hyperconverged infrastructure supporting storage and compute capabilities
US20160266830A1 (en) Techniques for importation of information to a storage system
WO2015010543A1 (en) Moving objects in primary computer based on memory errors in secondary computer
US9063854B1 (en) Systems and methods for cluster raid data consistency
WO2023169185A1 (zh) 内存管理方法和装置
US8683258B2 (en) Fast I/O failure detection and cluster wide failover
JP6714037B2 (ja) 記憶システム、及び、クラスタ構成制御方法
US10860224B2 (en) Method and system for delivering message in storage system
WO2023143039A1 (zh) 一种数据处理方法及装置
US20130024428A1 (en) Method and system for a fast full style system check using multithreaded read ahead
US11210034B2 (en) Method and apparatus for performing high availability management of all flash array server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745922

Country of ref document: EP

Kind code of ref document: A1