CN116560827A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN116560827A
CN116560827A CN202210115056.6A CN202210115056A CN116560827A CN 116560827 A CN116560827 A CN 116560827A CN 202210115056 A CN202210115056 A CN 202210115056A CN 116560827 A CN116560827 A CN 116560827A
Authority
CN
China
Prior art keywords
operating system
slave
memory
request
slave operating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210115056.6A
Other languages
Chinese (zh)
Inventor
屈欢
高军
高超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210115056.6A priority Critical patent/CN116560827A/en
Priority to PCT/CN2023/071510 priority patent/WO2023143039A1/en
Publication of CN116560827A publication Critical patent/CN116560827A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Abstract

A data processing method and apparatus, in this method, the first computing device runs at least the master operating system and first slave operating system; the master operating system sends the request to the second slave operating system when the first slave operating system fails; the second slave operating system may be a slave operating system running on the first computing device in addition to the first slave operating system, or may be a slave operating system running on the second computing device; the second slave operating system processes the request. In the application, the master operating system is used for providing services for the external device, such as receiving a request, the second slave operating system and the second slave operating system are both used for processing the request, when the first slave operating system fails, the master operating system can send the request to the second slave operating system for processing, and service interruption can not be caused due to memory errors of the first slave operating system, so that the reliability of the system is improved.

Description

Data processing method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
The memory is one of the devices of the server motherboard which are more prone to error, and the probability of memory error is multiplied along with the doubling of the memory capacity and the memory speed. And the memory error may cause system failure, the system recovery usually needs a long time, and the influence on the service is large.
Disclosure of Invention
The application provides a data processing method and device, which can improve the reliability of a system and reduce the influence of memory errors on services under the condition of not changing the original memory of equipment.
In a first aspect, embodiments of the present application provide a data processing method that may be applied to a first computing device running at least a master operating system and a first slave operating system; in the method, when a first slave operating system fails and cannot process a request, a master operating system of the first computing device sends the request to a second slave operating system, wherein the second slave operating system can be a slave operating system running on the first computing device except for the first slave operating system or can be a slave operating system running on the second computing device; the second slave operating system processes the request.
Through the design, the main operating system is used for providing services for external equipment, such as receiving a request, the second auxiliary operating system and the second auxiliary operating system are both used for processing the request, when the first auxiliary operating system fails, the main operating system can send the request to the second auxiliary operating system for processing, and service interruption caused by memory errors of the first auxiliary operating system is avoided, so that the reliability of the system is improved.
In one implementation, the memory reliability of the master operating system is higher than the memory reliability of any one of the slave operating systems.
Through the design, the main operating system is used for communicating with the client, when the memory reliability of the main operating system is high, the probability of memory errors of the main operating system is low, so long as the main operating system does not fail, services can be provided for the client, and services can not be interrupted, thereby improving the reliability of the system and reducing the influence of memory failures on the services.
In one implementation, a first computing device includes a first set of hardware resources and a second set of hardware resources; each hardware resource group comprises a processor resource and a memory resource; the first hardware resource group provides resources for a main operating system; the second set of hardware resources provides resources for the first slave operating system.
Through the design, the hardware resource group of the main operating system is different from the hardware resource group of the first slave operating system, so that the mode of configuring the memory type and the memory capacity of the main operating system is provided, meanwhile, the hardware isolation between the main operating system and the first slave operating system can be ensured, the main operating system cannot access the memory of the first slave operating system, the memory capacity of the main operating system is lower than that of the first computing device when the first computing device operates the independent operating system, the probability of occurrence of memory errors is also low, and the reliability of providing services for clients by the main operating system is improved.
In one implementation, the memory of the main operating system is configured using a mirroring technique.
Through the design, the main operating system uses the mirror image and other technologies to configure the memory of the main operating system, so that the memory reliability of the main operating system is improved, the probability of service interruption caused by memory errors accessed by the main operating system is reduced, and the overall reliability of the system is improved.
In one implementation, if the first secondary operating system is not failed, the primary operating system sends the request to the first secondary operating system; the first slave operating system processes the request.
In one implementation, the aforementioned request may be a request from a client, such as a write request or a read request, and the second slave operating system processes the request, including: if the request is a writing request, the second slave operating system writes the data to be written carried in the request into the storage device; or if the request is a read request, the second slave operating system acquires the data requested to be read by the request from the storage device and sends the data to the master operating system.
In a second aspect, embodiments of the present application further provide a data processing apparatus, where the data processing apparatus has a function of implementing the first computing device in the foregoing method embodiment of the first aspect, and the beneficial effects may be referred to the description of the first aspect and are not repeated herein. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. In one possible design, the structure of the data processing apparatus includes a master operating system instance, a first slave operating system instance, optionally, a second operating system instance, where these instances may perform corresponding functions in the method example of the first aspect, and detailed descriptions in the method example are omitted herein, and beneficial effects may be omitted herein with reference to the description of the first aspect.
In a third aspect, an embodiment of the present application further provides a data processing apparatus, where the data processing apparatus has a function of implementing the first computing device in the foregoing method example of the first aspect, and beneficial effects may be referred to the description of the first aspect and are not repeated herein. The structure of the device comprises a first processor, a second processor, a first memory and a second memory, and optionally, a communication interface. The first processor and the second processor are configured to support the data processing apparatus to perform the corresponding functions in the method of the first aspect. The first memory is coupled with the first processor and the second memory is coupled with the second processor, which holds the computer program instructions and data necessary for the communication device. The structure of the data processor apparatus further includes a communication interface, which is configured to communicate with other devices, for example, the request from the client may be sent to the second computing device, or the data requested by the client sent by the second computing device may be received, which is not described herein.
In a fourth aspect, the present application further provides a computer-readable storage medium, where instructions are stored, when the computer-readable storage medium runs on a computer, to enable the computer to execute the method of the first computing device in the foregoing first aspect and each possible implementation manner of the first aspect, and the beneficial effects may be referred to the description of the first aspect and are not repeated herein.
In a fifth aspect, the present application further provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first computing device of the first aspect and each possible implementation manner of the first aspect, and the advantages may be found in the description of the first aspect and are not repeated here.
In a sixth aspect, the present application further provides a computer chip, where the chip is connected to a memory, and the chip is configured to read and execute a software program stored in the memory, and execute the method of the first computing device in the first aspect and each possible implementation manner of the first aspect, which may refer to the description of the first aspect and will not be repeated herein.
In a seventh aspect, embodiments of the present application further provide a data processing system, where the system includes at least a first computing device and a second computing device, the first computing device running at least a first master operating system and a first slave operating system, and the second computing device running at least a second master operating system and a second slave operating system; the first computing device has a function of implementing the first computing device in the method example of the first aspect, and the second slave operating system is configured to receive a request sent by the first master operating system and process the request when the first slave operating system fails and cannot process the request, which is not repeated herein with reference to the description of the first aspect.
In an eighth aspect, embodiments of the present application also provide a system, including a client and a first computing device, the first computing device running at least a master operating system and a first slave operating system; a client for sending a request to the first computing device, the request for requesting access to data; the first computing device has the functions of the first computing device in the method example of the first aspect, and the beneficial effects can be seen from the description of the first aspect, which is not repeated here.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
Fig. 1 is a schematic diagram of a hardware architecture of a system according to an embodiment of the present application;
fig. 2 is a schematic software architecture diagram of a controller according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another system architecture according to an embodiment of the present disclosure;
fig. 4 is a flow chart corresponding to a data processing method according to an embodiment of the present application;
fig. 5 is a schematic view of a scenario of a data processing method according to an embodiment of the present application;
fig. 6 is a schematic view of a scenario of another data processing method according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the present application easier to understand, some basic concepts related to the embodiments of the present application are explained first below. It should be noted that these explanations are for the convenience of those skilled in the art, and do not limit the scope of protection claimed in the present application.
1, user mode/kernel mode, the user mode and kernel mode refer to the running state of the operating system. Currently, hierarchical protection is generally implemented in computer systems, i.e., distinguishing between certain operations that must be performed by certain roles that have corresponding rights according to the severity of the impact in the computer system, such as direct access to hardware and modification of hardware operating modes, requires the highest rights to perform.
The protection of the computer is completed by the cooperation of a CPU and an operating system, the modern CPU generally provides various operation authority levels, the operating system is generally divided into a plurality of operation states to cooperate with the CPU, the common states of the operating system are a user state and a kernel state, wherein the kernel state generally has the highest authority, and all instructions and operations are allowed to be executed by the CPU; the user mode is generally lower authority, and in this state, the software program can only execute limited instructions and operations, and high-risk operations are not allowed by the CPU hardware, such as configuring the internal control register of the CPU, accessing the memory address of the kernel part, and the like. When the operating system needs to execute the programs under different authorities, the authority state of the CPU is usually switched to the corresponding state, and then the corresponding program is executed.
The process may refer to a running activity of a program with a certain independent function or a carrier for running an application program, and it may also be understood that the process is a running instance of the application program and is a dynamic execution of the application program. For example, when a user runs a Notepad program (Notepad), the user creates a process to host the code that makes up the notepad.exe and its required calls to the dynamic link library.
Memory errors, including Corrected Errors (CE) and uncorrectable errors (uncorrected error, UCE), wherein corrected errors refer to errors that are correctable by memory error checking and correction functions, whereas uncorrectable errors refer to errors that are uncorrectable by memory error checking and correction functions.
Fig. 1 is a schematic diagram of a system architecture that may be applied in the embodiment of the present application. The system architecture includes an application server 100, a switch 101, and a storage system 120.
The user accesses the data through the application. The computer running these applications is called an "application server". The application server 100 may be a physical machine or a virtual machine. Physical application servers include, but are not limited to, desktop computers, servers, notebook computers, and mobile devices. The application server accesses the storage system 120 through the fabric switch 101 to access the data. However, switch 101 is only an optional device, and application server 100 may also communicate directly with storage system 120 via a network.
The storage system 120 shown in fig. 1 is a centralized storage system. The centralized storage system is characterized by a unified portal through which all data from external devices passes, which is the engine 121 of the centralized storage system. The engine 121 is the most central component of a centralized storage system in which many of the high-level functions of the storage system are implemented.
As shown in fig. 1, there are one or more controllers in the engine 121. In fig. 1, the engine is illustrated as including two controllers, and a mirror channel is provided between the controller 0 and the controller 1, so that the two controllers can be backed up. The engine 121 also includes a front-end interface 125 and a back-end interface 126, wherein the front-end interface 125 is configured to communicate with the application server 100 to provide storage services for the application server 100. And back-end interface 126 is used to communicate with hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 connects more hard disks 134, thereby forming a very large pool of storage resources.
The hardware components and software architecture of the controller 1 (and other controllers not shown in fig. 1) are similar to the controller 0, and the controller 0 is illustrated here.
In hardware, as shown in fig. 1, the controller 0 at least includes a processor 123 and a memory 124. The processor 123 is a central processing unit (central processing unit, CPU) for processing data access requests from outside the storage system (server or other storage system) and also for processing requests generated inside the storage system, such as read requests, write requests, for example, the processor 123 may preferentially store data in a memory, such as the memory 124, when the processor 123 receives a write request sent by the application server 100 through the front-end port 125. When the amount of data in the memory 124 reaches a certain threshold, the processor 123 sends the data stored in the memory 124 to the hard disk 134 for persistent storage through the back-end port 126. It should be noted that only one processor 123 is shown in fig. 1, and in practical applications, there will often be a plurality of CPUs 123, where one processor 123 has one or more CPU cores. The present embodiment does not limit the number of CPUs and the number of CPU cores.
The memory 124 is an internal memory for directly exchanging data with the processor, and can read and write data at any time, and has a high speed. The Memory 124 includes various types of Memory, such as random access Memory (random access Memory, RAM), read Only Memory (ROM). For example, random access memory includes, but is not limited to: dynamic random access memory (Dynamic Random Access Memory, DRAM), double data rate synchronous dynamic random access memory (DDR), static random access memory (Static Random Access Memory, SRAM), etc. Read-only memory includes, but is not limited to: programmable read-only memory (Programmable Read Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), etc. In practice, a plurality of memories 124 and different types of memories 124 may be configured in the controller 0. The number and type of the memories 124 are not limited in this embodiment.
The memory 124 serves as a temporary data storage for an Operating System (OS) or other program in progress, in which program code is stored, and the processor 123 executes the program code stored in the memory 124 to realize functions for which the program code is designed. If a memory error occurs during the running process of the processor 123, a process failure is caused, the process needs to be restarted, so that the service is interrupted, and in a more serious case, the operating system needs to be restarted, so that the service recovery time is increased. It can be seen that the memory 124 plays a vital role in the proper operation of the system. However, as the memory capacity and frequency increase, the probability of occurrence of memory errors gradually increases, so that the reliability of the system is difficult to guarantee, and the influence on the service is immeasurable.
Therefore, the embodiment of the application provides a data processing method which is used for improving the reliability of a system and reducing the influence of memory errors on services.
Before describing the data processing method provided in the embodiments of the present application, a software structure to which the data processing method is applicable will be first described. The software structure of the controller 0 provided in the embodiment of the present application will be described below with reference to fig. 2, taking the controller 0 in the system shown in fig. 1 as an example.
At the software level, as shown in fig. 2, the controller 0 is installed and runs at least two operating systems (fig. 2 only shows two operating systems, but the embodiment of the present application is not limited thereto), and the at least two operating systems include a master operating system (master OS) and one or more slave operating systems (slave OS) (fig. 2 only shows one slave operating system, but the embodiment of the present application is not limited thereto).
In this application, the functions of the master operating system and the slave operating system are different, and the memory reliability levels are different, which are described as follows:
a main operating system:
the main operating system is configured to provide services to a user device (such as the application server 100), for example, when providing the foregoing storage service, and the main operating system (or a processor running the main operating system) receives a request sent by the application server 100, such as a read request, a write request, or the like, through the front end interface 125.
Specifically, the main operating system is configured to provide services for external devices, as in fig. 2, where the main operating system is configured to provide a front-end protocol transceiver service, where the front-end protocol may be a communication protocol used by the front-end interface 125 to interact with the application server 100, and the service is configured to communicate with the application server 100 through the front-end interface 125. Generally, a process or software for providing services to a user runs in a user mode of an operating system, and the following description will not be repeated, and the main operating system may also run a front end peripheral driver corresponding to the process running in the user mode, where the front end peripheral driver runs in a kernel mode and is used to drive the front end interface 125 to communicate with the application server 100.
The master operating system may also be used for managing the slave operating system, for example, the system is powered on to first pull up the master operating system, the master operating system allocates hardware resources for the slave operating system, detects whether the slave operating system operates normally, for example, monitors the heartbeat of the slave operating system to check whether the slave operating system fails, and when the slave operating system is detected to fail, the slave operating system may be restarted to restore normal operation. Illustratively, as in FIG. 2, a Mgmt OS process running on the master operating system is used to provide functionality to manage the slave operating system.
The host operating system also has the functionality to communicate with other controllers, such as controller 1, and illustratively, as in fig. 2, the host operating system also provides forwarding services for interacting with controller 1, such as forwarding requests received from application server 100 to controller 1 for processing by controller 1.
The main operating system further comprises a memory management driver for managing the memory of the main operating system, wherein the memory management policy such as memory mirroring and memory inspection, such as a independent disk structure (raid) for performing distributed parity check in the memory, is similar to the hot backup of the hard disk, and the memory mirroring is to make two copies of memory data and place the two copies of the memory data in the main memory and the mirror memory respectively, so that when the memory data in the main memory has errors, the data can be obtained from the mirror memory, thereby improving the memory reliability of the main operating system. The memory raid refers to distributing data into the memory of the main operating system in units of blocks, specifically, storing the data and parity information corresponding to the data into each memory unit forming the memory of the main operating system, where the parity information and the corresponding data are respectively stored in different memory units. When one of the memory units is damaged, the damaged data is recovered by using the rest data and the corresponding parity check information, so that the reliability of the memory is improved. Illustratively, the memory raid may be raid4, raid5, raid6, raid10, raid0, raid1, where raid5 allows one memory unit to be corrupted, i.e., when one memory unit is corrupted, the data in the other memory unit may be used to recover corrupted data; raid6 allows two memory cells to be damaged, and specific embodiments or types of memory raid may refer to the description of the related art and will not be described in detail herein.
In addition, the memory management may further provide basic management of the memory, such as memory application, release, etc., which is not specifically limited in this application.
2. A slave operating system;
from the operating system, it is used to calculate or process data, and in this application, it may be used to process read requests, write requests, etc. from the application server 100. Illustratively, services provided from an operating system include, but are not limited to: data plane service, metadata service, back-end disk processing service. Drivers running in kernel mode from the operating system include, but are not limited to: back end peripheral drivers, memory management, etc.
The data plane service may be used to process a read request and a write request from the application server 100, for example, for the read request, the data requested to be read by the read request may be obtained from the memory 124 or the hard disk 134 through the back-end interface 126. For another example, for a write request, the data to be written carried in the write request may be written into the memory 124, or the data may be written into the hard disk 134 through the back-end interface 126, and optionally, before the data is written into the hard disk 134, the data to be written may be calculated or processed, for example, data de-duplication, data compression, data verification, and the like. Illustratively, the data plane service may operate in a user mode of the secondary operating system, and in response, the secondary operating system may read or write to the hard disk 134 through the back-end interface 126 via the back-end peripheral driver operating in a kernel mode. And the metadata service is used for generating metadata. And the memory management is used for managing the memory of the slave operating system, such as memory application, release and the like.
It should be noted that the foregoing is merely an example, and the master OS and the slave OS may have more or less functions than those shown in fig. 2, for example, the master OS may further include a service for managing the slave OS, an infrastructure service, a schedule management, the slave OS may further include a control plane service, a schedule service, and the like. Specifically, the services of the slave OS are managed, such as for detecting whether the slave OS has failed, restarting the slave OS if so, and so on. The infrastructure services are used to manage basic components, such as cpu, threads, etc., and reference may be made to the description of related technologies, which are not enumerated here, and the functions of the master operating system and the slave operating system are not specifically limited in this application. In addition, the names of the services or drivers are merely a proxy, and may have different names in different application scenarios, which is not specifically limited in this application.
At the hardware level, the hardware used by the master operating system and the slave operating system are both from the controller 0, and the hardware structure of the controller 0 may be described with reference to fig. 1, which is not repeated herein. Specifically, the master operating system and the slave operating system respectively correspond to a set of hardware, or, in other words, the hardware of different operating systems are isolated from each other, in other words, the master operating system does not use the hardware of the slave operating system, and the slave operating system does not use the hardware of the master operating system.
For example, the process of hardware allocation may include: the hardware of the controller 0 (including but not limited to the processor 123, the memory 124, the front-end interface 125, and the back-end interface 126) is split (or isolated) into two parts to obtain two hardware groups (such as the hardware group 1 and the hardware group 2 in fig. 2), each hardware group is uniquely allocated to one operating system for use, for example, assuming that the controller 0 includes two processors 123, one of the processors 123 may be allocated to the main operating system for running the main operating system. Another processor 123 is assigned to the secondary operating system for running the secondary operating system. For another example, assuming that controller 0 includes only one processor 123, the processor 123 includes multiple cores, the multiple cores may be split into two parts, with one part of the cores being assigned to the master operating system and the remaining part of the cores being assigned to the slave operating system. Similarly, memory 124 is split into two portions, with one portion of memory allocated to the master operating system and the remaining portion of memory allocated to the slave operating system.
In this application, since the function of the master operating system is to interact with the application server 100, the slave operating system is to process requests, the front-end interface 125 is assigned to the master operating system, and the back-end interface 126 is assigned to the slave operating system. The hardware splitting may be implemented by multi-OS partitioning, or by other means, which is not specifically limited in this application.
Briefly, in FIG. 2, hardware set 1 is allocated for use by the host operating system, and hardware set 1 includes a portion of the processor resources, a portion of the memory resources, and front-end interface 125 of controller 0. Hardware set 2 is allocated for use from the operating system, and hardware set 2 includes the remaining portion of processor resources of controller 0, the remaining portion of memory resources, and back-end interface 126. The main operating system uses the hardware of hardware group 1 and does not use the hardware of hardware group 2. Similarly, the hardware of hardware group 2 is used from the operating system, and the hardware of hardware group 1 is not used.
It should be noted that the amount or capacity of the hardware included in the two hardware groups is not necessarily equal, in other words, when splitting the hardware, the average allocation is not required, for example, in the present application, the memory capacity of the master operating system is different from the memory capacity of the slave operating system, and even, the memory capacity of the master OS is far lower than the memory capacity of the slave OS, that is, the memory capacity included in the hardware group 1 is lower than the memory capacity included in the hardware group 2, for example, assuming that the total capacity of the memory 124 included in the controller 0 is 4GB, the memory capacity allocated to the master operating system may be 512MB, and the rest of the memory (3gb+512 MB) is allocated to the slave operating system. Of course, this is merely an example, which is not specifically limited in this application.
According to the allocation mode, more memory is allocated to the slave operating system, the performance of processing requests by the slave operating system can be guaranteed, and relatively less memory is allocated to the master operating system, so that the probability of memory errors accessed in the master operating system is reduced, and the lower the probability of memory errors, the higher the memory reliability. This may also be manifested as a different level of memory reliability for the master operating system than the slave operating system. In the present application, the memory reliability of the main operating system is higher than that of the operating system. In addition, the type of memory allocated to the master operating system may be the same as or different from the type of memory allocated to the slave operating system, for example, the memory allocated to the master operating system may include SRAM, the memory allocated to the slave operating system may include SRAM, DRAM, and the like. Thereby further improving the memory reliability and response speed of the main OS.
The software structures of the controller 1 are similar to the controller 0, the controller 1 includes at least a master operating system and a slave operating system, and the software structures of the controller 1 and the controller 0 may be the same or different, for example, the software structures of the controller 0 and the controller 1 each include a master OS and a slave OS, or the controller 0 includes a master OS and a slave OS, and the controller 1 includes a master OS and a plurality of slave OS, which is not limited in this application.
In the foregoing, taking fig. 1 as an example, a hardware structure and a software structure of a system to which the embodiments of the present application are applicable are described, and it should be noted that, fig. 1 shows a centralized storage system with integrated disk control, where an engine 121 has a hard disk slot, and a hard disk 134 may be directly disposed in the engine 121, that is, the hard disk 134 and the engine 121 are disposed in the same device. Alternatively, the storage system 120 may be a disk-controlled separate storage system, in which the engine 121 may not have a disk slot, and the hard disk 134 needs to be placed in the hard disk frame 130, and the back-end interface 126 communicates with the hard disk frame 130. The back-end interface 126 exists in the engine 121 in the form of an adapter card, and two or more back-end interfaces 126 may be simultaneously used on one engine 121 to connect a plurality of hard disk frames. Alternatively, the adapter card may be integrated onto the motherboard, in which case the adapter card may communicate with the processor 123 via a high-speed serial computer expansion (peripheral component interconnect express, PCI-E) bus.
In addition, the data processing method provided in the embodiment of the present application is applicable to a centralized storage system, and is also applicable to a distributed storage system, as shown in fig. 3, which is a schematic system architecture diagram of the distributed storage system provided in the embodiment of the present application, where the distributed storage system includes a server cluster. The server cluster includes one or more servers 140 (three servers 110 are shown in fig. 3, but are not limited to three servers 110), and the servers 140 may communicate with each other.
In hardware, as shown in fig. 3, the server 110 includes at least a processor 112, a memory 113, a network card 114, and optionally, a hard disk 105. The processor 112, the memory 113, the network card 114 and the hard disk 105 are connected by buses. The functions and specific types of the processor 112, the memory 113, the network card 114 and the hard disk 105 may be referred to in the relevant description of fig. 1, and will not be repeated here. At the software level, the software structure of each server 110 may refer to the description of the software structure of the controller 0 shown in fig. 2, which is not repeated herein.
It should be noted that the structures of the controller 0 shown in fig. 1 and the server 110 shown in fig. 3 are only examples, and in an actual product, the controller 0 and the server 110 may have more or fewer components, for example, the controller 0 and the server 110 may further include input/output devices such as a keyboard, a mouse, a display screen, and the like. The hardware structure of the device in the system applicable to the embodiment of the application is not particularly limited, and all devices capable of installing at least two operating systems are applicable to the embodiment of the application.
The method for processing data provided in the embodiment of the present application is applied to the system shown in fig. 1, and the method may be executed by the controller 0 or the controller 1 in fig. 1, where the controller 0 is configured to operate at least one main OS and one secondary OS, and the controller 1 is also configured to operate at least one main OS and one secondary OS, and the method is described below by taking the controller 0 as an example.
Fig. 4 is a flow chart of a data processing method according to an embodiment of the present application, as shown in fig. 4, where the method includes the following steps:
in step 401, the main OS of the controller 0 receives a request transmitted from the application server 100.
In an alternative embodiment, the main OS may use a mirroring technique to configure the memory of the main OS, for example, the main OS backs up the memory data generated in the whole process of processing the request from the application server 100 to the mirrored memory space, so when the main OS accesses the memory error, the memory data can be recovered from the mirrored memory space, which does not cause the main OS to fail, improves the memory reliability of the main OS, and reduces the probability of system failure and the influence on the service.
In step 402a, the master OS sends the request to the first slave OS in the case where no failure of the slave OS of the controller 0 (denoted as the first slave OS) is detected.
In step 403a, the first slave OS processes the request. The processing flow of the request by the first slave OS may be referred to as the processing flow of the second slave OS hereinafter, and will not be repeated here.
In step 402b, the master OS detects a first slave OS failure.
In step 403b, the master OS sends the request to the second slave OS.
At step 404b, the second slave OS processes the request.
In one embodiment, except for the master OS, the controller 0 runs only one slave OS, and the second slave OS may be one slave OS running in the controller 1, please understand in conjunction with fig. 5 that fig. 5 is a schematic diagram of a transmission path of the request for the interaction between the controller 0 and the controller 1, and as shown in fig. 5, the front end protocol service of the master OS of the controller 0 receives, through the front end interface 125, the request sent by the application server 100, and if the first slave OS fails, the front end protocol service sends the request to the forwarding service, and the forwarding service sends the request to the second slave OS of the controller 1.
If the request is a write request, in an embodiment, the second slave OS may temporarily buffer the data to be written carried in the write request in the memory of the second slave OS, and when the data amount in the memory of the second slave OS reaches a certain threshold, the second slave OS sends the data in the memory to the hard disk 134 through the back-end interface 126 to store the data, and in an optional manner, before writing the data into the hard disk 134, the second slave OS may further perform data processing, such as data duplication, data compression, or other processing, to write the processed data into the hard disk 134, so as to reduce the storage space occupied by the data. The metadata service may also generate metadata for the data and write the metadata to a memory or hard disk 134 (not shown in fig. 5) of the second slave OS.
If the request is a read request, the second slave OS acquires the data requested to be read by the read request, and, for example, if the memory of the second slave OS stores the data, i.e. the memory hits, the second slave OS reads the data from the memory, if the memory misses, reads the data from the hard disk 134, returns the read data to the first host OS, and sends the read data to the controller 0, and the controller 0 sends the data to the application server 100 through the front end interface 125, see a transmission path shown by a dotted line in fig. 5.
In another alternative embodiment, the controller 0 runs a plurality of slave OS, and the second slave OS may also be a slave OS on the controller 0 except the first slave OS, as shown in fig. 6, when the controller 0 detects that the first slave OS fails, the service is switched to another slave OS (e.g. denoted as a third slave OS) of the controller 0, and the third slave OS processes the request, where the transmission flow of the request and the processing flow of the request by the third slave OS may be referred to the foregoing description, and are not repeated herein.
In the above design, the hardware of the controller 0 is split to construct at least two OSs with different memory reliability levels, where a primary OS with a high reliability level is used to deploy services related to upper-layer services, when a slave OS with a low reliability level accesses to a memory error to generate a fault, the primary OS can sense that the slave OS fails at a high speed and switch a request to another slave OS to process the request, for example, the other slave OS of the controller or the slave OS of the other controller, and a person skilled in the art determines through experiments that the time for the primary OS to switch the request from the first slave OS to the second slave OS can reach the second level, see table 1, where table 1 is some experimental data obtained through experiments by the person skilled in the application.
TABLE 1
The single OS system refers to a hardware architecture shown in fig. 1, where the controller 0 runs only one OS (i.e., single OS), and when a user-state process of the OS accesses UCE, a required recovery time is between 5 seconds (second) and 3 minutes (minute). When the kernel mode driver of the OS accesses the UCE, the required recovery time is 2 minutes to 10 minutes. When the controller 0 and the controller 1 are both single OS, the controller 1 may take over the service of the controller 0 when the controller 0 fails, and the controller 1 may take over the service of the controller 0 when the controller 0 fails, with a required switching time of about 30 seconds, and during the above-mentioned recovery period or service switching period, the service is interrupted.
According to the technical scheme, the main OS is not used for processing the request, and the memory allocated to the main OS is less, so that the memory management can be performed in a mirror image technology or a memory raid mode, the memory reliability of the main OS is further improved, the probability of memory errors of the main OS is quite low, if the main OS detects that the slave OS breaks down, the request can be switched to other slave OSs, service interruption cannot be caused because of the memory errors of the slave operating system, the whole switching time is less than 1 second, the upper layer service only senses slight time delay, the influence of the memory errors on the service can be reduced to a great extent, and the reliability of the whole system is improved.
Based on the same inventive concept as the method embodiment, the present application further provides a data processing device, which is configured to perform the method performed by the controller 0 in the method embodiment. The apparatus may be a hardware structure, a software module, or a hardware structure plus a software module. The apparatus 700 may be implemented by a system-on-chip. In the embodiment of the application, the chip system may be formed by a chip, and may also include a chip and other discrete devices. As shown in fig. 7, the data processing apparatus 700 includes a master operating system instance 701, a first slave operating system instance 702; optionally, a second secondary operating system instance 703 may also be included.
A master operating system instance 701, configured to send a request to a target slave operating system instance when the first slave operating system instance 702 fails and cannot process the request; for a specific implementation, please refer to the description of steps 402b to 403b in fig. 4, wherein the target secondary operating system instance is the second secondary operating system instance 703, or refer to the flowchart shown in fig. 5, which is not repeated herein. Or the target secondary operating system instance is a secondary operating system instance in the second computing device; the detailed implementation is described with reference to the flowchart shown in fig. 6, and will not be repeated here.
The target slave operating system instance is used for processing the request. The specific implementation is described with reference to step 403b in fig. 4, and will not be described herein.
In one possible implementation, the main operating system instance 701 is further configured to receive a request sent by a client device (such as the application server 100); the detailed implementation is described with reference to step 401 in fig. 4, and will not be described herein.
In one possible implementation, the memory reliability of the master operating system instance 701 is higher than the memory reliability of the first and second slave operating system instances 702 and 703.
In one possible implementation, the memory of the main operating system instance 701 is configured using a mirroring technique.
In one possible implementation, when the apparatus 700 includes a first operating system instance 701 and the second operating system instance 702, the apparatus 700 includes a first set of hardware resources and a second set of hardware resources; each hardware resource group comprises a processor resource and a memory resource; the first set of hardware resources provides resources for the primary operating system instance 701; the second set of hardware resources provides resources for the first slave operating system instance 702.
In one possible implementation, the main operating system instance 701 is further configured to: if the first slave operating system does not fail, the request is sent to the first slave operating system instance; the first slave operating system instance is configured to process the request. The specific implementation is described in steps 402a to 403a in fig. 4, and will not be described here again.
The data processing device provided by the embodiment of the invention can be applied to a storage system or a server, and the embodiment of the invention is not limited to the above.
The embodiment of the present application further provides a computer storage medium, in which computer instructions are stored, which when executed on a data processing apparatus, cause the data processing apparatus to perform the above related method steps to implement the method performed by the controller 0 in the above embodiment, and the descriptions of the steps in fig. 4 are omitted herein, and the descriptions of the steps are omitted herein.
The embodiment of the present application further provides a computer program product, which when executed on a computer, causes the computer to perform the above-mentioned related steps to implement the method performed by the controller 0 in the above-mentioned embodiment, and the descriptions of the steps in fig. 4 are omitted herein, and the descriptions thereof are omitted herein.
In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component, or a module, and may include a processor and a memory connected to each other; the memory is configured to store computer-executable instructions, and when the device is running, the processor may execute the computer-executable instructions stored in the memory, so that the chip executes the method executed by the controller 0 in the above method embodiments, see description of each step in fig. 4, which is not repeated herein.
The storage device, the computer storage medium, the computer program product, or the chip provided in the embodiments of the present application are used to execute the method corresponding to the controller 0 provided above, so that the beneficial effects that can be achieved by the storage device, the computer storage medium, the computer program product, or the chip can refer to the beneficial effects in the corresponding method provided above, and are not described herein again.
It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit (or module) in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the available medium. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The various illustrative logical blocks and circuits described in the embodiments of the present application may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in the embodiments of the present application may be embodied directly in hardware, in a software element executed by a processor, or in a combination of the two. The software elements may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the present application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to include such modifications and variations as well.

Claims (13)

1. A data processing method, wherein the method is applied to a first computing device, the first computing device running at least a master operating system and a first slave operating system;
the method comprises the following steps:
if the first slave operating system fails and cannot process the request, the master operating system sends the request to a second slave operating system; wherein the second slave operating system is a slave operating system in a second computing device or the second slave operating system is one slave operating system in the first computing device other than the first slave operating system;
the second slave operating system processes the request.
2. The method of claim 1, wherein the memory reliability of the master operating system is higher than the memory reliability of any one of the slave operating systems.
3. The method of claim 2, wherein the memory of the host operating system is configured using a mirroring technique.
4. The method of claim 1 or 2, wherein the first computing device comprises a first set of hardware resources and a second set of hardware resources; each hardware resource group comprises a processor resource and a memory resource; the first hardware resource group provides resources for the main operating system; the second set of hardware resources provides resources for the first slave operating system.
5. The method of any one of claims 1-4, wherein the method further comprises:
if the first slave operating system does not fail, the master operating system sends the request to the first slave operating system; the first slave operating system processes the request.
6. A data processing apparatus, the apparatus comprising;
the master operating system instance is used for sending a request to the second slave operating system instance when the first slave operating system instance fails and cannot process the request; wherein the second slave operating system instance is one of the first computing device other than the first slave operating system instance, or the second slave operating system instance is one of the second computing device;
the second slave operating system instance is configured to process the request.
7. The apparatus of claim 6, wherein the master operating system instance has a higher memory reliability than any one of the slave operating system instances.
8. The apparatus of claim 7, wherein the memory of the primary operating system instance is configured using a mirroring technique.
9. The apparatus of claim 6 or 7, wherein the apparatus comprises at least a first set of hardware resources and a second set of hardware resources; each hardware resource group comprises a processor resource and a memory resource; the first hardware resource group provides resources for the main operating system instance; the second set of hardware resources provides resources for the first slave operating system instance.
10. The apparatus of any of claims 6-9, wherein the primary operating system instance is further to: if the first slave operating system does not fail, the request is sent to the first slave operating system instance; the first slave operating system instance is configured to process the request.
11. A computing device comprising a first processor, a second processor, a first memory, and a second memory;
the first memory stores computer program instructions of a main operating system; the second memory stores computer program instructions of a first slave operating system;
the first processor executing computer program instructions in the first memory to implement the method performed by the host operating system of any one of claims 1 to 5; the second processor executes the computer program instructions in the second memory to implement the method of performing the first slave operation of any one of claims 1 to 5.
12. A computer readable storage medium comprising program code comprising instructions for implementing the method of any one of claims 1 to 5.
13. A computing device system, the system comprising at least a first computing device and a second computing device;
the first computing device is operated with a first master operating system and a first slave operating system; the second computing device is provided with a second master operating system and a second slave operating system;
the first master operating system is used for sending a request to a second slave operating system when the first slave operating system fails and cannot process the request;
the second slave operating system is used for processing the request.
CN202210115056.6A 2022-01-28 2022-01-28 Data processing method and device Pending CN116560827A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210115056.6A CN116560827A (en) 2022-01-28 2022-01-28 Data processing method and device
PCT/CN2023/071510 WO2023143039A1 (en) 2022-01-28 2023-01-10 Data processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210115056.6A CN116560827A (en) 2022-01-28 2022-01-28 Data processing method and device

Publications (1)

Publication Number Publication Date
CN116560827A true CN116560827A (en) 2023-08-08

Family

ID=87470421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210115056.6A Pending CN116560827A (en) 2022-01-28 2022-01-28 Data processing method and device

Country Status (2)

Country Link
CN (1) CN116560827A (en)
WO (1) WO2023143039A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000207232A (en) * 1999-01-18 2000-07-28 Fujitsu Ltd Multi-operating system controller and recording medium thereof
CN103902316B (en) * 2012-12-27 2017-07-25 联想(北京)有限公司 Switching method and electronic equipment
CN107807827A (en) * 2017-10-19 2018-03-16 安徽皖通邮电股份有限公司 A kind of method for supporting multi-core CPU multiple operating system
JP6813010B2 (en) * 2018-08-31 2021-01-13 横河電機株式会社 Availability systems, methods, and programs

Also Published As

Publication number Publication date
WO2023143039A1 (en) 2023-08-03

Similar Documents

Publication Publication Date Title
US8489914B2 (en) Method apparatus and system for a redundant and fault tolerant solid state disk
US8239518B2 (en) Method for detecting and resolving a partition condition in a cluster
CN106776159B (en) Fast peripheral component interconnect network system with failover and method of operation
US7600152B2 (en) Configuring cache memory from a storage controller
US7028218B2 (en) Redundant multi-processor and logical processor configuration for a file server
EP2348413B1 (en) Controlling memory redundancy in a system
US9766992B2 (en) Storage device failover
US20140173330A1 (en) Split Brain Detection and Recovery System
US20120144233A1 (en) Obviation of Recovery of Data Store Consistency for Application I/O Errors
US20190235777A1 (en) Redundant storage system
US11573737B2 (en) Method and apparatus for performing disk management of all flash array server
US10318393B2 (en) Hyperconverged infrastructure supporting storage and compute capabilities
US10782898B2 (en) Data storage system, load rebalancing method thereof and access control method thereof
JP2008052407A (en) Cluster system
US8683258B2 (en) Fast I/O failure detection and cluster wide failover
US9063854B1 (en) Systems and methods for cluster raid data consistency
US8782465B1 (en) Managing drive problems in data storage systems by tracking overall retry time
JP6714037B2 (en) Storage system and cluster configuration control method
US20230251931A1 (en) System and device for data recovery for ephemeral storage
US10114754B1 (en) Techniques for space reservation in a storage environment
WO2023169185A1 (en) Memory management method and device
US11210034B2 (en) Method and apparatus for performing high availability management of all flash array server
CN116560827A (en) Data processing method and device
JP5511546B2 (en) Fault tolerant computer system, switch device connected to multiple physical servers and storage device, and server synchronization control method
US8762673B2 (en) Interleaving data across corresponding storage groups

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination