CN116266150A - Service recovery method, data processing unit and related equipment - Google Patents

Service recovery method, data processing unit and related equipment Download PDF

Info

Publication number
CN116266150A
CN116266150A CN202210269274.5A CN202210269274A CN116266150A CN 116266150 A CN116266150 A CN 116266150A CN 202210269274 A CN202210269274 A CN 202210269274A CN 116266150 A CN116266150 A CN 116266150A
Authority
CN
China
Prior art keywords
dpu
interface card
memory
host
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210269274.5A
Other languages
Chinese (zh)
Inventor
冷超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2022/139182 priority Critical patent/WO2023109880A1/en
Publication of CN116266150A publication Critical patent/CN116266150A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Abstract

The application provides a service recovery method, which is executed by a DPU interface card, specifically, after software (such as an operating system or other application programs) of the DPU interface card is restarted, the DPU interface card acquires service information stored in a memory of a host, wherein the service information is information generated by IO (input/output) sent by a processor of the host after the software is restarted, and the DPU interface card processes the information generated by IO sent by the processor of the host before the software is restarted, so that the DPU interface card recovers service according to the service information stored in the memory of the host. Therefore, the DPU interface card does not depend on a local memory to resume the interrupted service, and even if service data loss occurs to the service due to version upgrade of software (such as an operating system and the like) of the DPU interface card or the memory failure of the DPU interface card and the like, the DPU interface card can also utilize the memory of the host to quickly resume the service, so that the influence on the service is reduced.

Description

Service recovery method, data processing unit and related equipment
The present application claims priority from chinese patent application filed at 2021, 12, 16, with application number 202111540861.5, application name "a data processing unit processing method, a data processing unit processing and system", the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a service recovery method, a data processing unit, and related devices.
Background
With the development of the demands of business processing, the demands of the computing power of the central processing unit (central processing unit, CPU) are increasing. Currently, data processing unit (data processing unit, DPU) interface cards can act as offload engines for CPUs, and can process traffic in concert with the CPUs to achieve more efficient data processing capabilities. The DPU interface card may include an application specific integrated circuit (application specific integrated circuit, ASIC), a processor, and a memory, where the ASIC and the processor may provide computing power for the CPU, and the memory may temporarily store service information of the DPU interface card.
In an actual application scenario, due to an upgrade of an operating system of the DPU interface card, or a memory failure in the DPU interface card, such as a row (row), column (column) or storage array (bank) failure in a memory, the operating system of the DPU interface card is restarted, which causes loss of service information stored in the memory of the DPU interface card, thereby causing service interruption.
Disclosure of Invention
In view of this, the embodiments of the present application provide a service restoration method to enable a DPU interface card to restore service after restarting software in the DPU interface card. The application also provides a corresponding data processing unit, computing device, interface card, computer readable storage medium and computer program product.
In a first aspect, embodiments of the present application provide a service restoration method, where the method is performed by a DPU interface card, where the DPU interface card is coupled to a host, for example, may be coupled through a bus, or the like; when the DPU interface card resumes the service, specifically, after the software (such as an operating system, etc.) of the DPU interface card is restarted, the service information stored in the memory of the host is obtained, where the service information is information generated by IO sent by the processor of the host and processed by the DPU interface card before the software is restarted, so that the DPU interface card can resume processing the interrupted service according to the service information stored in the memory of the host.
Because the DPU interface card can utilize the service information stored in the memory of the host computer to restore the service in time after restarting the software, the interrupted service can be restored without depending on the local memory of the DPU interface card, thus, even if the software of the DPU interface card is restarted due to the reasons of version upgrading or memory failure of the software (such as an operating system and the like) of the DPU interface card, the service information is lost, the DPU interface card can also utilize the memory of the host computer to quickly restore the service, and the influence on the service is reduced.
In one possible implementation, the DPU interface card is coupled to the host based on a PCIe bus, and the PCIe link between the DPU interface card and the host is not broken during the software restart in the DPU interface card. In this way, during the software restart process, the host may not sense the state change of the DPU interface card, so that the impact on the host may be reduced.
In one possible implementation, the software restart of the DPU interface card is triggered by the DPU interface card failing or by a software upgrade to a previous version of the software of the DPU interface card, thereby implementing the failover or software upgrade to the DPU interface card.
In other implementations, the DPU interface card may also restart software or the like after receiving an instruction sent by the host to restart software.
In one possible implementation, when the memory of the DPU interface card fails, the DPU interface card restarts the software to implement the repair of the failed memory of the DPU interface card.
In one possible implementation, when the memory of the DPU interface card fails and the failed memory does not meet the preset condition, the DPU interface card restarts the operating system of the DPU interface card. Therefore, when the memory fails, the DPU interface card can determine whether to restart the operating system of the DPU interface card according to the memory failure condition, so that the condition of restarting the operating system of the DPU interface card is constrained.
In one possible implementation, when the memory of the DPU interface card fails and the failed memory meets a preset condition, the DPU interface card may also restart the service component in the kernel of the operating system of the DPU interface card that uses the failed memory area. Therefore, the DPU interface card can avoid restarting the whole operating system, thereby reducing the influence of the fault memory on the DPU interface card as much as possible.
For example, the preset condition that the failed memory in the DPU interface card meets may be that the size of the failed memory does not exceed a preset size, for example, the number of failed rows (or columns) in the memory does not exceed a preset number of rows (or preset columns), etc. At this time, the influence of the fault memory on the DPU interface card is relatively small, so that the DPU interface card can realize fault repair without restarting the whole operating system.
Or, the preset condition satisfied by the failed memory may specifically be that the system component of the failed memory is a preset system component, so that when the failed memory affects a specific system component, the DPU interface card may restart the part of the system components to implement the fault repair.
Or, the preset condition satisfied by the failed memory may specifically be that the number of system components using the failed memory does not exceed the preset number. At this time, the fault memory only affects a small number of service components, but does not affect the rest of service components, so the DPU interface card can restart the affected service components without restarting the entire operating system and all service components to achieve fault repair.
In one possible implementation manner, when the DPU interface card obtains service information stored in a memory of the host, the DPU interface card may specifically obtain a first address identifier from a memory area, where the first address identifier is used to identify the memory area in the host, and the memory area is a storage area in a memory of the host and is used to store the service information, where the memory area may be a storage area in a volatile memory or a storage area in a nonvolatile memory, and when an operating system in the DPU interface card is restarted, data stored in the memory area is not lost; thus, after restarting the operating system, the DPU interface card can access the memory area of the host according to the first address identifier to obtain service information.
The memory area may be located inside the DPU interface card, for example, may be a logic block in a CPLD included in the DPU interface card, or the memory area may be located outside the DPU interface card, for example, may be a storage area in an external memory connected to the DPU interface card, or the like.
In one possible implementation manner, before restarting the software, the DPU interface card may further apply for a memory area for storing service information to the host, and obtain a first address identifier of the memory area, where the first address identifier may be, for example, a first address of the memory area, so that the DPU interface card may store the service information to the memory area according to the first address identifier. Thus, after the software in the DPU interface card is restarted, the DPU interface card can utilize the service information stored in the memory area to realize service recovery.
In a possible implementation manner, the DPU interface card may further obtain configuration information from the memory area, where the configuration information is used to configure the DPU interface card, where the configuration information may specifically include a second address identifier of a send queue (such as a first address of the send queue, etc.) in a memory of the host, and a third address identifier of a completion queue (such as a first address of the completion queue, etc.), where the send queue is used to store an IO sent by a processor of the host, and the completion queue is used to store an execution result of the DPU interface card for the IO.
Optionally, the configuration information may further include a communication format, a communication protocol version, etc. when the DPU interface card and the host perform data interaction, or the configuration information may further include other contents.
In a second aspect, an embodiment of the present application further provides a data processing unit DPU device configured to perform the service restoration method as described in the first aspect or any implementation manner of the first aspect.
In a third aspect, the present application provides a computing device comprising a host and a DPU (data processing unit) interface card, wherein the host comprises a memory and a processor, the DPU interface card configured to: after restarting software of the DPU interface card, acquiring service information stored in a memory of the host, wherein the service information is information generated by input/output IO sent by the processor processed by the DPU interface card before restarting the software; and recovering the service according to the service information. Illustratively, the DPU interface card is configured to perform the service restoration method as described in the first aspect or any implementation of the first aspect.
In a fourth aspect, the present application provides a data processing unit DPU interface card, where the DPU interface card includes a printed circuit board, an interface, and a data processing unit DPU chip, where the interface card communicates with a host through the interface, the interface and the DPU are installed on the printed circuit board, and the DPU chip is configured to obtain, after restarting software of the DPU interface card, service information stored in a memory of the host, where the service information is information generated by input/output IO sent by a processor of the host by the DPU interface card before restarting the software; and recovering the service according to the service information.
Illustratively, the DPU chip in the DPU interface card may be used to perform the service restoration method as described in the first aspect or any implementation of the first aspect.
In a fifth aspect, the present application provides a data processing unit DPU chip, applied to a DPU interface card, where the DPU interface card is coupled to a host, and the DPU chip includes an acquisition circuit and a processing circuit, where the acquisition circuit is configured to acquire, after restarting software of the DPU interface card, service information stored in a memory of the host, where the service information is information generated by input/output IO sent by a processor of the host, and before restarting the software, the processing circuit processes the service information; and the processing circuit is used for recovering the service according to the service information.
Illustratively, the acquisition circuit and the processing circuit cooperate with each other to perform the service restoration method as described in the first aspect or any implementation manner of the first aspect.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a schematic diagram of an architecture of an exemplary DPU interface card provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a service recovery method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing unit DPU device according to an embodiment of the present disclosure;
fig. 4 is a schematic hardware structure of a computing device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a DPU interface card provided in an embodiment of the present disclosure;
Fig. 6 is a schematic structural diagram of a DPU chip according to an embodiment of the present disclosure.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings in the present application.
The terms first, second, third and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature.
The DPU interface card is used as an unloading engine of the CPU, in the process of assisting the CPU in processing services, service data is usually temporarily stored in a local memory of the DPU interface card, and the memory of the DPU interface card fails or software (such as an operating system and the like) in the DPU interface card is upgraded in version, so that the corresponding software of the DPU interface card is restarted, and data in the memory of the DPU interface card is lost. For example, in practical application, a portion of the memory area of the DPU interface card may not be free from faults, and in order to repair a memory fault, software of the DPU needs to be restarted, so that service data temporarily stored in the portion of the memory area may be lost, which results in that the DPU interface card is difficult to continue to process the service because of the service data loss. In addition, when the operating system or other software in the DPU interface card needs to be upgraded, the service data stored in the memory of the DPU interface card is lost due to restarting the operating system of the DPU interface card, so that the DPU interface card is affected to process the service.
Based on this, the embodiment of the application provides a service restoration method to restore the service interrupted by the DPU interface card. In particular, before restarting the software of the DPU interface card, the memory of the host stores the service information generated by the processing service of the DPU interface card in advance, so that after restarting the software (such as triggering the DPU interface card to re-operate the system due to a memory failure or a software version upgrade), the DPU interface card can acquire the service information from the memory of the host, and the DPU interface card can resume processing the interrupted service by using the service information. Therefore, even if the DPU interface card is restarted for the reasons described above, the service data stored in the memory of the DPU interface card is lost, the DPU interface card can also quickly recover the service by using the host memory, the local memory of the DPU interface card is not needed, and the influence on the service is reduced.
The above-described service restoration method may be applied to the DPU interface card 100 as shown in fig. 1, for example. As shown in fig. 1, DPU interface card 100 and host 200 are coupled by a peripheral component interconnect express (peripheral component interconnect express, PCIe) bus or other bus. Further, DPU interface card 100 includes printed circuit board 1011, interface 1012, DPU chip 101, and software 1013, and interface 1012 and DPU chip 101 are mounted on printed circuit board 1011. Specifically, interface 1012 may be a PCIe interface. The software 1013 may be an operating system of the DPU interface card 100, which includes a system service component unit 102, a soft reset unit 103, a memory supervision unit 104, and a microkernel unit 105. Further, a micro reset (micro reset) unit 106 or the like may be further included in the DPU interface card 100. The software shown in fig. 1, the specific implementation may be located in the memory of the DPU interface card; the software shown in fig. 1 may also be embedded, etc., which is not limited by the embodiment of the present invention.
The DPU chip 101 is configured to control the DPU interface card 100 to provide a storage service for the host 200, i.e., process storage type services, such as shortcut nonvolatile storage (non-volatile memory express, NVMe), virtiofs (see https:// virtiofs-fs. Gitlabio /), virtioscsi (see https:// www.ovirt.org/development/release-management/features/storage/virtiosi-scsi. Html) services, and the like. Alternatively, DPU chip 101 may also control DPU interface card 100 to provide computing services or the like, i.e., process computing-type traffic, for host 200. And, when software such as the operating system of the DPU interface card 100 completes restarting, the DPU chip 101 may control the DPU interface card 100 to resume the interrupted service (such as the above-described storage type service or the calculation type service).
The system service component unit 102 includes a plurality of service components in the kernel of the operating system, and the plurality of service components may provide services for the operating system in the DPU interface card 100 using the memory in the DPU interface card 100, such as a driver component, a file system component, a memory management component, a network protocol component, and the like shown in fig. 1. Wherein, the driving component is configured to drive the DPU interface card 100 to perform data communication with the host 200, and may include a driving framework and a service driver for a group of entities; a file system component for providing file system services, such as data storage, reading, management, etc. in the form of files; the memory management component is used for providing memory management services, such as distribution, recovery, isolation and the like of the memory area; a network protocol component for providing network protocol services, such as hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP) and the like.
A soft reset unit 103, configured to reset the hardware unit in the DPU interface card 100, and restart software such as the operating system of the DPU interface card 100.
The memory supervision unit 104 is configured to perform fault monitoring, fault repairing, fault isolation, and redundant area replacement on the memory in the DPU interface card 100, so that the memory in the DPU interface card 100 has reliability, availability, and serviceability (reliability, availability, serviceability, RAS).
The microkernel unit 105 is configured to manage resources in the DPU interface card 100, and split service components of the kernel, so as to enable the service components of the kernel to be individually restarted.
And the micro-reset unit 106 is used for realizing the independent restarting of service components in the kernel of the control operating system through the micro-kernel architecture.
Typically, DPU interface card 100 may assist host 200 in handling traffic. When the memory supervision unit 104 monitors that the memory in the DPU interface card 100 has a fault, if the memory fault causes the service to be interrupted and the software of the DPU interface card 100 is restarted, the DPU interface card 100 can recover the service by using the service information stored in the memory of the host 200. For example, the memory supervision unit 104 may isolate the failed memory location or replace the failed unit, and trigger the soft reset unit 103 to reset the hardware unit in the DPU interface card 100 and restart the operating system in the DPU interface card 100. Microkernel unit 105 then initializes the restarted hardware units and restarts the various service components in system service component unit 102. After the soft reset unit 103 completes the restarting of the operating system, the DPU chip 101 acquires the service information stored in the memory area 201 of the host 200, and resumes the service using the service information of the DPU interface card 100.
Further, when the memory fault range in the DPU interface card 100 is smaller, such as a network protocol component that only involves the use of the portion of the failed memory in the kernel of the operating system, the memory supervision unit 104 may isolate the failed memory location and trigger the micro-reset unit 106 to execute the micro-reset procedure, specifically trigger the micro-reset unit 106 to restart the service component that uses the portion of the failed memory, for example, the network protocol component may be restarted by the micro-reset unit 106 (the remaining unaffected service components may not need to execute the restart procedure), so as to restore the normal network communication function of the DPU interface card 100.
It should be noted that, in fig. 1, the coupling between the DPU interface card 100 and the host 200 through the PCIe bus is taken as an example for illustration, and in practical application, the DPU interface card 100 may be coupled with the host 200 by other manners, which is not limited in this embodiment. Moreover, the architecture of the DPU interface card 100 shown in fig. 1 is only used as an exemplary illustration, and other architectures may be adopted by the DPU interface card 100 in practical applications, such as many other types of service components may be included in the DPU interface card 100.
Next, various non-limiting embodiments of the service restoration process are described in detail.
Fig. 2 is a schematic flow chart of a service recovery method in an embodiment of the present application. The method may be applied to the DPU interface card 100 described above in fig. 1, or may be applied to other applicable DPU interface cards. The following description will be given by taking the DPU interface card 100 applied to fig. 1 as an example. The service restoration method shown in fig. 2 specifically may include:
s201: the DPU interface card 100 applies for the memory area 201 to the host 200, and obtains the first address identifier of the applied memory area 201.
The first address identifier may be, for example, a first address of the memory area 201, or may also be other identifier information indicating the memory area 201, such as a tail address, which is not limited in this embodiment.
In this embodiment, the DPU interface card 100 may apply a section of memory area to the host 200 in advance, so that the applied memory area is used to store service information related to the service processed by the DPU interface card 100. As an implementation example, DPU chip 101 in DPU interface card 100 may send a request to host 200 to apply for a memory area, so that host 200 may determine, in response to the request, a memory area 201 of a preset size from the available memory areas, assign it to DPU interface card 100, and return a first address identification of the memory area 201 to DPU chip 101. In practical application, after the DPU interface card 100 and the host 200 establish a communication connection, the host 200 may actively allocate the memory area 201 for the DPU interface card 100, and send the first address identifier corresponding to the memory area 201 to the DPU interface card 100.
S202: the DPU interface card 100 stores the first address identification to the target memory area.
As an implementation example, the DPU interface card 100 may be configured with a target storage area therein, and when the DPU interface card 100 restarts software such as an operating system, data stored in the target storage area may not be lost. For example, a complex programmable logic device (complex programmable logic device, CPLD) may be configured in the DPU interface card 100, and the DPU interface card 100 may store the first address identification to a logic block (i.e., the target memory area) in the CPLD. In practice, the target storage area may be implemented by a nonvolatile memory, such as at least one of an electrically rewritable read only memory (electrically alterable read only memory, EAROM), a charged erasable programmable read only memory (electrically erasable programmable read only memory, EEPROM), and a flash memory. Alternatively, the target storage area may be implemented by a volatile memory, such as at least one of a static random access memory (static random access memory, SRAM) and a dynamic random access memory (dynamic random access memory, DRAM).
In other possible implementation examples, the target storage area may also be disposed outside the DPU interface card 100, for example, the DPU interface card 100 may be externally connected to a nonvolatile memory or a volatile memory, so that the DPU interface card 100 may write the acquired first address identifier into the externally connected nonvolatile memory or volatile memory.
It should be noted that, in this embodiment, the execution order of step 202 and step S203 is not limited, for example, in other embodiments, the DPU interface card 100 may execute step S203 first, then execute step S202, or execute both steps simultaneously.
S203: in the process of assisting the host 200 in processing the service, the DPU interface card 100 stores service information generated by processing the service into the memory area 201 in the host 200 according to the first address identifier, where the service information is information generated by processing IO sent by the processor of the host 200 by the DPU interface card 100.
After completing the configuration of DPU interface card 100, DPU interface card 100 may begin to assist host 200 in processing one or more traffic. Taking a service as an example, a processor in the host 200 may send an Input Output (IO) corresponding to the service to a send queue for storage (the number of IOs sent by the processor may be one or more), so that the DPU interface card 100 may read the IO from the send queue of the host 200, parse and execute the read IO, and store data obtained by parsing and executing the IO in the memory of the DPU interface card 100. Meanwhile, the DPU interface card 100 may also store IO related information generated in the process of executing IO into the memory area 201 in the host 200. The IO related information is service information, and includes, for example, an IO execution stage, a key state of IO execution, and the like. In this way, even if the service data stored in the memory of the DPU interface card 100 is lost, the subsequent DPU interface card 100 can read the service information such as the IO execution stage and the key state of the IO execution from the memory area 201, and continue to execute the IO by using the service information, so as to realize service recovery, and in this way, the DPU interface card 100 can also be prevented from re-executing the IO, and the service recovery delay is reduced.
In a further possible embodiment, the DPU interface card 100 may store at least part of the service information generated during the process of executing the IO in the memory area 201, so as to reduce the resource consumption of the DPU interface card 100 for processing the service. For example, during an initial phase of executing an IO, DPU interface card 100 may not store the current execution phase of the IO and the critical state of IO execution to memory in host 200. Accordingly, if the traffic subsequently needs to be processed based on the IO, DPU interface card 100 may re-execute the IO to resume the traffic. Since the DPU interface card 100 has not started executing the IO or just started the IO before restarting the software such as the operating system, even if the DPU interface card 100 subsequently re-executes the IO in the process of recovering the service, the delay effect on the recovery of the processing service by the DPU interface card 100 is small. When the stage of executing the IO by the DPU interface card 100 reaches the middle stage or the later stage, the DPU interface card 100 may store the information of the key state of executing the IO, the stage of executing the IO, etc. into the memory area 201 of the host 200, so if the service is required to be restored based on the IO later, the DPU interface card 100 may continue executing the IO according to the information stored in the host 200, without re-executing the IO, thereby reducing the service restoration delay.
Alternatively, the DPU interface card 100 may determine whether to store the service information generated by performing the IO in the memory area 201 according to the IO size. For example, when the size of the IO read by the DPU interface card 100 from the transmission queue does not exceed the preset threshold, the DPU interface card 100 may not need to send the service information generated by processing the IO to the memory of the host 200 for storage during the process of executing the IO. In this way, even if the DPU interface card 100 resumes traffic by re-executing the IO, the cost to be paid is relatively small. When the size of the IO read out from the transmission queue by the DPU interface card 100 exceeds the preset threshold, the DPU interface card 100 may store the service information such as the key status of the IO execution and the IO execution stage into the memory of the host 200, so as to avoid the DPU interface card 100 from re-executing the IO, and reduce the recovery delay of the service. In practical application, the DPU interface card 100 may also comprehensively determine whether to send service information generated by executing the IO to the memory of the host 200 for storage in combination with aspects such as the IO size and the IO execution progress.
In practical application, in the process of executing IO, the DPU interface card 100 may also send the IO execution result obtained in the process of executing IO to the memory of the host 200 for storage. Thus, when the IO is interrupted due to restarting of the operating system of the DPU interface card 100, the DPU interface card 100 can continue to process the IO from the interrupt location according to the IO execution result stored in the memory of the host 200 and the service information, so that the service recovery delay can be further reduced.
S204: when the condition for restarting the software is satisfied, the DPU interface card 100 restarts the software.
In this embodiment, the software in the DPU may be restarted when the restart condition is met, and the restart of the software may affect the DPU interface card 100 processing or recovering the service. The software may be, for example, an operating system in the DPU interface card 100, or may be other software. For ease of understanding, the following is exemplary of software, specifically an operating system.
In practice, DPU interface card 100 may restart the operating system in some scenarios. As some examples, meeting the conditions to restart the operating system may include the following:
example one: a memory failure in DPU interface card 100 is detected.
In particular, when the DPU interface card 100 senses whether the memory in the DPU interface card 100 has a fault in real time (or periodically), for example, senses that at least one row, column or bank in the memory has a fault, and causes an uncorrectable error (uncorrected errors, UCE) or the like (or may be other faults) in data access, and reports the location information of the fault in the memory to the memory supervision unit 104. The memory supervision unit 104 may isolate the failed memory portion or replace the failed memory portion according to the location information of the failure, and trigger the soft reset unit 103 to execute the soft reset process. Soft reset unit 103 may then reset the hardware units (e.g., DPU chip 101) in DPU interface card 100 and restart the operating system of DPU interface card 100.
The soft reset unit may reset all hardware units in the DPU interface card 100, and at this time, the PCIe link between the DPU interface card 100 and the host 200 is disconnected. In yet another implementation, soft reset unit 103 may reset hardware units other than the PCIe core (core) such that the PCIe core is able to continue to connect with host 200 because it is not reset, thereby keeping the PCIe link between DPU interface card 100 and host 200 from being broken. The PCIe core is used to establish a PCIe link with the host.
After the soft reset unit 103 completes the reset of the hardware unit, the microkernel unit 105 may initialize the hardware unit and restart each service component in the system service component unit 102 to start each system service of the kernel, such as a kernel driving service, a file system service, a memory management service, and a network protocol service.
It should be noted that in the embodiment shown in the example, taking the case that the detection of the memory failure triggers the DPU interface card 100 to restart the operating system as an example, in practical application, when other failures of the DPU interface card 100 are detected (such as an application running error in the DPU interface card 100) and the operating system needs to be restarted, the DPU interface card 100 may also be triggered to restart the operating system.
Example two: the upgrade to the operating system of the previous version of the operating system of DPU interface card 100 is completed.
Specifically, the host 200 may generate an upgrade instruction for the operating system of the previous version of the DPU interface card 100 and transmit it to the DPU interface card 100, so that the DPU interface card 100 may perform a process of upgrading the operating system of the DPU interface card 100 according to the received upgrade instruction. For example, the DPU interface card 100 may read a new version of the operating system from the host 200 according to the upgrade instruction, replace the operating system of the DPU interface card 100 with the new version from the previous version, and then, after determining that the version upgrade is completed, the DPU interface card 100 may start to run the new version of the operating system.
The host 200 may periodically issue an upgrade instruction to implement periodic update of the DPU interface card 100 operating system; alternatively, the host 200 may generate a corresponding upgrade instruction according to an upgrade operation of the user with respect to the operating system of the DPU interface card 100, and issue the upgrade instruction to the DPU interface card 100, and so on.
Example three: an instruction to restart the operating system is received.
Specifically, the host 200 may generate a corresponding restart instruction according to a restart operation of the user on the operating system of the DPU interface card 100, and send the restart instruction to the DPU interface card 100, so that the DPU interface card 100 restarts and runs the operating system after receiving the restart instruction.
The implementation of triggering the DPU interface card 100 to restart the operating system is merely described as some examples, and in other embodiments, the DPU interface card 100 may also restart the operating system when other possible conditions are met, for example, when the operating system of the DPU interface card 100 has a running error during the running process, the restarting of the operating system may be automatically triggered, etc.; alternatively, when the software to be restarted is other software than the operating system, the DPU interface card 100 may implement restarting the other software in the DPU interface card 100 based on the similar manner described above, which is not limited in this embodiment.
After the DPU interface card 100 restarts the operating system, the temporary storage of the service data in the memory of the DPU interface card 100 is lost, so that the DPU interface card 100 may interrupt the processing service due to the loss of the service data. For this reason, in the present embodiment, the DPU interface card 100 resumes the processing of the interrupted service by continuing to execute the following steps.
S205: after restarting the software, the DPU interface card 100 acquires the service information stored in the memory area 201.
In one possible implementation, after restarting the operating system or other software, the DPU chip 101 in the DPU interface card 100 may obtain a first address identifier from the target storage area, where the first address identifier (e.g., a first address of the memory area 201) is used to instruct the DPU interface card 100 to apply for the memory area 201 of the host 200 in advance, so that the DPU chip 101 may access the memory area 201 of the host according to the first address identifier, and read the service information stored in the memory area 201.
S206: the DPU interface card 100 restores a service according to the acquired service information.
The obtained service information may specifically be data generated by the DPU interface card 100 when executing the unfinished IO, so that the DPU chip 101101 may obtain the unfinished IO from the host 200, and continue to execute the IO from the current execution stage according to the current execution stage of the IO stored in the memory area 201 and the key state of the IO execution, thereby implementing the restoration of the service processing. Further, the DPU chip 101 may further continue to execute the IO from the interrupt location of the current execution stage according to the IO execution result stored in the memory area 201. For an IO that the DPU interface card 100 has not executed before restarting the operating system or an IO that has just started to execute, relevant information of the IO may not be recorded in the memory area 201, and at this time, the DPU chip 101 may directly re-execute the IO.
In a further possible embodiment, the service information stored in the memory area 201 may be information of a part of the IO that is not performed, such as an execution stage of the IO and an execution result corresponding to the execution stage, etc., while for another part of the IO that has been performed by the DPU interface card 100 but is not performed, relevant information of the part of the IO may not be stored in the memory area 201. Therefore, after obtaining the unfinished IO from the transmission queue of the host 200, the DPU interface card 100 may search whether the service information stored in the memory area 201 includes the IO related information. And if information related to the IO is found, the DPU interface card 100 may continue to execute the IO from the interrupt location according to the found information; if no information related to the IO is found, DPU interface card 100 may re-execute the IO.
In this way, even if the service is interrupted due to the software restart of the DPU interface card 100 caused by the memory failure of the DPU interface card 100 or the software upgrade, the DPU interface card 100 can quickly restore the service through the service information stored in the memory of the host 200, thereby reducing the influence on the service.
Further, when resetting the hardware unit of DPU interface card 100, DPU interface card 100 may not reset the PCIe core, so that DPU interface card 100 may continuously maintain the connection of the PCIe link with host 200 through the PCIe core, thereby implementing that the PCIe link between DPU interface card 100 and host 200 is not disconnected. In this way, when repairing the failed memory of the DPU interface card 100 or upgrading the operating system of the DPU interface card 100, the host 200 may not perceive the change of the failed state and the upgraded state of the DPU interface card 100, so that the influence on the host 200 may be reduced. In addition, the service recovery process of the DPU interface card 100 has low requirements on hardware and an operating system, and can be compatible with multiple types of computing devices and operating systems, so that the universality of scheme implementation can be improved.
In the above embodiment, the DPU interface card 100 may directly trigger the soft reset unit 103 to reset the hardware unit and restart the software such as the operating system when the memory failure occurs, and in other possible embodiments, the DPU interface card 100 may implement the failure recovery by restarting a part of service components in the kernel of the operating system. In a possible implementation manner, when the DPU interface card 100 detects that there is a memory failure, it may further determine whether the failed memory meets a preset condition, and when the failed memory meets the preset condition, the DPU interface card 100 restarts a service component using the failed memory in a kernel of the operating system, and determines an IO corresponding to data stored in the failed memory, so that the service operation is resumed by re-executing the IO, or the DPU interface card 100 may continue to execute the IO according to related information of the IO stored in the memory area 201, so as to resume the service operation. In this way, the DPU interface card 100 can implement fault repair without restarting the entire operating system and reconfiguring the DPU interface card 100, thereby reducing the cost of fault repair. And when the failed memory does not meet the preset condition, the DPU interface card 100 may resume the interrupt service by the embodiment shown in fig. 2. In this way, the fault repair can be performed by adopting different processing modes according to the fault condition of the memory of the DPU interface card 100, so as to improve the flexibility of repairing the fault memory of the DPU interface card 100.
As some implementation examples, the preset condition that the failed memory meets may specifically be that the size of the failed memory does not exceed a preset size, for example, the number of failed rows (or columns) in the memory does not exceed a preset number of rows (or preset columns), and so on. At this time, the failed memory portion has relatively little influence on the DPU interface card 100, and thus, the DPU interface card 100 can implement the failover without restarting the entire operating system.
Alternatively, the preset condition satisfied by the failed memory may specifically be that the system component using the failed memory is a preset system component, so when the failed memory affects a specific system component, the DPU interface card 100 may isolate or replace the failed memory portion, and restart the system component in the portion to implement the fault repair.
Or, the number of the system components using the failed memory does not exceed the preset number. At this time, the failed memory affects only a small number of service components, and not the remaining service components, so that DPU interface card 100 may restart the partially affected service components without restarting the entire operating system (or other software) and all service components to achieve failover. In practical application, the preset condition satisfied by the failed memory may be other conditions, which is not limited in this embodiment.
In addition, in an actual application scenario, the DPU interface card 100 may complete corresponding configuration in advance, so as to implement normal communication between the DPU interface card 100 and the host 200. For example, the DPU interface card 100 and the host 200 may be preconfigured to have a unified data communication format, a communication protocol version, a command parsing rule, and the like, and a Send Queue (SQ) and a Completion Queue (CQ) in a host memory may be configured for the DPU interface card 100, where the send queue is used to store at least one IO for processing a service sent by a processor in the host 200 to the DPU interface card 100, and the receive queue is used to store an execution result for the IO fed back by the DPU interface card 100. Since the DPU interface card 100 may lose the configuration of the DPU interface card 100 after restarting the software, in a further possible implementation manner, the DPU interface card 100 may further store the configuration information for configuring the DPU interface card 100 into the memory area 201 according to the first address identifier after acquiring the first address identifier. The DPU interface card 100 may be manually configured by a technician in advance, so that the DPU interface card 100 may generate a corresponding configuration file based on a configuration operation of the technician and send the configuration file to the memory area 201. Alternatively, after the DPU interface card 100 and the host 200 are connected in communication, the host 200 may generate a configuration file, automatically configure the DPU interface card 100 using the configuration file, and write the configuration file into the memory area 201 by the host 200, which is not limited in this embodiment.
Thus, when software is restarted (e.g., the DPU interface card 100 restarts the software due to a memory failure or a software version upgrade, etc.) and the configuration of the DPU interface card 100 has failed before, the DPU interface card 100 may obtain configuration information from the memory area 201, and use the configuration information to reconfigure the DPU interface card 100, for example, reconfigure the format of communication data of the DPU interface card 100, command parsing rules, etc., so as to ensure normal communication between the DPU interface card 100 and the host 200. And, the configuration information further includes a second address identifier of the transmission queue in the host 200 and a third address identifier of the completion queue, so that after the DPU interface card 100 is configured, the DPU interface card 100 may access the transmission queue of the host 200 according to the second address identifier, and obtain, from the transmission queue, an IO that is not yet executed by the DPU interface card 100 and is issued to the transmission queue by a processor in the DPU interface card 100. Accordingly, the result obtained by executing the IO by the DPU interface card 100 may be sent to the completion queue of the host 200 according to the third address identifier, so as to implement the service that is interrupted in the resume process.
The service restoration method provided in the embodiment of the present application is described above with reference to fig. 1 and fig. 2, and the functions of the data processing unit DPU device and the computing device implementing the data processing unit provided in the embodiment of the present application are described below with reference to the accompanying drawings.
Referring to fig. 3, a schematic diagram of a data processing unit DPU device is shown. Wherein the DPU device 300 shown in fig. 3 is coupled to a host (not shown in fig. 3), the DPU device 300 comprises:
an obtaining module 301, configured to obtain, after restarting software of the DPU device 300, service information stored in a memory of the host, where the service information is information generated by processing, by the DPU device 300, input/output IOs sent by a processor of the host before restarting the software;
and the recovery module 302 is configured to recover the service according to the service information.
Alternatively, the restarted software may be, for example, an operating system, a component in the kernel of the operating system, or may be other software in addition to the operating system.
In one possible implementation, the DPU device 300 is coupled to the host based on a peripheral component interconnect express (PCI express) bus, and the PCIe link between the DPU device 300 and the host is not broken during the software restart.
In one possible implementation, a software restart of the DPU device 300 is triggered by a failure of the DPU device 300 or by a software upgrade to a previous version of the software of the DPU device 300.
In a possible implementation manner, the DPU device 300 further includes the starting module 303, configured to restart the software when the memory of the DPU device 300 fails.
In a possible implementation manner, the starting module 303 is configured to restart the operating system of the DPU300 when the memory of the DPU device 300 fails and the failed memory does not meet a preset condition.
In a possible implementation manner, the starting module 303 is further configured to restart a service component in a kernel of an operating system of the DPU device 300 that uses the failed memory when the memory of the DPU device 300 fails and the failed memory meets the preset condition.
In a possible implementation manner, the acquiring module 301 is configured to:
acquiring a first address identifier, wherein the first address identifier is used for identifying a memory area in the host, and the memory area stores the service information;
and accessing the memory area according to the first address identifier to obtain the service information.
In one possible implementation, the DPU device 300 further includes:
an application module 304, configured to apply for the memory area to the host before restarting the software, and obtain a first address identifier of the memory area;
And a storage module 305, configured to store the service information to the memory area according to the first address identifier.
In a possible implementation manner, DPU device 300 may further obtain configuration information from a memory area, where the configuration information is used to configure DPU device 300, where the configuration information includes a second address identifier of a send queue in a memory of the host and a third address identifier of a completion queue, where the send queue is used to store the IO, and the completion queue is used to store an execution result of the DPU device 300 for the IO.
Since the DPU device 300 shown in fig. 3 can implement the method shown in fig. 2, the specific implementation of the DPU device 300 shown in fig. 3 and the technical effects thereof can be described with reference to the relevant points in the foregoing embodiments, which are not described herein.
The DPU device 300 shown in fig. 3 may be implemented by an application specific integrated circuit, a general purpose CPU and an application specific integrated circuit, or software, or a combination of software and hardware, which is not limited in this embodiment of the present invention.
Fig. 4 provides a computing device. Therein, as shown in fig. 4, computing device 400 includes a host 401 and a DPU interface card 402, host 401 and DPU interface card 402 being coupled by a bus 403. Bus 403 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
The host 401 includes a memory 4011 and a processor 4012, and the memory 4011 and the processor 4012 may be coupled through a bus 4013.
The bus 4013 may be a PCI bus or an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
The processor 4012 may be any one or more of a central processing unit (central processing unit, CPU), a graphics processing unit (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 4011 may be implemented by a memory, which may include volatile memory (RAM), such as random access memory (random access memory). And, the memory may further include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a mechanical hard disk (HDD), or a solid state disk (solid state drive, SSD).
DPU interface card 402 may be used to implement the method performed by DPU interface card 100 in the embodiment shown in FIG. 2, described above.
Computing device 400 may be a server, a storage array, or a distributed storage system.
In addition, the embodiment of the application also provides a DPU interface card. Referring to fig. 5, fig. 5 shows a schematic structural diagram of a DPU interface card. As shown in fig. 5, the DPU interface card 500 includes a printed circuit board 501, an interface 502, and a DPU chip 503, and the DPU interface card 500 communicates with a host through the interface 502, the interface 502 and the DPU chip 503 being mounted on the printed circuit board 501; the interface 502 and the DPU chip 503 may communicate via wires on a printed circuit board, or via a cable, or bus, or the interface 502 and the DPU chip 503 may be integrated. One implementation, for example, where interface 502 and DPU chip 503 are integrated together, is packaged in one chip. The DPU interface card 500 is used to implement the service restoration method performed by the DPU interface card 100 in the embodiment shown in fig. 2. Accordingly, the specific implementation of the printed circuit board 501, the interface 502 and the DPU chip 503 can be referred to the printed circuit board 1011, the interface 1012 and the DPU chip 101 in the foregoing embodiments, which are not described herein.
The embodiment of the application also provides a DPU chip. Referring to fig. 6, fig. 6 shows a schematic structural diagram of a DPU chip. As shown in fig. 6, DPU chip 600 is applied to a DPU interface card (not shown in fig. 6) coupled to a host, such as DPU interface card 100 in the foregoing embodiment, and the like; the DPU chip 600 includes an acquiring circuit 601 and a processing circuit 602, where the acquiring circuit 601 is configured to implement a function of acquiring data by the DPU chip 600, for example, acquire service information stored in a memory of a host, where the service information is information generated by input/output IO sent by a processor of the host and processed by the processing circuit 602 before software on the DPU interface card is restarted; the processing circuit 602 is configured to implement a data processing function of the DPU chip 600, such as recovering a service according to the service information acquired by the acquisition circuit 601. In particular implementations, DPU chip 600 may be an application specific integrated circuit ASIC.
The acquiring circuit 601 and the processing circuit 602 cooperate with each other to implement the service restoration method performed by the DPU chip 101 in the DPU interface card 100 in the embodiment shown in fig. 2.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform the above-described business recovery method.
Embodiments of the present application also provide a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computing device, the processes or functions described in accordance with the embodiments of the present application are produced in whole or in part.
The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, or data center to another website, computer, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
The computer program product may be a software installation package that can be downloaded and executed on a computing device in the event that any of the aforementioned methods of service restoration are desired.
The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.

Claims (19)

1. A method for service restoration, the method comprising:
after software of a DPU interface card of a data processing unit is restarted, the DPU interface card acquires service information stored in a memory of a host, wherein the service information is information generated by input/output IO sent by a processor of the host and processed by the DPU interface card before the software is restarted;
and the DPU interface card recovers the service according to the service information.
2. The method of claim 1, wherein the DPU interface card is coupled to the host based on a peripheral component interconnect express PCIe bus, and wherein a PCIe link between the DPU interface card and the host is not broken during the software restart.
3. The method according to claim 1 or 2, characterized in that a software restart of the DPU interface card is triggered by a failure of the DPU interface card or by a software upgrade of a previous version of the software of the DPU interface card.
4. The method of claim 3, wherein the DPU interface card restarting the software comprises:
and restarting the software by the DPU interface card when the memory of the DPU interface card fails.
5. The method of claim 4, wherein the DPU interface card restarting the software when the memory of the DPU interface card fails, comprising:
and restarting the operating system of the DPU interface card by the DPU interface card when the memory of the DPU interface card fails and the failed memory does not meet the preset condition.
6. The method according to claim 4, wherein the method further comprises:
when the memory of the DPU interface card fails and the failed memory meets the preset condition, the DPU interface card restarts a service component using the failed memory in the kernel of the operating system of the DPU interface card.
7. The method according to any one of claims 1 to 6, wherein the DPU interface card obtains service information stored in a memory of the host, including:
the DPU interface card acquires a first address identifier, wherein the first address identifier is used for identifying a memory area in the host, and the memory area stores the service information;
and the DPU interface card accesses the memory area according to the first address identifier to obtain the service information.
8. The method of claim 7, wherein prior to the software restart, the method further comprises:
the DPU interface card applies the memory area to the host to acquire a first address identification of the memory area;
and the DPU interface card stores the service information into the memory area according to the first address identifier.
9. A data processing unit, DPU, apparatus, the DPU apparatus comprising:
the acquisition module is used for acquiring service information stored in the memory of the host after the software of the DPU device is restarted;
the recovery module is used for recovering the service according to the service information; the service information is information generated by the input/output IO sent by the processor of the host computer and processed by the DPU device before the software is restarted.
10. The DPU device of claim 10, wherein the DPU device is coupled to the host based on a peripheral component interconnect express PCIe bus, and wherein a PCIe link between the DPU device and the host is not broken during the software restart.
11. A DPU device according to claim 9 or 10, characterized in that a software restart of the DPU device is triggered by a malfunction of the DPU device or by a software upgrade of a previous version of the software of the DPU device.
12. The DPU device of claim 11, further comprising a startup module to restart the software when a memory of the DPU device fails.
13. The DPU device of claim 12, wherein the boot module is configured to restart an operating system of the DPU device when a memory of the DPU device fails and the failed memory does not satisfy a preset condition.
14. The DPU device of claim 12, wherein the startup module is further configured to restart a service component in a kernel of an operating system of the DPU device that uses the failed memory when the memory of the DPU device fails and the failed memory meets a preset condition.
15. The DPU device of any one of claims 9 to 14, wherein the acquisition module is configured to:
acquiring a first address identifier, wherein the first address identifier is used for identifying a memory area in the host, and the memory area stores the service information;
and accessing the memory area according to the first address identifier to obtain the service information.
16. The DPU apparatus of claim 15, further comprising:
the application module is used for applying the memory area to the host before restarting the software and acquiring a first address identifier of the memory area;
and the storage module is used for storing the service information to the memory area according to the first address identification.
17. A computing device comprising a host and a data processing unit, DPU, interface card, the host comprising a memory and a processor;
the DPU interface card is used for acquiring service information stored in the memory of the host after software of the DPU interface card is restarted, and recovering service according to the service information; the service information is information generated by input/output IO sent by the processor of the host computer and processed by the DPU interface card before the software is restarted.
18. The DPU interface card is characterized by comprising a printed circuit board, an interface and a DPU chip, wherein the DPU interface card is communicated with a host through the interface, the interface and the DPU chip are arranged on the printed circuit board, and the DPU chip is used for acquiring service information stored in a memory of the host after software of the DPU interface card is restarted and recovering service according to the service information; the service information is information generated by input/output IO sent by the processor of the host computer and processed by the DPU chip before the software is restarted.
19. A data processing unit DPU chip, wherein the DPU chip is applied to a DPU interface card, the DPU interface card being coupled to a host; the DPU chip comprises an acquisition circuit and a processing circuit;
the acquisition circuit is used for acquiring service information stored in the memory of the host after the software of the DPU interface card is restarted;
the processing circuit is used for recovering the service according to the service information; the service information is information generated by input/output IO sent by the processor of the host computer and processed by the processing circuit before the software is restarted.
CN202210269274.5A 2021-12-16 2022-03-18 Service recovery method, data processing unit and related equipment Pending CN116266150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/139182 WO2023109880A1 (en) 2021-12-16 2022-12-15 Service recovery method, data processing unit and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021115408615 2021-12-16
CN202111540861 2021-12-16

Publications (1)

Publication Number Publication Date
CN116266150A true CN116266150A (en) 2023-06-20

Family

ID=86744086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210269274.5A Pending CN116266150A (en) 2021-12-16 2022-03-18 Service recovery method, data processing unit and related equipment

Country Status (2)

Country Link
CN (1) CN116266150A (en)
WO (1) WO2023109880A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795605A (en) * 2023-08-23 2023-09-22 珠海星云智联科技有限公司 Automatic recovery system and method for abnormality of peripheral device interconnection extension equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238093B (en) * 2011-08-16 2015-01-28 杭州华三通信技术有限公司 Service interruption prevention method and device
US10305970B2 (en) * 2016-12-13 2019-05-28 International Business Machines Corporation Self-recoverable multitenant distributed clustered systems
CN111078465A (en) * 2019-11-08 2020-04-28 苏州浪潮智能科技有限公司 Data recovery method and device and computer readable storage medium
CN113722147A (en) * 2020-05-26 2021-11-30 华为技术有限公司 Method for keeping service connection and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795605A (en) * 2023-08-23 2023-09-22 珠海星云智联科技有限公司 Automatic recovery system and method for abnormality of peripheral device interconnection extension equipment
CN116795605B (en) * 2023-08-23 2023-12-12 珠海星云智联科技有限公司 Automatic recovery system and method for abnormality of peripheral device interconnection extension equipment

Also Published As

Publication number Publication date
WO2023109880A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
US9665521B2 (en) System and method for providing a processing node with input/output functionality by an I/O complex switch
US10585755B2 (en) Electronic apparatus and method for restarting a central processing unit (CPU) in response to detecting an abnormality
US8782469B2 (en) Request processing system provided with multi-core processor
CN109032822B (en) Method and device for storing crash information
US9858067B2 (en) Electronic system with update control mechanism and method of operation thereof
CN104834575A (en) Firmware recovery method and device
US20110197193A1 (en) Device and method for controlling communication between bios and bmc
US11194589B2 (en) Information handling system adaptive component reset
US10353786B2 (en) Virtualization substrate management device, virtualization substrate management system, virtualization substrate management method, and recording medium for recording virtualization substrate management program
RU2653254C1 (en) Method, node and system for managing data for database cluster
CN101482823A (en) Single board application version implementing method and system
CN111124728A (en) Automatic service recovery method, system, readable storage medium and server
CN111090546B (en) Method, device and equipment for restarting operating system and readable storage medium
US20210240831A1 (en) Systems and methods for integrity verification of secondary firmware while minimizing boot time
WO2023109880A1 (en) Service recovery method, data processing unit and related device
JP6599725B2 (en) Information processing apparatus, log management method, and computer program
US8032791B2 (en) Diagnosis of and response to failure at reset in a data processing system
CN104572198A (en) Service restoration method and device
JP6135403B2 (en) Information processing system and information processing system failure processing method
CN116450046A (en) Cloud disk implementation method and device, intelligent network card, server and storage medium
US20130318310A1 (en) Processor processing method and processor system
US20140059335A1 (en) Information processing apparatus and activation method
CN111782515A (en) Web application state detection method and device, server and storage medium
TWI554876B (en) Method for processing node replacement and server system using the same
CN116483612B (en) Memory fault processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication