CN115904621B - Method and device for maintaining host of super fusion system - Google Patents

Method and device for maintaining host of super fusion system Download PDF

Info

Publication number
CN115904621B
CN115904621B CN202211439186.1A CN202211439186A CN115904621B CN 115904621 B CN115904621 B CN 115904621B CN 202211439186 A CN202211439186 A CN 202211439186A CN 115904621 B CN115904621 B CN 115904621B
Authority
CN
China
Prior art keywords
virtual machine
host
maintenance
target host
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211439186.1A
Other languages
Chinese (zh)
Other versions
CN115904621A (en
Inventor
周依然
徐文豪
张凯
王弘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiling Haina Technology Co ltd
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Publication of CN115904621A publication Critical patent/CN115904621A/en
Application granted granted Critical
Publication of CN115904621B publication Critical patent/CN115904621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for maintaining a host of a super fusion system, which comprises the following steps: pre-checking items including a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host; under the condition that the pre-checking result passes and a maintenance entering instruction of a user is received, the pre-checking is performed again; after the secondary pre-checking is passed, the target host is set to be non-schedulable, the target host is set to be in a storage maintenance mode, and the virtual machine on the target host is migrated to carry out host maintenance under the condition that the preset condition is met; after the host maintenance is completed and the maintenance mode is checked, the original virtual machine on the target host is migrated back to the target host. The method and the device can reduce the operation interaction flow, reduce the data recovery amount generated in the maintenance process of the target host, and automatically migrate the virtual machine on the target host.

Description

Method and device for maintaining host of super fusion system
Technical Field
The invention belongs to the technical field of network bandwidth management, and particularly relates to a method, a device, equipment and a storage medium for maintaining a host of a super fusion system.
Background
The super-fusion infrastructure is a unified system defined by software, is a technical architecture which integrates resources such as calculation, network and storage as infrastructure, can be selected, combined and defined according to specific service system requirements, and is convenient and rapid to build a data center and deploy a service system. In the super fusion architecture, each node is a computing node, a network node and a storage node, and when expected maintenance scenes need to be carried out, such as: when firmware is upgraded, kernel is upgraded or hardware is replaced, the host needs to be subjected to offline shutdown operation. Because the system architecture is a super-fusion system, the node offline is accompanied with the phenomenon of cluster computing resource reduction and storage resource reduction. From the perspective of computing resources, a user needs to ensure that the rest computing resources of the cluster meet the requirement that the virtual machine running on the target node continues to run, otherwise, the service continuity of the user is influenced; from the perspective of storage resources, a user needs to ensure that the remaining storage resources of the cluster meet the requirements of storage resources contained on a target node for data recovery, otherwise, the cluster will have a situation that partial data copies are not in line with expectations, and the system stability will be affected.
In the existing scene, a user maintains a host computer and needs to perform the following steps:
1. judging whether the target host meets offline requirements or not through a control console, and checking the operation condition of the key component;
2. migrating the virtual machine running on the target host to other hosts in the cluster in a thermomigration mode, so as to ensure that user service is not affected;
3. taking the target host offline to perform expected internal maintenance actions, such as firmware upgrade, fault hardware replacement and the like;
4. after maintenance is completed, migrating back to the original virtual machine of the target host, and ensuring uniform distribution of computing resources of different nodes in the cluster;
5. checking the running condition of the host computer to ensure that the host computer recovers the health state.
By adopting the scheme, the method has the following defects:
1. a large number of checking actions are required to be performed to ensure that the target node is offline and cannot influence the cluster;
2. the running condition of the virtual machine needs to be recorded and manually migrated to ensure that the computing resources are uniform after the maintenance action is finished;
3. a significant amount of data recovery occurs during maintenance, taking a long time to wait for the data to recover to the desired copy.
Disclosure of Invention
In order to solve the above problems, the present invention aims to provide a method and an apparatus for maintaining a host of a super fusion system, which can reduce the operation interaction flow, reduce the data recovery amount generated in the maintenance process of a target host, and automatically migrate a virtual machine on the target host.
In order to achieve the above purpose, the technical scheme of the invention is as follows: a method for maintaining a host of a super fusion system comprises the following steps: pre-checking items including a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host; under the condition that the pre-checking result passes and a maintenance entering instruction of a user is received, the pre-checking is performed again; after the secondary pre-checking is passed, the target host is set to be non-schedulable, the target host is set to be in a storage maintenance mode, and the virtual machine on the target host is migrated to carry out host maintenance under the condition that the preset condition is met; after the host maintenance is completed and the maintenance mode is checked, the original virtual machine on the target host is migrated back to the target host.
Through the device, a user can directly perform operations such as host state inspection, virtual machine migration, host state record updating and the like through the control console, and the preparation work required by offline maintenance of the host is automatically completed, so that interaction behaviors are reduced. And the storage maintenance mode is introduced to control the data recovery amount generated by the offline maintenance of the host in the super fusion scene, so that the time consumption of the offline maintenance of the host is reduced. Marking the target host as non-schedulable when preparing to enter maintenance mode prevents boundary scenarios in the entering process.
In one embodiment of the present invention, the pre-checking the target host for the check item including the cluster operation and maintenance component, the calculation component, and the storage component further includes: checking whether a host exists in a cluster of the super fusion system or not in a maintenance mode, a maintenance mode and a maintenance mode to be exited by inquiring a database record, wherein a task center designates a universal unique identification code to ensure that only one pre-check task can be operated at the same time and only one host exists in the cluster and operates in the maintenance mode; the method comprises the steps of performing health status checking on a cluster and a target host, wherein the health status checking at least comprises a computing component, a storage component and an operation and maintenance component, and when a platform is an Elf/SMTXZBS, a detection item also comprises virtual machine detection; and waiting for receiving an entering maintenance instruction of the user under the condition that the health status check passes.
In one embodiment of the present invention, the migrating the virtual machine on the target host further includes: performing a pre-dispatch check of virtual machine migration, wherein the pre-dispatch check comprises: the running state of the virtual machine, whether the virtual machine contains a through device or not, and whether the state of the virtual machine changes after the virtual machine is migrated or not; performing hot migration aiming at the virtual machine in operation; under the preset condition, shutting down and cold migration are carried out on the running virtual machine; and cold migration is carried out on the virtual machine in the shutdown state.
In one embodiment of the present invention, the migrating the virtual machine on the target host further includes: in the storage maintenance mode, after the node where the target host is located is offline, the cluster is not automatically triggered to restore the data after detection, so that the data restoration amount generated by a user during the subsequent maintenance period is reduced.
In one embodiment of the present invention, the migrating the virtual machine on the target host further includes: in the process of migrating the virtual machines to the target host, only one virtual machine is migrated at a time, and under the condition that the current virtual machine fails to migrate, the complete entering flow fails, and the subsequent virtual machines are not migrated.
In one embodiment of the present invention, after the maintenance of the host computer is completed, the method further includes: starting a target host, wherein the service on the target host is started automatically; after the target host is started, and an exit maintenance mode checking instruction issued by a user is received, checking an operation and maintenance component, a calculation component and a storage component of the host; and under the condition that the operation and maintenance component, the calculation component and the storage component meet preset conditions, setting the node where the target host is located as a non-storage maintenance mode, and setting the host as a schedulable state.
In one embodiment of the present invention, the migrating the original virtual machine on the target host back to the target host further includes: performing a pre-dispatch check of virtual machine migration back, wherein the pre-dispatch check comprises: the running state of the virtual machine, whether the virtual machine contains a through device or not, and whether the state of the virtual machine changes after the virtual machine is migrated or not; performing hot migration aiming at the virtual machine in operation; and cold migration is carried out on the virtual machine in the shutdown state.
Based on the same conception, the invention also provides a super fusion system host maintenance device, which comprises: a pre-inspection module for pre-inspecting inspection items including a cluster operation and maintenance component, a calculation component and a storage component for a target host; the waiting execution module is used for conducting the pre-inspection again under the condition that the pre-inspection result passes and the entering maintenance instruction of the user is received; the execution module is used for setting the target host to be non-schedulable after the secondary pre-checking is passed, setting the target host to be in a storage maintenance mode, and migrating the virtual machine on the target host to carry out host maintenance under the condition that the preset condition is met; and the rebuilding module is used for transferring the original virtual machine on the target host to the target host after the host is maintained and checked to pass the exit maintenance mode.
Based on the same conception, the present invention also provides a computer device comprising: a memory for storing a processing program; and the processor executes the host maintenance method of the super fusion system.
Based on the same conception, the invention also provides a readable storage medium, wherein the readable storage medium stores a processing program, and the processing program realizes the super fusion system host maintenance method when being executed by a processor.
After the technical scheme is adopted, compared with the prior art, the invention has the following advantages:
1. the maintenance mode is introduced to reduce mental burden of maintaining the target host by a user, reduce operation interaction flow, reduce data recovery amount generated in the maintenance process of the target host and automatically migrate the virtual machine on the target host.
2. The states of 'in-maintenance' and 'maintenance' are introduced for the host states in the cluster to ensure that the execution of the scheduling actions of the computing resources can be correctly performed when the host in the cluster is in the maintenance mode.
3. In the invention, when the node is offline, the cluster can automatically trigger the data recovery after detection, and when the host is offline and is an operation in expectation, the maintenance time consumption can be greatly reduced by avoiding the data recovery. Adding storage maintenance mode support may enable cold data on a host in storage maintenance mode to not create data recovery. If the system finds that one copy in a certain data block is in a storage maintenance mode, the copy is marked as to-be-recovered, but a recovery command is not truly triggered, so that the aim of reducing the data recovery amount is fulfilled.
Drawings
The invention is described in further detail below with reference to the attached drawing figures, wherein:
FIG. 1 is a schematic diagram of a host state machine of a host maintenance method of a super fusion system according to the present invention;
FIG. 2 is a schematic diagram of a pre-inspection flow of a host maintenance method of the super fusion system of the present invention;
FIG. 3 is a schematic diagram of a host maintenance method for a super fusion system according to the present invention;
FIG. 4 is a schematic diagram of a method for maintaining a host computer in a super fusion system according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the specific examples. Advantages and features of the invention will become more apparent from the following description and from the claims. It is noted that the drawings are in a very simplified form and utilize non-precise ratios, and are intended to facilitate a convenient, clear, description of the embodiments of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Example 1
In the current scenario, a maintenance mode is introduced to relieve the mental burden of a user to maintain a target host: the method reduces the operation interaction flow, reduces the data recovery amount generated in the maintenance process of the target host, and automatically migrates the virtual machine on the target host. The state of 'maintenance mode' and 'entering maintenance mode' is introduced for the state of the host in the cluster to ensure that the execution of the dispatching action of the computing resource can be correctly carried out when the host in the cluster enters the maintenance mode. Please refer to fig. 1, which illustrates a method for maintaining a host state machine for a super fusion system host.
The maintenance flow of the host computer is divided into the following stages: pre-inspection, ready to enter maintenance, ready to exit maintenance inspection, exit maintenance.
Referring to fig. 2, the purpose of the pre-check is to check whether the target host can currently meet the enter maintenance mode condition. The cluster operation and maintenance components, the computing components and the storage components are respectively checked. The checking items are independently executed in parallel and are not perceived mutually, the checking items are divided into a necessary checking item and a secondary checking item, wherein if the necessary checking item does not pass, a user cannot enter the target host into a maintenance mode, and if the secondary checking item does not pass, the user is prompted to pass the reason, and the user can select to enter the target host into the maintenance mode.
Referring to fig. 3, after the pre-inspection initiated by the user passes, the user may choose to enter the target host into a maintenance mode, and during the entering process, the following actions are automatically performed:
1. setting the host state to "ready to enter maintenance";
2. performing secondary pre-inspection to prevent a user from initiating an entering action in time after the pre-inspection is finished, so that the cluster state is inconsistent with the pre-inspection state, and subsequent action execution fails;
3. setting the host to be non-schedulable, preventing a user from creating and running a virtual machine on the target host during entry;
4. migrating a target host virtual machine, and migrating the target host to other nodes of the cluster;
5. setting the target node to a storage maintenance mode to reduce the data recovery amount generated by a user during subsequent maintenance;
6. the host state is set to "maintenance".
After the above actions are completed, referring to fig. 4, the user can observe that the target host is in the "maintenance mode" state at the console, and at this time, the user can perform subsequent offline maintenance actions.
After the off-line maintenance is finished, the user starts the target host, the service on the target host can start up automatically, after waiting for the host to start up, the user can initiate exit maintenance check through the control console, and if the check is passed, the user can exit the maintenance of the target host. The complete flow is as follows
1. Initiating a ready-to-exit maintenance check, checking the running state of the target host, and waiting for receiving an exit request initiated by a user after the check is passed;
2. setting the target node into a non-storage maintenance mode;
3. setting the target host to a schedulable state to allow subsequent auto-migration virtual machine actions to be schedulable to the target host;
4. migrating the virtual machine, namely migrating the original virtual machine of the target host to the target node;
5. and (5) exiting maintenance.
Example two
The embodiment provides an implementation mode of host maintenance method and device pre-checking based on the above super fusion system.
The pre-checking in this embodiment ensures that all check items are decoupled independently, and does not need to perceive whether the check items are called by the maintenance mode device, each check item is implemented as an independent Task type function of Job, and when the pre-checking is triggered, the Task center Leader generates a Task set of the pre-checking Job according to the related information of the current platform and the target node, and submits the Task set to the unified. And after all the inspection items are executed, returning inspection results uniformly. The current Task center does not support saving the results of the specific Task execution, and the project improvement Task center supports the feature, and the pre-examination results are saved in the specific Job Task information.
During the entry of the target host into maintenance mode, a secondary check is performed. After each inspection in the pre-inspection is completed, the respective inspection result is required to be recorded in the database, if the pre-inspection is changed into a single Task mode, compatibility adjustment is carried out for specific inspection realization, and unnecessary conditions are introduced, so that the secondary inspection is carried out in a unified API scheduling mode in the maintenance mode.
Specifically, whether the cluster has a maintenance mode related state host:
by querying the database record, it is checked whether the cluster has a state that the host is in maintenance mode. All maintenance mode tasks are submitted and executed through a task center, and the task center designates a unique UUID to ensure that only one front-end inspection task can be operated at the same time, and also ensure that only one host can exist in the cluster to operate in the maintenance mode.
Cluster and target host health status checking:
the check that the host enters maintenance mode should be based on: judging whether the host computer can influence user service (calculation and storage) when entering a maintenance mode and performing offline maintenance, classifying the detection items according to the basis, and dividing the detection items into calculation component detection, storage component detection and operation and maintenance component detection, wherein when a platform is an Elf/SMTXZBS, the detection items comprise virtual machine detection, and when the platform is a Vmware, the detection items remove the virtual component detection:
1. the operation and maintenance component detection comprises the following steps:
if the cluster exists, whether the node is in the maintenance mode state or enters the maintenance mode state, and if the node exists, the check is not passed.
2. The storage component detection includes:
whether single copy data exists or not, and if the single copy data exists on the target node, checking that the single copy data does not pass; (ZBS-meta pextent list can only return all copy cases at present, self-filtering is required; ZBS is required to provide support for acquiring the assignment chunk pextent list according to the chunk id);
zookeeper, if there is abnormal node except the target node in 3 nodes of the cluster, check not to pass; if there are more than 1 abnormal nodes in the cluster 5 nodes except the target node, the checking is failed;
zbs-meta, the cluster having at least one surviving meta node in addition to the target node, if the condition is not met, checking not to pass;
if the cluster has the nodes in the storage maintenance mode, checking that the cluster does not pass if other nodes are in the storage maintenance mode;
whether a cluster has data recovery (if so, it cannot enter a storage maintenance mode);
detecting the capacity, namely, the current residual capacity of the cluster, namely, the residual capacity of the node, namely, the used capacity of the node, only displaying the current residual capacity of the cluster, and not taking the current residual capacity of the cluster as a basis of whether a maintenance mode can be entered or not;
3. the computing component detection includes:
mongo, if cluster 3 node has an abnormal node except the target node, then check is failed; if there are more than 1 abnormal nodes in the cluster 5 nodes except the target node, the checking is failed;
if the target node storage network can be communicated, checking that the communication is not passed;
if the job-center-worker is running, checking that the service of the target node is not passed;
libvirtd; whether the target node service is running or not, if not, checking that the target node service is not passing;
whether the virtual machine on the target host can be thermally migrated;
whether the virtual machine on the target host can be cold migrated.
In the super fusion system, from the perspective of a storage component, each independent node is a storage node, the storage system adopts a copy mechanism to ensure the availability of data, after the nodes are offline, the cluster is automatically triggered to recover the data after detection, and when the host is offline and is an operation in anticipation, the maintenance time can be greatly reduced by avoiding the recovery of the data. Adding storage maintenance mode support may enable cold data on a host in storage maintenance mode to not create data recovery. If the system finds that one copy in a certain data block is in a storage maintenance mode, the copy is marked as to-be-recovered, but a recovery command is not truly triggered, so that the aim of reducing the data recovery amount is fulfilled.
From the perspective of a computing component, a user service virtual machine may run on any host in the cluster, and when a certain host is maintained, the virtual machine needs to be migrated first, so that the user service is not affected. When a user enters a certain host into a maintenance mode, the host is firstly set to be non-schedulable at the computing component level, so that the unexpected virtual machine is not created/migrated to the target host in the entering process. Before the virtual machine is migrated, pre-scheduling inspection of the virtual machine is performed, if the pre-scheduling execution is completed, the subsequent migration action is performed, if the pre-scheduling execution fails, the virtual machine cannot be migrated, and the maintenance mode cannot be entered. In the process that the target host enters the maintenance mode, in order to ensure that the migration influence range of the virtual machines is as small as possible, serial migration is selected, only one virtual machine is migrated at a time, after the current virtual machine is successfully migrated, a subsequent virtual machine is migrated, if the current virtual machine fails to migrate, the complete entering process fails, and the subsequent virtual machine is not migrated any more.
The target host enters the maintenance mode and is executed as an asynchronous task, the task does not contain subtasks, and the complete control flow is logically controlled by the device, so that external dependence is not required. On the task realization, all the step idempotency in the entering process is ensured to ensure reentrant under abnormal scene. Whether the asynchronous task is successfully executed or failed, the state of the target host is finally set to be the expected state, and the state consistency is ensured.
After the maintenance of the target host is finished and the target host is on line again, the target host needs to be withdrawn from the maintenance. Before actually exiting, a ready-to-exit maintenance check is performed to check whether some critical services on the target host meet the exit condition, and if not, the exit from the maintenance mode is prohibited. And if the condition is met, performing subsequent exiting actions. In the exiting process, the storage component exits the target node from the storage maintenance mode to bring the target node back online; the computing component marks the target host as a schedulable state, and migrates the automatic migration virtual machine triggered in the process that the target host enters the maintenance mode back to the target host, so that cluster computing resource balance is ensured. As the target host is ready to enter the maintenance mode, the ready-to-exit maintenance check itself is executed as an asynchronous task, the task does not contain subtasks, whether the exit is successful or failed, the target host state is finally set to be the expected state, and the consistency of the states is ensured.
Example III
The embodiment provides a specific implementation mode for virtual machine migration based on the above super fusion system host maintenance method and device.
During the process of entering the maintenance mode by the host, virtual machine migration and migration back actions are involved. The state and configuration of the virtual machine can influence whether the final virtual machine is migrated successfully or not. Configuration items related to virtual machine cannot migrate include:
1. whether the virtual machine is in an operating state;
2. whether the virtual machine contains a pass-through device;
3. whether the state changes after the virtual machine is migrated.
Specific divisions for the above various combined scenarios are as follows:
first scenario:
the second scenario:
third scenario:
fourth scenario:
preferably, in entering the target host into the maintenance mode, interactions of steps and components are designed, and if there are more computing resources on the target host, the overall time may be on the order of minutes, thus requiring the ability to support user cancellation of the entry. Before the asynchronous task starts entering/exiting the maintenance mode, it is recorded whether the current step supports a cancel action and whether the current task is marked as cancel. Each step is checked for current task flag bits before execution, and update kill flag bit capability is provided via the API. If the marking bit is True, canceling execution of the subsequent steps, and marking the whole task as canceled; if the flag bit is False, it indicates that the user has not initiated a cancel action, and the subsequent steps are continued until the next step is checked before execution.
Example IV
Based on the same conception, the invention also provides a super fusion system host maintenance device, which comprises: a pre-inspection module for pre-inspecting inspection items including a cluster operation and maintenance component, a calculation component and a storage component for a target host; the waiting execution module is used for conducting the pre-inspection again under the condition that the pre-inspection result passes and the entering maintenance instruction of the user is received; the execution module is used for setting the target host to be non-schedulable after the secondary pre-checking is passed, setting the target host to be in a storage maintenance mode, and migrating the virtual machine on the target host to carry out host maintenance under the condition that the preset condition is met; and the rebuilding module is used for transferring the original virtual machine on the target host to the target host after the host is maintained and checked to pass the exit maintenance mode.
Example five
Based on the same conception, the present invention also provides a computer device, which may vary considerably in configuration or performance, and may include one or more processors (central processing units, CPU) (e.g., one or more processors) and memory, one or more storage media (e.g., one or more mass storage devices) storing application programs or data. The memory and storage medium may be transitory or persistent. The program stored on the storage medium may include one or more modules (not shown), each of which may include a series of instruction operations in the computer device. Still further, the processor may be arranged to communicate with a storage medium and to execute a series of instruction operations in the storage medium on a computer device.
The computer device may also include one or more power supplies, one or more wired or wireless network interfaces, one or more input/output interfaces, and/or one or more operating systems, such as Windows Serve, mac OS X, unix, linux, freeBSD, etc.
It will be appreciated by those skilled in the art that the computer device structure of the present embodiment is not limiting of the computer device and may include more or fewer components than shown or may be combined with certain components or a different arrangement of components.
The computer readable instructions, when executed by the processor, cause the processor to perform the steps of the embodiments described above when the computer readable instructions are executed.
In one embodiment, a readable storage medium is provided, where the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the above-mentioned method for maintaining a super fusion system host, and specific steps are not described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for maintaining a host of a super fusion system, comprising:
pre-checking items including a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host;
under the condition that the pre-checking result passes and a maintenance entering instruction of a user is received, the pre-checking is performed again;
after the secondary pre-checking is passed, the target host is set to be non-schedulable, the target host is set to be in a storage maintenance mode, and the virtual machine on the target host is migrated to carry out host maintenance under the condition that the preset condition is met; the migrating the virtual machine on the target host further includes: performing a pre-dispatch check of virtual machine migration, wherein the pre-dispatch check comprises: the running state of the virtual machine, whether the virtual machine contains a through device or not, and whether the state of the virtual machine changes after the virtual machine is migrated or not; performing hot migration aiming at the virtual machine in operation; under the preset condition, shutting down and cold migration are carried out on the running virtual machine; cold migration is carried out on the virtual machine in the shutdown state; in the storage maintenance mode, after the node where the target host is located is offline, the cluster detection does not automatically trigger data recovery so as to reduce the data recovery amount generated by a user during the subsequent maintenance period;
after the host maintenance is completed and the maintenance mode is checked, the original virtual machine on the target host is migrated back to the target host.
2. The hyperfusion system host maintenance method of claim 1, wherein pre-checking for a target host for check items comprising a cluster operation and maintenance component, a computation component, and a storage component further comprises:
checking whether a host exists in a cluster of the super fusion system or not in a maintenance mode, a maintenance mode and a maintenance mode to be exited by inquiring a database record, wherein a task center designates a universal unique identification code to ensure that only one pre-check task can be operated at the same time and only one host exists in the cluster and operates in the maintenance mode;
the method comprises the steps of performing health status checking on a cluster and a target host, wherein the health status checking at least comprises a computing component, a storage component and an operation and maintenance component, and when a platform is an Elf/SMTXZBS, a detection item also comprises virtual machine detection;
and waiting for receiving an entering maintenance instruction of the user under the condition that the health status check passes.
3. The method of claim 1, wherein the migrating the virtual machine on the target host further comprises:
in the process of migrating the virtual machines to the target host, only one virtual machine is migrated at a time, and under the condition that the current virtual machine fails to migrate, the complete entering flow fails, and the subsequent virtual machines are not migrated.
4. The method for maintaining a host of a super fusion system as defined in claim 1, further comprising, after the maintenance of the host is completed:
starting a target host, wherein the service on the target host is started automatically;
after the target host is started, and an exit maintenance mode checking instruction issued by a user is received, checking an operation and maintenance component, a calculation component and a storage component of the host;
and under the condition that the operation and maintenance component, the calculation component and the storage component meet preset conditions, setting the node where the target host is located as a non-storage maintenance mode, and setting the host as a schedulable state.
5. The super fusion system host maintenance method as defined in claim 1, wherein the migrating the original virtual machine on the target host back to the target host further comprises:
performing a pre-dispatch check of virtual machine migration back, wherein the pre-dispatch check comprises: the running state of the virtual machine, whether the virtual machine contains a through device or not, and whether the state of the virtual machine changes after the virtual machine is migrated or not;
performing hot migration aiming at the virtual machine in operation;
and cold migration is carried out on the virtual machine in the shutdown state.
6. A hyperfusion system host maintenance device, comprising:
a pre-inspection module for pre-inspecting inspection items including a cluster operation and maintenance component, a calculation component and a storage component for a target host;
the waiting execution module is used for conducting the pre-inspection again under the condition that the pre-inspection result passes and the entering maintenance instruction of the user is received;
the execution module is used for setting the target host to be non-schedulable after the secondary pre-checking is passed, setting the target host to be in a storage maintenance mode, and migrating the virtual machine on the target host to carry out host maintenance under the condition that the preset condition is met; the migrating the virtual machine on the target host further includes: performing a pre-dispatch check of virtual machine migration, wherein the pre-dispatch check comprises: the running state of the virtual machine, whether the virtual machine contains a through device or not, and whether the state of the virtual machine changes after the virtual machine is migrated or not; performing hot migration aiming at the virtual machine in operation; under the preset condition, shutting down and cold migration are carried out on the running virtual machine; cold migration is carried out on the virtual machine in the shutdown state; in the storage maintenance mode, after the node where the target host is located is offline, the cluster detection does not automatically trigger data recovery so as to reduce the data recovery amount generated by a user during the subsequent maintenance period;
and the rebuilding module is used for transferring the original virtual machine on the target host to the target host after the host is maintained and checked to pass the exit maintenance mode.
7. A computer device, comprising:
a memory for storing a processing program;
a processor, which when executing the processing program, implements the super fusion system host maintenance method according to any one of claims 1 to 5.
8. A readable storage medium, wherein a processing program is stored on the readable storage medium, and when the processing program is executed by a processor, the processing program implements the super fusion system host maintenance method according to any one of claims 1 to 5.
CN202211439186.1A 2022-10-12 2022-11-17 Method and device for maintaining host of super fusion system Active CN115904621B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022112451482 2022-10-12
CN202211245148 2022-10-12

Publications (2)

Publication Number Publication Date
CN115904621A CN115904621A (en) 2023-04-04
CN115904621B true CN115904621B (en) 2023-09-19

Family

ID=86483544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211439186.1A Active CN115904621B (en) 2022-10-12 2022-11-17 Method and device for maintaining host of super fusion system

Country Status (1)

Country Link
CN (1) CN115904621B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399201A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of method, apparatus and cloud management platform of openstack calculate node host maintenance
CN111176790A (en) * 2019-12-30 2020-05-19 北京浪潮数据技术有限公司 Active maintenance method and device of cloud platform physical host and readable storage medium
CN111669284A (en) * 2020-04-28 2020-09-15 长沙证通云计算有限公司 OpenStack automatic deployment method, electronic device, storage medium and system
US11157263B1 (en) * 2020-06-15 2021-10-26 Dell Products L.P. Pipeline rolling update

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11520673B2 (en) * 2020-07-21 2022-12-06 Hewlett Packard Enterprise Development Lp Maintenance operations based on analysis of collected data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399201A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of method, apparatus and cloud management platform of openstack calculate node host maintenance
CN111176790A (en) * 2019-12-30 2020-05-19 北京浪潮数据技术有限公司 Active maintenance method and device of cloud platform physical host and readable storage medium
CN111669284A (en) * 2020-04-28 2020-09-15 长沙证通云计算有限公司 OpenStack automatic deployment method, electronic device, storage medium and system
US11157263B1 (en) * 2020-06-15 2021-10-26 Dell Products L.P. Pipeline rolling update

Also Published As

Publication number Publication date
CN115904621A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
US7774785B2 (en) Cluster code management
US5664088A (en) Method for deadlock recovery using consistent global checkpoints
EP3117322B1 (en) Method and system for providing distributed management in a networked virtualization environment
US7536582B1 (en) Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US20020073410A1 (en) Replacing software at a telecommunications platform
JPH05181823A (en) Method and apparatus for controlling block in block partitioning type process environment
JPH0561808A (en) Method and device for dynamically changing i/o configulation of system
JP2001134454A (en) Method and system for updating component in computer environment and manufactured product
CN111857951A (en) Containerized deployment platform and deployment method
US20080263183A1 (en) Management of Kernel configurations for nodes in a clustered system
US6502176B1 (en) Computer system and methods for loading and modifying a control program without stopping the computer system using reserve areas
US20080294839A1 (en) System and method for dumping memory in computer systems
US8307371B2 (en) Method for efficient utilization of processors in a virtual shared environment
CN115904621B (en) Method and device for maintaining host of super fusion system
US10970098B2 (en) Methods for sharing input-output device for process automation on a computing machine and devices thereof
US5613133A (en) Microcode loading with continued program execution
US6487580B1 (en) Method and system for managing concurrently executable computer processes
US20130144842A1 (en) Failover and resume when using ordered sequences in a multi-instance database environment
CN109634721B (en) Method and related device for starting communication between virtual machine and host
CA2345200A1 (en) Cross-mvs-system serialized device control
CN113342511A (en) Distributed task management system and method
CN113342499A (en) Distributed task calling method, device, equipment, storage medium and program product
US6823498B2 (en) Masterless building block binding to partitions
US20230393882A1 (en) Management of virtual machine shutdowns in a computing environment based on resource locks
CN116954760B (en) UEFI intelligent starting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100098

Patentee after: Beijing Zhiling Haina Technology Co.,Ltd.

Country or region after: China

Address before: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100098

Patentee before: Beijing zhilinghaina Technology Co.,Ltd.

Country or region before: China