CN117519914B

CN117519914B - Cloud host control method and device and management host

Info

Publication number: CN117519914B
Application number: CN202410024187.2A
Authority: CN
Inventors: 刘金松; 施扬; 申习之
Original assignee: Chengdu Zhuozhou Technology Co ltd
Current assignee: Chengdu Zhuozhou Technology Co ltd
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-03-12
Anticipated expiration: 2044-01-08
Also published as: CN117519914A

Abstract

The application discloses a cloud host control method, a cloud host control device and a management host, which are applied to a cloud platform provided with the management host and at least one operation host corresponding to the management host, wherein the method is applied to the management host, and specifically, after the time interval between the starting time and the current time of the management host exceeds the maximum allowable running time of a task, the method can periodically execute: determining operation hosts meeting the shutdown condition from all operation hosts, judging whether the time interval between the latest try start time and the current moment of the operation hosts exceeds the try start time interval threshold, and if so, outputting a shutdown request to the cloud platform to request to shut down the operation hosts. Wherein, the condition that the threshold value of the start-up time interval is exceeded can be characterized as follows: in the current period of time, the operation host does not have operation tasks to be processed, and based on the operation task, the operation host can be closed, so that useless occupation of cloud platform resources is reduced, resource waste is avoided, and cost is saved.

Description

Cloud host control method and device and management host

Technical Field

The present application relates to the field of computer technologies, and in particular, to a cloud host control method, a cloud host control device, and a management host.

Background

When a user relies on the cloud platform to operate, firstly, a cloud host is deployed on the cloud platform, cloud platform resources such as cpu/gpu resources and memory resources are configured for the cloud host, then the cloud host is started to issue an operation task, and the operation task is realized by relying on the cloud host.

In order to ensure the realization of the operation task, enough cloud platform resources need to be configured for the cloud host, and based on the cloud platform resources, the cloud host can occupy a large amount of cloud platform resources after being started for a long time, so that a large amount of expenditure is generated.

Disclosure of Invention

In view of the above problems, the present application is provided to provide a method and an apparatus for controlling a cloud host, and a management host, so as to implement an auto-shutdown control task for the cloud host, and save overhead.

The specific scheme is as follows:

in a first aspect, a cloud host control method is provided, where a management host and at least one operation host corresponding to the management host are deployed on a cloud platform in advance, the cloud host control method is applied to the management host, and the cloud host control method includes:

after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, periodically executing a shutdown control method according to a preset shutdown control execution period;

The shutdown control method comprises the following steps: determining an operation host meeting a preset shutdown condition from all operation hosts corresponding to the management host; executing a shutdown operation on the operation host, wherein the shutdown operation on the operation host comprises:

judging whether the time interval between the latest try start time and the current moment of the operation host exceeds a preset try start time interval threshold value or not; the latest try start-up time of the operation host is the latest time for determining that the operation host needs to start operation processing; if not, ending executing the shutdown operation of the operation host; if yes, outputting a shutdown request to the cloud platform to request to shut down the operation host, and ending executing the shutdown operation on the operation host.

In a second aspect, a cloud host control device is provided, where a management host and at least one operation host corresponding to the management host are deployed on a cloud platform in advance, the cloud host control device is applied to the management host, and the cloud host control device includes: a shutdown control module;

the shutdown control module is used for: after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, periodically executing a shutdown control method according to a preset shutdown control execution period;

In a third aspect, a management host is provided, where the management host is deployed on a cloud platform, and at least one operation host corresponding to the management host is further deployed on the cloud platform, and the management host includes a memory and a processor;

the memory is used for storing programs;

and the processor is used for executing the program and realizing the steps of the cloud host control method.

By means of the above technical solution, the present application may be applied to a cloud platform, where a cloud host serving a target user is deployed in advance on the cloud platform, and the cloud host may include a management host and at least one operation host corresponding to the management host, and the above cloud host control method may be applied to the management host, where the method may include: and after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, periodically executing the shutdown control method according to the preset shutdown control execution period. The shutdown control method may include: determining an operation host meeting a preset shutdown condition from all operation hosts corresponding to the management host, and executing shutdown operation on the operation host: judging whether the time interval between the latest try start time and the current moment of the operation host exceeds a preset try start time interval threshold, wherein the condition that the time interval threshold exceeds the preset try start time interval threshold can be characterized in that the operation host does not have operation tasks to be processed in a period of time before the current moment, namely the operation host is in an idle state currently, and based on the operation task, a shutdown request can be output to the cloud platform to request to close the operation host, so that useless occupation of cloud platform resources is reduced to a certain extent, resource waste is avoided, and cost is saved.

In addition, the shutdown control method starts to be executed after the time interval between the startup time and the current time of the management host exceeds the preset maximum allowable task running time, so that the situation that the operation host is turned off by mistake caused by restarting the management host, emptying monitoring data of the operation host and the like can be avoided to a certain extent, and the reliability of shutdown control is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 shows a flow chart of a shutdown control method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a shutdown operation of the operation host according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a shutdown operation of the operation host according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of a startup control method according to an embodiment of the present application;

FIG. 5 illustrates a flowchart of a power-on operation of the computing host according to an embodiment of the present application;

FIG. 6 illustrates another flowchart of a power-on operation of the computing host according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of a power-on operation of the computing host according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a management host according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The embodiment of the application provides a cloud host control method, a cloud host control device and a management host, so that automatic control tasks of the cloud host are realized, and the cost is saved.

The cloud host control scheme provided by the embodiment of the application can be applied to a cloud platform, and a management host and at least one operation host corresponding to the management host can be deployed on the cloud platform in advance. Specifically, the cloud host control method may be applied to the management host, and the cloud host control method may include:

And after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, periodically executing the shutdown control method according to the preset shutdown control execution period.

It should be noted that, the start-up time of the management host may represent the time when the management host starts to perform cloud host control. In one possible implementation manner, the cloud host control method may be implemented by using a server process running on the management host. Based on this, the boot time of the management host may be the boot time of the server process.

The time interval between the starting time and the current time of the management host can represent the duration of the cloud host control of the management host; the time interval exceeds the preset maximum allowable running time of the task, which can indicate that the management host has performed cloud host control for a long enough time, and can start gc (garbage collection ) logic, that is, periodically execute a shutdown control method according to a preset shutdown control execution period, and timely shut down an useless operation host. The preset maximum allowable task running time may be set according to a general maximum time consumption of the operation tasks processed by the at least one operation host, for example, the preset maximum task running time is greater than the general maximum time consumption of the operation tasks processed by the at least one operation host. For example, if the computing host can process and complete a computing task in 5 minutes, the preset maximum allowable task running time may be 15 minutes.

It should be noted that, restarting the management host or restarting the server process running on the management host may cause the management host to lose the monitoring data of each operation host, if the gc logic is started at this time, executing the shutdown control method may determine that the operation host that is processing the operation task meets the preset shutdown control condition, and may misclose the operation host, thereby causing the failure in processing the ongoing operation task. In order to solve the problem, the time for starting to execute the shutdown control method is set, namely, after the time interval between the startup time and the current time of the management host exceeds the preset maximum allowable running time of the task, the situation of erroneously closing the operation host can be avoided to a certain extent. In addition, since the preset maximum allowable running time of the task is at least greater than the general maximum time consumption of the operation task processed by the at least one operation host, when the gc logic is started, the operation task started before the management host is restarted or the server process running on the management host is restarted is usually processed and completed, so that the situation that the operation task processing failure caused by the misclosing operation host can be avoided to a certain extent.

Specifically, fig. 1 is a schematic flow chart of a shutdown control method according to an embodiment of the present application, where the shutdown control method may include the following steps S101 to S102:

step S101, determining an operation host meeting a preset shutdown condition from the operation hosts corresponding to the management hosts.

Optionally, the preset shutdown condition may include: the time interval between each task time of the operation host and the current moment exceeds a preset idle time threshold value, and the operation host is in a starting state at present; wherein, each task time of the operation host can be the starting time or the ending time of each operation task processed by the operation host; the preset idle time threshold value can be set according to the general maximum time consumption of the operation host computer when processing the operation task.

In addition, the processing task of the operation task can be realized by using the operation process running on the operation host, based on this, the on-line state of the operation process running on the operation host can be used for representing the on-state of the operation host, that is, the operation host is in the on-state currently can include: the operation process running on the operation host is currently in an on-line state.

For example, if the computing host can process and complete a computing task in 5 minutes, the preset idle time threshold may be 15 minutes; based on this, the operation host meeting the preset shutdown condition can be characterized as follows: among the operation tasks handled by the operation host, there is no task that starts or ends within 15 minutes, and the operation process of the operation host is on-line.

In a possible implementation manner, in step S101, determining, from the operation hosts corresponding to the management host, the operation host that meets the preset shutdown condition may include:

for each of the operation hosts: inquiring the locally stored operation task state of the operation host, inquiring the off-line state of the operation process of the operation host under the condition that the time interval between each task time and the current moment of the operation host exceeds a preset idle time threshold value, and determining the operation host in the on-line state as the operation host meeting the preset shutdown condition.

Step S102, executing the shutdown operation of the operation host.

Specifically, fig. 2 illustrates a schematic flow diagram of a shutdown operation of the computing host, where the operation may include:

The method comprises the steps of firstly, judging whether the time interval between the latest try start time and the current moment of the operation host exceeds a preset try start time interval threshold value or not; if yes, executing a second step; if not, ending executing the shutdown operation of the operation host.

The latest try start-up time of the operation host can be the latest time for determining that the operation host needs to start operation processing, and the latest time for starting the operation host can be represented; the time interval between the latest try start time and the current moment of the operation host exceeds the preset try start time interval threshold value, and can be characterized by: the operation host has been idle for a period of time since the operation host has been required to be turned on last time, and the operation host can be turned off to save overhead.

And step two, outputting a shutdown request to the cloud platform to request to shut down the operation host.

The shutdown request may be used to shut down the operation host, and after the shutdown request is output to the cloud platform, execution of the shutdown operation on the operation host is ended. In one possible implementation manner, the outputting the shutdown request to the cloud platform may include: and calling an api interface of the cloud platform to execute shutdown operation on the operation host.

By means of the cloud host control method, after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, the execution period is executed according to the preset shutdown control, and the periodic execution is performed: determining an operation host meeting a preset shutdown condition from all operation hosts corresponding to a management host, judging whether the time interval between the latest try startup time and the current moment of the operation host exceeds a preset try startup time interval threshold, and if the time interval exceeds the preset try startup time interval threshold, namely, in a period of time before the current moment, the operation host does not have operation tasks needing to be processed and is in an idle state, outputting a shutdown request to a cloud platform so as to request to shut down the operation host.

Alternatively, the management host and the at least one operation host may be cloud hosts serving a specified user. Based on the cloud host control scheme, cloud host management tasks of the appointed user can be realized, and cloud platform resource expenditure of the appointed user is saved.

FIG. 3 illustrates another flow chart of a shutdown operation for the operation host, and in combination with the illustration of FIG. 3, the shutdown operation for the operation host may include the following steps:

And thirdly, judging whether the time interval between the output time and the current time of the shutdown request exceeds the preset shutdown timeout time. If the shutdown timeout time is exceeded, executing a fourth step; and if the shutdown timeout time is not exceeded, executing the fifth step.

The preset shutdown timeout time can be set according to the shutdown response speed of the cloud host. The preset shutdown timeout may be, for example, 3 minutes.

And step four, outputting error reporting information used for representing shutdown timeout of the operation host.

The condition that the time interval between the output time of the shutdown request and the current time exceeds the preset shutdown timeout time can indicate that the operation host is not shutdown successfully for a long time, the shutdown timeout error is needed to be reported, and after error reporting information for indicating the shutdown timeout of the operation host is output, the shutdown operation of the operation host is finished.

And fifthly, judging whether the operation host is powered off or not. If the operation host is not shut down, executing a sixth step; if the operation host is shut down, the operation of shutting down the operation host is finished.

Optionally, the determining whether the operation host is powered off may include: and inquiring the on-off state of the operation host by using the interface of the cloud platform, and judging whether the operation host is powered off or not according to an inquiry result.

And sixthly, waiting for a preset shutdown polling interval.

The preset shutdown polling interval can represent the time interval between every two adjacent startup and shutdown state query operations, and resource waste caused by frequent query can be reduced by the aid of the preset shutdown polling interval. And after waiting for a preset shutdown polling interval, returning to execute the third step, and judging whether the time interval between the output time and the current time of the shutdown request exceeds the preset shutdown timeout time.

In the above-described shutdown operation of the operation host, a possible implementation of shutdown polling logic is provided, and reference is made to the above for other descriptions of the operation.

In some embodiments provided herein, the shutdown operation of the operation host may further include:

And under the condition that the time interval between the output time and the current time of the shutdown request does not exceed the shutdown timeout time and the shutdown error reporting signal output by the cloud platform is received, outputting error reporting information used for representing the shutdown failure of the operation host, and ending executing the shutdown operation of the operation host.

The shutdown error-reporting signal output by the cloud platform may be API error-reporting information of the cloud platform.

According to the shutdown operation, after the shutdown control request is output, the on-off state of the operation host is polled within the preset shutdown timeout time, so that the state of the operation host can be timely determined, and the abnormal situation can be timely perceived and recorded and corresponding treatment measures are taken under the condition that the operation host is not shutdown successfully for a long time, such as shutdown retry or manual operation to shut down the operation host. In practical application, the on-off state of the operation host can be polled every second within 3 minutes after the shutdown control request is output until the operation host is determined to be shutdown, shutdown control error is reported or shutdown timeout is reached.

In some embodiments provided herein, the cloud host control method may further include: and responding to the operation task execution request, and executing the startup control method.

The operation task execution request is used for representing that at least one operation task which is required to be processed by the operation host corresponding to the management host currently exists, and operation task information can be contained in the operation task execution request. In one possible implementation, the management host may run a server process, which may start a rpc (Remote Procedure Call ) listening service, where an operational task start endpoint StartTask may be included. On this basis, when the client calls the endpoint remotely via the network to request processing of an operational task, the rpc listening service will receive a remote procedure call rpc named StartTask.

Specifically, fig. 4 shows a flowchart of a power-on control method provided in the embodiment of the present application, and in combination with fig. 4, the power-on control method may include the following steps:

step 201, determining an operation host meeting a preset starting condition from the operation hosts corresponding to the management hosts.

The operation host needs to process the operation task corresponding to the operation task execution request, that is, the preset starting condition may include: and processing the operation task corresponding to the operation task execution request. Optionally, in the case that the operation host for processing the operation task is not specified in the operation task execution request, the operation host for processing the operation task corresponding to the request may be determined according to the state of the operation host, for example, the operation host that is to be in a power-on state and is currently idle or is about to be idle is determined as the operation host that satisfies the preset power-on condition.

Step S202, executing the starting operation of the operation host.

It should be noted that, the power-on operation of the operation host and the power-off operation of the operation host are configured to be executed by a single thread, that is, only at most one of the power-on operation of the operation host and the power-off operation of the operation host can be executed at the same time for each operation host.

Fig. 5 illustrates a flowchart of a power-on operation of the operation host according to an embodiment of the present application, where the power-on operation of the operation host may include:

and the first step is to update the latest start-up attempt time of the operation host to the current moment.

Wherein the latest attempted power-on time of the computing host may characterize the latest time that the computing host needs to be turned on. Alternatively, the cloud host control scheme may be implemented by using a server process running on the management host, and on this basis, the latest attempted start-up time of each operation host corresponding to the management host may be stored in the server process of the management host.

And step two, judging whether the operation process of the operation host is on line or not. If not, executing a third step; if the operation host is on line, the execution of the starting operation of the operation host is ended.

It should be noted that, the processing of the operation task may be implemented by an operation process running on the operation host, where the operation process may be a process for processing the operation task that is started after the operation host is started, and if the operation process of the operation host is in an online state, it may be represented that: the operation host is in a starting state and can process operation tasks. Based on the above, if the operation process of the operation host is online, the execution of the startup operation of the operation host can be ended; if the operation process of the operation host is not on-line, the operation host can be requested to be started so as to start the operation process running on the operation host.

And thirdly, outputting a starting request to the cloud platform to request to start the operation host.

The starting request can be used for starting the operation host. In one possible implementation manner, the outputting the startup request to the cloud platform may include: and calling an api interface of the cloud platform to execute starting operation on the operation host.

And fourthly, inquiring the off-line state of the operation process of the operation host.

After querying whether the operation process of the operation host is online, the execution of the power-on operation of the operation host may be ended.

Step S203, when the operation process of the operation host is online, sending the operation task corresponding to the operation task execution request to the operation host, so that the operation process of the operation host processes the operation task.

For the operation process of the operation host, in one possible implementation manner, the operation process may start a rpc listening service, which may include an operation task starting endpoint StartTask. On this basis, when the management host remotely calls the endpoint through the network to request the computing process of the computing host to process the computing task, the rpc listening service will receive a remote procedure call rpc named StartTask, and the computing process may then respond to it to execute the computing task. On the basis of the above, when the management host remotely calls the endpoint through the network to request the operation process of the operation host to process the operation task, if the call fails, the error reporting information for representing the failure of the operation process of the operation host to process the operation task can be output.

By means of the cloud host control method, the operation hosts corresponding to the management hosts can be automatically started according to operation task processing requirements, automatic starting control tasks of the operation hosts corresponding to the management hosts are achieved, the management hosts do not need to be manually operated or started, and the automation degree of cloud host control is improved.

In some embodiments provided in the present application, the management host may be preconfigured with respective on-off mutual exclusion locks of the corresponding operation hosts.

Wherein, the on-off exclusive lock of the operation host can be used for guaranteeing: only one thread can perform at most one of a power-on operation to the operation host and a power-off operation to the operation host at the same time. Based on this, for any one operation host, when one thread performs one of the power-on operation for the operation host and the power-off operation for the operation host, if a new thread needs to perform any one of the power-on operation for the operation host and the power-off operation for the operation host, queuing and waiting are performed.

On this basis, for the shutdown control scheme, the executing the shutdown operation on the operation host may include: acquiring an on-off mutual exclusion lock of the operation host, and executing shutdown operation on the operation host by using the on-off mutual exclusion lock of the operation host.

Accordingly, after the execution of the shutdown operation on the operation host is finished, the shutdown control method may further include: releasing the on-off exclusive lock of the operation host.

For the power-on control scheme, the step S202 of executing the power-on operation on the operation host may include: acquiring the on-off mutual exclusion lock of the operation host, and executing the starting operation of the operation host by using the on-off mutual exclusion lock of the operation host.

Correspondingly, after the execution of the boot operation on the operation host is finished, the boot control method may further include: releasing the on-off exclusive lock of the operation host.

According to the scheme, the on-off mutual exclusion lock of the operation host ensures that the on-off operation of the operation host and the off-off operation of the operation host are executed by a single thread, and control logic errors of both on-off and on-off of the same operation host can be avoided.

Furthermore, in one possible implementation, for each operation host, the latest attempted power-on time of the operation host may be stored in a power-on and power-off exclusive lock of the operation host.

That is, a variable is protected inside the on-off exclusive lock of the operation host, where the variable is the latest attempted start-up time of the operation host, and the variable may be updated when the on-off exclusive lock of the operation host is used to perform a start-up operation on the operation host, and obtained when the on-off exclusive lock of the operation host is used to perform a shutdown operation on the operation host, so as to determine whether a time interval between the latest attempted start-up time and a current time of the operation host exceeds a preset attempted start-up time interval threshold.

Next, possible schemes of the boot operation provided in the embodiments of the present application are described.

Fig. 6 illustrates another flowchart of a power-on operation of the operation host according to the embodiment of the present application, and in combination with the illustration in fig. 6, the power-on operation of the operation host may include:

And step two, judging whether the operation process of the operation host is on line or not. If not, executing a third step; if yes, the starting operation of the operation host computer is finished.

Judging whether the time interval between the output time of the starting request and the current time exceeds the preset starting timeout time or not; if yes, executing a fifth step; if not, executing the sixth step.

The preset starting timeout time can be set according to the starting response speed of the cloud host. The preset power-on timeout may be, for example, 3 minutes.

And fifthly, outputting error reporting information used for representing the starting overtime of the operation host.

The condition that the time interval between the output time of the starting request and the current time exceeds the preset starting timeout time can indicate that the operation host is not started successfully for a long time, the starting timeout error is needed, and after error information used for indicating the starting timeout of the operation host is output, the starting operation of the operation host is finished.

Sixthly, judging whether the operation host is started or not; if not, executing a seventh step; if yes, the eighth step is executed.

Optionally, the determining whether the operation host is started may include: and inquiring the on-off state of the operation host by using the interface of the cloud platform, and judging whether the operation host is started or not according to an inquiry result.

And seventh, waiting for a preset startup polling interval.

The preset power-on polling interval can represent the time interval between every two adjacent power-on state query operations, and resource waste caused by frequent query can be reduced by the aid of the preset power-on polling interval. After waiting for a preset startup polling interval, returning to execute the fourth step, and judging whether the time interval between the output time of the startup request and the current time exceeds the preset startup timeout time.

And eighth step, inquiring the off-line state of the operation process of the operation host.

After inquiring the off-line state of the operation process of the operation host, the starting operation of the operation host is finished.

In the above-mentioned power-on operation of the operation host, a possible implementation manner of the power-on polling logic is provided, and reference may be made to the above for other description of the power-on operation.

In some embodiments provided herein, the booting operation of the operation host may further include:

and outputting error reporting information used for representing the failure of the operation host machine in starting up under the condition that the time interval between the output time of the starting-up request and the current time does not exceed the starting-up timeout time and the starting-up error reporting signal output by the cloud platform is received, and ending the starting-up operation of the operation host machine.

The power-on error-reporting signal output by the cloud platform can be API error-reporting information of the cloud platform.

After the startup control request is output, the startup state of the operation host is polled within the preset startup timeout time, so that the state of the operation host can be timely determined, and the abnormal situation can be timely perceived and recorded and corresponding processing measures are taken under the condition that the operation host is not successfully started for a long time, such as startup retry or manual startup of the operation host. In practical application, the on-off state of the operation host can be polled every second within 3 minutes after the start-up control request is output until the operation host is determined to be started up, the start-up control is reported to be wrong or the start-up is overtime.

Fig. 7 illustrates still another flowchart of a power-on operation of the operation host according to the embodiment of the present application, and in conjunction with fig. 7, the power-on operation of the operation host may include:

And seventh, waiting for a preset startup polling interval.

After waiting for a preset startup polling interval, returning to execute the fourth step, and judging whether the time interval between the output time of the startup request and the current time exceeds the preset startup timeout time.

And eighth, judging whether the time interval between the starting time and the current time of the operation host exceeds the preset process starting timeout time, if so, executing the ninth step, and if not, executing the tenth step.

The preset process starting timeout time can be set according to the general starting time of the operation process. The preset process start timeout may be, for example, 3 minutes.

And a ninth step of outputting error reporting information used for representing the starting overtime of the operation process of the operation host.

The situation that the time interval between the starting time and the current time of the operation host exceeds the preset process starting overtime time can be characterized in that the operation process of the operation host is not started successfully for a long time, the process starting overtime error reporting is needed, and after error reporting information used for representing the operation process starting overtime of the operation host is output, the starting operation of the operation host is finished.

A tenth step of judging whether the operation process of the operation host is on line or not, if not, executing an eleventh step; if yes, the starting operation of the operation host computer is finished.

And eleventh step, waiting for a preset online polling interval.

After waiting for the preset online polling interval, returning to execute the eighth step, and judging whether the time interval between the starting time and the current time of the operation host exceeds the preset process starting timeout time.

In the above-mentioned power-on operation of the operation host, a possible implementation manner of the power-on polling logic of the operation process is provided, and reference may be made to the above for other description of the power-on operation. In practical applications, the off-line state of the operation process of the operation host may be polled every second within 3 minutes after the operation host is determined to be started, until it is determined that the operation process is on-line or the process start time-out.

For the operation process of the operation host, in one possible implementation manner, the operation process may start a rpc listening service, which may include a process online query response endpoint Ping. On this basis, when the management host remotely calls the endpoint through the network to query the off-line state of the operation process, the rpc listening service may receive a remote procedure call rpc named Ping, and then the operation process may respond to it and return a success flag to characterize that the operation process is currently in an on-line state.

Next, a description is given of a cloud host control scheme provided in an embodiment of the present application according to an application example.

Specifically, a management host and an operation host corresponding to the management host are pre-deployed on the cloud platform, where cloud platform resources occupied by the management host may include: 1CPU, 2GB memory and 20GB hard disk; the cloud platform resources occupied by the operation host may include: 8CPU, 32GB memory and 100GB disk. It should be noted that, because the management host is mainly used for managing the operation host, implementing the cloud host control scheme and the proxy operation task, there is no need to configure more cloud platform resources for the management host, and based on the above cloud platform resource configuration situation, compared with the existing scheme that the cloud host for processing the operation task is always open, even in the process of implementing the cloud host control scheme provided by the embodiment of the present application, the management host is started for a long time, and does not occupy a large amount of cloud platform resources and generate a large amount of overhead.

The server process running on the management host starts rpc a listening service, where the rpc listening service includes a first operation task starting endpoint StartTask. The server process running on the management host is further predefined with an on-off exclusive lock of the operation host, wherein a variable is protected: the latest try start-up time of the operation host; the operation of starting up the operation host and the operation of shutting down the operation host can be executed by using the on-off mutual exclusion lock of the operation host.

The operation process (i.e. the server process) running on the operation host starts rpc a monitor service, and the rpc monitor service includes: the process queries the responding endpoint Ping and the second operation task to start the endpoint StartTask online.

On this basis, after receiving a rpc remote procedure call named "StartTask" (i.e. receiving an operation task execution request), the server process running on the management host performs the following steps:

the first step, determining the operation host as the operation host meeting the preset starting condition;

step two, acquiring an on-off mutual exclusion lock of the operation host, and executing the starting operation of the operation host by using the on-off mutual exclusion lock of the operation host; releasing the on-off exclusive lock of the operation host after finishing executing the starting operation of the operation host;

thirdly, calling a rpc monitoring service of the operation process of the operation host to start an endpoint StartTask of a second operation task under the condition that the operation process of the operation host is on line; and then ends execution. Alternatively, if the call fails, an error may be reported.

Specifically, the executing the startup operation on the operation host by using the startup and shutdown exclusive lock of the operation host may include:

Step two, delaying for a preset time (for example, 1 second), calling a rpc monitoring process of the operation host to inquire about an online inquiry response endpoint Ping of a process in service so as to inquire about an off-line state of the operation process of the operation host, and judging whether the operation process of the operation host is online; if the operation host is on line, ending executing the starting operation of the operation host; if not, executing the third step.

And thirdly, calling an API interface of the cloud platform to start the operation host.

Setting timeout time (for example, 3 minutes) according to preset startup timeout time; and periodically using the API interface of the cloud platform to inquire the on-off state of the operation host until the operation host is determined to be started, the time-out time is exceeded or the start-up error reporting information is received.

If the time-out time is exceeded, error reporting information used for representing the starting time-out of the computing host can be output; if the startup error reporting information is received, for example, an API interface of the cloud platform reports errors, the error reporting information used for representing the startup failure of the operation host can be output. After the error information is output, the starting operation of the operation host computer is finished.

Fifthly, after the operation host is determined to be started, setting timeout time (for example, 3 minutes) according to a preset process starting timeout time; and periodically calling the rpc monitoring service process on-line inquiry response end point Ping (for example, per second) of the operation process of the operation host in the timeout period until the operation process of the operation host is determined to be on-line or exceeds the timeout period.

If the timeout time is exceeded, error reporting information for representing that the operation process of the operation host starts overtime can be output, and then starting operation of the operation host is finished; if the operation process of the operation host is determined to be online, the starting operation of the operation host is finished.

The server process running on the management host may turn on gc logic, which may be configured to: after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowed running time of the task (for example, after the management host is started or after a server process running on the management host is started for 15 minutes), the gc logic is periodically executed according to a preset shutdown control execution period (for example, 1 minute), so as to implement the shutdown control scheme provided by the embodiment of the application.

Specifically, the gc logic may include the following steps:

firstly, inquiring a locally stored operation task state of the operation host, and judging whether the time intervals between each task time (the starting time or the ending time of the operation task) of the operation host and the current moment exceed a preset idle time threshold (for example, 15 minutes); if not, ending executing the gc logic of the time; if yes, executing the second step.

Step two, delaying for a preset time (for example, 1 second), calling a rpc monitoring process of the operation host to inquire about an online inquiry response endpoint Ping of a process in service so as to inquire about an off-line state of the operation process of the operation host, and judging whether the operation process of the operation host is online; if not, ending executing the gc logic of the time; if yes, determining that the operation host meets a preset shutdown condition, and executing a third step.

And thirdly, acquiring an on-off mutual exclusion lock of the operation host, and executing shutdown operation on the operation host by using the on-off mutual exclusion lock of the operation host.

And fourthly, releasing the on-off mutual exclusion lock of the operation host after the shutdown operation of the operation host is finished.

The executing the shutdown operation on the operation host by using the on-off exclusive lock of the operation host may include:

a first step of judging whether the time interval between the latest try start time and the current moment of the operation host exceeds a preset try start time interval threshold (for example, 15 minutes); if yes, executing a second step; if not, ending executing the shutdown operation of the operation host.

And secondly, calling an API interface of the cloud platform to close the operation host.

Thirdly, setting timeout time (for example, 3 minutes) according to preset shutdown timeout time; and periodically using the API interface of the cloud platform to inquire the on-off state (for example, every second) of the operation host until the operation host is determined to be powered off, the timeout time is exceeded or the shutdown error reporting information is received.

If the overtime time is exceeded, the error reporting information used for representing the shutdown overtime of the operation host can be output; if the shutdown error reporting information is received, for example, an API interface of the cloud platform reports an error, the error reporting information for indicating that the operation host fails to boot may be output. And after the error reporting information is output or the operation host is determined to be powered off, the shutdown operation of the operation host is finished.

In some embodiments provided herein, the cloud host control method may include: and receiving and recording information fed back by the API interface of the cloud platform, such as startup API error reporting information and shutdown API error reporting information, so as to update and maintain the management host.

For example, the cause of the API error of the cloud platform may be: a network problem between the management host and the cloud platform; the cloud platform of the cloud platform is insufficient in resources and the operation host cannot be started; account defaulting of a designated user served by the management host; the API of the cloud platform is being maintained and cannot be used; the operation host is not present.

According to the cloud host control method, by means of the shutdown control method, automatic shutdown control of the long-time idle operation host can be achieved, and operation host shutdown control tasks based on operation requirements are achieved; by means of the starting control method, the management computer can start the corresponding operation host according to operation requirements, and rapid processing of operation tasks is achieved. That is, the scheme realizes the automatic control task of the operation host based on the operation requirement by the management host. Compared with a normally open scheme of a cloud host for operation, the scheme that the operation host is opened and closed according to operation requirements can reduce useless occupation of cloud platform resources to a certain extent, avoid resource waste and save expenditure.

In one possible implementation, the operation input data of the operation task and the operation result data output by the operation host may be stored on another cloud storage, and interact through another cloud storage API interface. In addition, the operation tasks processed by the operation host corresponding to the management host, the real-time operation state monitoring and log management tasks, may be implemented on the server process of the management host by using rpc, and may also be implemented in other places, so as to facilitate system maintenance.

The following describes the cloud host control device provided in the embodiments of the present application, and the cloud host control device described below and the cloud host control method described above may be referred to correspondingly to each other.

The cloud platform may be pre-deployed with a management host and at least one operation host corresponding to the management host, where the cloud host control device may be applied to the management host, and the cloud host control device may include: and a shutdown control module.

Specifically, the shutdown control module may be configured to: and after the time interval between the starting time and the current time of the management host exceeds the preset maximum allowable running time of the task, periodically executing the shutdown control method according to the preset shutdown control execution period.

The shutdown control method may include: determining an operation host meeting a preset shutdown condition from all operation hosts corresponding to the management host; executing a shutdown operation on the operation host, wherein the shutdown operation on the operation host comprises: judging whether the time interval between the latest try start time and the current moment of the operation host exceeds a preset try start time interval threshold value or not; the latest try start-up time of the operation host is the latest time for determining that the operation host needs to start operation processing; if not, ending executing the shutdown operation of the operation host; if yes, outputting a shutdown request to the cloud platform to request to shut down the operation host, and ending executing the shutdown operation on the operation host.

In some embodiments provided herein, the cloud host control apparatus may further include: and a starting control module.

Specifically, the power-on control module may be configured to: responding to an operation task execution request, executing a startup control method, wherein the startup control method comprises the following steps:

determining an operation host meeting preset starting conditions from all operation hosts corresponding to the management host, wherein the operation host needs to process operation tasks corresponding to the operation task execution requests; executing the starting operation of the operation host; under the condition that the operation process of the operation host is online, sending an operation task corresponding to the operation task execution request to the operation host so as to enable the operation process of the operation host to process the operation task; wherein the power-on operation to the operation host and the power-off operation to the operation host are configured to be single-thread execution; the power-on operation of the operation host may include: updating the latest start-up attempt time of the operation host to the current moment; judging whether the operation process of the operation host is on line or not; if the operation host is on line, ending executing the starting operation of the operation host; if the operation process is not online, a starting request is output to the cloud platform to request to start the operation host, the off-line state of the operation process of the operation host is inquired, and the starting operation of the operation host is finished.

In some embodiments provided in the present application, the management host may be preconfigured with respective on-off mutual exclusion locks of the corresponding operation hosts; the on-off exclusive lock of the operation host can be used for ensuring that only one thread can execute at most one operation of starting up the operation host and shutting down the operation host at the same time.

Based on the above, the process of executing the shutdown operation of the operation host may include:

acquiring an on-off mutual exclusion lock of the operation host, and executing shutdown operation on the operation host by using the on-off mutual exclusion lock of the operation host;

after finishing executing the shutdown operation on the operation host, the shutdown control method may further include: releasing the on-off exclusive lock of the operation host.

On the basis of the above, the executing the startup operation of the operation host may include:

acquiring an on-off mutual exclusion lock of the operation host, and executing starting operation on the operation host by using the on-off mutual exclusion lock of the operation host;

after finishing executing the boot operation on the operation host, the boot control method may further include: releasing the on-off exclusive lock of the operation host.

In some embodiments provided herein, before querying the offline state of the operation process of the operation host, the powering-on operation of the operation host may further include:

judging whether the time interval between the output time of the starting request and the current time exceeds the preset starting timeout time or not;

if the starting time-out time is exceeded, outputting error reporting information used for representing the starting time-out of the operation host, and ending executing the starting operation of the operation host;

if the starting time-out time is not exceeded, judging whether the operation host is started or not;

if the operation host is not started, after waiting for a preset starting polling interval, returning to execute the step of judging whether the time interval between the output time of the starting request and the current time exceeds a preset starting timeout time;

if the operation host is started, the step of inquiring the off-line state of the operation process of the operation host is continuously executed.

In some embodiments provided herein, the querying the offline state of the operation process of the operation host may include:

judging whether the time interval between the starting time and the current time of the operation host exceeds the preset process starting timeout time or not;

If the process starting overtime time is exceeded, outputting error reporting information used for representing the starting overtime of the operation process of the operation host, and ending executing the starting operation of the operation host;

if the process starting timeout time is not exceeded, inquiring the off-line state of the operation process of the operation host, and returning to execute the step of judging whether the time interval between the starting time and the current time of the operation host exceeds the preset process starting timeout time after waiting for the preset on-line polling interval under the condition that the operation process of the operation host is in an off-line state.

In some embodiments provided herein, after outputting the shutdown request to the cloud platform, the shutdown operation on the computing host may further include:

Judging whether the time interval between the output time of the shutdown request and the current time exceeds the preset shutdown timeout time or not;

if the shutdown timeout time is exceeded, outputting error reporting information used for representing shutdown timeout of the operation host, and ending executing shutdown operation of the operation host;

if the shutdown timeout time is not exceeded, judging whether the operation host is shutdown;

if the operation host is not powered off, after waiting for a preset power-off polling interval, returning to execute the step of judging whether the time interval between the output time of the power-off request and the current time exceeds a preset power-off timeout time;

if the operation host is shut down, the operation of shutting down the operation host is finished.

and outputting error reporting information used for representing the shutdown failure of the operation host under the condition that the time interval between the output time of the shutdown request and the current time does not exceed the shutdown timeout time and the shutdown error reporting signal output by the cloud platform is received, and ending the execution of the shutdown operation of the operation host.

The cloud host control device provided by the embodiment of the application can be applied to a management host, wherein the management host is deployed on a cloud platform, and at least one operation host corresponding to the management host is also deployed on the cloud platform. Alternatively, fig. 8 shows a block diagram of a hardware structure of the management host, and referring to fig. 8, the hardware structure of the management host may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4;

processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:

Alternatively, the refinement function and the extension function of the program may be described with reference to the above.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, and may be combined according to needs, and the same similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The cloud host control method is characterized in that a management host and at least one operation host corresponding to the management host are deployed on a cloud platform in advance, the cloud host control method is applied to the management host, and the cloud host control method comprises the following steps:

2. The cloud host control method according to claim 1, wherein the cloud host control method further comprises:

responding to an operation task execution request, executing a startup control method, wherein the startup control method comprises the following steps:

determining an operation host meeting preset starting conditions from all operation hosts corresponding to the management host, wherein the operation host needs to process operation tasks corresponding to the operation task execution requests; executing the starting operation of the operation host; under the condition that the operation process of the operation host is online, sending an operation task corresponding to the operation task execution request to the operation host so as to enable the operation process of the operation host to process the operation task;

Wherein the power-on operation to the operation host and the power-off operation to the operation host are configured to be single-thread execution; the starting operation of the operation host comprises the following steps:

updating the latest start-up attempt time of the operation host to the current moment; judging whether the operation process of the operation host is on line or not; if the operation host is on line, ending executing the starting operation of the operation host; if the operation process is not online, a starting request is output to the cloud platform to request to start the operation host, the off-line state of the operation process of the operation host is inquired, and the starting operation of the operation host is finished.

3. The cloud host control method according to claim 2, wherein the management host is preconfigured with respective on-off mutual exclusive locks of the corresponding operation hosts; the on-off exclusive lock of the operation host is used for ensuring that only one thread can execute at most one operation of the starting operation and the shutting operation of the operation host at the same time;

the executing the shutdown operation of the operation host comprises the following steps:

After finishing executing the shutdown operation on the operation host, the shutdown control method further comprises the following steps: releasing the on-off exclusive lock of the operation host;

the executing the starting operation of the operation host comprises the following steps:

after finishing executing the starting operation on the operation host, the starting control method further comprises the following steps: releasing the on-off exclusive lock of the operation host.

4. The cloud host control method of claim 3, wherein the powering-on operation of the operation host before querying the off-line state of the operation process of the operation host further comprises:

5. The cloud host control method of claim 4, wherein said querying the offline state of the operation process of the operation host comprises:

6. The cloud host control method of claim 4, wherein the powering on operation of the computing host further comprises:

7. The cloud host control method according to any one of claims 1 to 6, wherein after outputting a shutdown request to the cloud platform, a shutdown operation to the operation host further comprises:

8. The cloud host control method of claim 7, wherein the shutdown operation for the operation host further comprises:

9. The cloud host control device is characterized in that a management host and at least one operation host corresponding to the management host are deployed on a cloud platform in advance, the cloud host control device is applied to the management host, and the cloud host control device comprises: a shutdown control module;

10. The management host is characterized by being deployed on a cloud platform, and at least one operation host corresponding to the management host is also deployed on the cloud platform, and comprises a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the respective steps of the cloud host control method according to any one of claims 1 to 8.