CN115904642A

CN115904642A - Cloud server control method and device, storage medium and electronic equipment

Info

Publication number: CN115904642A
Application number: CN202110957018.0A
Authority: CN
Inventors: 张瑞; 皮振伟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2023-04-04
Also published as: WO2023020141A1

Abstract

The disclosure relates to a cloud service control method and device, a storage medium and an electronic device, which aim to process memory errors in a targeted manner and improve the processing efficiency of the memory errors. Wherein, the method comprises the following steps: when a memory error is monitored, determining a memory address where the memory error occurs, and determining an error type of the memory error according to the memory address, wherein the error type is used for identifying whether the memory error can cause the virtual machine to crash; determining the total error times of the monitored memory errors; and controlling a cloud scheduler to schedule corresponding scheduling services from a scheduling service library according to the error type of the memory error and the total error times, wherein the scheduling service library stores a first scheduling service and a second scheduling service, the first scheduling service is used for controlling the virtual machine to be restarted on an original host machine to which the virtual machine belongs, and the second scheduling service is used for controlling the virtual machine to be migrated from the original host machine to another host machine.

Description

Cloud server control method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of cloud computing technologies, and in particular, to a cloud service control method and apparatus, a storage medium, and an electronic device.

Background

With the rise of cloud computing services, a large number of enterprises and personal services are deployed on a cloud server, and effective management of the cloud server is extremely important. As a common hardware failure, memory errors (MCE) affect the normal operation of a virtual Machine to different degrees, some memory errors may stop the operation of a virtual Machine monitor (Hypervisor), and some memory errors may restart the virtual Machine.

In the related art, after a memory error is monitored, an operation and maintenance person is required to log in a machine to check a specific reason and manually process the reason, or directly select to restart a virtual machine. For the manual processing mode, more manpower and time are required to be consumed, and the processing efficiency of the memory error is low. For the method of directly restarting the virtual machine, the targeted memory error cannot be processed, so that the normal operation of the cloud computing service is influenced.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a cloud service control method, including:

when a memory error is monitored, determining a memory address where the memory error occurs, and determining an error type of the memory error according to the memory address, wherein the error type is used for identifying whether the memory error can cause the virtual machine to crash;

determining a total number of errors of the monitored memory errors;

and controlling a cloud scheduler to schedule corresponding scheduling services from a scheduling service library according to the error type of the memory error and the total error times, wherein the scheduling service library stores a first scheduling service and a second scheduling service, the first scheduling service is used for controlling the virtual machine to be restarted on an original host machine to which the virtual machine belongs, and the second scheduling service is used for controlling the virtual machine to be migrated from the original host machine to another host machine.

In a second aspect, the present disclosure provides a cloud service control apparatus, the apparatus including:

the first determining module is used for determining a memory address where the memory error occurs when the memory error is monitored, and determining an error type of the memory error according to the memory address, wherein the error type is used for identifying whether the memory error can cause the virtual machine to crash;

the second determining module is used for determining the total error times of the monitored memory errors;

and a third determining module, configured to control a cloud scheduler to schedule a corresponding scheduling service from a scheduling service library according to the error type of the memory error and the total error frequency, where the scheduling service library stores a first scheduling service and a second scheduling service, the first scheduling service is used to control restarting of a virtual machine on an original host to which the virtual machine belongs, and the second scheduling service is used to control migrating the virtual machine from the original host to another host.

In a third aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.

Through the technical scheme, when the memory error is monitored, the error type of the memory error can be determined, and the total error frequency of the monitored memory error is determined, so that the cloud scheduler is controlled to schedule the corresponding scheduling service from the scheduling service library according to the error type and the total error frequency of the memory error, a first scheduling service and a second scheduling service are stored in the scheduling service library, the first scheduling service is used for controlling the virtual machine to be restarted on the original host machine to which the virtual machine belongs, and the second scheduling service is used for controlling the virtual machine to be migrated from the original host machine to another host machine. Compared with a method of directly restarting the virtual machine in the related art, the method can be used for processing the memory errors in a targeted manner, so that the influence of the memory errors on the cloud computing service is reduced. In addition, according to the error type and the total error times of the memory error, the service is scheduled, so that the automatic processing of the memory error can be realized, the labor and time consumed in the memory error processing process are reduced, and the processing efficiency is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

fig. 1 is a flowchart illustrating a cloud service control method according to an exemplary embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a cloud service control method according to another exemplary embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a cloud service control apparatus according to an exemplary embodiment of the present disclosure;

fig. 4 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. It is further noted that references to "a", "an", and "the" modifications in the present disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

As background art shows, after a memory error is detected, an operation and maintenance person is required to log in a machine to check a specific reason and manually process the reason, or directly select to restart a virtual machine in the related art. For the manual processing mode, more manpower and time are required to be consumed, and the processing efficiency for the memory error is low. For the method of directly restarting the virtual machine, the targeted memory error cannot be processed, so that the normal operation of the cloud computing service is influenced.

In view of this, the present disclosure provides a cloud service control method, which is used for processing a memory error by automatically scheduling a service in a targeted manner according to an error type and a total number of errors of the memory error, so as to reduce labor power and time consumed in a memory error processing process and improve memory error processing efficiency.

Fig. 1 is a flowchart illustrating a cloud service control method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the method includes:

step 101, when a memory error is detected, determining a memory address where the memory error occurs, and determining an error type of the memory error according to the memory address. Wherein the error type is used to identify whether the memory error will cause the virtual machine to crash.

Step 102, determining a total error number of the monitored memory errors.

Step 103, controlling the cloud scheduler to schedule a corresponding scheduling service from a scheduling service library according to the error type and the total error frequency of the memory error, wherein a first scheduling service and a second scheduling service are stored in the scheduling service library, the first scheduling service is used for controlling the virtual machine to be restarted on an original host machine to which the virtual machine belongs, and the second scheduling service is used for controlling the virtual machine to be migrated from the original host machine to another host machine.

Through the method, compared with a method of directly restarting the virtual machine in the related art, the method can be used for processing the memory errors in a targeted manner, so that the influence of the memory errors on the cloud computing service is reduced. In addition, according to the error type and the total error times of the memory error, the service is scheduled, so that the automatic processing of the memory error can be realized, the labor and time consumed in the memory error processing process are reduced, and the processing efficiency is improved.

In order to make those skilled in the art understand the cloud service control method provided by the present disclosure, the following steps are exemplified in detail.

It should be understood that the cloud service control method provided by the present disclosure may be applied to a cloud computing scenario. For example, steps 101 to 103 may be performed by a server running a cloud computing service. Or, further subdivided, a virtual machine monitor and a cloud scheduler are operated on the server, wherein the virtual machine monitor is an intermediate software layer between the server and the virtual machine, and can allow a plurality of virtual machines to share the server. In this scenario, the server may execute step 101 through the virtual machine monitor, determine an error type of the memory error, report the detailed information and the error type of the memory error to the cloud scheduler through the virtual machine monitor, and execute step 102 and step 103 through the cloud scheduler.

In a possible manner, before step 101, if a memory error occurs, a memory error notification may be sent to the virtual machine monitor through an operating system kernel of the cloud server, and then when the virtual machine monitor receives the memory error notification, it is determined that the memory error is monitored. Thereafter, steps 101 to 103 may be performed.

That is, when a memory error occurs, the operating system kernel of the server notifies the virtual machine monitor, and when the virtual machine monitor receives the notification of the kernel, it is determined that the memory error is monitored.

It should be understood that in the cloud computing scenario, the memory of the server is divided into two parts, one part corresponds to the virtual machine, and the other part corresponds to the virtual machine monitor. Therefore, after the memory error is monitored, the memory address where the memory error occurs can be further judged, so that the error type can be determined according to the memory address.

For example, in a cloud computing service, some memory errors may cause a virtual machine monitor to stop running, but a virtual machine may continue to run, and some memory errors may cause a virtual machine process to crash, requiring the virtual machine to be restarted. Thus, the error types may include a crash error, which is a memory error that can crash the virtual machine, or a non-crash error, which is a memory error that can continue running the virtual machine.

In a possible manner, the error type of the memory error may be determined as follows: and if the memory error can be repaired through the memory erasure code, determining that the error type of the memory error is a non-crash error which can not cause the crash of the virtual machine, and if the memory error cannot be repaired through the memory erasure code, determining that the error type of the memory error is a crash error which can cause the crash of the virtual machine.

For example, the memory erasure code can be used to repair a memory error caused by an exception occurring in a bit in the memory. If the memory error can be repaired by the memory erasure code, the memory error is caused by the exception of a bit in the memory, and the virtual machine process cannot be crashed, so that the memory error can be determined to be a non-crash error. Conversely, if the memory error cannot be repaired by the memory erasure code, it indicates that the memory error may be caused by an exception occurring in a plurality of bits in the memory, and may eventually cause the virtual machine process to crash, so that the memory error may be determined to be a crash error.

In another possible manner, when the memory address belongs to the memory address range corresponding to the virtual machine, it may be determined whether a memory error can be injected into the virtual machine, if the memory error is successfully injected into the virtual machine, it is determined that the error type of the memory error is a non-crash error that does not cause the virtual machine to crash, and if the memory error is not successfully injected into the virtual machine, it is determined that the error type of the memory error is a crash error that causes the virtual machine to crash.

For example, if the memory address where the memory error occurred belongs to the virtual machine, the memory error may be caused by an application running on the virtual machine, and the memory error may be attempted to be injected into the virtual machine in order to further determine whether the memory error may cause the virtual machine process to crash. If the injection is successful, it indicates that the virtual machine can automatically repair the memory error, so that it can be determined that the memory error is a non-crash error. Otherwise, if the injection fails, it indicates that the virtual machine cannot automatically repair the memory error, so that it can be determined that the memory error is a crash error.

After the error type of the memory error is determined, the total error times of the memory error monitored by the virtual machine monitor can be determined, so that the cloud scheduler is controlled to schedule the corresponding scheduling service from the scheduling service library by combining the error type of the memory error and the total error times.

In a possible manner, if the error type of the memory error identifies that the memory error does not cause the virtual machine to crash, the cloud scheduler may be controlled to schedule the second scheduling service from the scheduling service library under the condition that the total error frequency reaches a preset threshold. The second scheduling service is used for controlling the virtual machine to be migrated from the original host to another host.

For example, the preset threshold may be set according to actual conditions, and the embodiment of the present disclosure does not limit this. The second scheduling service is used for controlling the virtual machine to be migrated from the original host to another host, for example, the virtual machine is migrated from the original host to another host in a hot migration or a cold migration manner according to whether the virtual machine crashes or not. The live migration is to completely save the running state of the virtual machine in the running process of the virtual machine, and simultaneously quickly restore the running state of the virtual machine to an original hardware platform or different hardware platforms, so that the virtual machine still runs smoothly after restoration, and a user cannot perceive any difference. The cold migration refers to the virtual machine migrating to another host machine when the virtual machine is in a shutdown state.

In the embodiment of the present disclosure, if the error type of the memory error identifies that the memory error does not cause the virtual machine to crash, that is, the memory error is a non-crash error, the virtual machine may continue to operate. Further, it may be determined whether the total number of errors reaches a preset threshold. And if the total error times do not reach the preset threshold value, continuously recording the total error times. If the total error times reach the preset threshold value, the host machine to which the virtual machine belongs is in an unhealthy state and is not suitable for normal operation of the virtual machine, and therefore the second scheduling service can be scheduled through the cloud scheduler, namely the virtual machine is migrated to another healthy host machine. In this case, since the virtual machine does not crash, the running state of the virtual machine can be live migrated to another healthy host machine while the virtual machine is running. The host to which the virtual machine belongs may be understood as the above-mentioned server that executes the method of the present disclosure, and the other host may be understood as another server that is different from the server and that can ensure the normal operation of the virtual machine. By the method, the normal operation of the virtual machine can be ensured, so that the normal operation of the cloud computing service is ensured.

In other possible manners, if the error type of the memory error identifies that the memory error may cause the virtual machine to crash, the cloud scheduler may be controlled to schedule the first scheduling service from the scheduling service library under the condition that the total error frequency does not reach the preset threshold, or the cloud scheduler may be controlled to schedule the second scheduling service from the scheduling service library under the condition that the total error frequency reaches the preset threshold.

For example, if the memory error is a crash error, it indicates that the memory error may crash the virtual machine process. In this case, if the total error frequency does not reach the preset threshold, it indicates that the host to which the virtual machine belongs is still in a healthy state, and the host can ensure normal operation of the virtual machine, so that the cloud scheduler can be controlled to schedule the first scheduling service from the scheduling service library, that is, the virtual machine is restarted on the original host to which the virtual machine belongs. If the total error frequency reaches a preset threshold value, it is indicated that the host machine to which the virtual machine belongs is in an unhealthy state, and the host machine cannot guarantee normal operation of the virtual machine, so that the cloud scheduler can be controlled to schedule the second scheduling service from the scheduling service library, that is, the virtual machine is migrated to another host machine. In this case, since the process of the virtual machine crashes, a cold migration method can be adopted.

In a possible approach, controlling the cloud scheduler to schedule the first scheduling service from the scheduling service library may be: determining whether the memory address with the memory error belongs to a large-page memory address, if not, controlling the cloud scheduler to schedule the first scheduling service from the scheduling service library, or if the memory address belongs to the large-page memory address, allocating the large-page memory for the original host to which the virtual machine belongs, and controlling the cloud scheduler to schedule the first scheduling service from the scheduling service library after allocating the large-page memory.

It should be appreciated that if a memory error occurs in the large page memory, the capacity of the large page memory is reduced. In this case, the first scheduling service is directly scheduled, and the virtual machine is restarted on the original host, which may cause a memory shortage. Therefore, in the embodiment of the present disclosure, it is first determined whether the memory address where the memory error occurs belongs to the large-page memory address.

If the memory address does not belong to the large-page memory address, the first scheduling service can be scheduled, the virtual machine is restarted on the original host machine to which the virtual machine belongs, and if the memory address belongs to the large-page memory address, the large-page memory can be allocated to the original host machine to which the virtual machine belongs, namely, the large-page memory of the original host machine is consistent with the large-page memory capacity before the memory error occurs. The first scheduling service may then be scheduled on the original host after the large page of memory is allocated to restart the virtual machine. Therefore, the problem of insufficient memory in the process of restarting the virtual machine can be avoided, and the normal operation of the restarted virtual machine is ensured.

The cloud service control method provided by the present disclosure is explained below by another exemplary embodiment. Referring to fig. 2, the cloud service control method includes:

step 201, determining that the virtual machine monitor monitors a memory error.

Step 202, under the condition that the memory address belongs to the memory address range corresponding to the virtual machine monitor, determining whether the memory error can be repaired by the memory erasure code, if so, executing step 203, otherwise, executing step 204.

In step 203, the memory error is determined to be a non-crash error, and step 206 is entered.

In step 204, the memory error is determined to be a crash error, and step 210 is entered.

Step 205, in a case that the memory address belongs to the memory address range corresponding to the virtual machine, determining whether a memory error can be injected into the virtual machine, if so, executing step 203, otherwise, executing step 204.

In step 206, it is determined whether the total error frequency of the memory errors reaches a preset threshold, if yes, step 207 is executed, otherwise step 208 is executed.

And step 207, scheduling a second scheduling service through the cloud scheduler, and performing hot migration on the virtual machine to another host machine.

In step 208, the total error times of the memory errors are continuously recorded.

In step 209, it is determined whether the total error frequency of the memory errors reaches a preset threshold, if yes, step 210 is executed, otherwise step 211 is executed.

In step 210, it is determined whether the memory address with the memory error belongs to the large page memory address, if so, step 212 is executed, otherwise, step 213 is executed.

And step 211, scheduling the second scheduling service through the cloud scheduler, and cold migrating the virtual machine to another host machine.

Step 212, allocating a large-page memory for the original host to which the virtual machine belongs, scheduling the first service after allocating the large-page memory, and restarting the virtual machine on the original host to which the virtual machine belongs.

Step 213, scheduling the first service, and restarting the virtual machine on the original host machine to which the virtual machine belongs.

The specific embodiments of the above steps have been exemplified in detail above, and are not described again here. It will also be appreciated that for simplicity of explanation, the above-described method embodiments are presented as a series of interrelated acts, although those skilled in the art will appreciate that the present disclosure is not limited by the order of acts described above. Further, those skilled in the art will also appreciate that the embodiments described above are preferred embodiments and that the steps involved are not necessarily required for the present disclosure.

It should be appreciated that in the management of large-scale cloud servers, the way of manually handling memory errors is inefficient and untimely, and directly restarting a virtual machine cannot effectively prevent the memory errors. Therefore, the dispatching service corresponding to automatic dispatching aiming at different memory error types is provided, the memory errors are timely and properly solved, the times of the memory errors occurring on the host are recorded so as to monitor the hardware health state of the host, and the virtual machine can be migrated as early as possible to prevent the service from being influenced.

Based on the same concept, the present disclosure also provides a cloud service control apparatus, which may become part or all of an electronic device through software, hardware, or a combination of both. Referring to fig. 3, the cloud service control apparatus 300 may include:

a first determining module 301, configured to determine, when a memory error is monitored, a memory address where the memory error occurs, and determine an error type of the memory error according to the memory address, where the error type is used to identify whether the memory error may cause a virtual machine to crash;

a second determining module 302, configured to determine a total number of errors of the monitored memory errors;

a third determining module 303, configured to control a cloud scheduler to schedule a corresponding scheduling service from a scheduling service library according to the error type of the memory error and the total error frequency, where the scheduling service library stores a first scheduling service and a second scheduling service, the first scheduling service is used to control restarting of a virtual machine on an original host to which the virtual machine belongs, and the second scheduling service is used to control migrating the virtual machine from the original host to another host.

Optionally, the third determining module 303 is configured to:

and when the error type of the memory error indicates that the memory error cannot cause the virtual machine to crash, controlling the cloud scheduler to schedule the second scheduling service from a scheduling service library under the condition that the total error times reach a preset threshold value.

Optionally, the third determining module 303 is configured to:

when the error type of the memory error identifies that the memory error can cause the virtual machine to crash, the cloud scheduler is controlled to schedule the first scheduling service from a scheduling service library under the condition that the total error frequency does not reach the preset threshold value, or the cloud scheduler is controlled to schedule the second scheduling service from the scheduling service library under the condition that the total error frequency reaches the preset threshold value.

Optionally, the third determining module 303 is configured to:

determining whether the memory address with the memory error belongs to a large-page memory address;

and when the memory address does not belong to a large-page memory address, controlling the cloud scheduler to schedule the first scheduling service from a scheduling service library, or when the memory address belongs to the large-page memory address, allocating a large-page memory to an original host to which the virtual machine belongs, and after the large-page memory is allocated, controlling the cloud scheduler to schedule the first scheduling service from the scheduling service library.

Optionally, the first determining module 301 is configured to:

determining whether the memory error can be repaired by a memory erasure code under the condition that the memory address belongs to a memory address range corresponding to the virtual machine monitor;

when the memory error can be repaired through the memory erasure code, determining that the error type of the memory error is a non-crash error which cannot cause the virtual machine to crash, and when the memory error cannot be repaired through the memory erasure code, determining that the error type of the memory error is a crash error which can cause the virtual machine to crash.

Optionally, the first determining module 301 is configured to:

determining whether the memory error can be injected into the virtual machine under the condition that the memory address belongs to a memory address range corresponding to the virtual machine;

and when the memory error is not successfully injected into the virtual machine, determining the error type of the memory error as a crash error which can cause the virtual machine to crash.

Optionally, the apparatus 300 further comprises:

the sending module is used for sending a memory error notification to the virtual machine monitor through an operating system kernel of the cloud server;

a fourth determining module, configured to determine that a memory error is monitored when the virtual machine monitor receives the memory error notification.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Based on the same concept, the present disclosure also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processing apparatus, implements the steps of any of the cloud service control methods described above.

Based on the same concept, the present disclosure also provides an electronic device, comprising:

a storage device having a computer program stored thereon;

and the processing device is used for executing the computer program in the storage device so as to realize the steps of any one of the cloud service control methods.

Referring now to FIG. 4, a block diagram of an electronic device 400 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication device 409 may allow the electronic device 400 to communicate with other devices, either wirelessly or by wire, to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing device 401, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the communication may be performed using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: when a memory error is monitored, determining a memory address of the memory error, and determining an error type of the memory error according to the memory address, wherein the error type is used for identifying whether the memory error can cause the crash of a virtual machine; determining the total error times of the monitored memory errors; and controlling a cloud scheduler to schedule corresponding scheduling services from a scheduling service library according to the error type of the memory error and the total error times, wherein the scheduling service library stores a first scheduling service and a second scheduling service, the first scheduling service is used for controlling the virtual machine to be restarted on an original host machine to which the virtual machine belongs, and the second scheduling service is used for controlling the virtual machine to be migrated from the original host machine to another host machine.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides a cloud service control method according to one or more embodiments of the present disclosure, including:

determining the total error times of the monitored memory errors;

Example 2 provides the method of example 1, wherein controlling the cloud scheduler to schedule the corresponding scheduling service from the scheduling service library according to the error type of the memory error and the total number of errors includes:

if the error type of the memory error indicates that the memory error cannot cause the virtual machine to crash, controlling the cloud scheduler to schedule the second scheduling service from a scheduling service library under the condition that the total error times reach a preset threshold value.

Example 3 provides the method of example 1, wherein controlling the cloud scheduler to schedule the corresponding scheduling service from the scheduling service library according to the error type of the memory error and the total number of errors includes:

if the error type of the memory error identifies that the memory error can cause the virtual machine to crash, the cloud scheduler is controlled to schedule the first scheduling service from a scheduling service library under the condition that the total error frequency does not reach the preset threshold value, or the cloud scheduler is controlled to schedule the second scheduling service from the scheduling service library under the condition that the total error frequency reaches the preset threshold value.

Example 4 provides the method of example 3, the controlling the cloud scheduler to schedule the first scheduling service from a scheduling service repository, including:

if the memory address does not belong to the large-page memory address, controlling the cloud scheduler to schedule the first scheduling service from a scheduling service library; or if the memory address belongs to a large-page memory address, allocating a large-page memory to the original host to which the virtual machine belongs, and controlling the cloud scheduler to schedule the first scheduling service from a scheduling service library after the large-page memory is allocated.

Example 5 provides the method of any one of examples 1-4, wherein determining the error type of the memory error from the memory address includes:

if the memory error can be repaired by the memory erasure code, determining that the error type of the memory error is a non-crash error which does not cause the virtual machine to crash, and if the memory error cannot be repaired by the memory erasure code, determining that the error type of the memory error is a crash error which causes the virtual machine to crash.

Example 6 provides the method of any one of examples 1-4, wherein determining the error type of the memory error from the memory address includes:

determining whether the memory error can be injected into the virtual machine under the condition that the memory address belongs to the memory address range corresponding to the virtual machine;

if the memory error is successfully injected into the virtual machine, determining that the error type of the memory error is a non-crash error which cannot cause the virtual machine to crash, and if the memory error is not successfully injected into the virtual machine, determining that the error type of the memory error is a crash error which can cause the virtual machine to crash.

Example 7 provides the method of any one of examples 1-4, further comprising, in accordance with one or more embodiments of the present disclosure:

sending a memory error notification to a virtual machine monitor through an operating system kernel of a cloud server;

when the virtual machine monitor receives the memory error notification, determining that a memory error is monitored.

Example 8 provides, in accordance with one or more embodiments of the present disclosure, a cloud service control apparatus, the apparatus comprising:

Example 9 provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-7, in accordance with one or more embodiments of the present disclosure.

Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-7.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A cloud service control method, the method comprising:

determining a total number of errors of the monitored memory errors;

2. The method according to claim 1, wherein the controlling a cloud scheduler to schedule a corresponding scheduling service from a scheduling service library according to the error type of the memory error and the total number of errors comprises:

3. The method according to claim 1, wherein the controlling a cloud scheduler to schedule a corresponding scheduling service from a scheduling service library according to the error type of the memory error and the total number of errors comprises:

4. The method of claim 3, wherein said controlling the cloud scheduler to schedule the first scheduled service from a schedule service library comprises:

if the memory address does not belong to the large-page memory address, controlling the cloud scheduler to schedule the first scheduling service from a scheduling service library; or

And if the memory address belongs to a large-page memory address, allocating the large-page memory to the original host machine to which the virtual machine belongs, and controlling the cloud scheduler to schedule the first scheduling service from a scheduling service library after the large-page memory is allocated.

5. The method according to any of claims 1-4, wherein determining the error type of the memory error based on the memory address comprises:

6. The method according to any of claims 1-4, wherein said determining an error type of said memory error based on said memory address comprises:

7. The method according to any one of claims 1-4, further comprising:

8. A cloud service control apparatus, characterized in that the apparatus comprises:

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by processing means, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.