CN111736948A - Cloud computing platform automation operation and maintenance system and method, terminal device and storage medium - Google Patents

Cloud computing platform automation operation and maintenance system and method, terminal device and storage medium Download PDF

Info

Publication number
CN111736948A
CN111736948A CN202010430955.6A CN202010430955A CN111736948A CN 111736948 A CN111736948 A CN 111736948A CN 202010430955 A CN202010430955 A CN 202010430955A CN 111736948 A CN111736948 A CN 111736948A
Authority
CN
China
Prior art keywords
host
memory
cpu
maintenance
nova
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010430955.6A
Other languages
Chinese (zh)
Other versions
CN111736948B (en
Inventor
王洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inesa R&d Center
Original Assignee
Inesa R&d Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inesa R&d Center filed Critical Inesa R&d Center
Priority to CN202010430955.6A priority Critical patent/CN111736948B/en
Publication of CN111736948A publication Critical patent/CN111736948A/en
Application granted granted Critical
Publication of CN111736948B publication Critical patent/CN111736948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to an automatic operation and maintenance system, a method, terminal equipment and a storage medium of a cloud computing platform, wherein the system comprises an operation and maintenance control subsystem, a nova-computer service module and a nova-api service module, wherein the operation and maintenance control subsystem is used for selectively killing corresponding processes obtained by judgment under different conditions and sending a corresponding request to the nova-api service module when automatic heat transfer is required to be carried out on a part of virtual machines; the nova-computer service module is configured on a host on the cloud computing platform, and is used for marking and scoring all processes of the host, also over-scoring configured resources of the host on the cloud computing platform, and executing a request from the nova-api service module; and the nova-api service module is used for receiving the request from the operation and maintenance control subsystem and informing the nova-computer service module to execute the request. Compared with the prior art, the method has the advantages of improving the utilization rate of cloud resources, ensuring the stable operation of the system and the like.

Description

Cloud computing platform automation operation and maintenance system and method, terminal device and storage medium
Technical Field
The invention relates to the technical field of cloud computing, in particular to an automatic operation and maintenance system and method for a cloud computing platform, terminal equipment and a storage medium.
Background
In a cloud computing platform, a large number of user virtual machines are run in a host cluster. Most of the virtual machines have low resource occupancy rates, the platform often performs over-partitioning on the CPU and the memory resources of the host machine in order to save resources, where the over-partitioning refers to scheduling the virtual machine resources exceeding the real resources of the host machine, however, under the condition of resource over-partitioning, a large amount of resource consumption peak values intermittently appear in the virtual machine of the user, including a large amount of occupation of the CPU and the memory resources. This in turn can lead to the extrusion of host machine resources, which often causes memory errors or CPU performance degradation of the host machine, affecting the normal operation of the user virtual machine.
The resource occupancy rate of the user virtual machine is often uncontrollable, and if the CPU and the memory resources of the host are not configured with over-distribution, a large amount of resources are idle. If the CPU and memory resource configuration of the host machine is over-divided, the host machine resource extrusion can be intermittently caused when too much application is performed, and the user experience is influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an automatic operation and maintenance system, method, terminal equipment and storage medium for a cloud computing platform.
The purpose of the invention can be realized by the following technical scheme:
an automatic operation and maintenance system of a cloud computing platform is applied to the environment of the cloud computing platform and comprises an operation and maintenance control subsystem, a nova-computer service module and a nova-api service module, wherein:
the operation and maintenance control subsystem is used for sending a corresponding request to the nova-api service module when the corresponding process obtained by judgment under different conditions needs to be killed and when automatic thermal migration needs to be carried out on part of the virtual machines;
the nova-computer service module is configured on a host on the cloud computing platform, and is used for marking and scoring all processes of the host, also over-scoring configured resources of the host on the cloud computing platform, and executing a request from the nova-api service module;
and the nova-api service module is used for receiving the request from the operation and maintenance control subsystem and informing the nova-computer service module to execute the request.
Further, the cloud computing platform is an OpenStack.
Further, the operation and maintenance control subsystem further comprises a monitoring subsystem, wherein:
and the monitoring subsystem is used for monitoring the use conditions of the memory and the CPU resource of the host machine and recording the respective use rates of the CPU and the memory resource of the host machine and the corresponding marking and scoring conditions.
The invention also provides an operation and maintenance control method based on the cloud computing platform automation operation and maintenance system, which comprises the following steps:
step 1: configuring the nova-computer service module on a host machine on the cloud computing platform to mark all processes of the host machine and sequentially score according to the importance degree of the processes;
step 2: configuring the nova-computer service module to perform resource allocation over-division on a host on the cloud computing platform, wherein a CPU (Central processing Unit) and a memory of the host set an over-division ratio according to historical average load on the cloud computing platform;
and step 3: a monitoring subsystem in the operation and maintenance control subsystem monitors the CPU of the process on the host machine and the utilization rate of the memory and marks the scoring condition;
and 4, step 4: based on the monitored conditions of CPU (central processing unit) and memory utilization rate of the process on the host and the marking, the operation and maintenance control subsystem selectively kills the corresponding process judged under different conditions, sends a request for deleting the virtual machine on the host to the nova-api service module, and the nova-computer service module executes a deletion task to ensure the normal operation of the important process of the host;
and 5: based on the monitored CPU of the process on the host machine, the monitored memory utilization rate and the marked condition, the operation and maintenance control subsystem carries out automatic live migration on part of virtual machines on the host machine, sends a live migration request to the nova-api service module, and the nova-computer service module executes the live migration request to reduce the system load.
Further, the step 3 specifically includes:
the monitoring subsystem monitors the use condition of the memory and the CPU resource of the host machine, records the use rate of the memory and the CPU resource of the host machine, simultaneously records the use rates of the memory and the CPU of all processes in the host machine, and further respectively calculates and records the memory and the CPU judgment score of each process.
Further, the CPU of each process determines a score, which is described by the formula:
Cn=Sn*RCn
where Cn is the CPU determination score of the process number n, Sn is the score of the process number n, and RCn is the CPU utilization of the process number n.
The memory determination score for each process is described by the formula:
Mn=Sn*RMn
in the formula, Mn is a memory determination score of a process with a process number n, and RMn is a memory usage rate of the process with the process number n.
Further, in the step 4, based on the monitored conditions of the CPU and the memory usage rate of the process on the host and the marking, the process of selecting and killing the corresponding process determined under different conditions by the operation and maintenance control subsystem specifically includes:
setting basic operation and maintenance intervention thresholds of a CPU and a memory as C and M, and setting advanced operation and maintenance intervention thresholds as TC and TM;
if TC > RC > Tc, killing a process with the number of mc on the host, wherein mc is Max (C1, C2, … and Cn), C1, C2 and …, Cn is a 1 st to nth CPU basic operation and maintenance intervention threshold, RC is the CPU resource utilization rate of the host, and Tc is a CPU operation and maintenance intervention threshold;
if TM > RM > Tm, a process with the number of mm is killed on the host, wherein mm is Max (M1, M2, …, Mn), M1, M2, …, Mn is the 1 st to nth memory basic operation and maintenance intervention threshold, RM is the memory resource utilization rate of the host, and Tm is the memory operation and maintenance intervention threshold.
Further, based on the monitored CPU and memory usage rate of the process on the host and the marked condition in step 5, the process of performing automated live migration on part of virtual machines on the host by the operation and maintenance control subsystem specifically includes:
setting basic operation and maintenance intervention thresholds of a CPU and a memory as C and M, and setting advanced operation and maintenance intervention thresholds as TC and TM;
if TC < RC, performing thermal migration on a virtual machine with the number of tmc on a host, wherein tmc is Max (Ct1, Ct2, … and CtN), Ct1, Ct2, … and CtN are the t1 to the tN CPU basic operation and maintenance intervention thresholds;
and if TM < RM, performing hot migration on the virtual machine with the number of tmm on the host machine, wherein tmc is Max (Mt1, Mt2, … and MtN), Mt1, Mt2 and …, and MtN is the t1 th to the tN th memory base operation and maintenance intervention thresholds.
The invention also provides a terminal device, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the operation and maintenance control method based on the cloud computing platform automation operation and maintenance system when executing the computer program.
The invention also provides a computer readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the operation and maintenance control method based on the cloud computing platform automation operation and maintenance system.
Compared with the prior art, the invention has the following advantages:
(1) the cloud computing platform automation operation and maintenance system, the cloud computing platform automation operation and maintenance method, the terminal equipment and the storage medium are applied to a cloud computing environment, are used for configuring resource over-distribution for the host on the premise of not wasting platform resources, and meanwhile, avoid resource extrusion of the host and improve user experience.
(2) The invention monitors the CPU utilization rate of the host machine by the subsystem, under the condition of overhigh CPU or memory load, according to the CPU of the process on the host machine, the memory utilization rate and the marking condition, the process with low killing importance degree and high utilization rate is selected, the request for deleting the virtual machine is sent to the nova-api service, and the deletion task is executed by the nova-computer service, thereby ensuring the normal operation of the important process of the host machine, in particular the process of the virtual machine of the user.
(3) The invention carries out automatic hot migration on part of the virtual machines under the condition that the process resource occupation of the user virtual machine is too high and is far higher than the resource occupation of other processes of the host machine, thereby reducing the system load. And the live migration request is sent to the nova-api service and executed by the nova-computer service, so that the utilization rate of cloud resources is improved, and the stable operation of the system is ensured.
Drawings
FIG. 1 is a system architecture diagram of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
The invention aims to provide an automatic operation and maintenance system and method for a cloud computing platform, which can improve the resource utilization rate of the cloud platform, ensure the stable operation of the system and optimize the user experience.
The system architecture of the invention is shown in fig. 1, and based on the architecture, an automation operation and maintenance control system and method for an openstack cloud computing platform are provided, and applied to an openstack environment, wherein:
the system comprises an operation and maintenance control subsystem, a nova-computer service module and a nova-api service module, wherein:
the operation and maintenance control subsystem is used for sending a corresponding request to the nova-api service module when the corresponding process obtained by judgment under different conditions needs to be killed and when automatic thermal migration needs to be carried out on part of the virtual machines;
the nova-computer service module is configured on a host machine on the cloud computing platform and used for marking and scoring all processes of the host machine, also for over-scoring the configuration resources of the host machine on the cloud computing platform and executing a request from the nova-api service module;
the nova-api service module is used for receiving a request from the operation and maintenance control subsystem and informing the nova-computer service module to execute the request;
the operation and maintenance control subsystem further comprises a monitoring subsystem, wherein:
and the monitoring subsystem is used for monitoring the use conditions of the memory and the CPU resource of the host machine and recording the respective use rates of the CPU and the memory resource of the host machine and the corresponding marking and scoring conditions.
The method comprises the following steps of,
(1) and configuring a nova-computer service on a host machine of the cloud platform to mark and score all processes of the host machine. The process is scored according to the importance level, the important process is scored low, and the unimportant process is scored high.
(2) And the configuration nova-computer service performs over-scoring on the configuration resources of the host of the cloud platform, and the CPU and the memory can set an over-scoring proportion according to the historical average load on the platform.
(3) The monitoring subsystem monitors the CPU utilization rate of the host machine, under the condition that the CPU or the memory load is too high, according to the CPU of the process on the host machine and the conditions of memory utilization rate and mark scoring, the process with low killing importance degree and high utilization rate is selected, the request for deleting the virtual machine is sent to the nova-api service, and the deletion task is executed by the nova-computer service, so that the important process of the host machine, particularly the normal operation of the process of the virtual machine of the user is ensured.
(4) Under the condition that the process resource occupation of the user virtual machine is too high and is far higher than the resource occupation of other processes of the host machine, automatic hot migration is carried out on part of the virtual machine, and the system load is reduced. The live migration request is sent to the nova-api service and executed by the nova-computer service.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
(1) In a cloud platform environment, a system administrator marks and scores all processes (with process number of 1 … … N) on a host, and the process mark value with process number of N is marked as Sn. Important processes score low, such as 0, and unimportant processes score high, such as 10. In principle, all the corresponding processes of the user virtual machine should be marked as 0 point, and the virtual machine which is not considered to be important by a system administrator can also be marked with high point.
(2) The Nova-computer service performs over-scoring on the configuration resources of the host of the cloud platform, and the CPU and the memory can set an over-scoring proportion according to the historical average load on the platform. The commonly used memory overcut ratio is 1.5, and the CPU overcut ratio is 3 times. This may improve the utilization of system resources. That is to say, when the cloud platform is scheduled, it is considered that the host has 1.5 times of the actual memory and 3 times of the real CPU amount. In this case, the host can carry more virtual machine load. But because the virtual machine and CPU resource of the user often cannot reach 100% utilization rate, the normal operation of the host machine is not influenced.
(3) The monitoring subsystem monitors the use conditions of the memory and the CPU resource of the host machine and records the use rates of the CPU and the memory resource of the host machine as RC and RM. Meanwhile, the CPU and memory usage rates of all processes (numbered 1 … … N) in the host are recorded, the CPU usage rate of the process numbered N is recorded as RCn, and the memory usage rate of the process numbered N is recorded as RMn. The CPU judgment score Cn ═ Sn × RCn and the memory judgment score Mn ═ Sn × RMn for each process are recorded.
(4) The operation and maintenance control subsystem records the process numbers t1, t2, … … and tN corresponding to the virtual machines in the host machine, and then the ratio of the CPU consumption of the virtual machines in the host machine to the total CPU consumption of the host machine
Figure BDA0002500562500000061
Interest rate of memory consumption of virtual machine in host machine in total memory consumption of host machine
Figure BDA0002500562500000062
(5) Setting basic operation and maintenance intervention thresholds of a CPU and a memory as C and M, and setting advanced operation and maintenance intervention thresholds as TC and TM;
if TC > RC > Tc, killing a process with the number of mc on the host, wherein mc is Max (C1, C2, … and Cn), C1, C2 and …, Cn is a 1 st to nth CPU basic operation and maintenance intervention threshold, RC is the CPU resource utilization rate of the host, and Tc is a CPU operation and maintenance intervention threshold;
if TM > RM > Tm, killing a process with the number of mm on a host, wherein mm is Max (M1, M2, …, Mn), M1, M2, …, Mn is a 1 st to nth memory basic operation and maintenance intervention threshold, RM is the memory resource utilization rate of the host, and Tm is a memory operation and maintenance intervention threshold;
if TC < RC, performing thermal migration on a virtual machine with the number of tmc on a host, wherein tmc is Max (Ct1, Ct2, … and CtN), Ct1, Ct2, … and CtN are the t1 to the tN CPU basic operation and maintenance intervention thresholds;
and if TM < RM, performing hot migration on the virtual machine with the number of tmm on the host machine, wherein tmc is Max (Mt1, Mt2, … and MtN), Mt1, Mt2 and …, and MtN is the t1 th to the tN th memory base operation and maintenance intervention thresholds.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An automatic operation and maintenance system of a cloud computing platform is applied to the environment of the cloud computing platform, and is characterized by comprising an operation and maintenance control subsystem, a nova-computer service module and a nova-api service module, wherein:
the operation and maintenance control subsystem is used for sending a corresponding request to the nova-api service module when the corresponding process obtained by judgment under different conditions needs to be killed and when automatic thermal migration needs to be carried out on part of the virtual machines;
the nova-computer service module is configured on a host on the cloud computing platform, and is used for marking and scoring all processes of the host, also over-scoring configured resources of the host on the cloud computing platform, and executing a request from the nova-api service module;
and the nova-api service module is used for receiving the request from the operation and maintenance control subsystem and informing the nova-computer service module to execute the request.
2. The cloud computing platform automation operation and maintenance system of claim 1, wherein the cloud computing platform is an OpenStack.
3. The cloud computing platform automation operation and maintenance system of claim 1, wherein the operation and maintenance control subsystem further comprises a monitoring subsystem, wherein:
and the monitoring subsystem is used for monitoring the use conditions of the memory and the CPU resource of the host machine and recording the respective use rates of the CPU and the memory resource of the host machine and the corresponding marking and scoring conditions.
4. The operation and maintenance control method of the cloud computing platform automation operation and maintenance system according to claim 1, characterized by comprising the following steps:
step 1: configuring the nova-computer service module on a host machine on the cloud computing platform to mark all processes of the host machine and sequentially score according to the importance degree of the processes;
step 2: configuring the nova-computer service module to perform resource allocation over-division on a host on the cloud computing platform, wherein a CPU (Central processing Unit) and a memory of the host set an over-division ratio according to historical average load on the cloud computing platform;
and step 3: a monitoring subsystem in the operation and maintenance control subsystem monitors the CPU of the process on the host machine and the utilization rate of the memory and marks the scoring condition;
and 4, step 4: based on the monitored conditions of CPU (central processing unit) and memory utilization rate of the process on the host and the marking, the operation and maintenance control subsystem selectively kills the corresponding process judged under different conditions, sends a request for deleting the virtual machine on the host to the nova-api service module, and the nova-computer service module executes a deletion task to ensure the normal operation of the important process of the host;
and 5: based on the monitored CPU of the process on the host machine, the monitored memory utilization rate and the marked condition, the operation and maintenance control subsystem carries out automatic live migration on part of virtual machines on the host machine, sends a live migration request to the nova-api service module, and the nova-computer service module executes the live migration request to reduce the system load.
5. The operation and maintenance control method based on the cloud computing platform automation operation and maintenance system according to claim 4, wherein the step 3 specifically includes:
the monitoring subsystem monitors the use condition of the memory and the CPU resource of the host machine, records the use rate of the memory and the CPU resource of the host machine, simultaneously records the use rates of the memory and the CPU of all processes in the host machine, and further respectively calculates and records the memory and the CPU judgment score of each process.
6. The operation and maintenance control method based on the cloud computing platform automation operation and maintenance system according to claim 5, wherein the CPU of each process determines a score according to a description formula:
Cn=Sn*RCn
where Cn is the CPU determination score of the process number n, Sn is the score of the process number n, and RCn is the CPU utilization of the process number n.
The memory determination score for each process is described by the formula:
Mn=Sn*RMn
in the formula, Mn is a memory determination score of a process with a process number n, and RMn is a memory usage rate of the process with the process number n.
7. The operation and maintenance control method based on the cloud computing platform automation operation and maintenance system according to claim 4, wherein the operation and maintenance control subsystem selects and kills the processes judged under different conditions based on the monitored conditions of the CPU and the memory usage rate of the processes on the host and the marking and scoring in the step 4, and the process specifically comprises:
setting basic operation and maintenance intervention thresholds of a CPU and a memory as C and M, and setting advanced operation and maintenance intervention thresholds as TC and TM;
if TC > RC > Tc, killing a process with the number of mc on the host, wherein mc is Max (C1, C2, … and Cn), C1, C2 and …, Cn is a 1 st to nth CPU basic operation and maintenance intervention threshold, RC is the CPU resource utilization rate of the host, and Tc is a CPU operation and maintenance intervention threshold;
if TM > RM > Tm, a process with the number of mm is killed on the host, wherein mm is Max (M1, M2, …, Mn), M1, M2, …, Mn is the 1 st to nth memory basic operation and maintenance intervention threshold, RM is the memory resource utilization rate of the host, and Tm is the memory operation and maintenance intervention threshold.
8. The operation and maintenance control method based on the cloud computing platform automation operation and maintenance system according to claim 4, wherein the operation and maintenance control subsystem performs an automated live migration process on part of virtual machines on the host machine based on the monitored CPU and memory usage rate of the processes on the host machine and the marked condition in the step 5 specifically includes:
setting basic operation and maintenance intervention thresholds of a CPU and a memory as C and M, and setting advanced operation and maintenance intervention thresholds as TC and TM;
if TC < RC, performing thermal migration on a virtual machine with the number of tmc on a host, wherein tmc is Max (Ct1, Ct2, … and CtN), Ct1, Ct2, … and CtN are the t1 to the tN CPU basic operation and maintenance intervention thresholds;
and if TM < RM, performing hot migration on the virtual machine with the number of tmm on the host machine, wherein tmc is Max (Mt1, Mt2, … and MtN), Mt1, Mt2 and …, and MtN is the t1 th to the tN th memory base operation and maintenance intervention thresholds.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor implements the operation and maintenance control method of the cloud computing platform-based automation operation and maintenance system according to any one of claims 4 to 8 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the steps of the operation and maintenance control method of the cloud computing platform-based automation operation and maintenance system according to any one of claims 4 to 8.
CN202010430955.6A 2020-05-20 2020-05-20 Cloud computing platform automatic operation and maintenance system, method, terminal equipment and storage medium Active CN111736948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430955.6A CN111736948B (en) 2020-05-20 2020-05-20 Cloud computing platform automatic operation and maintenance system, method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430955.6A CN111736948B (en) 2020-05-20 2020-05-20 Cloud computing platform automatic operation and maintenance system, method, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111736948A true CN111736948A (en) 2020-10-02
CN111736948B CN111736948B (en) 2023-10-31

Family

ID=72647438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430955.6A Active CN111736948B (en) 2020-05-20 2020-05-20 Cloud computing platform automatic operation and maintenance system, method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111736948B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732357A (en) * 2021-01-12 2021-04-30 中国科学技术大学 CPU over-partition rate configuration method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020936A (en) * 2016-06-07 2016-10-12 深圳证券通信有限公司 Virtual machine dispatching method and device for financial cloud platform on basis of operating loads
CN107589994A (en) * 2017-08-16 2018-01-16 深圳市爱培科技术股份有限公司 Method, equipment, system and the storage medium of application process priority management
US20190281112A1 (en) * 2018-03-08 2019-09-12 Nutanix, Inc. System and method for orchestrating cloud platform operations
CN110704167A (en) * 2019-10-09 2020-01-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for creating virtual machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020936A (en) * 2016-06-07 2016-10-12 深圳证券通信有限公司 Virtual machine dispatching method and device for financial cloud platform on basis of operating loads
CN107589994A (en) * 2017-08-16 2018-01-16 深圳市爱培科技术股份有限公司 Method, equipment, system and the storage medium of application process priority management
US20190281112A1 (en) * 2018-03-08 2019-09-12 Nutanix, Inc. System and method for orchestrating cloud platform operations
CN110704167A (en) * 2019-10-09 2020-01-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for creating virtual machine

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732357A (en) * 2021-01-12 2021-04-30 中国科学技术大学 CPU over-partition rate configuration method and device

Also Published As

Publication number Publication date
CN111736948B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN109586952B (en) Server capacity expansion method and device
WO2015023191A1 (en) Power balancing to increase workload density and improve energy efficiency
CN104901989B (en) A kind of Site Service offer system and method
CN109491788B (en) Method and device for realizing load balance of virtualization platform
CN111966289B (en) Partition optimization method and system based on Kafka cluster
CN111277640B (en) User request processing method, device, system, computer equipment and storage medium
CN112188551B (en) Computation migration method, computation terminal equipment and edge server equipment
CN110647392A (en) Intelligent elastic expansion method based on container cluster
CN111736948A (en) Cloud computing platform automation operation and maintenance system and method, terminal device and storage medium
CN114867065A (en) Base station computing force load balancing method, equipment and storage medium
WO2021043146A1 (en) Detection method, apparatus and system
CN113672345A (en) IO prediction-based cloud virtualization engine distributed resource scheduling method
CN110445824A (en) NB-IoT data reporting method, device, system and computer readable storage medium
CN105740077B (en) Task allocation method suitable for cloud computing
CN110389958B (en) Method and device for dynamically adjusting aging frequency of hardware table entry and computer storage medium
CN105893150B (en) Interface calling frequency control method and device and interface calling request processing method and device
CN110609758A (en) Queue-based device operating method, computer device and readable storage medium
CN113448747B (en) Data transmission method, device, computer equipment and storage medium
CN110362448A (en) A kind of GPU management-control method and relevant apparatus
CN114153553A (en) High-availability control method and system for virtual machine and related components
CN114911667A (en) Monitoring data acquisition method, system and storage medium
JP5743334B2 (en) Congestion control device
CN111158899A (en) Data acquisition method, data acquisition device, task management center and task management system
CN112328462A (en) Server health state evaluation-based server capacity expansion method and system
CN114007246B (en) Method, apparatus, computer device and medium for reducing network congestion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant