CN112699002A - Method and equipment for multi-cloud resource alarm control - Google Patents

Method and equipment for multi-cloud resource alarm control Download PDF

Info

Publication number
CN112699002A
CN112699002A CN202011559612.6A CN202011559612A CN112699002A CN 112699002 A CN112699002 A CN 112699002A CN 202011559612 A CN202011559612 A CN 202011559612A CN 112699002 A CN112699002 A CN 112699002A
Authority
CN
China
Prior art keywords
information
alarm
cloud
metadata
cloud resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011559612.6A
Other languages
Chinese (zh)
Inventor
赵平
高海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lianwei Panyun Technology Co ltd
Original Assignee
Shanghai Lianwei Panyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianwei Panyun Technology Co ltd filed Critical Shanghai Lianwei Panyun Technology Co ltd
Priority to CN202011559612.6A priority Critical patent/CN112699002A/en
Publication of CN112699002A publication Critical patent/CN112699002A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a multi-cloud resource alarm control method, which is applied to a user terminal and comprises the following steps: acquiring identity verification information of a user, wherein the identity verification information matches with a plurality of public cloud account identification information corresponding to the identity verification information; respectively sending a plurality of metadata requests to a plurality of cloud end devices based on the public cloud account identification information; receiving a plurality of pieces of metadata sent by the plurality of cloud end devices based on the metadata requests, wherein the plurality of pieces of metadata comprise cloud resource information of corresponding public clouds; carrying out classified management on the cloud resource information; respectively pushing corresponding alarm rules to the plurality of cloud end devices according to the classified cloud resource information; receiving alarm information metadata sent by the plurality of cloud end devices based on the alarm rule, wherein the alarm information metadata comprise alarm information of corresponding public clouds; sequentially executing cleaning and persistent storage operations on the alarm information metadata; and optimizing an alarm rule according to the stored alarm information.

Description

Method and equipment for multi-cloud resource alarm control
Technical Field
The invention relates to the field of cloud computing, in particular to a method and equipment for multi-cloud resource alarm control.
Background
With the large-scale use of multiple public clouds by an enterprise, the types and the number of the used multiple cloud resources are continuously increased, and each cloud has one own set of alarm rules in the using process of the multiple cloud resources, so that an enterprise IT department, an IT management worker and an IT operation and maintenance worker have to be familiar with the alarm rule configuration and management of each cloud resource of each cloud platform, the difficulty of multiple cloud and multiple resource management is brought, the resources which are abnormally used are difficult to capture and accurately position, and the obstacle is increased for quick debugging.
Disclosure of Invention
In view of the problems in the prior art, the present invention provides a method for managing and controlling a multi-cloud resource alarm, which is applied to a user terminal, and the method includes:
acquiring identity verification information of a user, wherein the identity verification information matches with a plurality of public cloud account identification information corresponding to the identity verification information;
respectively sending a plurality of metadata requests to a plurality of cloud end devices based on the public cloud account identification information;
receiving a plurality of pieces of metadata sent by the plurality of cloud end devices based on the metadata requests, wherein the plurality of pieces of metadata comprise cloud resource information of corresponding public clouds;
carrying out classified management on the cloud resource information;
respectively pushing corresponding alarm rules to the plurality of cloud end devices according to the classified cloud resource information;
receiving alarm information metadata sent by the plurality of cloud end devices based on the alarm rule, wherein the alarm information metadata comprise alarm information of corresponding public clouds;
sequentially executing cleaning and persistent storage operations on the alarm information metadata;
and optimizing an alarm rule according to the stored alarm information.
Further, the step of sending a plurality of metadata requests to a plurality of cloud devices, respectively, includes:
and respectively sending a plurality of metadata requests to the plurality of cloud end devices based on a preset time interval.
Further, the pushing of the corresponding alarm rules to the plurality of cloud end devices according to the classified cloud resource information includes:
and determining pushed information by adopting online bipartite graph matching, wherein an alarm rule item is used as a known vertex, the classified cloud resource information is used as an online arrived vertex, and the matching quantity of the alarm rule item and the classified cloud resource information is maximized after the online vertex arrives.
Further, the pushing of the corresponding alarm rules to the plurality of cloud end devices according to the classified cloud resource information includes:
and adopting a self-defined pushing mode based on matrix decomposition, wherein the alarm rule items are classified according to characteristics to form a matrix.
Further, the cleaning operation of the alarm information metadata comprises cleaning operations which are sequentially performed according to the sequence of the cloud classification alarm information, the cloud resource classification alarm information and the alarm grade information.
Further, when the cleaning operation is performed on the alarm information metadata, marking alarm information meeting preset conditions; pushing the marked alarm information to the user.
Further, the optimizing alarm rules according to the stored alarm information includes:
and optimizing the alarm rule through strategy gradient, wherein a track curve of corresponding alarm rule item change is formed aiming at a specific environment according to the stored alarm information, so that the estimation of the gradient under the current parameter is obtained, and the threshold setting of the corresponding alarm rule item is further optimized.
Further, in response to an operation instruction input by a user at a single interface of the user terminal, executing corresponding operation based on at least one item of cloud resource information and/or alarm information.
The invention also provides a device for multi-cloud resource alarm control, which comprises:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the above-described method.
The present invention also provides a computer-readable medium storing instructions that, when executed, cause a system to perform the operations of the above-described method.
Compared with the prior art, the multi-cloud resource alarm control method and the equipment classify and persist the alarm information acquired from the multi-cloud resource, continuously optimize the alarm rules, and perform unified alarm information pushing and displaying; the method has the advantages that the enterprise IT personnel can quickly and accurately locate the abnormal alarm condition of the cloud resources in the multi-cloud and multi-account environment, the workload is reduced, and the working efficiency is improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a system architecture of one embodiment of the present invention;
fig. 2 shows a flow of a method for multi-cloud resource alarm management and control according to an embodiment of the present invention;
FIG. 3 is an illustration of a task queue in one embodiment of the invention;
FIG. 4 is an illustration of a current task in the task queue of FIG. 3;
FIG. 5 is a diagram illustrating the transition of task execution states according to one embodiment of the present invention;
FIG. 6 is a flow diagram illustrating the execution of queue tasks in one embodiment of the invention;
FIG. 7 is a flow diagram illustrating pushing of self-matching alarm rules, in accordance with an embodiment of the present invention;
FIG. 8 is a flow diagram of a custom push alert rule in one embodiment of the invention;
FIG. 9 is a schematic flow diagram of alarm information management in one embodiment of the present invention;
FIG. 10 is a schematic illustration of alert information reception in accordance with an embodiment of the present invention;
FIG. 11 is a schematic flow diagram of alarm message washing in one embodiment of the present invention;
FIG. 12 illustrates functional modules of an exemplary system that may be used in various embodiments of the invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
In a typical configuration of the invention, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.
The Memory may include forms of volatile Memory, Random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM), Flash Memory (Flash Memory) or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (Digital Versatile Disc, DVD) or other optical storage, magnetic tape or other magnetic or non-magnetic storage devices, may be used to store information that may be accessed by the computing device.
The device referred to in the present invention includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, etc. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a VPN Network, a wireless Ad Hoc Network (Ad Hoc Network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.
Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use with the present invention, are also within the scope of the present invention and are hereby incorporated by reference.
In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.
The embodiment firstly provides a system architecture for multi-cloud resource alarm management and control. As shown in fig. 1, the system adopts a B/S mode and a micro-service architecture, and the overall design is divided into four layers of structures, namely a user layer, a middle layer, a data layer and a cloud layer, wherein:
-a user layer: the user accesses the system through a PC computer or a third-party system.
-an intermediate layer: the expandability of the system is considered to carry out front-end and back-end separation design, distributed deployment can be carried out rapidly, a front-end page is deployed to a Web server independently, and a back-end application is deployed to an application server independently; the application service can construct cluster providing services, including unified security authentication, statistical analysis service, query service, visualization service, database access service, configuration service, timed task service, calculation service and the like, the interface server provides independent services for third parties, and the user layer performs data request interaction with the middle layer through Webservices or Restful in order to prevent the third parties from influencing a service system during interaction.
-a data layer: the database server can perform dual-computer hot standby, master-slave operation and the like, an independent cache server is added, and pages and common data are cached, so that the pressure of the database is relieved, the bottleneck of database reading and writing is solved, and the normal operation of the database is ensured.
-cloud layer: according to different cloud account information, a timing task is customized, an API (application program interface) or an SDK (software development kit) is requested to synchronize resources and receive native alarm data from clouds (Ariiyun, Azure, AWS and Tencent cloud) at regular time, and corresponding alarm rules are automatically pushed according to the characteristics of the clouds.
Based on the above framework, in particular, the embodiment provides a method for multi-cloud resource alarm management and control. The method is applied to a user terminal and is supported by corresponding network equipment (such as a cloud server). Referring to fig. 2, the method includes step S100, step S200, step S300, step S400, step S500, step S600, step S700, step S800, and step S900. The following describes a specific implementation of the present embodiment by taking a user terminal as an example.
Specifically, in step S100, the user terminal acquires authentication information of the user. For example, a user inputs his or her user identification (e.g., system account name) and authentication information (e.g., account password) at a user terminal.
In step S200, the user terminal matches a plurality of pieces of public cloud account identification information corresponding to the authentication information, where each piece of public cloud account identification information corresponds to a public cloud account. For example, a user account logged in by an administrator corresponds to a number of public cloud accounts managed by the administrator; in some cases, different administrators may manage different public cloud accounts for the same set of systems.
In step S300, the user terminal sends a plurality of metadata requests to the plurality of cloud devices, respectively, where each metadata request includes public cloud account identification information, and the public cloud account identification information is used to determine an access right of the user to a corresponding public cloud account. For example, the plurality of cloud devices respectively correspond to a plurality of different cloud platforms. The access right of a user to a certain cloud account is determined by related account information provided by the user in some embodiments, for example, aristo needs to obtain fields of entering accessKeyId and accessSecret, and Azure (cloud service platform provided by amazon) needs to obtain fields of entering subscribentid and clientSecret. And after the entry is successful, verifying whether the entered account is available.
In step S400, the user terminal receives a plurality of pieces of metadata sent by the plurality of cloud end devices based on the metadata request. Wherein the plurality of pieces of metadata include cloud resource information of the corresponding public cloud. In some embodiments, resource monitoring data synchronized by multiple cloud accounts is persisted and used as basic data for optimizing conservation analysis and calculation.
In step S500, the user terminal performs classification management on the cloud resource information. In some embodiments, the content of the metadata is first cleaned into a corresponding data structure according to different data cleaning rules of each cloud platform, including checking data consistency, processing invalid values and missing values, and the like.
In step S600, the user terminal respectively pushes corresponding alarm rules to the multiple cloud devices according to the classified cloud resource information. In some embodiments, the alarm information rule base is configured for setting, matching, invoking, etc. of alarm rules.
In step S700, the user terminal receives and manages the alarm information sent by the plurality of cloud devices based on the alarm rule, including a cleaning operation on metadata of the alarm information, a persistent storage operation on the alarm information, a pushing operation on the alarm information, and optimization of the alarm rule.
In step S800, in response to an operation instruction input by a user on a single interface of the user terminal, a corresponding operation is performed based on at least one item of cloud resource information and/or alarm information.
Therefore, the user can realize the management and control of the cloud resource alarm of the plurality of cloud accounts only in one single user interface, and does not need to enter each cloud account to perform monitoring management.
In some embodiments, in step S300, the user terminal sends a plurality of metadata requests to the plurality of network devices respectively based on a preset time interval. For example, after the user's authentication information is acquired, the system performs the above operations at regular intervals by itself, so as to reduce the operation burden of the user and improve the real-time performance of the local data.
In some embodiments, the step S300 includes a substep S310, a substep S320, a substep S330, and a substep S340 (not shown). In sub-step S310, the user terminal creates a task queue, where the task queue includes a plurality of metadata request tasks corresponding to the plurality of public cloud account identification information; in substep S320, the user terminal obtains a current task in the task queue and determines an executable state of the current task; in the substep S330, if the executable state of the current task is non-executable, the user terminal moves the current task to the tail of the task queue; in the sub-step S340, if the executable state of the current task is executable, the user terminal executes the current task to send a corresponding metadata request to a corresponding network device, and removes the current task after the current task is executed. In order to automatically execute some tasks and reduce the burden of an administrator, some tasks are provided with a cycle state, and the cycle state is used for representing whether the task needs to be automatically executed again after the task is executed at this time. Accordingly, in some embodiments, in sub-step S340, if the executable state of the current task is executable, the user terminal executes the current task to send a corresponding metadata request to a corresponding network device; if the cycle state of the current task is true, moving the current task to the tail of the task queue after the current task is executed; otherwise, the current task is removed after the current task is executed.
For example, the system manages information synchronization of various cloud account resources (basic resources, resource monitoring and the like); after the synchronization task is successfully created, the synchronization task is saved in a task queue to be executed, as shown in fig. 3. An account task in the task queue comprises a plurality of subtasks of resource information, each account task has only one state, the task state is divided into (executable, to be executed, in execution, execution completed, execution error), and the task records the next execution time and marks whether to loop the task (for example, 0-no, 1-yes). Referring to the task execution flow illustrated in fig. 6, the system first obtains the first task (i.e., "current task") in the queue, and determines whether it is executable, where fig. 4 illustrates an exemplary logical structure of the current task; if the current task can not be executed, the current task is moved to the tail of the queue, otherwise the current task is set to be in execution (so as to avoid conflict during parallel processing), the task is put into the execution thread pool, and the task is executed through the subtask concurrent coroutine. After the current task is executed, marking the completion state of the current task as 'completed'; and checking the cycle identifier, if the current task is a cycle task, still moving the task to the tail of the queue, and otherwise, removing the task from the queue. For ease of illustration, FIG. 5 illustrates the transition process between task states.
In some embodiments, in step S500, the ue classifies and manages the cloud resource information synchronized from (Azure, aricloud, AWS, Tencent cloud), and table 1 provides a cloud resource classification management table that can be used.
TABLE 1
Figure BDA0002859124740000081
Figure BDA0002859124740000091
In some embodiments, in step S600, the alarm rule is automatically matched by the user terminal and pushed to the multi-cloud resource event center, as shown in fig. 7.
And the user terminal adopts online matching to push so as to find out the maximum matching item, and the online bipartite graph matching is adopted. For bipartite graph G (U, V, E), U and V are two parts of a bipartite graph, E is an edge in G, where vertices in V are known in advance, vertices in U arrive online and are not known in advance, they need to decide on a match relationship with vertices in V once they arrive, and such relationship cannot be altered once determined. The matching goal is to maximize the number of matches when all vertices in U are reached.
Specifically, initialization: a permutation of V is randomly generated, V being a vertex known in advance.
Online: when U ∈ U arrives online, if N (U) is all the neighboring vertices of U that have not been matched yet, and N (U) ≠ 0, then the vertex with the lowest ranking of U and N (U) matches, the algorithm is:
Figure BDA0002859124740000092
wherein cr (competitive ratio) represents a competition ratio, which reflects whether a match is good or bad, and e is a probability of the match.
If the classified cloud resources in table 1 are used as the resources requiring pushing of the alarm rule, the resources are determined to be U, and if the alarm rule item is listed as V in table 2 by way of example, a permutation σ is generated according to the resources and the alarm rule item, so that matching between each resource and the alarm rule is obtained, and the alarm rule item which is generated by the user terminal and is matched with the resource is pushed to resource event centers in each cloud.
TABLE 2
Figure BDA0002859124740000093
Figure BDA0002859124740000101
In some embodiments, in step S600, the user terminal employs a custom alarm rule push, as shown in fig. 8.
The custom push alarm rule is based on a recommendation method (feature decomposition) of matrix decomposition.
The meaning of eigenvalue and eigenvector in matrix decomposition is basically defined as follows:
Ax=λx
where matrix a is an n × n matrix, x is an n-dimensional vector, λ is an eigenvalue of matrix a, and x is an eigenvector corresponding to the eigenvalue λ of matrix a. The geometric meaning of the feature vector is: the eigenvector x is only scaled by the square matrix a transform and the direction does not change.
If n eigenvalues of matrix a can be found, a diagonal matrix sigma can be obtained, which expands into the following form:
Figure BDA0002859124740000102
the matrix a can be represented by the eigen decomposition of:
A=U∑U-1
where U is the n × n dimensional matrix generated by the n eigenvectors and Σ is the n × n dimensional matrix with the n eigenvalues being the principal diagonals.
These n eigenvectors of U will typically be normalized, i.e., satisfy U-1=UTAt this point, the eigen-decomposition expression for matrix A is further written as:
A=UΣ'UT
the above decomposition is directed to the case where the number of matrix rows is uniform, and when there is a non-uniform case, Singular Value Decomposition (SVD) may be employed.
In one example, table 3 exemplarily lists alarm rule items classified according to features.
TABLE 3
Alarm rule item Unit of Feature(s)
host.cpu.total cpu
Pre-branching CPU integral count cpu
Consumed CPU credits count cpu
System disk Total read BPS Mbytes/s Magnetic disk
Disk read IOPS Second/second Magnetic disk
System disk write-through BPS Mbytes/s Magnetic disk
System write IOPS Second/second Magnetic disk
Public network ingress bandwidth Mbytes/s Network
Public network outgoing bandwidth Mbytes/s Network
Rate of use of outgoing bandwidth from public network Network
Inner network inflow bandwidth Mbytes/s Network
Bandwidth of intranet outflow Mbytes/s Network
Excess CPU integral count cpu
Integrating CPU integrals count cpu
Number of simultaneous ECS connections count Network
host.cpu.idle cpu
host.cpu.other cpu
host.cpu.system cpu
host.cpu.user cpu
host.cpu.iawait cpu
host.dist.readybytes Mbytes/s Magnetic disk
According to the alarm rule items classified by the features in the table 4, a matrix A is formed, wherein lambda is the feature (such as cpu, disk and network), and the recommendation method cuts the utilization rate of the rule items and keeps the minimum rule item matching number of the feature if the number of the rule items corresponding to the feature is not consistent under the condition that the number of the rule items corresponding to the feature is consistent.
In some embodiments, in step S700, the user terminal receives alarm information sent by the multiple cloud devices based on the pushed alarm rule, performs a cleaning operation on the received alarm information metadata, persistently stores the cleaned alarm information, and pushes the alarm information that needs to be pushed to the user, as shown in fig. 9.
Specifically, when the arri cloud, the flight cloud, Azure, and the AWS alarm, the alarm information receiving address provided by the user terminal is called, and the alarm information is pushed to the alarm platform of the user terminal through the Http/Http interface configured by the rule, for example, as shown in fig. 10.
And after receiving the alarm information, the user terminal cleans, automatically identifies, classifies and marks the alarm information. Specifically, the acquired unwashed metadata of the cloud alarm information enters an alarm information cleaning flow queue, and first, the cloud classification alarm information is cleaned, the cloud resource classification alarm information is cleaned again, and finally, the alarm level information is cleaned, as shown in fig. 11. If the alarm information meets the requirement of pushing the information, a pushing mark is marked on the alarm information.
And (4) entering the persistence queue for the alarm information which is cleaned, and sequentially carrying out persistence operation.
And acquiring the alarm information marked as push, putting the alarm information into a message push queue, pushing the alarm information according to a configured push rule, marking that the push is successful as completed, and putting the alarm information into the tail part of the queue if the push fails. And uniformly pushing the alarm information to be pushed according to different configured channels, wherein the pushing modes can be adopted, such as Email, short messages, WeChat, nailing, small programs and the like.
In some embodiments, in step S700, the user terminal further performs a unified alarm information rule base optimization calculation on the alarm information that has been persisted and has not been subjected to the optimization calculation, so that the rule base is richer, more accurate and more intelligent through the calculation model.
The method of the policy gradient is used here to optimize the alarm information rule base.
In the strategy gradient, the relation between the two states before and after is considered as St+1~P(St+1|St,at) In which S ist、St+1Are two states in succession, atIs the action taken at step t and P is the next time state distribution as determined by the environment. And act atIs at~πθ(At|St) In which piθA distribution with θ as a parameter, atSampling is performed from this distribution. Thus, under the same environment, the total profit function of reinforcement learning is:
Figure BDA0002859124740000121
is determined entirely by θ, where E () is a derivative of the gradient from 0 to T time, rtIs the state score at a certain time, ztIs a gradient at a certain time. The basic idea of strategic gradients is to optimize R (θ) directly with the gradient method.
The derivative of R (θ) is first calculated before using the gradient method. Let τ be the set of all states and actions (called a trace) from 0 to T times at a time, then R (θ) is E (R (τ)), where the function R computes the score for trace τ. There are R (θ) ═ E (R (τ)) ═ p-θ(τ) r (τ) d τ, and therefore,
Figure BDA0002859124740000131
in the last step in view of p (S)k+1|ak,Sk) Determined by the environment and thus independent of theta, and therefore
Figure BDA0002859124740000132
Each trace τ corresponds to a gradient of
Figure BDA0002859124740000133
Wherein Sk,akThe state and motion for each step on the trace τ. Thus, a strategy is givenθTraces can be obtained by simulation, and for each trace, the yield r (τ) and the < state, behavior > pairs for each step can be obtained, so that an estimate of the gradient at the current parameter can be obtained.
In some practical applications, the P environment may be a development environment, a test environment, a UAT environment or a production environment, and table 4 shows a P test environment with the alarm rule terms "CPU credits consumed" and "public network incoming bandwidth" as StTo optimize an alarm information rule base.
TABLE 4
Consumed CPU credits 1000 2000 2500 3000 6000
Public network ingress bandwidth 10% 20% 30% 50% 80%
atTaking action: marking Is normal Is normal Is normal Middle risk High risk
And forming a track curve according to the value obtained by the initial alarm rule item, thereby obtaining a gradient and achieving an optimal range of the alarm item threshold value setting of each resource, and further enabling the subsequent alarm rule item configuration to be more reasonable and accurate.
In some embodiments, in step S800, the user terminal detects an operation instruction input by a user at a single interface in a browser application of the user terminal; and responding to the operation instruction, and executing corresponding operation based on at least one item of cloud resource information and/or alarm information. Here, the corresponding operations include, but are not limited to, screening, displaying, imaging, summarizing, outputting, and the like of the cloud resource information and/or the alarm information. Fig. 10 illustrates an exemplary display interface of a user terminal, which may perform screening, multi-dimensional display, and the like of cloud resource information and/or alarm information through a selection key or a pull-down menu bar (corresponding to an operation instruction) in the interface.
The present embodiment also provides a computer program product, which when executed, performs the steps of
The method as in any one of the preceding claims is performed when executed by a computer device.
The present embodiment further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.
FIG. 12 illustrates an exemplary system that can be used to implement the various embodiments described in this disclosure.
As shown in fig. 12, in some embodiments, the system 1000 may be configured as any of the user terminal devices in the various embodiments described herein. In some embodiments, system 1000 may include one or more computer-readable media (e.g., system memory or NVM/storage 1020) having instructions and one or more processors (e.g., processor(s) 1005) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform actions described in this disclosure.
For one embodiment, system control module 1010 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1005 and/or to any suitable device or component in communication with system control module 1010.
The system control module 1010 may include a memory controller module 1030 to provide an interface to the system memory 1015. Memory controller module 1030 may be a hardware module, a software module, and/or a firmware module.
System memory 1015 may be used to load and store data and/or instructions, for example, for system 1000. For one embodiment, system memory 1015 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, system memory 1015 may include double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).
For one embodiment, system control module 1010 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 1020 and communication interface(s) 1025.
For example, NVM/storage 1020 may be used to store data and/or instructions. NVM/storage 1020 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).
NVM/storage 1020 may include storage resources that are physically part of a device on which system 1000 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 1020 may be accessed over a network via communication interface(s) 1025.
Communication interface(s) 1025 may provide an interface for system 1000 to communicate over one or more networks and/or with any other suitable device. System 1000 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.
For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic for one or more controller(s) of the system control module 1010, e.g., memory controller module 1030. For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic for one or more controller(s) of the system control module 1010 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic for one or more controller(s) of the system control module 1010. For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic of one or more controllers of the system control module 1010 to form a system on a chip (SoC).
In various embodiments, system 1000 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 1000 may have more or fewer components and/or different architectures. For example, in some embodiments, system 1000 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present invention can be applied as a computer program product, such as computer program instructions, which when executed by a computer, can invoke or provide the method and/or technical solution according to the present invention through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.
Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.
An embodiment according to the invention herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the invention as described above.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (10)

1. A method for multi-cloud resource alarm control is applied to a user terminal, and comprises the following steps:
acquiring identity verification information of a user, wherein the identity verification information matches with a plurality of public cloud account identification information corresponding to the identity verification information;
respectively sending a plurality of metadata requests to a plurality of cloud end devices based on the public cloud account identification information;
receiving a plurality of pieces of metadata sent by the plurality of cloud end devices based on the metadata requests, wherein the plurality of pieces of metadata comprise cloud resource information of corresponding public clouds;
carrying out classified management on the cloud resource information;
respectively pushing corresponding alarm rules to the plurality of cloud end devices according to the classified cloud resource information;
receiving alarm information metadata sent by the plurality of cloud end devices based on the alarm rule, wherein the alarm information metadata comprise alarm information of corresponding public clouds;
sequentially executing cleaning and persistent storage operations on the alarm information metadata;
and optimizing an alarm rule according to the stored alarm information.
2. The method of claim 1, wherein the step of sending the plurality of metadata requests to the plurality of cloud devices respectively comprises:
and respectively sending a plurality of metadata requests to the plurality of cloud end devices based on a preset time interval.
3. The method of claim 1, wherein the pushing the corresponding alarm rules to the plurality of cloud devices according to the classified cloud resource information comprises:
and determining pushed information by adopting online bipartite graph matching, wherein an alarm rule item is used as a known vertex, the classified cloud resource information is used as an online arrived vertex, and the matching quantity of the alarm rule item and the classified cloud resource information is maximized after the online vertex arrives.
4. The method of claim 1, wherein the pushing the corresponding alarm rules to the plurality of cloud devices according to the classified cloud resource information comprises:
and adopting a self-defined pushing mode based on matrix decomposition, wherein the alarm rule items are classified according to characteristics to form a matrix.
5. The method of claim 1, wherein the cleansing operation of the alert information metadata comprises sequentially performing cleansing operations in an order of cloud classification alert information, cloud resource classification alert information, and alert level information.
6. The method of claim 1, wherein alert information meeting a preset condition is flagged when a washing operation is performed on the alert information metadata; pushing the marked alarm information to the user.
7. The method of claim 1, wherein optimizing alarm rules based on stored alarm information comprises:
and optimizing the alarm rule through strategy gradient, wherein a track curve of corresponding alarm rule item change is formed aiming at a specific environment according to the stored alarm information, so that the estimation of the gradient under the current parameter is obtained, and the threshold setting of the corresponding alarm rule item is further optimized.
8. The method according to claim 1, wherein in response to an operation instruction input by a user at a single interface of the user terminal, a corresponding operation is performed based on at least one item of cloud resource information and/or alarm information.
9. An apparatus for cloud resource automation operation and maintenance in a multi-cloud environment, the apparatus comprising:
a processor; and
a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform operations according to the method of any one of claims 1 to 8.
10. A computer-readable medium storing instructions that, when executed, cause a system to perform operations according to the method of any one of claims 1 to 8.
CN202011559612.6A 2020-12-25 2020-12-25 Method and equipment for multi-cloud resource alarm control Pending CN112699002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011559612.6A CN112699002A (en) 2020-12-25 2020-12-25 Method and equipment for multi-cloud resource alarm control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559612.6A CN112699002A (en) 2020-12-25 2020-12-25 Method and equipment for multi-cloud resource alarm control

Publications (1)

Publication Number Publication Date
CN112699002A true CN112699002A (en) 2021-04-23

Family

ID=75510350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011559612.6A Pending CN112699002A (en) 2020-12-25 2020-12-25 Method and equipment for multi-cloud resource alarm control

Country Status (1)

Country Link
CN (1) CN112699002A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108696369A (en) * 2017-04-06 2018-10-23 华为技术有限公司 A kind of warning information processing equipment and method
CN109766247A (en) * 2018-12-19 2019-05-17 平安科技(深圳)有限公司 Alarm setting method and system based on system data monitoring
CN110879774A (en) * 2019-11-27 2020-03-13 北京天元创新科技有限公司 Network element performance data warning method and device
CN111049904A (en) * 2019-12-12 2020-04-21 上海联蔚信息科技有限公司 Method and equipment for monitoring multiple public cloud resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108696369A (en) * 2017-04-06 2018-10-23 华为技术有限公司 A kind of warning information processing equipment and method
CN109766247A (en) * 2018-12-19 2019-05-17 平安科技(深圳)有限公司 Alarm setting method and system based on system data monitoring
CN110879774A (en) * 2019-11-27 2020-03-13 北京天元创新科技有限公司 Network element performance data warning method and device
CN111049904A (en) * 2019-12-12 2020-04-21 上海联蔚信息科技有限公司 Method and equipment for monitoring multiple public cloud resources

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马庆祥 等: "基于Storm的实时报警服务的设计与实现", 信息技术, no. 12, 25 December 2016 (2016-12-25), pages 162 - 166 *

Similar Documents

Publication Publication Date Title
US11106994B1 (en) Tuning of machine learning models using accuracy metrics selected to increase performance
CN111049904A (en) Method and equipment for monitoring multiple public cloud resources
US20170177737A9 (en) Method, Controller, Program, and Data Storage System for Performing Reconciliation Processing
CN110719320B (en) Method and equipment for generating public cloud configuration adjustment information
CN110727664A (en) Method and device for executing target operation on public cloud data
CN113342500B (en) Task execution method, device, equipment and storage medium
US10182104B1 (en) Automatic propagation of resource attributes in a provider network according to propagation criteria
US11481394B2 (en) Elimination of measurement lag for operations across a large number of customer nodes
CN110704851A (en) Public cloud data processing method and device
US20220138557A1 (en) Deep Hybrid Graph-Based Forecasting Systems
US11157282B2 (en) Scaling performance across a large number of customer nodes
CN110008261B (en) External change detection
Al-Hashimi et al. Fog-cloud scheduling simulator for reinforcement learning algorithms
US11816020B2 (en) Online query execution using a big data framework
CN112699002A (en) Method and equipment for multi-cloud resource alarm control
CN112667468B (en) Method and equipment for cloud resource automatic operation and maintenance in multi-cloud environment
CN112769782A (en) Method and equipment for multi-cloud security baseline management
US11941421B1 (en) Evaluating and scaling a collection of isolated execution environments at a particular geographic location
US12047439B1 (en) System and method for management of workload distribution in shared environment
US20220129786A1 (en) Framework for rapidly prototyping federated learning algorithms
US20240177027A1 (en) System and method for managing inference model performance through proactive communication system analysis
US11811862B1 (en) System and method for management of workload distribution
US20240177179A1 (en) System and method for management of inference models of varying complexity
US20240152501A1 (en) Managing transaction consistency in distributed databases
Erkol et al. Identifying influential spreaders in noisy networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination