CN117149587A

CN117149587A - Monitoring ledger management method, device, storage medium and equipment

Info

Publication number: CN117149587A
Application number: CN202311095268.3A
Authority: CN
Inventors: 刘昌峻; 王洋; 赵林海; 黎明
Original assignee: Youwei Technology Shenzhen Co ltd; China Merchants Fund Management Co ltd
Current assignee: Youwei Technology Shenzhen Co ltd; China Merchants Fund Management Co ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2023-12-01
Anticipated expiration: 2043-08-28
Also published as: CN117149587B

Abstract

The invention discloses a monitoring ledger management method, a device, a storage medium and equipment, wherein the method comprises the following steps: creating a monitoring template corresponding to various monitored objects; sequentially deploying the monitoring templates in a development environment, a testing environment and a production environment to form various versions of the monitoring templates in various environments; deploying the monitoring templates of the corresponding versions in the monitored object, and performing alarm simulation; assigning unique event feature IDs to the same type of alarm event of each monitored object, sending the alarm event to a notifier after an alarm occurs, and predicting corresponding alarm scenes according to the combination of different event feature IDs of different monitored objects; and designing a corresponding treatment scheme of the standardized flow according to different alarm scenes, and automatically executing the corresponding treatment scheme when an alarm occurs. According to the invention, the fine management of the monitoring object can be realized, and the cost of an enterprise in the aspect of monitoring management is reduced.

Description

Monitoring ledger management method, device, storage medium and equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for managing a monitoring ledger, a storage medium, and a device.

Background

In recent years, with the rising of cloud primordia, in order to discover system problems in advance or as early as possible from an infrastructure layer to middleware to a business system layer in the construction of an IT system, the range, the type and the index of objects to be monitored are more and more, the requirements are more and more complex, and the existing monitoring system has the following five problems in design:

firstly, when the monitoring system is configured for one type of object, the design is carried out for the monitoring indexes instead of the monitoring object, so that a plurality of indexes of the monitored object are required to be configured singly one by one, and the efficiency is low. For example, when resource monitoring of a cloud host needs to be configured, if four indexes of a CPU, a memory, a hard disk capacity and an IO need to be monitored, the four indexes need to be configured one by one, each index needs to be subjected to the steps of filling in a monitoring item name, selecting a specific monitoring object instance range, setting an alarm threshold value, setting a notification object and the like, the process needs to be repeated four times, and if N indexes of the object need to be monitored, the process needs to be repeated N times, so that the efficiency is low, and the configuration party is extremely unfriendly.

Secondly, the index type and threshold value currently configured on the monitored object lack version management. This brings a problem that when the monitoring index and threshold of an already configured monitoring object instance need to be adjusted, the instance needs to be cancelled from the original monitoring policy, then a new monitoring item and threshold are configured, the whole process is complicated, and the reason of the initial configuration, the reason of each adjustment, the adjustment range and whether the adjustment is reasonable is not recorded, thereby bringing about the improvement of management cost.

Third, the current monitoring system does not have a monitoring pre-notification mechanism. In actual work, many times, due to the problem of alarm notification configuration, the person who receives the notification does not receive the alarm notification, but the person irrelevant to the alarm event receives the alarm, so that the alarm processing time is delayed, and a larger problem may be caused. The reason for the problem is that the configuration data is inaccurate, but if the monitoring system has a pre-notification function, namely, when the alarm does not actually occur, the alarm is sent to related personnel in a simulation mode, and the related personnel confirms the object of the alarm, so that the configuration data can be fed back, and the notification precision of the alarm when the alarm actually occurs is improved.

Fourth, the event feature ID management mechanism is lacking. The current monitoring platform has an event code for each alarm event, but each event code is different, because the code is the ID of each event, only represents the occurrence of one event, the characteristics of the event are not abstracted, and the alarm scene analysis can not be performed based on one type of characteristic event, so that the analysis value brought by each event occurrence is greatly limited.

Fifthly, an alarm automation treatment mechanism based on a standardized flow is lacked. In the current practical work, due to lack of automatic treatment for the standardized operation flow of the alarm event, related personnel can normally carry out a series of standardized checks on a system in charge of the related personnel after receiving the alarm notification, and then further carry out analysis of the alarm problem.

Disclosure of Invention

In view of the above technical problems, the invention provides a monitoring ledger management method, a device, a storage medium and equipment, which aim to solve the problems of complicated index and low efficiency of configuration of a monitoring object caused by lack of a monitoring template facing the monitoring object when a current monitoring system is designed; the problem of raising the monitoring configuration management cost due to the lack of version management of the monitoring object-oriented configuration monitoring item and the threshold value; the problem that alarm notification is inaccurate due to the lack of an advance alarm dial testing mechanism, so that the alarm event is affected to be processed in time is solved; the problem that the root cause analysis is affected due to the fact that the alarm scene analysis cannot be carried out by fully utilizing the event characteristics due to the lack of event characteristic management; the problem that the mechanism for carrying out automatic treatment based on event characteristic driving is lacking, so that the labor cost is increased and the alarm positioning time is prolonged is solved.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the present invention, there is provided a monitoring ledger management method, the method including:

creating a monitoring template corresponding to various monitored objects, wherein the monitoring template at least comprises one of a first monitoring item set, a first creator, a first creation time, a first creation reason, a first environment type, a first deployment coverage rate, a father template, first adjustment content and a first notifier, the first monitoring item set comprises a monitoring strategy related to the monitored objects and a corresponding threshold value, and the first adjustment content is adjusted to the father template;

sequentially deploying the monitoring templates in a development environment, a testing environment and a production environment to form versions of the monitoring templates in the environments;

disposing the monitoring templates of the corresponding versions in the monitored object, and performing alarm simulation;

assigning unique event feature IDs to the same type of alarm event of each monitored object, sending the alarm event to a notifier after an alarm occurs, and predicting corresponding alarm scenes according to the combination of different event feature IDs of different monitored objects;

And designing a corresponding treatment scheme of the standardized flow according to different alarm scenes, and automatically executing the corresponding treatment scheme when an alarm occurs.

Further, the monitored object at least comprises one of a host, a middleware, a network, a database and an application, and the first monitoring item set at least comprises one of monitoring items of the host, the middleware, the network, the database and the application.

Further, the initial version of the monitoring template of the other environment than the development environment is summarized by any version of the superior environment, and the non-initial version of the monitoring template in any environment is modified by any version of the superior environment.

Further, the monitoring template of any version records at least one of a second creator, a second creation reason, a second creation time, a father version, a second monitoring item set, a second adjustment content, a second deployment coverage rate, a second notifier and a second environment type, and the second monitoring item set at least comprises one of a host, middleware, a network, a database and an applied monitoring item.

Further, after the monitoring templates are deployed in the corresponding monitored objects, an alarm list for the notifier to inquire is formed for each alarm in the monitoring templates, the alarm simulation is carried out, any monitoring item is regulated to exceed the corresponding threshold value so as to verify whether each alarm is correctly pushed to the corresponding notifier, and when errors occur in the notifier or alarm items, error information is regulated in the CMDB.

Further, the combinations of different of the event feature IDs are time ordered, and the relationship of the time feature IDs includes at least one of "and", "or".

Further, after the corresponding treatment scheme is automatically executed, the alarm which is not treated is manually treated.

According to a second aspect of the present disclosure, there is provided a monitoring ledger administration apparatus comprising:

the system comprises a monitoring template design module, a first management module and a second management module, wherein the monitoring template design module is used for creating monitoring templates corresponding to various monitored objects, the monitoring templates at least comprise one of a first monitoring item set, a first creator, first creation time, first creation reason, first environment type, first deployment coverage rate, a father template, first adjustment content and a first notifier, the first monitoring item set comprises a monitoring strategy related to the monitored objects and a corresponding threshold value, and the first adjustment content is adjusted to the father template;

the monitoring template devots management module is used for sequentially deploying the monitoring templates in a development environment, a testing environment and a production environment to form versions of the monitoring templates in various environments;

The monitoring pre-alarm notification management module is used for deploying the monitoring templates of the corresponding versions in the monitored object to perform alarm simulation;

the event feature ID management module is used for assigning a unique event feature ID to the same type of alarm event of each monitored object, sending the event feature ID to a notifier after an alarm occurs, and predicting a corresponding alarm scene according to the combination of different event feature IDs of different monitored objects;

and the alarm automatic treatment module is used for designing a corresponding treatment scheme according to different alarm scenes, and automatically executing the corresponding treatment scheme when an alarm occurs.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements a monitoring ledger administration method as described above.

According to a fourth aspect of the present disclosure, there is provided a monitoring ledger administration apparatus comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the monitoring ledger administration method described above.

The technical scheme of the present disclosure has the following beneficial effects:

1. the monitoring ledger of the present disclosure realizes the fine operation management from the monitoring object template to the version devots management to the monitoring event feature combination and the automatic treatment of the monitoring items, and solves the problems of rough management and fault caused by the current monitoring focusing only on the monitoring alarm itself;

2. and the efficiency, coverage statistics and traceability of the configuration monitoring of the monitored object are improved. In the method, the efficiency of monitoring configuration by using the template pairs of each type of monitored object can be greatly improved, the coverage rate of each monitoring template can be tracked, the monitored object can be reversely checked out that the monitored object is covered by the template, and the reason for configuring the monitoring template can be also tracked;

3. and the accurate and standardized management of the monitoring template is supported. The monitoring template devops management method can meet the personalized and accurate requirements of various roles of development, test and operation staff on the monitoring template, manages the monitoring template based on the devops concept, and realizes the advanced deployment of the monitoring template from development to test to production, so that the deployment and use of the monitoring template in the production environment are more standard;

4. The accuracy of monitoring alarm notification and optimizing CMDB data are improved. By using the monitoring pre-alarm notification mechanism, the alarm receiver is notified in a mode of simulating an alarm threshold value when an alarm event does not actually occur, whether an alarm sending target is accurate or not is found in advance, and therefore the accuracy of notification when the alarm event actually occurs is achieved. Meanwhile, if the notification has errors, the data in the CMDB can be synchronously modified, so that the effect of the accuracy of the data of the back feeding CMDB is achieved;

5. and the problem positioning and root cause analysis capability when the alarm occurs are improved. Based on the alarm event feature ID management method disclosed by the invention, problems or fault root causes can be associated with a group of alarm event feature ID combinations according to historical experience, when a specific alarm event feature ID combination is generated, a specific fault scene is corresponding, the corresponding problems or fault root causes can be quickly found, and analysis is carried out on the root causes based on a historical experience library, so that the problem positioning and root cause analysis capability when faults occur is improved;

6. and the alarm fault handling efficiency is improved. According to the alarm automation treatment method based on the standardized flow in the disclosure, when an alarm event occurs or a fault scene corresponding to the event feature ID combination is triggered, automatic fault self-healing can be realized or the original manual fixing operation is changed into automatic execution, so that the efficiency of treating alarm faults is improved.

Drawings

FIG. 1 is a flow chart of a method of monitoring ledger administration in an embodiment of the present disclosure;

FIG. 2 is a monitoring logic diagram of a monitoring ledger in an embodiment of the present disclosure;

FIG. 3 is an exemplary presentation of different monitoring templates in an embodiment of the present description;

FIG. 4 is a devots flow chart of a monitoring template in an embodiment of the present disclosure;

FIG. 5 is an iterative exemplary presentation of two versions of a monitoring template in an embodiment of the present disclosure;

FIG. 6 is a diagram showing information recorded after a version of an exemplary newly created monitoring template in an embodiment of the present disclosure;

FIG. 7 is a version list of a monitoring template in an embodiment of the present disclosure;

fig. 8 is a block diagram of a monitoring ledger management apparatus in an embodiment of the present specification;

fig. 9 is a terminal device for implementing a method for managing a monitoring ledger in an embodiment of the present disclosure;

fig. 10 is a computer readable storage medium storing a monitoring ledger administration method according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are only schematic illustrations of the present disclosure. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

As shown in fig. 1, an embodiment of the present disclosure provides a monitoring ledger management method, where an execution subject of the method may be a terminal device, and the terminal device may be a personal computer. The invention relates to a monitoring ledger management method, which comprises the steps of creating a monitoring template for each type of monitored object, performing version devops management, monitoring prediction management, combining alarm scenes corresponding to monitoring event feature IDs and automatically executing strategies based on the alarm scenes, wherein the method specifically comprises the following steps of S101-S105:

in step S101, a monitoring template corresponding to various monitored objects is created, where the monitoring template includes at least one of a first monitoring item set, a first creator, a first creation time, a first creation reason, a first environment type, a first deployment coverage rate, a parent template, a first adjustment content, and a first notifier, the first monitoring item set includes a monitoring policy related to the monitored objects and a corresponding threshold, and the first adjustment content is a content adjusted for the parent template.

Specifically, each monitored object has a plurality of monitoring indexes (monitoring items) belonging to the characteristics of the monitored object, for example, the operating system level has the monitoring indexes such as CPU utilization rate, memory utilization rate, disk space utilization rate and the like; the network switch device has monitoring indicators of total network traffic, per-port status, and the like. In particular, the monitored objects may include, but are not limited to, hosts, middleware, networks, databases, applications. In the case of a host, the host's body may include an operating system and other software applications, the monitoring items of which may be: CPU utilization (kernel mode, user mode, overall), memory utilization, actual memory utilization, memory residuals, memory usage, swap partition utilization, swap partition usage, disk space utilization (overall and per mount point), disk space usage (overall and per mount point), network total traffic (ingress traffic, egress traffic, overall traffic), per port traffic, per port state, file handle usage; IOPS (read and write), IO throughput (read and write), ioutil, iowait, load 1, load 5, load 15, and the like; in the case of middleware, the monitoring items may include: jvmgc frequency, jvm memory usage, jvm memory usage, jvm heap memory usage, jvm heap memory usage, jvm non-heap memory usage, jvm non-heap memory usage, middleware queue length, middleware blocking condition, middleware active-standby switch, middleware big key query, middleware slow request processing, middleware connection condition, middleware availability; in the case of a network, the monitoring items may include: connectivity, time delay, packet loss rate, network traffic (whole and every port), connection number, device load; in the case of a database, the monitoring items may include: tablespace usage, slow sql, lock number, archive space usage; when an application is, the monitoring items may include: application availability, application interface return value, application response time, application interface return content, etc. Any monitoring item/monitoring index of any monitored object can be considered as a monitoring item to which the present embodiment refers, because of the inexhaustibility of the monitoring item.

The object-oriented monitoring template designed in the embodiment comprises the following attributes: the monitoring method comprises the steps that firstly, a group of monitoring strategies related to the monitored object comprises a corresponding set of monitoring items, namely monitoring indexes; secondly, creating the template; thirdly, creating a reason (background) of the template; fourthly, the environment type to which the template is applicable; fifthly, the actual deployment coverage rate of the template; sixthly, a parent template of the template; seventhly, the content of the template is adjusted compared with the parent template; eighth, the monitoring template notifies persons, etc., and these attributes can be adjusted as needed, such as added or deleted. After the monitoring templates of each class of objects are set, a user only needs to select different monitoring templates according to the needs aiming at different monitored objects, so that the configuration efficiency of monitoring is greatly improved, and the labor cost is reduced.

In step S102, the monitoring templates are deployed in the development environment, the test environment and the production environment in order to form each version of the monitoring templates in each environment, the initial versions of the monitoring templates of other environments than the development environment are summarized by any version of the upper environment, the non-initial version of the monitoring templates in any environment is modified by any version of the upper environment, the monitoring templates of any version record at least one of a second creator, a second creation reason, a second creation time, a father version, a second monitoring item set, a second adjustment content, a second deployment coverage rate, a second notifier and a second environment type, and the second monitoring item set at least comprises one of a host, a middleware, a network, a database and an applied monitoring item.

The monitoring templates are deployed in a grading manner, namely after the monitoring templates are established, each monitoring template needs to undergo a process of deploying a development environment, a testing environment and a production environment, the process can not jump, namely the templates can not be directly deployed to the testing environment or the production environment, and after the development environment is deployed, the templates can not be directly deployed to the production environment without entering the testing environment, so that the monitoring templates can be correctly optimized. The names of development environment, test environment, production environment and the like refer to various stages of software development, and may be referred to differently in different occasions, for example, the development environment is also known as a development stage, a development area, a development server, a development instance, a development environment and a development mode, for example, the test environment is also known as: test stage, test area, test server, test instance, QA environment, SIT environment, etc.; the production environment is also called: production phase, production area, production server, production instance, formal environment, PROD environment, etc. The scope of the invention should not be limited by the names.

The deployment process of the monitoring template in each environment is called as one instantiation of the template, and the threshold value of the strategy in the template is initialized in the instantiation process to form the initial version of the monitoring template in the environment, and the initial versions of the environments are as follows: development environment dev-0, test environment test-0 and production environment prod-0. In the development environment and the test environment, developers and testers can create new versions based on the initial versions and the existing versions in the version list under the corresponding environment, the versions are added to the version list of the environment after being created, the corresponding developers and testers can directly select to use or modify the versions again, each version can record a second creator, a second creation reason, second creation time, a father version, a second monitoring item set, second adjustment content compared with the father version and the like of the corresponding version, and each version of the monitoring template can record coverage of the version in actual deployment. For example, developer A may generate a new version dev-1 based on dev-0, tester A may generate a new version test-1 based on test-0, developer B may generate a new version dev-2 based on dev-0, may derive a new version dev-3 based on dev-1, and tester B may generate a new version test-2 based on test-1. The initial version dev-0 of the development environment is generally generated by operation and maintenance personnel according to the actual working experience of the operation and maintenance personnel; after operating the test-0 of the test environment for a period of time according to each version of the development environment by operation and maintenance personnel, summarizing and extracting the advantages of each dev-n version, and deploying the test-0 to the test environment by the operation and maintenance personnel; the method comprises the steps that after an operation and maintenance person operates for a period of time according to each version of the test environment, the initial version prod-0 of the production environment is formed after summarizing and extracting the advantages of each test-n version, and the operation and maintenance person deploys the initial version prod-0 to the production environment.

To summarize, the non-initial version of the monitoring template in any one environment is modified from any version of the upper level, e.g., test-3 may be modified from any one of the upper level test-2, test-1, and test-0 versions in the same environment.

In step S103, the monitoring templates of the corresponding versions are deployed in the monitored object, and alarm simulation is performed.

Specifically, after the monitoring templates are deployed in the corresponding monitored objects, an alarm list for the notifiers to inquire is formed for each alarm in the monitoring templates, so that each notifier can inquire which alarm information is received by itself. And performing alarm simulation, and adjusting any monitoring item to exceed the corresponding threshold value so as to verify whether each alarm is correctly pushed to a corresponding notifier, wherein the notifier can confirm whether the received alarm information is relevant to the notifier or not, and avoid that the received alarm is not received. When an error occurs in notifying a person or an alarm event, the error information is adjusted in the CMDB.

In addition, CMDB is an abbreviation for configuration management database (Configuration Management Database), a database for managing and tracking IT infrastructure and ITs associated configuration information. The CMDB records details of the hardware, software, network devices, services, and various configuration items of the organization, including their attributes, relationships, and states, etc.

In step S104, a unique event feature ID is assigned to the same type of alarm event of each monitored object, after an alarm occurs, the alarm is sent to a notifier, and corresponding alarm scenes are predicted according to combinations of different event feature IDs of different monitored objects, the combinations of different event feature IDs are ordered in time, and the relationship of the time feature IDs at least includes one of "and", "or".

By way of explanation, when a cloud host generates disk space alarms multiple times, the event feature ID is unique. In an actual production scene, due to the difference of business systems of each company and the difference of basic structures, when alarms or faults are generated, after-the-fact analysis can often find that some event combinations are generated before the alarms or faults occur, the events are temporally sequential and 'or' in space, and then the corresponding alarms or fault scenes can be basically predicted based on the event feature ID combinations, and corresponding disposal schemes are designed based on the event feature ID combinations, so that the fault root cause analysis and fault recovery schemes based on the event feature ID combinations are realized.

In step S105, a treatment scheme of a corresponding standardized flow is designed according to different alert scenes, when an alert occurs, the corresponding treatment scheme is automatically executed, and after the corresponding treatment scheme is automatically executed, an alert that has not been treated is manually treated.

When an alarm event occurs or a fault scene corresponding to the event feature ID combination is triggered, a pre-designed automatic process is called to execute a corresponding standardized treatment scheme, the effect of fault self-healing can be achieved for some fault scenes in an automatic mode, a standardized action can be formed for the process which needs manual investigation originally for some fault scenes, the problem investigation efficiency is improved, and the labor cost investment is reduced. Then, if an alarm failure of the automated treatment plan is not configured, the notifier is notified to handle the failure, or if a failure of the treatment plan is executed but not solved, the notifier is notified to handle the failure.

Hereinafter, with respect to the above disclosure, explanation will be made in connection with examples.

As shown in fig. 2, the monitoring ledger includes multiple types of monitored objects, such as a cloud host, an application, middleware, a database, a switch, and the like, in this embodiment, the cloud host and the application are listed, and for the cloud host, the cloud host has corresponding monitoring templates according to each type due to different types, each monitoring template is to follow a devops management method, a prediction condition of each template, a combination condition of cloud host monitoring event feature IDs, a self-healing and automatic investigation strategy, and the like.

When the monitoring template is selected, different types of cloud host monitoring templates can be generated according to different types of cloud hosts and different requirements aiming at the cloud hosts as monitored objects. As shown in fig. 3, starting from the monitored object of the cloud host, N monitoring templates may be generated, where each monitoring template is used to meet different types of monitoring requirements, based on various types of cloud host monitoring templates, when a certain newly-built cloud host needs to be monitored, only a corresponding template needs to be selected, so that monitoring configuration efficiency for the cloud host object is greatly improved.

Monitoring templates are required to be subjected to devots management, and in general, devots is a methodology for software development and operation and maintenance, aiming at realizing rapid and reliable software delivery and continuous deployment by improving collaboration and communication between development teams and operation and maintenance teams. The core idea of DevOps is to treat development and operation as a whole, emphasizing collaboration, automation and continued improvement. In this embodiment, the monitoring templates are instantiated by devops. As shown in fig. 4, the operation and maintenance personnel deploy an initial version dev-0 of the monitoring template in the development environment of the cloud host, and the developer creates other versions according to the initial version dev-0 or creates more versions according to non-initial versions, particularly according to actual situations. And then according to the advantages of each version in the development environment, the operation and maintenance personnel form an initial version test-0 in the test environment of the cloud host, the initial version test-0 is consistent with the development environment, and under the test environment, the test personnel create each version. Finally, the operation and maintenance personnel form an initial version prod-0 of the monitoring template in the production environment based on the advantages of the monitoring templates of all versions in the test environment, and the operation and maintenance personnel modify the versions suitable for all conditions according to the initial version prod-0.

In each environment, in order to realize the advanced deployment of the monitoring templates from development to testing to production, so that the deployment of the monitoring templates of the production environment is more standard, the iteration of each version is recorded, and as shown in fig. 5, the initial version dev-0 records the information of creation time, creator, creation reason, monitoring item set (set of policies), father template (father version), alarm notifier and the like, wherein the monitoring item set is CPU usage, memory usage, disk usage and IO port flow. When another staff member modifies the initial version dev-0, the corresponding information of creation time, creator, creation reason, monitoring item set (policy set), father template (father version), alarm notifier and the like is recorded, and when the content is modified, the information is presented in a second adjustment content, such as adjusting CPU threshold value, removing IO monitoring, and increasing network traffic and file handle usage monitoring. Therefore, the monitoring templates of each version become clear, and the monitoring ledger management is convenient for the staff.

In addition, when the monitoring template is designed, the content recorded in the monitoring template is the first content (i.e., the first monitoring item set, the first creator, … …, the first notifier) designed in step S101, such as the initial version of the development environment. When the version is updated iteratively, the content recorded therein is changed, so that the content is the second content (i.e., the second monitoring item set, the second creator, … …, the second notifier) in step S102.

Simplifying fig. 5, the recorded content of the monitoring template is obtained as in fig. 6 for two versions of the iteration.

As shown in fig. 7, each time a monitoring template of a cloud host is newly built, the monitoring templates are stored in a monitoring template version list of the cloud host, so that a user can conveniently select a template which meets the requirement and is subject to standard management from the existing templates, deployment is rapidly performed, and efficiency and standardization of deployment of the monitoring templates are improved.

In the aspect of monitoring pre-alarm notification management, assuming that a cloud host a is deployed by using a dev-0 monitoring template, the CPU utilization rate can be simulated to exceed 90% by technical means, and then, whether a person notified of an alarm is correct or not is observed, for example, if the alarm is normal, zhang three and Lifour are notified, and Zhang three and Wang five are actually notified, so that notification errors occur, and the person in charge of the cloud host maintained in the CMDB is found to have errors by checking, so that the information on the CMDB can be modified, and the accuracy of the CMDB data is fed back.

For the aspect of event feature ID combination, the event feature ID of the cloud host agent abnormality (the abnormality represents that the cloud host is in a false dead state, and certain components or functions are unavailable) is assumed to be ID1, the event feature ID of the cloud host which can be ping-passed is assumed to be ID2, when the ID1 and the ID2 events occur simultaneously, the cloud host network is reachable at the moment, but problems exist in the cloud host, and the agent abnormality is caused, so that the network problems can be eliminated under the scene, and the internal problems of the cloud host can be directly positioned; assuming that a service a in the cloud host is dialed and tested, an event ID of a dialing and testing service is normal is ID3, an event ID of a dialing and testing abnormality is ID5, the service a in the cloud host is dialed and tested outside the cloud host, the event ID of the dialing and testing service is normal is ID4, the event ID of the dialing and testing abnormality is ID6, and when the ID3 and the ID6 occur simultaneously, the service is not problematic, but the service cannot be accessed outside the cloud host possibly because of network problems.

For the aspect of alarm automatic treatment, an automatic treatment flow can be triggered based on the combination of the event feature IDs of the alarm, for example, based on the combination of the feature IDs in the aspect, when an ID1 event and an ID2 event occur simultaneously, the cloud host can be automatically restarted, the inside of the cloud host is recovered to be normal after the cloud host is restarted with high probability, and then the cloud host is logged in for further analysis; another case is that when a specific single event of a certain kind occurs, an automatic handling process is triggered, for example, when a disk space of a cloud host alarms, the automatic process may be triggered, and a log under an application log path is cleaned up to contact the alarm.

Based on the same idea, as shown in fig. 8, there is provided a monitoring ledger management apparatus including: the monitoring template design module 801 is configured to create a monitoring template corresponding to various monitored objects, where the monitoring template includes at least one of a first monitoring item set, a first creator, a first creation time, a first creation reason, a first environment type, a first deployment coverage rate, a parent template, a first adjustment content, and a first notifier, and the first monitoring item set includes a monitoring policy related to the monitored object and a corresponding threshold value, and the first adjustment content is a content adjusted to the parent template; the monitoring template devots management module 802, wherein the monitoring template devots management module 802 is used for sequentially deploying the monitoring templates in a development environment, a testing environment and a production environment to form each version of the monitoring templates in each environment; the monitoring pre-alarm notification management module 803 is configured to deploy the monitoring templates of corresponding versions in the monitored object, and perform alarm simulation; the event feature ID management module 804, where the event feature ID management module 804 is configured to assign a unique event feature ID to the same type of alarm event of each monitored object, send the event feature ID to a notifier after an alarm occurs, and predict a corresponding alarm scene according to a combination of different event feature IDs occurring in different monitored objects; an alarm automation treatment module 805, where the alarm automation treatment module 805 is configured to design a corresponding treatment plan according to different alarm scenarios, and when an alarm occurs, automatically execute the corresponding treatment plan.

By adopting the monitoring ledger management device, the fine operation management from the monitoring object template to version devots management to monitoring event feature combination and automatic treatment of monitoring items is realized, and the problems of rough management and fault caused by focusing on monitoring alarm per se in the current monitoring are solved;

and the efficiency, coverage statistics and traceability of the configuration monitoring of the monitored object are improved. In the device, the efficiency of monitoring configuration by using the template pairs of each type of monitored object can be greatly improved, the coverage rate of each monitoring template can be tracked, and the reason that the monitored object is covered by the template and the reason for configuring the monitoring template can be reversely detected from the monitored object;

and the accurate and standardized management of the monitoring template is supported. The monitoring template devops management method in the device can meet the personalized and accurate requirements of various roles of development, test and operation and maintenance personnel on the monitoring template, and manages the monitoring template based on the devops concept, so that the advanced deployment of the monitoring template from development to test to production is realized, and the deployment and use of the monitoring template in the production environment are more standard;

the accuracy of monitoring alarm notification and optimizing CMDB data are improved. By using the monitoring pre-alarm notification mechanism, the alarm receiver is notified in a mode of simulating an alarm threshold value when an alarm event does not actually occur, whether an alarm sending target is accurate or not is found in advance, and therefore the accuracy of notification when the alarm event actually occurs is achieved. Meanwhile, if the notification has errors, the data in the CMDB can be synchronously modified, so that the effect of the accuracy of the data of the back feeding CMDB is achieved;

And the problem positioning and root cause analysis capability when the alarm occurs are improved. Based on the alarm event feature ID management method in the device, problems or fault root causes can be associated with a group of alarm event feature ID combinations according to historical experience, when a specific alarm event feature ID combination is generated, a specific fault scene is corresponding, the corresponding problems or fault root causes can be quickly found, and analysis is carried out on the root causes based on a historical experience library, so that the problem positioning and root cause analysis capability when faults occur is improved;

and the alarm fault handling efficiency is improved. According to the alarm automation treatment method based on the standardized flow in the disclosure, when an alarm event occurs or a fault scene corresponding to the event feature ID combination is triggered, automatic fault self-healing can be realized or the original manual fixing operation is changed into automatic execution, so that the efficiency of treating alarm faults is improved.

The specific details of each module/unit in the above apparatus are already described in the method section embodiments, and the details not disclosed may refer to the method section embodiments, so that they will not be described in detail.

Based on the same thought, the embodiment of the present disclosure further provides a monitoring ledger management device, as shown in fig. 9.

The monitoring ledger administration device may be a terminal device or a server provided in the above embodiment.

The monitoring ledger administration device may vary widely in configuration or performance, may include one or more processors 901 and memory 902, and may store one or more stored applications or data in memory 902. The memory 902 may include, among other things, readable media in the form of volatile memory units, such as Random Access Memory (RAM) units and/or cache memory units, and may further include read-only memory units. The application programs stored in memory 902 may include one or more program modules (not shown) including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Still further, the processor 901 may be arranged to communicate with the memory 902 and execute a series of computer executable instructions in the memory 902 on the monitoring ledger administration device. The monitoring ledger administration device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more I/O interfaces (input output interfaces) 905, one or more external devices 906 (e.g., a keyboard), and may also communicate with one or more devices that enable a user to interact with the device, and/or with any device that enables the device to communicate with one or more other computing devices (e.g., a router, a network switch, etc.). Such communication may occur through the I/O interface 905. Also, devices can communicate with one or more networks (e.g., a Local Area Network (LAN)) via a wired or wireless interface 904.

In particular, in this embodiment, the monitoring ledger administration apparatus includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions for the monitoring ledger administration apparatus, and execution of the one or more programs by the one or more processors includes computer executable instructions for:

Based on the same idea, exemplary embodiments of the present disclosure further provide a computer readable storage medium having stored thereon a program product capable of implementing the method described in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 10, a program product 1000 for implementing the above-described method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, CSS, HTML and the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of monitoring ledger administration, the method comprising:

2. The method according to claim 1, wherein the monitored object includes at least one of a host, a middleware, a network, a database, and an application, and the first monitoring item set includes at least one of monitoring items of the host, the middleware, the network, the database, and the application.

3. The monitoring ledger administration method according to claim 1, characterized in that the initial version of the monitoring template of other environments than the development environment is summarized by any version of the upper level environment, and the non-initial version of the monitoring template in any environment is modified by any version of the upper level environment.

4. The method according to claim 3, wherein the arbitrary version of the monitoring template records at least one of a second creator, a second creation reason, a second creation time, a parent version, a second monitoring item set, a second adjustment content, a second deployment coverage, a second notifier, and a second environment type, and the second monitoring item set includes at least one of a host, middleware, a network, a database, and an application monitoring item.

5. The method according to claim 1, wherein after the monitoring templates are deployed in the corresponding monitored objects, an alarm list for a notifier to inquire about is formed for each alarm in the monitoring templates, the alarm simulation is performed, any monitoring item is adjusted to exceed the corresponding threshold value to verify whether each alarm is properly pushed to the corresponding notifier, and error information is adjusted in the CMDB when an error occurs in the notifier or the alarm event.

6. The monitoring ledger administration method according to claim 1, wherein combinations of different said event feature IDs are time ordered, and the relationship of said time feature IDs includes at least one of "and", "or".

7. The monitoring ledger administration method according to claim 1, characterized in that after automatically executing the corresponding treatment scheme, the alarm that is not treated is manually treated.

8. A monitoring ledger administration apparatus, comprising:

9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the monitoring ledger administration method according to any one of claims 1-7.

10. A monitoring ledger administration apparatus, characterized by comprising:

a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to:

And designing a corresponding treatment scheme according to different alarm scenes, and automatically executing the corresponding treatment scheme when an alarm occurs.