CN116204388A - Intelligent monitoring system and method for system service state - Google Patents
Intelligent monitoring system and method for system service state Download PDFInfo
- Publication number
- CN116204388A CN116204388A CN202310464795.0A CN202310464795A CN116204388A CN 116204388 A CN116204388 A CN 116204388A CN 202310464795 A CN202310464795 A CN 202310464795A CN 116204388 A CN116204388 A CN 116204388A
- Authority
- CN
- China
- Prior art keywords
- data
- service state
- service
- state
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000007405 data analysis Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000012423 maintenance Methods 0.000 claims abstract description 18
- 238000004458 analytical method Methods 0.000 claims description 40
- 230000008569 process Effects 0.000 claims description 19
- 230000007704 transition Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000009795 derivation Methods 0.000 claims description 5
- 239000000523 sample Substances 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 2
- 238000007906 compression Methods 0.000 claims description 2
- 238000013480 data collection Methods 0.000 claims description 2
- 238000013500 data storage Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an intelligent monitoring system and method for system service state, which belongs to the technical field of information system operation and maintenance monitoring and is used for solving the following technical problems: the existing system service state monitoring method cannot effectively discover faults and problems of the system. The system comprises: the service definition unit is used for describing the system service state, decomposing the system service state into system component service states and generating a service state tree according to the system component service states; the data acquisition unit is used for acquiring system service data according to the system component service state; and the data analysis unit is used for analyzing and processing the system service data through a service state tree maintenance algorithm according to the service state tree to obtain a monitoring result of the system service state.
Description
Technical Field
The application relates to the technical field of information system operation and maintenance monitoring, in particular to an intelligent monitoring system and method for a system service state.
Background
Most of the traditional informatization systems are single systems, the deployment structure is simple, and faults are easy to troubleshoot. Modern informatization systems are mostly developed by applying internet technology, and the system is composed of a plurality of components and operates in a complex environment composed of a server, a database and middleware. In certain scenarios, there are also hybrid deployment structures across the internet, wide area networks, or multiple networks. Under the condition of the complex operation environment, the service state of the system is influenced by various factors, and once a certain component of the system has operation faults, the service state of the whole system is directly interrupted, so that accidents occur. How to track the service state of the system and discover the fault factors in time becomes a key link for improving the operation and maintenance work of the modern informatization system.
The service status of the system mentioned in this specification refers to the performance of the quality of service and availability provided to the user during normal operation of the system. In popular terms, the system can provide the required service in an expected manner, and can timely restore the service when required, so that a user can conveniently use the system and obtain a satisfactory service experience.
The monitoring of the service state of the system is now mainly performed on specific environments or specific scenes, such as network monitoring, host monitoring, or page response monitoring of an internet system. However, the modern informatization system is composed of a plurality of parts such as a network, a server, a database, middleware, application software and the like, and the reasons for generating faults are related to each other, and the faults and problems of the system cannot be found effectively by monitoring only one link or monitoring a plurality of links without being related to each other.
Disclosure of Invention
The embodiment of the application provides an intelligent monitoring system and method for a system service state, which are used for solving the following technical problems: the existing system service state monitoring method cannot effectively discover faults and problems of the system.
The embodiment of the application adopts the following technical scheme:
in one aspect, an embodiment of the present application provides an intelligent monitoring system for a service state of a system, where the system includes: the service definition unit is used for describing the system service state, decomposing the system service state into system component service states and generating a service state tree according to the system component service states; the data acquisition unit is used for acquiring system service data according to the system component service state; and the data analysis unit is used for analyzing and processing the system service data through a service state tree maintenance algorithm according to the service state tree to obtain a monitoring result of the system service state.
In a possible implementation manner, the service state tree maintenance algorithm includes a service state transition rule and a service state deduction algorithm; the service state transition rule is used for indicating the influence on the service states of the system components of other levels after the service state of the system components of one level in the service state tree is changed, the level comprises nodes, modules, subsystems and systems, and the service states comprise an available state, an unavailable state and an unstable state; the service state deduction algorithm is used for indicating the realization process of calculating the module service state, the subsystem service state and the system service state from the node service state.
In a possible embodiment, the service state derivation algorithm comprises: in the service state tree, calculating service state scores corresponding to all leaf nodes according to the system service data; determining a superior non-leaf node associated with the leaf node, and obtaining a service state score of the superior non-leaf node by weighted summation according to the service state score of the leaf node; continuing to determine a higher-level non-leaf node associated with the higher-level non-leaf node, and obtaining a service state score of the higher-level non-leaf node by weighted summation according to the service state score of the higher-level non-leaf node; repeatedly executing the process until the service state score of the root node is calculated; and determining a monitoring result of the system service state according to the service state score of the root node.
In a possible embodiment, the data analysis unit includes an analysis task definition component, an analysis task control component, and a data processing channel component; the analysis task definition component is used for constructing a data analysis task according to data to be analyzed in the system service data; the data processing channel component is used for preprocessing the data to be analyzed; the analysis task control component is used for calling a definable analysis plug-in to execute the data analysis task.
In a possible embodiment, the data processing channel component is configured to perform data formatting, data cleansing, data de-duplication and data timestamp conversion on the data to be analyzed.
In a possible implementation manner, the data acquisition unit comprises an acquisition control component, a data buffer component, a timer component and a data transmission component; the acquisition control component is used for determining the acquisition mode of the system service data and controlling the acquisition process of the system service data; the data caching component is used for caching the system service data; the timer component is used for controlling the uploading time of the system service data; the data transmission component is used for uploading the system service data after compression processing.
In a possible implementation manner, the collection mode of the system service data includes collection through a service state monitoring probe and collection through a log.
In a possible embodiment, the system further comprises a time synchronization unit for adjusting the time stamps of the components of the system to determine the time synchronization between the components of the system.
In a possible implementation manner, the system further comprises a monitoring unit for displaying the analysis processing result of the system service data in a table form.
On the other hand, the embodiment of the application also provides an intelligent monitoring method for the service state of the system, which is applied to the intelligent monitoring system for the service state of the system, and the method comprises the following steps: determining a system service state to be monitored through a service definition unit, decomposing the system service state into a system component service state, and generating a service state tree according to the system component service state; collecting system service data corresponding to the system component service state through a data collecting unit; and analyzing and processing the system service data by using a service state tree maintenance algorithm according to the service state tree through a data analysis unit to obtain a monitoring result of the system service state.
The intelligent monitoring system and the intelligent monitoring method for the service state of the system have the following beneficial effects: because the intelligent monitoring system for the system service state comprises the service definition unit, the data acquisition unit, the data analysis unit and the monitoring unit, the system service state can be used as a monitoring target, the service state of the component to be monitored is combed from top to bottom to form a service state chain to be monitored, global definition is formed, the global definition is then issued to the data acquisition unit, relevant system service data is acquired, the data analysis unit analyzes according to the service state tree to form analysis data, the monitoring unit monitors and displays according to the service state tree to provide a guarantee for effectively positioning system problems and faults, meanwhile, an information system operation and maintenance worker can conveniently and rapidly collect the whole service state data of the system, store, analyze and display the whole service state data, accurately analyze the operation state of a platform system, rapidly position operation and maintenance faults, and improve the operation and maintenance work efficiency of the information system.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
fig. 1 is a schematic structural diagram of an intelligent monitoring system for a service state of a system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a service status tree according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data acquisition unit according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data analysis unit according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
Aiming at the modern Internet technology informatization system with complex deployment structure and complex system constitution, the intelligent monitoring system for accurately grasping the service state of the system is provided with high flexibility, non-invasiveness and high efficiency.
The intelligent monitoring system in the application is explained in detail below through the attached drawings.
Fig. 1 is a schematic structural diagram of an intelligent monitoring system for a service state of a system according to an embodiment of the present application. As shown in fig. 1, the intelligent monitoring system in the embodiment of the present application at least includes: a service definition unit 1, a data acquisition unit 2, a data storage unit 3, a data analysis unit 4, a monitoring unit 5 and a time synchronization unit 6.
As shown in fig. 1, the service definition unit 1 is a core functional module of the intelligent monitoring system, and determines the operations of the data acquisition unit 2, the data analysis unit 4 and the monitoring unit 5: the data acquisition unit 2 needs to acquire data according to a data source in the service definition, the data analysis unit 4 analyzes the data according to a data threshold of the service definition, the service state of the system is determined, and the monitoring unit 5 displays the data according to the service definition. The time synchronization unit 6 is related to the data acquisition unit 2 and the data analysis unit 4. The data storage unit 3 is associated with the data acquisition unit 2 and the data analysis unit 4.
In one or more possible implementations of the embodiments of the present application, the service definition unit is configured to describe a service state of the system or define a service state of the system, and decompose the service state of the system into service states of various components of the system, that is, service states of components of the system. Meanwhile, the service definition unit in the application is also used for constructing a service state tree of the system according to the dependency relationship among the states of all the components of the system.
In one example of the present application, the foregoing description of the service state of the system may refer to taking the running state of the system when the system provides a certain service as a monitoring target, that is, defining the service state of the system may be defining the monitoring target of the system, where a system function or system data required for implementing the service, a monitoring index of each function or a monitoring index of each data, and the like may also be defined.
Further, in order to better monitor the service state of the system, the embodiment of the application accurately grasps the problem or failure of the system and introduces a service state tree. In the embodiment of the application, the service state tree is a recursively defined tree structure, which represents the dependency relationship among all components in the system, each component has an independent state, the state represents whether the component works normally, the root node of the state tree represents the service state of the system, and other nodes represent the states of the components. When the state of a component changes, the state of a node on the tree corresponding to the component also changes, and in the service state tree, the state of the component is affected by the change of the state of the component, so that the root node is affected, and finally, the service state of the system is formed. Thus, by analyzing the service state tree, the complete service state of the system can be obtained.
In one or more possible implementations of embodiments of the present application, the service state tree is maintained using two methods: firstly, a service state transition rule is predefined, and when the service state changes, the state update is carried out through the rule; second, a service state derivation algorithm is used to recursively traverse through the service state tree to derive the final system service state.
Specifically, the service state transition rule in the embodiment of the present application may be used to describe an effect of a change on a service state of another system component on a tree after the service state of the system component at a certain level in the service state tree changes, and a change embodied in the service state. In one example of the present application, the service states are divided into a basic service state and a detailed service state. The basic service state is divided into 3 kinds of available state, unavailable state and unstable state; the detailed service state may be defined as a vector containing a plurality of indicators, such as an availability indicator, a response time indicator, a capacity indicator, etc. The above-mentioned levels can be classified into 4 levels of nodes, modules, subsystems and systems according to the system scope.
For example, in the service state tree, the service state of a node changes from "available" to "unavailable", which directly affects the change of the state of the module to which the service state belongs to "unavailable", the state of the upper subsystem changes to "unstable", and the overall state of the system changes to "unstable".
Further, the state derivation algorithm in the embodiments of the present application describes how to calculate the states of the modules, subsystems, and systems from the node states based on the state transition rules. That is, it is described how to derive the service state of the entire system from the service state of each node so as to comprehensively evaluate the operation condition of the system. The flow of the algorithm is as follows:
1) The weights of the service states are defined, the weights of each index in the detailed service states are determined by using a hierarchical analysis (Analytic Hierarchy Process, AHP) method, and then the service state score of each leaf node is calculated according to the weights and the system service data. It should be noted that the AHP method is a multi-attribute decision analysis method, and can be used to deal with complex decision problems with multiple attributes and multiple factors.
2) From the leaf nodes, the last non-leaf node associated with the leaf node is traversed in turn, and the service status score of each non-leaf node is calculated by weighted summation of the service status scores of the respective leaf nodes.
3) For the higher-level non-leaf nodes associated with each non-leaf node, a service state score for the higher-level non-leaf node is calculated from a weighted sum of the service state scores for each non-leaf node.
4) And (3) repeating the step until the service state score of the root node is calculated.
5) And classifying the service states of the system according to the service state scores of the whole system. For example, some threshold may be used to define a level of service status, such as excellent, good, general, bad, and so forth.
In the above algorithm process, there may be the following steps:
and traversing each node on the service state tree from the root node, calculating the comprehensive score of each node according to indexes in the detailed service state and the system service data, and combining the comprehensive score with the service state score obtained by weighting and summing the scores of the lower nodes in the process to obtain the final node service state score. That is, the score of each node in the service state tree and the score of the lower node are comprehensively considered to influence the service state tree, so that the effectiveness of the service state of each node is ensured, and further, the effective system service state monitoring is realized.
It should be noted that the time complexity of the algorithm is mainly determined by the depth of the service state tree and the complexity of calculating the service state score for each node. If the depth of the service state tree is small and the algorithm for calculating the service state score is simple, the algorithm efficiency is high. Meanwhile, because the algorithm is based on a service state tree structure, a new subsystem or service component can be conveniently added so as to evaluate the service state of the system more comprehensively.
Fig. 2 is a schematic structural diagram of a service status tree according to the present embodiment. The traditional operation and maintenance monitoring of the service state of the information system focuses on the equipment states of a network, a host, a storage, a database and the like, namely, focuses on the lowest-layer node in fig. 2, and operation and maintenance management staff needs to manually process scattered monitoring information. The modern informatization system aims at service effectiveness and sustainability, so that the system service state is specially defined as a monitoring target in the application, and the service states of the system components to be monitored are combed from top to bottom to form a service state tree to be monitored, and a global definition is formed. As shown in fig. 2, the lowest node of the service state tree is a node such as a device-level service state, the upper level is a system sub-function state that is formed by different device-level influences, the upper level of the system sub-function state is a system function state that is influenced by the system sub-function state, and the upper level of the system function state, that is, the top node of the whole service state tree is the system service state that is influenced by the system function state.
In fig. 2, to monitor the system service state shown in fig. 2, the corresponding system function 1 state, system function 2 state and system function 3 state are monitored, and to monitor the system function 1 state, for example, the corresponding system function 1.1 state and system function 1.2 state are required, and to monitor the system function 1.1 state, for example, the corresponding network state and host state are required. In the process, the service state of the system component to be monitored is combed out from the service state of the system to be monitored. The system service state tree is introduced in the method, the running state of the information system can be expressed more intuitively and accurately, and the accuracy and the effectiveness of operation and maintenance monitoring are improved.
The service state definition unit distributes the service state definition and the service state tree to the data acquisition unit, the data analysis unit and the monitoring unit to complete data acquisition, data analysis and monitoring result formation.
In one or more possible implementations of the embodiments of the present application, the data collection unit is configured to collect data of each component of the system, including, but not limited to, a system log, such as an operating system, a network, middleware, a database, an application log, and the like. The data acquisition unit generally does not need to be specially modified by the monitoring informatization system, and the behavior data can be acquired and transmitted to the data storage unit 3 only by adding the unit to the relevant part of the system.
Further, the data acquisition unit in the embodiment of the application is deployed on each component part of the information system to be monitored, and the running state of the system is obtained through active detection and acquisition of logs and is uploaded to the data storage unit.
Fig. 3 is a schematic structural diagram of a data acquisition unit according to the present embodiment. In fig. 3, the data acquisition unit controls the acquisition process and the acquisition mode of the system service data through the acquisition control component, can be expanded, supports the data acquisition through the mode of the service state monitoring probe and the mode of log acquisition, and uploads the service state data. After the system service data is acquired, the system service data is firstly stored locally through the data caching component, then the timer component controls the uploading time of the service state data, and the compressed system service data is uploaded through the data transmission component according to the data uploading time of the timer component. The system service data can be conveniently and flexibly acquired through the data acquisition unit and efficiently uploaded to the data storage component for subsequent processing.
In one example of the present application, the service status monitoring probe includes web page endpoint monitoring, response time monitoring, capacity monitoring, process monitoring, and the like.
In one or more possible implementation manners of the embodiments of the present application, the data storage unit is configured to store system service data and analysis data after analysis processing, where the unit may be implemented by using a general storage system, which is not described in detail in the embodiments of the present application.
Further, the data analysis unit may analyze and process the system service data according to the service status tree, and store the processing result in the data storage unit. The data analysis unit has a function of defining and managing data analysis tasks, and is capable of defining and controlling execution of the data analysis tasks. In addition, the data analysis unit in the embodiment of the application is an extensible processing structure, comprises a definable analysis task processing structure and supports concurrent operation.
In one example of the present application, the data analysis unit includes the following components:
1) And (3) data source control: support for retrieving data from various data sources, such as databases, files, APIs, and the like.
2) And (3) data processing: has high-efficiency data processing capability, and can process a large amount of data and clean, convert and preprocess the data. For example, invalid logs are removed from service state data, time stamps are converted, format conversion is performed, and the like.
3) Task scheduling: the system has an automatic task scheduling function and can process data according to the change of a data source or time triggering.
4) An analysis plug-in may be defined: and completing the designated analysis task, generating an analysis result, and controlling by task scheduling. The analysis plug-in specifically performs a state analysis function. For example, by performing threshold analysis on the response time, it is determined whether the service state of the node is available or unavailable, or when the response time is too long, the service state is set to be "unstable".
Through the combination of the components, the service state data can be conveniently analyzed, and the analysis result is uploaded to the data storage unit.
Fig. 4 is a schematic structural diagram of a data analysis unit provided in this embodiment, where in fig. 4, the data analysis unit includes three related components including an analysis task definition component, an analysis task control component, and a data processing channel component. A new data analysis task can be created by the analysis task definition component based on the data to be analyzed in the system service data, and the data analysis task can be invoked by the analysis task control component for execution. In an example of the present application, the data to be analyzed includes at least a part of system service data, where it has been described in the foregoing description that the system service status is affected by different system functions, and the different system functions are affected by different device-level statuses, so that service status data corresponding to one system service status includes operation data corresponding to multiple system functions or devices, and when one of the functions needs to be monitored, the operation data corresponding to the function in the system service data needs to be determined as the data to be analyzed.
In fig. 4, there are listed, for example, definable analysis plug- ins 1, 2, 3, which can be invoked by the analysis task control component. These plug-ins can be used to perform different analysis events, and the analysis task control component ultimately implements the analysis persona that performs the definition previously described by invoking these plug-ins. In addition, in fig. 4, there is a data processing channel component, which is a data preprocessing and subsequent processing tool module related to the analysis task, and is also an expandable processing structure, supporting operations such as formatting, cleaning, weight removing, converting, etc. on data, where the component and the definable analysis task plugin jointly execute the analysis task, complete a data analysis processing process, form a data analysis result, and store the data analysis result in the data storage unit.
In one or more possible implementations of the embodiments of the present application, the monitoring unit is configured to display the data analysis result, where a manner of displaying the result includes a data list and a chart manner, and the monitoring unit further supports deriving the data analysis result.
Further, the time synchronization unit is used for realizing time synchronization of all components of the monitored system, ensuring that event time stamps of all the components are consistent, and collecting and analyzing all the monitored system events in a unified time sequence. Through the time synchronization unit, the fault time point can be accurately positioned, so that the real operation and maintenance problem can be effectively found.
The foregoing has explained each unit in the intelligent monitoring system according to the embodiment of the present application in detail, and in order to make the system in the present application easier to understand, the following supplementary descriptions are further provided in the embodiment of the present application.
Fig. 1 shows the connection between the units of the intelligent monitoring system, as can be seen from the foregoing description and fig. 1, in this embodiment of the present application, the service defining unit 1 is configured to define a system service state and generate a service state tree, then send the defined service state and the service state tree to the data collecting unit 2, the data collecting unit 2 collects system service data according to the components involved in the service state tree, and stores the collected system service data in the data storing unit 3, the data analyzing unit 4 performs data analysis on the service state data in the data storing unit 3 according to the service state tree generated by the service defining unit 1, after the analysis is completed, the analysis result data is also stored in the data storing unit 3, and the monitoring unit 5 can display the data analysis result in the data storing unit 3. In this process, the time synchronization unit 6 acts on the data acquisition unit 2 and the data analysis unit 4 in order to ensure time stamp synchronization of the components.
In summary, the system operation service state is defined as a monitoring target in the application, and the components to be monitored are combed from top to bottom to form a service state chain to be monitored, so as to form a global definition. And then the data is issued to a data acquisition unit to enable the data acquisition unit to acquire relevant monitoring state data. And analyzing the data according to the service state tree in the data analysis unit to form analysis data. The monitoring unit monitors and displays according to the service state tree.
In addition, the embodiment of the application also provides an intelligent monitoring method for the system service state, which is applied to the intelligent monitoring system for the system service state, and specifically comprises the following steps:
determining a system service state to be monitored through a service definition unit, decomposing the system service state into a system component service state, and generating a service state tree according to the system component service state; collecting system service data corresponding to the system component service state through a data collecting unit; and analyzing and processing the system service data by using a service state tree maintenance algorithm according to the service state tree through a data analysis unit to obtain a monitoring result of the system service state.
All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for the method embodiments, since they are substantially similar to the system embodiments, the description is relatively simple, with reference to the partial description of the system embodiments being relevant.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the embodiments of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.
Claims (10)
1. An intelligent monitoring system for a service status of a system, the system comprising:
the service definition unit is used for describing the system service state, decomposing the system service state into system component service states and generating a service state tree according to the system component service states;
the data acquisition unit is used for acquiring system service data according to the system component service state;
and the data analysis unit is used for analyzing and processing the system service data through a service state tree maintenance algorithm according to the service state tree to obtain a monitoring result of the system service state.
2. The intelligent monitoring system of claim 1, wherein the service state tree maintenance algorithm comprises a service state transition rule and a service state derivation algorithm;
the service state transition rule is used for indicating the influence on the service states of the system components of other levels after the service state of the system components of one level in the service state tree is changed, the level comprises nodes, modules, subsystems and systems, and the service states comprise an available state, an unavailable state and an unstable state;
the service state deduction algorithm is used for indicating the realization process of calculating the module service state, the subsystem service state and the system service state from the node service state.
3. The intelligent monitoring system of a system service state according to claim 2, wherein the service state derivation algorithm comprises:
in the service state tree, calculating service state scores corresponding to all leaf nodes according to the system service data;
determining a superior non-leaf node associated with the leaf node, and obtaining a service state score of the superior non-leaf node by weighted summation according to the service state score of the leaf node;
continuing to determine a higher-level non-leaf node associated with the higher-level non-leaf node, and obtaining a service state score of the higher-level non-leaf node by weighted summation according to the service state score of the higher-level non-leaf node;
repeatedly executing the process until the service state score of the root node is calculated;
and determining a monitoring result of the system service state according to the service state score of the root node.
4. The intelligent monitoring system of claim 1, wherein the data analysis unit comprises an analysis task definition component, an analysis task control component, and a data processing channel component; wherein,,
the analysis task definition component is used for constructing a data analysis task according to data to be analyzed in the system service data;
the data processing channel component is used for preprocessing the data to be analyzed;
the analysis task control component is used for calling a definable analysis plug-in to execute the data analysis task.
5. The intelligent monitoring system according to claim 4, wherein the data processing channel component is configured to perform data formatting, data cleansing, data de-duplication and data time stamp conversion on the data to be analyzed.
6. The intelligent monitoring system of claim 1, wherein the data acquisition unit comprises an acquisition control component, a data buffer component, a timer component and a data transmission component; wherein,,
the acquisition control component is used for determining the acquisition mode of the system service data and controlling the acquisition process of the system service data;
the data caching component is used for caching the system service data;
the timer component is used for controlling the uploading time of the system service data;
the data transmission component is used for uploading the system service data after compression processing.
7. The intelligent monitoring system of claim 6, wherein the system service data collection mode includes collection by a service status monitoring probe and collection by a log.
8. The intelligent monitoring system of claim 1, further comprising a time synchronization unit for adjusting time stamps of components of the system to determine time synchronization between the components of the system.
9. The intelligent monitoring system according to claim 1, further comprising a monitoring unit for displaying the analysis processing result of the system service data in a form of a table.
10. An intelligent monitoring method for a system service state, applying the intelligent monitoring system for a system service state according to any one of claims 1-9, characterized in that the method comprises:
determining a system service state to be monitored through a service definition unit, decomposing the system service state into a system component service state, and generating a service state tree according to the system component service state;
collecting system service data corresponding to the system component service state through a data collecting unit;
and analyzing and processing the system service data by using a service state tree maintenance algorithm according to the service state tree through a data analysis unit to obtain a monitoring result of the system service state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310464795.0A CN116204388A (en) | 2023-04-27 | 2023-04-27 | Intelligent monitoring system and method for system service state |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310464795.0A CN116204388A (en) | 2023-04-27 | 2023-04-27 | Intelligent monitoring system and method for system service state |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116204388A true CN116204388A (en) | 2023-06-02 |
Family
ID=86513178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310464795.0A Pending CN116204388A (en) | 2023-04-27 | 2023-04-27 | Intelligent monitoring system and method for system service state |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116204388A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101778017A (en) * | 2010-01-05 | 2010-07-14 | 中国工商银行股份有限公司 | Method and server for processing on-line transaction fault event of mainframe |
CN103532739A (en) * | 2013-09-25 | 2014-01-22 | 上海斐讯数据通信技术有限公司 | Monitoring analysis system based on network service and application |
CN104683446A (en) * | 2015-01-29 | 2015-06-03 | 广州杰赛科技股份有限公司 | Method and system for monitoring service states of cloud storage cluster nodes in real time |
CN110855473A (en) * | 2019-10-16 | 2020-02-28 | 平安科技(深圳)有限公司 | Monitoring method, device, server and storage medium |
US20210096911A1 (en) * | 2020-08-17 | 2021-04-01 | Essence Information Technology Co., Ltd | Fine granularity real-time supervision system based on edge computing |
CN112751729A (en) * | 2020-12-30 | 2021-05-04 | 平安证券股份有限公司 | Log monitoring method, device, medium and electronic equipment |
CN114827678A (en) * | 2022-04-29 | 2022-07-29 | 广东省广播电视网络股份有限公司中山分公司 | Operation and maintenance monitoring and analyzing system for digital television front-end platform |
-
2023
- 2023-04-27 CN CN202310464795.0A patent/CN116204388A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101778017A (en) * | 2010-01-05 | 2010-07-14 | 中国工商银行股份有限公司 | Method and server for processing on-line transaction fault event of mainframe |
CN103532739A (en) * | 2013-09-25 | 2014-01-22 | 上海斐讯数据通信技术有限公司 | Monitoring analysis system based on network service and application |
CN104683446A (en) * | 2015-01-29 | 2015-06-03 | 广州杰赛科技股份有限公司 | Method and system for monitoring service states of cloud storage cluster nodes in real time |
CN110855473A (en) * | 2019-10-16 | 2020-02-28 | 平安科技(深圳)有限公司 | Monitoring method, device, server and storage medium |
US20210096911A1 (en) * | 2020-08-17 | 2021-04-01 | Essence Information Technology Co., Ltd | Fine granularity real-time supervision system based on edge computing |
CN112751729A (en) * | 2020-12-30 | 2021-05-04 | 平安证券股份有限公司 | Log monitoring method, device, medium and electronic equipment |
CN114827678A (en) * | 2022-04-29 | 2022-07-29 | 广东省广播电视网络股份有限公司中山分公司 | Operation and maintenance monitoring and analyzing system for digital television front-end platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112073208B (en) | Alarm analysis method, device, chip system and storage medium | |
CN108521339B (en) | Feedback type node fault processing method and system based on cluster log | |
CN110428018A (en) | A kind of predicting abnormality method and device in full link monitoring system | |
US8270410B2 (en) | Sampling techniques | |
CN110581773A (en) | automatic service monitoring and alarm management system | |
CN111124830B (en) | Micro-service monitoring method and device | |
CN113242153A (en) | Application-oriented monitoring analysis method based on network traffic monitoring | |
CN116719664B (en) | Application and cloud platform cross-layer fault analysis method and system based on micro-service deployment | |
CN107104820B (en) | Dynamic capacity-expansion daily operation and maintenance method based on F5 server node | |
CN107635003A (en) | The management method of system journal, apparatus and system | |
CN113516244A (en) | Intelligent operation and maintenance method and device, electronic equipment and storage medium | |
CN116566831A (en) | Mobile network resource management method and system based on cloud computing | |
CN113887823A (en) | Self-adaptive extraction method for fault blackout line based on knowledge reasoning | |
CN117971384A (en) | Automatic operation and maintenance method based on container and big data | |
CN118214649B (en) | Operation and maintenance fault quick positioning method based on network topology structure | |
CN116166443A (en) | Load optimization method and system of game task system | |
WO2024146438A1 (en) | Application-cluster health detection method and system based on directed acyclic graph | |
CN113300914A (en) | Network quality monitoring method, device, system, electronic equipment and storage medium | |
CN116204388A (en) | Intelligent monitoring system and method for system service state | |
CN110609761B (en) | Method and device for determining fault source, storage medium and electronic equipment | |
CN115766768B (en) | Perception center design method and device in computing power network operation system | |
US8832496B2 (en) | Information managing computer product, apparatus, and method | |
CN1791028A (en) | Gridding information monitoring system | |
Wang et al. | LSTM-based alarm prediction in the mobile communication network | |
CN110415136B (en) | Service capability evaluation system and method for power dispatching automation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230602 |