CN110633189A - Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system - Google Patents

Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system Download PDF

Info

Publication number
CN110633189A
CN110633189A CN201910894961.4A CN201910894961A CN110633189A CN 110633189 A CN110633189 A CN 110633189A CN 201910894961 A CN201910894961 A CN 201910894961A CN 110633189 A CN110633189 A CN 110633189A
Authority
CN
China
Prior art keywords
model
maintenance
intelligent
task
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910894961.4A
Other languages
Chinese (zh)
Other versions
CN110633189B (en
Inventor
李晓林
曾维朝
刘亮
刘祖福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Guangtong Software Co Ltd
Original Assignee
Shenzhen Guangtong Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Guangtong Software Co Ltd filed Critical Shenzhen Guangtong Software Co Ltd
Priority to CN201910894961.4A priority Critical patent/CN110633189B/en
Publication of CN110633189A publication Critical patent/CN110633189A/en
Application granted granted Critical
Publication of CN110633189B publication Critical patent/CN110633189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • G06Q50/40
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

An intelligent operation and maintenance monitoring method of an IT system and an intelligent operation and maintenance monitoring system thereof are provided, wherein the intelligent operation and maintenance monitoring method comprises the following steps: acquiring service information of an IT system; determining an intelligent model from a preset knowledge system according to the service information; inputting the service information into the intelligent model, and creating a robot processing model, wherein the robot processing model comprises an operation and maintenance operation task corresponding to the service information; and driving a preset model execution engine by using the robot processing model to execute operation and maintenance operation tasks so as to monitor the IT system. Due to the fact that the knowledge system is constructed in advance, the business information can accurately correspond to one intelligent model in the knowledge system, and the robot processing model required by the operation and maintenance operation task can be created by rapidly inputting the business information into the matched intelligent model.

Description

Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system
Technical Field
The invention relates to the technical field of computer operation and maintenance, in particular to an intelligent operation and maintenance monitoring method and an intelligent operation and maintenance monitoring system of an IT system.
Background
With the deep development of information-based construction, IT systems increasingly become key infrastructures for core business processing. In order to ensure the normal operation of the IT resources such as the network, the server, the database and the like, the IT resources need to be maintained, when the system is abnormal, an alarm can be generated in time and the operation and maintenance personnel can be informed, so that the operation and maintenance personnel can locate and diagnose the abnormality according to the alarm and complete the corresponding maintenance operation.
At present, most enterprises adopt relatively solidified information operation and maintenance strategies, an original operation and maintenance system based on personnel monitoring and manual processing is difficult to deal with dynamic changes of actual business of the enterprises on the aspects of accuracy and efficiency of judgment of operation states of the IT systems due to the fact that the personal levels of operation and maintenance personnel are different and the intensity of monitoring different IT systems is lack of reasonable standard specifications, and has large dependence on personnel and the problem of waste of operation and maintenance resources.
In the management of operation and maintenance, many management objects are different, so that the indexes which need to be monitored are different, the monitoring density is different, the judgment standard for judging whether the indexes are normal or not is different, the treatment analysis and treatment operation after the indexes are found to be deviated are also different, but many management objects are generally the same or similar. At present, the operation and maintenance management is usually performed by completely different monitoring indexes, monitoring densities, judgment standards, treatment analysis and treatment operations for all the management objects, and therefore, when the same or similar objects are encountered, the operation and maintenance management is often repeated, which complicates the operation and maintenance management.
Currently, some technicians attempt to combine the operation and maintenance management work of the IT system with database technology (DB) and artificial intelligence technology (AI) to improve the intelligent work demand of operation and maintenance management. Although the database technology and the artificial intelligence technology are important branch fields in computer science, and have achieved outstanding results and are applied to some related fields, some outstanding problems still exist when the database technology and the artificial intelligence technology are combined to achieve intelligent operation and maintenance management. On the one hand, existing AI systems (e.g., expert systems) can use hundreds or thousands of rule-based knowledge to perform heuristic searches, inferences, but do not have the ability to efficiently retrieve access to existing databases and manage large amounts of data; on the other hand, current database management systems are optimized to handle massive amounts of data and transactions, but are incapable of expressing and handling rule-based knowledge. Finally, the IT system is difficult to combine with the database and artificial intelligence, the intelligent operation and maintenance level of the IT system cannot be further improved, and the use experience of the user on the IT system is influenced.
Disclosure of Invention
The invention mainly solves the technical problem of how to improve the intelligent operation and maintenance level of the existing IT system. In order to solve the technical problem, the present application provides an intelligent operation and maintenance monitoring method of an IT system and an intelligent operation and maintenance monitoring system thereof.
According to a first aspect, an embodiment provides an intelligent operation and maintenance monitoring method for an IT system, including: acquiring service information of an IT system; determining an intelligent model from a preset knowledge system according to the service information, wherein the intelligent model is one or the combination of a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model; inputting the service information into the intelligent model, and creating a robot processing model, wherein the robot processing model comprises an operation and maintenance operation task corresponding to the service information; and driving a preset model execution engine by using the robot processing model to execute the operation and maintenance operation task so as to monitor the IT system.
The acquiring the service information of the IT system comprises the following steps: receiving a task instruction, an operation instruction and/or a behavior instruction of an operation and maintenance worker to the IT system to obtain the service information; or monitoring event behaviors and/or abnormal behaviors of each node in the IT system to obtain the service information.
The determining an intelligent model from a preset knowledge system according to the service information includes: performing instruction analysis or behavior analysis on the service information to obtain an operation-service-node list and a time-task result; and comparing the operation-service-node list and the time-task result with a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model in a preset knowledge system one by one to confirm a matched intelligent model from the knowledge system.
For a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model in the knowledge system, the process of constructing any one of the models comprises the following steps: for a new service message, forming a new instruction, a new event behavior or a new abnormal behavior contained in the service message into summary information, and defining and classifying the service message according to the summary information to obtain a corresponding matching service and a rule strategy; compiling a model framework of the new business information through a preset model designer according to the matching service and the rule strategy; converting the model framework by using a preset modeling language to obtain a model corresponding to the new service information; and loading the model corresponding to the new service information into the knowledge system so as to update the knowledge system.
The inputting the service information into the intelligent model and creating a robot processing model comprises: for business information about task instructions or operation instructions sent by operation and maintenance personnel, inputting the business information into a matched intelligent model in the knowledge system, and creating a task-type robot processing model, wherein the task-type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system; for business information about behavior instructions sent by operation and maintenance personnel, inputting the business information into a matched intelligent model in a knowledge system, and creating a monitoring type robot processing model, wherein the monitoring type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system; and for the business information of the event behaviors or the abnormal behaviors of each node in the IT system, inputting the business information into the matched intelligent model in the knowledge system, and creating a perception type robot processing model, wherein the perception type robot processing model is used for carrying out operation and maintenance operations of node detection, abnormal perception and/or abnormal restoration on the IT system.
The method for monitoring the IT system by utilizing the robot processing model to drive a preset model execution engine to execute the operation and maintenance operation task comprises the following steps: inquiring the time-task result according to the clock signal of the IT system, and scheduling the time corresponding to the operation and maintenance task after the time is met; driving a preset model execution engine by using the robot processing model, executing the operation and maintenance operation task corresponding to the time according to the operation-service-node list, and monitoring the IT system in the execution process; the monitored content comprises one or more of data integration, data analysis, report generation, routine inspection of software and hardware, alarm analysis, fault early warning, node detection, anomaly perception and anomaly repair.
After the operation and maintenance operation task is executed by driving a preset model execution engine by the robot processing model, the method further comprises the following steps: feeding back an execution result of the operation and maintenance operation task, wherein the execution result comprises one or more of operation log information, error report log information and communication service information; and generating an intelligent operation and maintenance report of the IT system according to the execution result, wherein the intelligent operation and maintenance report is used for filing operation and maintenance conditions of the IT system and checking by a user.
According to a second aspect, an embodiment provides an intelligent operation and maintenance monitoring system of an IT system, including: the acquiring unit is used for acquiring the service information of the IT system; the confirming unit is used for confirming an intelligent model from a preset knowledge system according to the service information, wherein the intelligent model is one or the combination of a service model, a mathematical model, a fuzzy recognition model, a mode matching model, a rule model, a strategy model and a standard specification model; the robot unit is used for inputting the service information into the intelligent model and creating a robot processing model, and the robot processing model comprises an operation and maintenance operation task corresponding to the service information; and the execution unit is used for driving a preset model execution engine by using the robot processing model to execute the operation and maintenance operation task so as to monitor the IT system. The robot process model created by the robot cell includes: a task-type robot processing model, a monitoring-type robot processing model, or a perception-type robot processing model; the task-type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system, the monitoring-type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system, and the perception-type robot processing model is used for performing operation and maintenance operations of node detection, abnormal perception and/or abnormal restoration on the IT system.
According to a third aspect, an embodiment provides a computer-readable storage medium, which includes a program, where the program is executable by a processor to implement the intelligent operation and maintenance monitoring method described in the first aspect.
The beneficial effect of this application is:
according to the above embodiment, an intelligent operation and maintenance monitoring method for an IT system and an intelligent operation and maintenance monitoring system thereof are provided, wherein the intelligent operation and maintenance monitoring method comprises the following steps: acquiring service information of an IT system; determining an intelligent model from a preset knowledge system according to the service information; inputting the service information into the intelligent model, and creating a robot processing model, wherein the robot processing model comprises an operation and maintenance operation task corresponding to the service information; and driving a preset model execution engine by using the robot processing model to execute operation and maintenance operation tasks so as to monitor the IT system. On the first hand, due to the fact that a knowledge system is constructed in advance, business information can accurately correspond to an intelligent model in the knowledge system, and the business information can be rapidly input into the matched intelligent model to create a robot processing model required by an operation and maintenance operation task; in the second aspect, the created robot processing model has strong knowledge searching and reasoning capabilities and high-efficiency retrieval access and mass data management capabilities due to the fact that the created robot processing model comprises all matching services and rule strategies related to business information, so that the robot processing model is beneficial to quickly executing related operation and maintenance operation tasks in an IT system and simultaneously improving the intelligent operation and maintenance level of the IT system; in a third aspect, the system obtained according to the intelligent operation and maintenance monitoring method mainly comprises an acquisition unit, a confirmation unit, a robot unit and an execution unit, so that the system can know and comprehensively analyze the operation and maintenance conditions of monitoring objects such as software, hardware, network equipment and the like in real time, and can provide a basis for platform optimization and operation and maintenance planning of the IT system through generated log information or early warning information; in a fourth aspect, the intelligent operation and maintenance monitoring system can not only perform intelligent operation and maintenance operation on some repetitive tasks to meet the daily business requirements of operation and maintenance personnel, but also sense the event behaviors and abnormal behaviors of each node in real time, so that the events are automatically processed or the abnormalities are repaired according to an intelligent model in a knowledge system, the intelligent management function of the abnormalities or events is realized, the intervention strength of the operation and maintenance personnel is reduced, and the user experience is facilitated to be improved.
Drawings
FIG. 1 is a flowchart of an intelligent operation and maintenance monitoring method according to the present application;
FIG. 2 is a detailed flowchart of the intelligent operation and maintenance monitoring method;
FIG. 3 is a flow diagram of any model building process in the knowledge system;
FIG. 4 is a flow diagram of generating an intelligent operation and maintenance report for an IT system;
FIG. 5 is a schematic diagram illustrating the working principle of the intelligent operation and maintenance monitoring method according to the present application;
FIG. 6 is a schematic diagram of the working principle of a task-type and monitoring-type robot handling model;
FIG. 7 is a schematic diagram of the operation of a perception type robot process model;
FIG. 8 is a schematic structural diagram of the intelligent operation and maintenance monitoring system of the present application;
FIG. 9 is a schematic diagram of class structure and relationships of a knowledge system;
FIG. 10 is a schematic diagram of class structure and relationships of a task-based robotic process model;
fig. 11 is a schematic diagram of class structure and relationship of a perception type robot handling model.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).
The first embodiment,
Referring to fig. 1, the present application discloses an intelligent operation and maintenance monitoring method for an IT system, which mainly includes steps S100-S400, which are described below.
And step S100, acquiring the service information of the IT system. In a specific embodiment, the method for acquiring the service information of the IT system includes: receiving a task instruction, an operation instruction and/or a behavior instruction of an operation and maintenance worker to the IT system to obtain service information; or monitoring event behaviors and/or abnormal behaviors of each node in the IT system to obtain service information.
IT should be noted that, the operation and maintenance personnel in this embodiment refers to an operation and maintenance engineer responsible for managing the IT system, and ITs responsibility is to maintain normal operation of the system, and during this period, various operation instructions, such as a task instruction for integrating and classifying data, an operation instruction generated by a daily report, and a task instruction for checking the disk/memory usage rate, are often sent to the IT system as needed.
IT should be noted that, a node of an IT system refers to middleware deployed on the system, such as MQ/Kafka queue, tomcat, and the like on a server. In addition, for a node in the IT system, the event behavior or abnormal behavior of the node can be monitored by means of a sensor of the node, where "sensor" refers to a specific service component located at the node and is specifically used for sensing or collecting the event or abnormal state at the node.
Step S200, determining an intelligent model from a preset knowledge system according to the service information, wherein the intelligent model is one or the combination of a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model. In one embodiment, see FIG. 2, the step S200 may include steps S210-S220, respectively, as described below.
Step S210, performing instruction analysis or behavior analysis on the service information to obtain an operation-service-node list and a time-task result.
In the operation-service-node list, "operation" refers to a related operation task, "service" refers to an application service corresponding to the operation task, and "node list" refers to a server list corresponding to the operation task. In addition, "time" in the time-task result refers to a trigger time for executing the operation task, and "task result" refers to information that needs to be acquired or finally output during execution of the operation task.
Step S220, comparing the operation-service-node list and the time-task result with a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model in a preset knowledge system one by one so as to confirm a matched intelligent model from the knowledge system.
For example, fig. 5, the knowledge system is constructed in advance by operation and maintenance personnel, and may include various types of model libraries. The independent service model library, the composite service model library and the arrangement service model library are all used for loading service models of various operation and maintenance operation tasks, the mathematical model library is used for loading data models of various operation and maintenance operation tasks, the standard specification model library is used for loading standard specification models of various operation and maintenance operation tasks, the rule model library is used for loading rule models of various operation and maintenance operation tasks, the strategy model library is used for loading strategy models of various operation and maintenance operation tasks, the pattern matching model library is used for loading pattern matching models of various operation and maintenance operation tasks, and the fuzzy recognition model library is used for loading pattern recognition models of various operation and maintenance operation tasks. In addition, each standard specification model in the standard specification model library has specification items such as messages, XML frameworks, interfaces, protocols and the like, each rule model in the rule model library has specification items such as matching rules, scheduling rules, sequencing rules, message rules (short messages and WS), protocol rules and the like, each policy model in the policy model library has policy items for executing policies and other policies, each pattern matching model in the pattern matching model library is used for sorting specific event classification information to obtain a corresponding behavior table, and each fuzzy recognition model in the fuzzy recognition model library is used for sorting specific abnormal classification information to obtain a corresponding behavior table.
For example, for a rule model, an event/exception classification information may be configured in advance in the rule model, where a matching rule is jsoneconcept and is classified as Json transition exception, and when the rule model is executed, if content containing jsoneconcept keyword is captured, it is automatically determined that the Json transition exception occurs in a task, and the execution continues downward according to a rule defined in the rule model in advance (the rule model may include a processing action after a specified event/exception occurs).
And step S300, inputting the service information into the intelligent model confirmed in the step S220, and creating a robot processing model, wherein the robot processing model comprises the operation and maintenance operation task corresponding to the service information. In one embodiment, see FIG. 2, the step S300 may include steps S310-S330, respectively, as described below.
Step S310, for the business information about the task instruction or the operation instruction sent by the operation and maintenance personnel, the business information is input into the matched intelligent model in the knowledge system, and a task-type robot processing model is created, wherein the task-type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system.
It should be noted that the task-based robot processing model is directed to a common or repeated task or operation, and the execution time policy and initialization parameters are already included in the model.
Step S320, inputting the business information about the behavior instruction sent by the operation and maintenance personnel into the matched intelligent model in the knowledge system, and creating a monitoring type robot processing model, wherein the monitoring type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system.
IT should be noted that the monitoring robot processing model provides centralized alarm analysis and fault early warning for the information environment of the IT system and the operating conditions of various service systems, and aims to help operation and maintenance personnel to quickly perform fault location to improve operation and maintenance efficiency. The monitored objects include, but are not limited to, servers, storage, networks, system software, middleware (including application servers, databases, etc.), operating states of application systems, and system resource utilization rates.
It should be noted that the "information environment" herein refers to usage of a disk, a CPU, and a memory, and the "operation of each service system" includes checking whether tomcat is operating normally; in addition, "alarm analysis processing and fault early warning" means that the system can configure specific monitoring items and monitoring early warning rules, and the system conforming to the rules can automatically generate early warning information.
Step S330, inputting the business information of the event behaviors or the abnormal behaviors of each node in the IT system into the matched intelligent model in the knowledge system, and creating a perception type robot processing model, wherein the perception type robot processing model is used for carrying out operation and maintenance operations of node detection, abnormal perception and/or abnormal restoration on the IT system.
The perception type robot processing model is used for perceiving the operation conditions of each node of the system aiming at the events and the abnormalities of the nodes of the system. When a specific event behavior or abnormal behavior is received, the next processing action can be executed according to the event matching rule or the abnormal matching rule in the model, and the purposes of recording the event and repairing the abnormality are achieved.
In this embodiment, steps S310, S320, S330 are in a parallel relationship, and the purpose is to construct different types of robot processing models for different business information.
And step S400, driving a preset model execution engine by using the created robot processing model to execute operation and maintenance operation tasks so as to monitor the IT system. In one embodiment, see FIG. 2, the step S400 may include steps S410-S420, described below, respectively.
And step S410, inquiring a time-task result according to a clock signal of the IT system, and scheduling the time corresponding to the operation and maintenance task after the time is met.
Step S420, a preset model execution engine is driven by the robot processing model, the operation and maintenance operation tasks corresponding to the time are executed according to the operation-service-node list, and an IT system is monitored in the execution process; the monitored content comprises one or more of data integration, data analysis, report generation, routine inspection of software and hardware, alarm analysis, fault early warning, node detection, anomaly perception and anomaly repair.
IT should be noted that, in the present embodiment, the preset model execution Engine (e.g., Powerrich mathmatic Modeling Engine) refers to an executor of each node in the IT system, and the executor may be some conventional functional modules, such as a functional component of an external DSL (Domain-Specific Language), which can parse the source code, generate and logically execute a series of processing actions through a parsing/syntax tree. Each actuator has a specific processing action and will not be described in detail here.
In the embodiment of the present application, the process of constructing any one of the service model, the mathematical model, the fuzzy recognition model, the pattern matching model, the rule model, the policy model and the standard specification model in the knowledge system includes steps a01-a05, which are described below.
Step a01, for a new service information, forming summary information from new instructions, new event behaviors or new abnormal behaviors contained in the service information.
IT should be noted that, if an abnormal behavior (or an event behavior) of a node of the IT system is received and compared, the abnormal behavior (or the event behavior) is found not to match any intelligent model in the knowledge system, the abnormal behavior may be defined as a new abnormal behavior (or the event behavior may be defined as a new event behavior).
For example, for summary information formed by a new abnormal behavior, the summary information may include abnormal stack contents (usually error log information, specifically, time, trigger condition, error content, and operation condition of the server).
Step A02, defining and classifying the service information according to the abstract information to obtain the corresponding matching service and rule strategy.
Step A03, according to the matching service and rule strategy, the model framework of the new business information is compiled through a preset model designer.
It should be noted that the preset model designer may be some existing mathematical model programming application software, and may freely combine some operation function modules in a graphical manner, so as to define the operation and maintenance operation task, service, node list, time, and task result. Since such model designers are already in the prior art (e.g., powerrich Modeling Designer), they will not be described in detail here.
And step A04, converting the compiled model frame by using a preset modeling language to obtain a model corresponding to the new service information.
It should be noted that the preset modeling language refers to a computer programming language, and the programming languages used by different model designers differ according to actual situations, so the modeling language is not limited here. For example, a Powerrich mathmatic Modeling Language, which is an external DSL (Domain-Specific Language), can be used, whose source code is written based on standard XML, specifically for converting some data models.
And step A05, loading the model corresponding to the new service information into a knowledge system so as to update the knowledge system.
Further, the intelligent operation and maintenance monitoring method claimed in the present application may further include steps S500-S600 after step S400, which are respectively described as follows.
And step S500, after the robot processing model is used for driving a preset model execution engine to execute the operation and maintenance operation task, feeding back an execution result of the operation and maintenance operation task, wherein the execution result comprises one or more of operation log information, error report log information and communication service information.
It should be noted that the "communication service information" refers to information transmitted on the external application interface, for example, some specific information needs to be acquired during the operation and maintenance task, and the information needs to be acquired through the external interface.
And step S600, generating an intelligent operation and maintenance report of the IT system according to the execution result, wherein the intelligent operation and maintenance report is used for filing the operation and maintenance condition of the IT system and checking by a user.
It should be noted that the user can view the messages by sending the mails/short messages to the operation and maintenance staff. The contact information (such as e-mail/short message) of the contact person is input in advance in the intelligent model, and the process node is appointed to send the e-mail/short message.
For clear understanding of the technical solutions disclosed in the embodiments, the intelligent operation and maintenance monitoring method will be described herein with reference to schematic operation principle diagrams shown in fig. 5 to 7.
Referring to fig. 5, the operation and maintenance staff assist in constructing a knowledge system in advance according to the business needs of the IT system, so that the knowledge system includes an independent service model library, a composite service model library, an arrangement service model library, a mathematical model library, a standard specification model library, a rule model library, a policy model library, a pattern matching model library, and a fuzzy recognition model library, thereby being capable of conveniently calling various models in the knowledge system to form an intelligent model matched with any business information. Referring to fig. 9, the definition contents of the operation information and the behavior information, the classification contents of the event classification and the anomaly classification, and the definition contents of various models can be known.
In the first aspect, referring to fig. 5 and 6, an operation and maintenance person issues a task instruction, an operation instruction and/or a behavior instruction to an IT system to form a task allocation table (i.e., service information), and the system performs task analysis on the task allocation table through relationship data of task definition, behavior definition and operation definition to obtain an operation-service-node list and a time-task result. The system identifies an intelligent model matching the business information from the knowledge system, inputs the business information into the intelligent model, and creates a task-type robot processing model (or a monitoring-type robot processing model). The system inquires the time-task result corresponding to the service information according to the clock signal, and schedules the operation and maintenance operation task corresponding to the time after the time is met, so that the model execution engine (namely an actuator) in the created robot processing model driving node is used for executing the corresponding operation and maintenance operation task. In the process of executing the operation and maintenance operation task, feeding back an execution result (running log information, error reporting log information or communication service information) of the operation and maintenance operation task, and sending the communication service information to external equipment; meanwhile, the system receives communication service information from external equipment, generates an intelligent operation and maintenance report of the IT system by recording the execution result of the task, and performs filing on the intelligent operation and maintenance report or sends the intelligent operation and maintenance report to operation and maintenance personnel in a mail/short message mode for user check. Referring to fig. 10, the specific contents of task assignment, notification and log service, the management contents of the robot process model, and the definition contents of various models can be known.
In a second aspect, referring to fig. 5 and 7, the system monitors the event behavior or abnormal behavior of each node through a sensor (specific service component) located at each node, forms service information, identifies the service information by means of a knowledge system, and performs behavior analysis on the service information through the relationship data of the event behavior and the abnormal behavior, thereby obtaining an operation-service-node list and a time-task result. The system confirms an intelligent model matched with the business information from the knowledge system, inputs the business information into the intelligent model and creates a perception type robot processing model. The system inquires the time-task result corresponding to the service information according to the clock signal, and schedules the operation and maintenance operation task corresponding to the time after the time is met, so that the model execution engine (namely an actuator) in the created robot processing model driving node is used for executing the corresponding operation and maintenance operation task. In the process of executing the operation and maintenance operation task, feeding back an execution result (running log information, error reporting log information or communication service information) of the operation and maintenance operation task, and sending the communication service information to external equipment; meanwhile, the system receives communication service information from external equipment, generates an intelligent operation and maintenance report of the IT system by recording the execution result of the task, and performs filing on the intelligent operation and maintenance report or sends the intelligent operation and maintenance report to operation and maintenance personnel in a mail/short message mode for user check. Referring to fig. 11, the classification contents of event classification and abnormal classification, the specific contents of notification and log service, the management contents of the robot processing model, and the definition contents of various models can be known.
Example II,
Referring to fig. 8, on the basis of the intelligent operation and maintenance monitoring method disclosed in the first embodiment, the present application further discloses an intelligent operation and maintenance monitoring system of an IT system, which includes an obtaining unit 11, a confirming unit 12, a robot unit 13, and an executing unit 14, which are respectively described below.
The acquiring unit 11 is configured to acquire service information of an IT system. In a specific embodiment, the obtaining unit 11 receives a task instruction, an operation instruction and/or a behavior instruction of an operation and maintenance worker to the IT system to obtain service information; alternatively, the obtaining unit 11 monitors event behaviors and/or abnormal behaviors of each node in the IT system to obtain the service information.
The confirming unit 12 is configured to determine an intelligent model from a preset knowledge system according to the service information, where the intelligent model is one or a combination of a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a policy model, and a standard specification model. In a specific embodiment, the validation unit 12 performs instruction parsing or behavior parsing on the service information to obtain an operation-service-node list and a time-task result; in addition, the validation unit 12 compares the operation-service-node list and the time-task result with a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a policy model and a standard specification model in a preset knowledge system one by one to validate a matched intelligent model from the knowledge system.
And the robot unit 13 is used for inputting the service information into the confirmed intelligent model and creating a robot processing model, wherein the robot processing model comprises an operation and maintenance operation task corresponding to the service information. In a specific embodiment, the robot process model created by the robot unit 13 comprises: a task-type robot processing model, a monitoring-type robot processing model, or a perception-type robot processing model; the task type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system, the monitoring type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system, and the perception type robot processing model is used for performing operation and maintenance operations of node detection, abnormity perception and/or abnormity restoration on the IT system.
The execution unit 14 is used for driving a preset model execution engine by using the robot processing model to execute the operation and maintenance operation task so as to monitor the IT system. In one embodiment, the execution unit 14 queries the time-task result according to the clock signal of the IT system, and schedules the time corresponding to the operation and maintenance task after the time is satisfied; the execution unit 14 drives a preset model execution engine by using the robot processing model, executes the operation and maintenance operation task corresponding to the time according to the operation-service-node list, and monitors the IT system in the execution process; the monitored content comprises one or more of data integration, data analysis, report generation, routine inspection of software and hardware, alarm analysis, fault early warning, node detection, anomaly perception and anomaly repair.
Further, the intelligent operation and maintenance monitoring system disclosed in this embodiment further includes a communication service unit 15, which is connected to the execution unit 14, and configured to feed back an execution result of the operation and maintenance operation task, where the execution result includes one or more of operation log information, error report log information, and communication service information; and the communication service unit 15 generates an intelligent operation and maintenance report of the IT system according to the execution result, wherein the intelligent operation and maintenance report is used for filing the operation and maintenance condition of the IT system and checking by a user.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (10)

1. An intelligent operation and maintenance monitoring method of an IT system is characterized by comprising the following steps:
acquiring service information of an IT system;
determining an intelligent model from a preset knowledge system according to the service information, wherein the intelligent model is one or the combination of a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model;
inputting the service information into the intelligent model, and creating a robot processing model, wherein the robot processing model comprises an operation and maintenance operation task corresponding to the service information;
and driving a preset model execution engine by using the robot processing model to execute the operation and maintenance operation task so as to monitor the IT system.
2. The intelligent operation and maintenance monitoring method according to claim 1, wherein the acquiring the service information of the IT system comprises:
receiving a task instruction, an operation instruction and/or a behavior instruction of an operation and maintenance worker to the IT system to obtain the service information;
or monitoring event behaviors and/or abnormal behaviors of each node in the IT system to obtain the service information.
3. The intelligent operation and maintenance monitoring method according to claim 2, wherein the determining an intelligent model from a preset knowledge system according to the service information comprises:
performing instruction analysis or behavior analysis on the service information to obtain an operation-service-node list and a time-task result;
and comparing the operation-service-node list and the time-task result with a service model, a mathematical model, a fuzzy recognition model, a pattern matching model, a rule model, a strategy model and a standard specification model in a preset knowledge system one by one to confirm a matched intelligent model from the knowledge system.
4. The intelligent operation and maintenance monitoring method according to claim 3, wherein the process of constructing any one of the service model, the mathematical model, the fuzzy recognition model, the pattern matching model, the rule model, the policy model and the standard specification model in the knowledge system comprises:
for a new service message, forming a new instruction, a new event behavior or a new abnormal behavior contained in the service message into summary information, and defining and classifying the service message according to the summary information to obtain a corresponding matching service and a rule strategy;
compiling a model framework of the new business information through a preset model designer according to the matching service and the rule strategy;
converting the model framework by using a preset modeling language to obtain a model corresponding to the new service information;
and loading the model corresponding to the new service information into the knowledge system so as to update the knowledge system.
5. The intelligent operation and maintenance monitoring method according to claim 3, wherein the inputting the service information into the intelligent model and creating a robot processing model comprises:
for business information about task instructions or operation instructions sent by operation and maintenance personnel, inputting the business information into a matched intelligent model in the knowledge system, and creating a task-type robot processing model, wherein the task-type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system;
for business information about behavior instructions sent by operation and maintenance personnel, inputting the business information into a matched intelligent model in a knowledge system, and creating a monitoring type robot processing model, wherein the monitoring type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system;
and for the business information of the event behaviors or the abnormal behaviors of each node in the IT system, inputting the business information into the matched intelligent model in the knowledge system, and creating a perception type robot processing model, wherein the perception type robot processing model is used for carrying out operation and maintenance operations of node detection, abnormal perception and/or abnormal restoration on the IT system.
6. The intelligent operation and maintenance monitoring method according to claim 5, wherein the driving of the preset model execution engine by the robot processing model to execute the operation and maintenance task to monitor the IT system comprises:
inquiring the time-task result according to the clock signal of the IT system, and scheduling the time corresponding to the operation and maintenance task after the time is met;
driving a preset model execution engine by using the robot processing model, executing the operation and maintenance operation task corresponding to the time according to the operation-service-node list, and monitoring the IT system in the execution process; the monitored content comprises one or more of data integration, data analysis, report generation, routine inspection of software and hardware, alarm analysis, fault early warning, node detection, anomaly perception and anomaly repair.
7. The intelligent operation and maintenance monitoring method according to any one of claims 1-6, further comprising, after driving a preset model execution engine with the robot process model to execute the operation and maintenance task:
feeding back an execution result of the operation and maintenance operation task, wherein the execution result comprises one or more of operation log information, error report log information and communication service information;
and generating an intelligent operation and maintenance report of the IT system according to the execution result, wherein the intelligent operation and maintenance report is used for filing operation and maintenance conditions of the IT system and checking by a user.
8. An intelligent operation and maintenance monitoring system of an IT system, comprising:
the acquiring unit is used for acquiring the service information of the IT system;
the confirming unit is used for confirming an intelligent model from a preset knowledge system according to the service information, wherein the intelligent model is one or the combination of a service model, a mathematical model, a fuzzy recognition model, a mode matching model, a rule model, a strategy model and a standard specification model;
the robot unit is used for inputting the service information into the intelligent model and creating a robot processing model, and the robot processing model comprises an operation and maintenance operation task corresponding to the service information;
and the execution unit is used for driving a preset model execution engine by using the robot processing model to execute the operation and maintenance operation task so as to monitor the IT system.
9. The intelligent operation and maintenance monitoring system of claim 8,
the robot process model created by the robot cell includes: a task-type robot processing model, a monitoring-type robot processing model, or a perception-type robot processing model;
the task-type robot processing model is used for performing operation and maintenance operations of data integration, data analysis and/or report generation on the IT system, the monitoring-type robot processing model is used for performing operation and maintenance operations of software and hardware routine inspection, alarm analysis and/or fault early warning on the IT system, and the perception-type robot processing model is used for performing operation and maintenance operations of node detection, abnormal perception and/or abnormal restoration on the IT system.
10. A computer-readable storage medium, comprising a program executable by a processor to implement the intelligent operation and maintenance monitoring method according to any one of claims 1-7.
CN201910894961.4A 2019-09-20 2019-09-20 Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system Active CN110633189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910894961.4A CN110633189B (en) 2019-09-20 2019-09-20 Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910894961.4A CN110633189B (en) 2019-09-20 2019-09-20 Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system

Publications (2)

Publication Number Publication Date
CN110633189A true CN110633189A (en) 2019-12-31
CN110633189B CN110633189B (en) 2023-04-07

Family

ID=68972127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910894961.4A Active CN110633189B (en) 2019-09-20 2019-09-20 Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system

Country Status (1)

Country Link
CN (1) CN110633189B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111240874A (en) * 2020-01-03 2020-06-05 联想(北京)有限公司 Data processing method and electronic equipment
CN111522705A (en) * 2020-03-23 2020-08-11 广东工业大学 Intelligent operation and maintenance solution method for industrial big data
CN112365004A (en) * 2020-11-27 2021-02-12 广东省科学院智能制造研究所 Robot autonomous anomaly restoration skill learning method and system
CN112711508A (en) * 2020-12-21 2021-04-27 航天信息股份有限公司 Intelligent operation and maintenance service system facing large-scale client system
CN112817825A (en) * 2021-02-26 2021-05-18 上海德衡数据科技有限公司 Operation and maintenance early warning and prevention system based on multi-sensor information fusion
CN112990744A (en) * 2021-03-30 2021-06-18 杭州东方通信软件技术有限公司 Automatic operation and maintenance method and device for massive million-level cloud equipment
WO2024061081A1 (en) * 2022-09-23 2024-03-28 中兴通讯股份有限公司 Autonomous operation and maintenance method and apparatus, and computer-readable storage medium and electronic apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1684170A2 (en) * 2005-01-21 2006-07-26 Outsystems, software em redes, S. A. Software development system and method
CN102195813A (en) * 2011-05-04 2011-09-21 成都勤智数码科技有限公司 Method and device for intelligently creating operation and maintenance worksheet
CN103888287A (en) * 2013-12-18 2014-06-25 北京首都国际机场股份有限公司 Information system integrated operation and maintenance monitoring service early warning platform and realization method thereof
CN108665237A (en) * 2018-05-03 2018-10-16 广州供电局有限公司 Method for establishing automatic inspection model and positioning abnormity based on business system
CN110209104A (en) * 2019-05-24 2019-09-06 武汉烽火技术服务有限公司 The system and method for intelligent monitoring and operation and maintenance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1684170A2 (en) * 2005-01-21 2006-07-26 Outsystems, software em redes, S. A. Software development system and method
CN102195813A (en) * 2011-05-04 2011-09-21 成都勤智数码科技有限公司 Method and device for intelligently creating operation and maintenance worksheet
CN103888287A (en) * 2013-12-18 2014-06-25 北京首都国际机场股份有限公司 Information system integrated operation and maintenance monitoring service early warning platform and realization method thereof
CN108665237A (en) * 2018-05-03 2018-10-16 广州供电局有限公司 Method for establishing automatic inspection model and positioning abnormity based on business system
CN110209104A (en) * 2019-05-24 2019-09-06 武汉烽火技术服务有限公司 The system and method for intelligent monitoring and operation and maintenance

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111240874A (en) * 2020-01-03 2020-06-05 联想(北京)有限公司 Data processing method and electronic equipment
CN111240874B (en) * 2020-01-03 2022-01-14 联想(北京)有限公司 Data processing method and electronic equipment
CN111522705A (en) * 2020-03-23 2020-08-11 广东工业大学 Intelligent operation and maintenance solution method for industrial big data
CN112365004A (en) * 2020-11-27 2021-02-12 广东省科学院智能制造研究所 Robot autonomous anomaly restoration skill learning method and system
CN112711508A (en) * 2020-12-21 2021-04-27 航天信息股份有限公司 Intelligent operation and maintenance service system facing large-scale client system
CN112817825A (en) * 2021-02-26 2021-05-18 上海德衡数据科技有限公司 Operation and maintenance early warning and prevention system based on multi-sensor information fusion
CN112817825B (en) * 2021-02-26 2022-09-20 上海德衡数据科技有限公司 Operation and maintenance early warning and prevention system based on multi-sensor information fusion
CN112990744A (en) * 2021-03-30 2021-06-18 杭州东方通信软件技术有限公司 Automatic operation and maintenance method and device for massive million-level cloud equipment
WO2024061081A1 (en) * 2022-09-23 2024-03-28 中兴通讯股份有限公司 Autonomous operation and maintenance method and apparatus, and computer-readable storage medium and electronic apparatus

Also Published As

Publication number Publication date
CN110633189B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110633189B (en) Intelligent operation and maintenance monitoring method and intelligent operation and maintenance monitoring system of IT system
US10901727B2 (en) Monitoring code sensitivity to cause software build breaks during software project development
CN107886238B (en) Business process management system and method based on mass data analysis
Rausch et al. An empirical analysis of build failures in the continuous integration workflows of java-based open-source software
CN112394922B (en) Decision configuration method, business decision method and decision engine system
Panjer Predicting eclipse bug lifetimes
De Medeiros et al. An outlook on semantic business process mining and monitoring
US8027946B1 (en) Higher order logic applied to expert systems for alarm analysis, filtering, correlation and root cause
US20180129483A1 (en) Developing software project plans based on developer sensitivity ratings detected from monitoring developer error patterns
KR20180108446A (en) System and method for management of ict infra
Aggarwal et al. Test case generation from uml state machine diagram: A survey
CN112446511A (en) Fault handling method, device, medium and equipment
CN111858251A (en) Big data computing technology-based data security audit method and system
US8635601B2 (en) Method of calculating key performance indicators in a manufacturing execution system
Nogueira et al. Monitoring a ci/cd workflow using process mining
Xia et al. Dependability prediction of WS-BPEL service compositions using petri net and time series models
CN116611813B (en) Intelligent operation and maintenance management method and system based on knowledge graph
Patiniotakis et al. Assessing flexibility in event-driven process adaptation
JP2009245154A (en) Computer system, method, and computer program for evaluating symptom
Tang Towards automation in software test life cycle based on multi-agent
Conforti et al. A software framework for risk-aware business process management
Adam et al. A fuzzy based approach to the improvement of business processes
Leopold Business process management
AU2021287457B2 (en) "Log Data Compliance"
Dávid A model-driven approach for processing complex events

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant