CN109495308B - Automatic operation and maintenance system based on management information system - Google Patents

Automatic operation and maintenance system based on management information system Download PDF

Info

Publication number
CN109495308B
CN109495308B CN201811427415.1A CN201811427415A CN109495308B CN 109495308 B CN109495308 B CN 109495308B CN 201811427415 A CN201811427415 A CN 201811427415A CN 109495308 B CN109495308 B CN 109495308B
Authority
CN
China
Prior art keywords
software
maintenance
service
database
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811427415.1A
Other languages
Chinese (zh)
Other versions
CN109495308A (en
Inventor
李奕彤
聂帅
张永伟
宿晓丹
杨明凯
朱健
陆彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201811427415.1A priority Critical patent/CN109495308B/en
Publication of CN109495308A publication Critical patent/CN109495308A/en
Application granted granted Critical
Publication of CN109495308B publication Critical patent/CN109495308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0876Aspects of the degree of configuration automation
    • H04L41/0886Fully automatic configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an automatic operation and maintenance system based on a management information system, which mainly solves the problems of high labor cost, difficult operation, dispersed system configuration and the like in the operation and maintenance work of the management information system. The automatic operation and maintenance system realized based on the technology provides functions of resource management, system monitoring, system configuration, an operation and maintenance knowledge base, alarm management, operation audit and the like by relying on a client agent and a client plug-in, and realizes centralized management and remote monitoring of resources in a management information system, centralized configuration of the system, rapid elimination of software faults and regular detection and alarm of the system. The operation and maintenance working efficiency is practically improved, and the system operation and maintenance cost is reduced.

Description

Automatic operation and maintenance system based on management information system
Technical Field
The invention belongs to the field of business management, and particularly relates to an automatic operation and maintenance system based on a management information system.
Background
In recent years, the information industry in China is rapidly developed, and in order to improve the working efficiency and the service management level, more and more management information systems are developed and deployed in the office environment. In order to ensure safe and reliable operation of the service system, a normalized system operation maintenance environment and mechanism need to be established, and higher requirements are provided for operation and maintenance work of the system. Meanwhile, the need for autonomous control of software and hardware in the country for managing information systems is becoming more and more urgent.
The operation and maintenance mode of the current management information system is mainly manual operation and maintenance, that is, a large number of operation and maintenance personnel are required to go to an engineering site to perform software deployment, service start and stop, system configuration, fault troubleshooting and other work, and the operation and maintenance personnel are required to stay in the site to deal with emergency at any time under necessary conditions. The current operation and maintenance work has the problems of high labor cost, low efficiency, high operation and maintenance work and learning cost, poor replaceability of operation and maintenance personnel and the like. With the continuous expansion and increase of the scale and the quantity of the future management information systems, the traditional manual operation and maintenance mode is difficult to deal with the change, so that the current operation and maintenance mode needs to be changed urgently to realize the automatic operation and maintenance of the management information systems.
At present, various major software and internet companies are intensively researching automated operation and maintenance technologies, and some open-source operation and maintenance software such as Nagios and cloudsight exist in the market. The software can monitor the state of a host, network equipment such as a switch, a router and the like, but cannot adapt to a domestic hardware environment at present. Therefore, providing a technology for automating operation and maintenance of a management information system based on a home-made environment is one of the key links for realizing the automation of operation and maintenance of the management information system.
Disclosure of Invention
The invention aims to solve the problems in the operation and maintenance work of the management information system, and realizes the functions of system monitoring, resource management, system configuration, an operation and maintenance knowledge base, alarm management, operation audit and the like of the management information system by developing an automatic operation and maintenance system so as to realize the operation and maintenance automation to the maximum extent, improve the operation and maintenance work efficiency and reduce the operation and maintenance cost. The automatic operation and maintenance system comprises a system monitoring module, a resource management module, a system configuration module, an operation and maintenance knowledge base module, an alarm management module, an operation auditing module, an operation and maintenance database, a client agent and a client plug-in;
the system monitoring module is used for processing an operation request sent by a user in a foreground, converting a request parameter of the user into a parameter format accepted by a client agent and calling the client agent to execute remote operation; the operation request of the user comprises the starting, stopping and heartbeat detection of business software, a database and system services in the management information system and the updating and upgrading of the business software;
the client agent sends the instruction to the client plug-in to execute specific operation, and feeds back the execution result to the service thread for calling the client plug-in line by line in a character string mode; according to different operations of a user, a business thread can define different character string processing functions to analyze the character strings and extract key information, and the extracted information is persisted to an operation and maintenance database according to the situation;
the client plug-in is used for assisting the management information system to complete system deployment, and when the system deployment is completed, the client plug-in can generate system information related to the management information system and store the system information in a text file, wherein the text file comprises a service.list file, a database.list file and a service _ db file, the service.list file stores identification, ip addresses and port numbers of all service software in the management information system, the database.list file stores identification, ip addresses and port numbers of all databases in the management information system, and the service _ db file records the corresponding relation between the service software identification and the database identification; the resource management module reads the text files by means of the client agent, analyzes the information in the text form, respectively generates corresponding resource entities, and stores the entity information and the relation data among the entities into the operation and maintenance database to realize the automatic construction of the incidence relation between the business software and the database service;
the resource management module reads system information of a management information system generated by a client plug-in through a client agent, discovers resources in the management information system by integrating the system information, and can automatically construct an incidence relation between business software and database service according to the system information;
the system configuration module has the functions of synchronization of personnel and organizations and system parameter configuration;
the operation and maintenance knowledge base module is used for recording the experiences summarized by the operation and maintenance personnel in daily work, and a user can check the experiences in the foreground;
the alarm management module is used for customizing an alarm rule in a foreground by a user and storing the alarm rule;
the operation auditing module is used for configuring surrounding notification or post notification for the methods of the key operation, tracking the execution conditions of the methods and persisting the key information to the operation and maintenance database.
The system monitoring module comprises a software service interface software service, which defines an operatesoftware () method, and can convert foreground request parameters into parameters accepted by a client agent so as to realize software start-stop, heartbeat detection and upgrading operations;
the software service interface software service is realized by a software service Impl class, an object of the software service interface software service is provided with an object processor of a Command processor type for calling a client agent, and a software Dao object for realizing a software Dao interface for persisting the generated business software object.
The client agent sends an instruction to a client plug-in to execute specific operation, and feeds back an execution result to a service thread calling the client plug-in line by line in a character string mode, namely, a class TextDealerImpl of a TextDealer interface is defined, a dealText () method scans feedback information line by line and compares the feedback information with defined keywords, when service starting information is detected, the service starting is successful, the state of service software is set to be running, and the state of the software is persisted to a running maintenance database.
The resources in the system include business software, database services and system services,
the resource management module supports maintenance of resource information in the management information system and supports import and export in the form of excel tables.
The system monitoring module supports a user to check an instruction execution result or a real-time log of service software in a foreground in real time.
The system configuration module calls different business software services with personnel and organization management functions in sequence through the REST interface according to specific business logic, and performs certain processing on the obtained data to realize the synchronization of the personnel and the organization.
When a user defines an alarm rule through the alarm management module, business software which is not concerned by the rule can be defined, and when the regular alarm detection is carried out according to the rule, the automatic operation and maintenance system automatically ignores the detection results of the business software; the user can start any one alarm rule to carry out timing alarm detection, and if the detection information contains the abnormal condition defined in the alarm rule, the alarm information is output and displayed for the user; the user can start, stop or switch the alarm rules in real time.
The operation auditing module is also used for auditing information, and the audited information comprises an IP address of an operation, a login user name, an operation object, start-stop time of the operation, an execution result and the like; the automatic operation and maintenance system can perform auditing operations including starting, stopping, detecting and upgrading services (including business software, a database and system services), updating and unloading system plug-ins, setting timing heartbeat detection, troubleshooting software faults, starting and stopping timing alarm detection, switching alarm rules, remotely restarting the server and other important operations; the user can view the audit record of the operation and export it as an excel form.
The automatic operation and maintenance system completes troubleshooting of software existing in the management information system by executing the following steps:
step 1, according to a specific software fault, an automatic operation and maintenance system can find service software associated with the software fault;
step 2, the automatic operation and maintenance system searches a database which is associated with the service software in the step 1 in the management information system;
step 3, carrying out heartbeat detection on the associated databases in sequence and restarting the databases in the stop state;
step 4, searching all the service software associated with the stop state database, and restarting the associated service software in parallel;
and 5, restarting the service software associated with the software fault.
The user can add software faults in the foreground by himself, and dynamically associates software in the management information system for the software faults, and when corresponding faults occur, the user can remove the faults in the foreground by one key.
The automatic operation and maintenance system can dynamically configure and manage all system parameters and personnel organization information in the information system.
The configuration work does not depend on a framework of the management information system, and the automatic operation and maintenance system can complete data interaction with corresponding software and a database in the system through interfaces in sequence according to specific service logic, so that the configuration work is completed in one stop mode, and a user does not need to deeply know the relationship among the software in the management information system.
The resource management function may maintain basic information of business software, database services, system services, and the like. When the system deployment is completed, the client plug-in will generate the deployment information of the service. The operation and maintenance server can read the deployment information of the service through the client agent, analyze the information, generate structured data and store the structured data in the operation and maintenance database. For a specific database service, the operation and maintenance server can also read the service software information depending on the database service through the client agent and persist the dependency relationship to the operation and maintenance database.
The system monitoring function of the invention comprises business software monitoring, database service monitoring, system service monitoring and server monitoring. The user can start, stop, detect and upgrade the selected service entity through the browser. The system monitoring is realized by adopting the structure of a browser, an operation and maintenance server, an operation and maintenance database, a client agent and a client plug-in. And binding the client agent and the operation and maintenance server and deploying the client agent and the operation and maintenance server together. The client plug-in is deployed in a server where the monitoring object is located. The operation of the user generates corresponding parameters, firstly, the parameters are sent to the operation and maintenance server from the browser through an ajax request, and the operation and maintenance server calls the client-side proxy and transmits the corresponding parameters. And the client proxy remotely calls a client plug-in of the server where the monitoring object is located and transmits corresponding parameters, and the client plug-in finally realizes the remote monitoring of the monitoring object by the user. The operation real-time log of the client plug-in can be fed back to the operation and maintenance server, the operation and maintenance server analyzes the real-time log to generate structured data, and the data is durably transmitted to the operation and maintenance database.
The user can check the execution condition of the operation on the service entity and the real-time log generated by the service entity in real time through the client (browser), and the real-time log is pushed to the user through the websocket full-duplex real-time communication technology and the producer consumer mode.
And the heartbeat detection of the monitored object is supported at regular time, the information is pushed to all the online clients after the detection is finished, and the clients send requests to the operation and maintenance server after receiving the information to refresh the state of the current detected object.
System configuration functions include synchronization of personnel managing the information system with the organization, and configuration of system parameters. The user can carry out unified configuration on the management information system through the browser, wherein the unified configuration comprises addition of organizations and personnel, editing of menus and authorities and the like.
The operation and maintenance knowledge base function supports a user to input the operation and maintenance fault and elimination method into the operation and maintenance knowledge base through a browser. For software type faults, the user is supported to configure fault removing steps and remove faults by one key.
The alarm management function supports the user to define the alarm rule, and the user can define the threshold value of the corresponding index of the alarm detection by himself and set the alarm detection period and the repeated alarm period. After the user starts the alarm rule, the operation and maintenance server background regularly schedules an alarm detection task according to an alarm detection period set by the user, and generates alarm information according to a detection log obtained by the client agent and the alarm rule.
The operation auditing function records information such as operation objects, operation time, operation results and the like of important operations of a user by using a section-oriented programming technology, and ensures the traceability of the important operations.
The automatic operation and maintenance system can dynamically configure and manage all system parameters and personnel organization information in the information system.
The configuration work does not depend on a framework of the management information system, and the automatic operation and maintenance system can complete data interaction with corresponding software and a database in the system through interfaces in sequence according to specific service logic, so that the configuration work is completed in one stop mode, and a user does not need to deeply know the relationship among the software in the management information system. Has the advantages that:
the invention changes the operation and maintenance working mode of the management information system under the home-made environment, and has the remarkable advantages compared with the traditional operation and maintenance working mode: (1) the user does not need to be familiar with a complex linux instruction, and the operation and maintenance of the system can be realized only through the automatic operation and maintenance system, so that the learning cost of the operation and maintenance work is greatly reduced; (2) the centralized management of resources in the system is realized, and a user can conveniently master the system resource information; (3) the centralized configuration of the system is realized, the configuration steps are simplified, and most of repetitive work of operation and maintenance personnel is eliminated; (4) the centralized maintenance of operation and maintenance knowledge is realized, the problem is conveniently and quickly positioned by a user, one-key troubleshooting of software faults is realized, and the working efficiency of troubleshooting is greatly improved; (5) the system is detected and alarmed regularly, and operation and maintenance personnel are not required to be on site for a long time; (6) important operation of the system is recorded, and traceability of the important operation is ensured.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a block diagram of an automated operation and maintenance system corresponding to the technology;
FIG. 2 is a functional block diagram of an operation and maintenance management system corresponding to the technology;
FIG. 3 is a class diagram identifying the major entities and entity relationships of the automated operation and maintenance system;
FIG. 4 is a system monitoring module class diagram of the automated operation and maintenance system;
FIG. 5 is a system parameter configuration implementation class diagram of the automated operation and maintenance system;
FIG. 6 is a general software troubleshooting program flow diagram of an automated operation and maintenance system;
FIG. 7 is a diagram of alarm management module classes for an automated operation and maintenance system.
FIG. 8 is an illustration of an automated operation and maintenance system.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
The invention is realized by developing an automatic operation and maintenance system, and the system can provide functions of remote monitoring, online upgrading, real-time log viewing and the like for each service software, database and system service in the management information system. Moreover, resource management, system configuration, operation and maintenance knowledge base, alarm management and the like of the system can be realized, and important operations of users can be audited, and the system block diagram is shown in fig. 1. The automatic operation and maintenance system is realized by adopting a B/S (browser/server) architecture based on J2EE technology. The access control layer is realized based on SpringMVC, the data access layer is realized based on Hibernate technology, and the client agent and the client plug-in are realized based on shell scripts.
The management information system mainly comprises business software, system service and a database, and the three can be deployed on different application servers in different combinations; the automatic operation and maintenance system consists of operation and maintenance management software, an operation and maintenance database, a client agent and a client plug-in, wherein the operation and maintenance management software, the operation and maintenance database and the client agent are deployed on an operation and maintenance server, and the client plug-in is installed on each application server of the management information system. The client agent and the client plug-in establish network connection through an ssh protocol, and the operation and maintenance management software sends an operation instruction to the client plug-in by calling the client agent, so that remote monitoring of the management information system is realized; the operation and maintenance management software can directly exchange data with part of service software in the management information system through the http protocol and the rest interface so as to assist in completing the configuration work of the management information system.
The present invention will be described in further detail with reference to the following drawings in terms of the module design, deployment form and usage mode of the system.
Designing a software module:
the operation and maintenance management software of the automatic operation and maintenance system is shown in fig. 2 and mainly comprises modules of system monitoring, resource management, system configuration, an operation and maintenance knowledge base, alarm management, operation audit and the like.
1) Resource management
The module supports maintenance of resource information in the management information system and supports import and export in the form of excel tables. FIG. 3 identifies the major resource entities maintained in the automated operation and maintenance system, including servers, business software, databases, system services, and operation and maintenance knowledge, and the relationships between them. The Server class represents a Server, and the number, the name, the system and the ip respectively represent the main attributes of the Server, such as the number, the name, the operating system, the ip address and the like; the SystemService class represents the system service, and the name, the ip and the identity respectively represent the main attributes of the system service, such as the name, the ip address and the identification; the Knowledge class represents the operation and maintenance Knowledge, description, level, type and solution respectively represent the description of the operation and maintenance Knowledge, the important level, the Knowledge type, the solution and other main attributes; the Software type represents service Software, and the name, the ip, the port and the identity respectively represent main attributes of the service Software, such as the name, the ip address, the port number, the identification and the like; the Database class represents a Database, and the name, the ip, the port and the identity respectively represent main attributes of the Database, such as the name, the ip address, the port number, the identification and the like. The connecting lines in the diagram identify the relationship among the entities, the server, the system service, the business software and the database are in one-to-many relationship, the operation and maintenance knowledge and the business software are in one-to-many relationship, and the database and the business software are in one-to-many relationship.
The resource management module can read the system information generated by the client plug-in through the client agent, can discover resources in the system including business software, database service and system service by integrating the system information, and can automatically construct the incidence relation between the business software and the database service according to the system information.
Specifically, when the client plug-in assists the management information system to complete system deployment, system information about the management information system is generated and stored in the text file. For example, a service.list file stores the identifiers, ip addresses and port numbers of all service software in the management information system, a database.list file stores the identifiers, ip addresses and port numbers of all databases in the management information system, and a service _ db file records the corresponding relationship between the service software identifiers and the database identifiers. The resource management module reads the files by means of the client agent, analyzes the information in the text form, respectively generates corresponding resource entities (service software and a database) in fig. 3, and stores the entity information and the relationship data between the entities (the corresponding relationship between the service software identifier and the database identifier) into the database of the operation and maintenance management system. Therefore, the incidence relation between the business software and the database service is automatically established.
2) System monitoring
A typical implementation class diagram of the system monitoring module is shown in fig. 4. The software service is a software service interface, and defines an operatesoftware () method, which can convert foreground request parameters into parameters received by a client agent so as to realize the operations of starting and stopping, heartbeat detection, upgrading and the like of software; the generateInfo () method is defined to generate the Software object of the Software service in figure 3. The SofteareServiceImpl class implements the SofteareService interface, whose objects hold an object processor of the CommandProcessor type for calling the client agent, and a SoftwareDao object implementing the softdao interface for persisting (storing to the database of the operation and maintenance system) the generated business software object. The CommandProcessor class is used for interacting with a client agent, an object of the CommandProcessProcessor class has an object textDeler realizing a textDeler interface and is used for processing feedback information for calling the client agent, a poison attribute of the CommandProcessProcessEx class defines an identifier for finishing instruction execution pushing, a ProduceInfo () method analyzes and splices parameters and prepares for calling the client agent, and an executeCommand () method specifically calls the client agent, namely executes a shell script in the client agent and receives the feedback information. The TextDealer interface defines a unique method dealText () for outputting the execution information of the client agent and extracting keywords, the implementation class TextDealerImpl realizes specific processing logic, and the object is dynamically generated by the operateSofteware () method of SofteareServiceImpl. BlockingQueue is a blocking queue built in Java, and put () method writes data into the queue and take () method takes data out of the queue. The WebSocket class realizes full-duplex real-time communication with a front-end browser, a pageId attribute of the WebSocket class identifies a front-end page identifier of a connection established with the WebSocket class, a session object stores basic information of the connection, and the WebSocket class maintains a connection set connections. The consumeLog () method is used for pushing a message to a connection object of a specific page, and the specific connection can be found by traversing connection sets connections and combining a pageId attribute as a parameter; the broadcast () method is used to push messages to all currently connected pages.
The system monitoring module can process an operation request sent by a user in a foreground, convert request parameters of the user into a parameter format accepted by the client agent and call the client agent to execute remote operation. The operation request of the user comprises the starting, stopping, heartbeat detection, updating and upgrading of business software and the like of the business software, the database and the system service in the management information system. The client agent sends the instruction to the client plug-in to execute specific operation, and feeds back the execution result to the service thread calling the client agent line by line in a character string mode. According to different operations of users, different character string processing functions can be defined by the service thread to analyze the character strings and extract key information, and the extracted information is persisted to a database according to the situation.
For example, when a user needs to start a certain service software that has already stopped running, the user only needs to check the service software in the service software list on the page of the foreground browser, click the [ start ] button, and the browser sends an ajax request to the backend operation and maintenance management software, and sends the basic information and the instruction information of the software to the backend in a json format. At this time, the system monitoring module determines a server of a management information system where the service software is located according to ip address and port number information in the software information, calls a client agent to connect to the server through a ssh protocol, and transmits a command parameter to a client plug-in deployed in the server, and if the port number occupied by the service software is 8088, the parameter format is 5XX restart 8088, wherein 5XX is a system code, restart represents start (restart), and the meaning of the command parameter is a process for restarting the 8088 port, thereby realizing remote start of the service software.
The client agent feeds back the execution result to the service thread calling the client agent line by line in a character string form, the service thread defines a special character string processing function for processing feedback information of starting operation, namely a class TextDealerImpl realizing a TextDealer interface shown in figure 4 is defined, the dealText () method scans the feedback information line by line and compares the feedback information with defined keywords, if the service starting is successful when the service starting information is detected, the state of the service software is set to be running, and the state of the software is persisted to a database of the operation and maintenance management system.
The system monitoring module supports a user to check an instruction execution result or a real-time log of service software in a foreground in real time, and the function is realized based on a producer-consumer mode. The business thread obtains an instruction execution message from the client agent, and the instruction execution message is a producer of the message; the log monitoring thread establishes websocket connection with the foreground and pushes feedback information to the foreground, and the log monitoring thread is a consumer of the information. Producers and consumers act as message buffers through the congestion queue.
3) System configuration
The system configuration module comprises two main functions of personnel and organization mechanism synchronization and system parameter configuration.
The management functions of personnel and organizations in the management information system are often distributed in different business software, and the management of the management information system requires logging in different software for operation, which is time-consuming and labor-consuming. The resource management module of the operation and maintenance management software realizes the centralized management of all software in the system, and can easily obtain the IP address and the port number of one software. The automatic operation and maintenance system calls different business software services with personnel and organization management functions in sequence through the REST interface according to specific business logic, and performs certain processing on the obtained data, so that a large amount of manual editing operation is omitted, and centralized simplified configuration of the personnel and the organization is realized.
For example, a new person in the management information system needs to complete the following steps:
1. logging personnel and organization management software enter the information of the newly added personnel into a database of the management information system;
2. logging in directory service software to synchronize the database personnel information to directory service;
3. logging in authority management software to compile a menu for the newly added personnel;
4. and synchronizing the personnel information of the directory service to other related business software.
These steps require the operation and maintenance personnel to remember delicately, are labor intensive and are prone to error.
The system configuration module directly displays the steps in front of operation and maintenance personnel in a navigation page mode, the operation and maintenance personnel directly input personnel information or import personnel excel tables in the navigation page (database personnel maintenance), and the system configuration module calls a personnel newly-added interface for managing information system organization management software through a rest request at the background to realize the addition of the database personnel. Meanwhile, the mailbox address and the login account number can be automatically generated for the user according to the name of the input user, the original manual input is avoided, and the workload of personnel maintenance is reduced. And then guiding the user to carry out the next step (directory service personnel synchronization), directly checking the personnel who are not synchronized in the database of the management information system by the user, clicking (synchronization), and calling a personnel synchronization interface of the directory service by a system configuration module through a rest request at the background to realize personnel synchronization of the directory service. And then guiding the user to perform the next step (menu configuration) to enable the user to be linked to the authority management software of the management information system to perform menu configuration. And finally, guiding the user to enter the last step of business software personnel synchronization, clicking a synchronization button by the user, and sequentially calling personnel synchronization interfaces related to the business software by a system configuration module to finally complete the addition of management information system personnel.
The design of the system configuration module effectively reduces the workload and the maintenance difficulty of management of an organization and personnel of the management information system. The system parameter configuration is based on a typical policy schema implementation, as shown in FIG. 5. The SystemConfig represents the configuration entity class, and the name, configNo and information respectively represent the main attributes such as the name, the number and the description of the configuration item; StrategyContext represents a policy container class whose object holds an object stream of the configstream interface type, the setconfigstream () method is used to set the reference of the stream object, and the getconfigstream () is used to get the stream object; the config format interface defines the specific logic of the processConfig () method for implementing configuration, and has a plurality of implementation classes config format 1, config format 2, and the like. The setconfigstratgy () of the StrategyContext selects the corresponding implementation class of the configstratgy to implement the corresponding configuration method according to the attribute value of the incoming SystemConfig type object.
4) Operation and maintenance knowledge base
The operation and maintenance knowledge base module records the experiences summarized by the operation and maintenance personnel in daily work, and the user can check and maintain the operation and maintenance knowledge base in the foreground.
For a software knowledge, namely a software fault removal method, a user can configure the service software related to the knowledge by self through a foreground table: selecting software knowledge from a table in the foreground, clicking a [ configuration ] button, popping up a service software list maintained by operation and maintenance management software, selecting service software related to the software knowledge, clicking [ determination ], and storing the association relationship between the software knowledge and the service software into a database of an operation and maintenance management system by an operation and maintenance knowledge base, namely completing the configuration of the software knowledge.
And after the configuration is finished, the one-key troubleshooting of the fault can be realized. The invention designs a set of general troubleshooting scheme according to the working experience of operation and maintenance personnel, and is shown in figure 6. The user selects software knowledge in the foreground form and clicks the [ process ] button to execute the process flow shown in fig. 6.
Firstly, the operation and maintenance management software queries software related to software knowledge in the operation and maintenance database according to the configuration information, the database and the business software have one-to-many relationship in fig. 3, and the association relationship data of the database and the business software are also stored in the operation and maintenance database, so that the database related to all faults can be found back according to the associated business software. And carrying out heartbeat detection on all involved databases, and restarting the databases if the databases stop running. If the database is restarted, the software related to the restarted database is searched and restarted completely according to the one-to-many relationship between the database and the service software (the association relationship data is also stored in the operation and maintenance database), and the software is restarted in parallel by the multithreading technology to reduce the troubleshooting time. After the operation is completed, whether all the service software associated with the software knowledge is restarted or not needs to be judged, and the service software which is not restarted is restarted.
For example, when logging in a management information system, it is found that an organization menu to which a user belongs cannot be displayed, related software knowledge is found on an operation and maintenance knowledge base page, and a solution is to restart a directory service and select software knowledge clicking (processing) to remove a fault.
At the moment, the operation and maintenance knowledge base module finds the associated service software, namely the directory service, according to the association relationship between the software knowledge and the software configured in advance; then, database information related to the directory service is searched in the operation and maintenance database, and the searched database is a basic database of the management information system; the knowledge base module calls a method of the system monitoring module to perform heartbeat detection on the basic database, finds that the basic database stops running, and continues to call the system monitoring module to restart the basic database; and continuously searching service software associated with the basic database in the operation and maintenance database, calling a system monitoring module to restart the 6 service software in a multithreading mode by the searched service software comprising 6 service software such as directory service, authority management and the like, and completing fault removal because the directory service is restarted.
And logging in the management information system again, and displaying the organization menu to which the user belongs normally.
5) Alarm management
The alarm management module supports a user to define and store alarm rules in the foreground, the user can define business software which is not concerned by the rules when defining the alarm rules, and the automatic operation and maintenance system automatically ignores detection results of the business software when performing timing alarm detection according to the rules. The user can start any one alarm rule to carry out timing alarm detection, and if the detection information contains the abnormal condition defined in the alarm rule, the alarm information is output and displayed for the user. The user may start, stop or switch the alert rules in real time.
The background implementation class diagram of the alarm management module is shown in fig. 7. The TaskService is a timing task control interface, and defines a startTask () method for starting a scheduling task and a stopTask () method for stopping the scheduling task. SystemCheckTask is an implementation class of a TaskService interface, and holds an object Timer of a java built-in Timer type to implement two methods defined by the TaskService interface. The WarnConfigService is an alarm configuration interface, defines a startWarnTask () method for starting an alarm task, defines a stopWarnTask () method for stopping the alarm task, and defines a status of an updateConfigStatus () method for updating alarm configuration information. The WarnConfigServiceImpl class implements a WarnConfigService interface, an object of the WarnConfigService interface holds an object configDao of the class implementing the WarnConfigDao interface and is used for persisting the changed configuration information, and an object warTask of the SystemCheckTask class is used for assisting in implementing three methods defined in the WarnConfigService interface. The Software class is the same as described in fig. 3. The warnenfig class represents the alarm configuration information, and the configNo, name, description and ignorant software respectively represent the number, name, description and main attributes of the alarm configuration information such as service software. The TimerTask is a timed task interface that defines the run () method to perform the specific detection logic. The WarnTimeTask class implements the TimeTask interface, whose objects hold an object processor of the WarnProcessor type for implementing alarm detection logic. The Warnprocessor class is an alarm processing class and is used for realizing specific alarm detection logic and generating alarm information; the method comprises the steps that an object warnDao of a class realizing a WarnDao interface is held for persisting generated alarm information; holding an object config of a WarnConfig class, storing thresholds of various detection indexes, and judging whether to generate alarm information or not by comparing the thresholds with the detected parameter indexes; the method comprises the steps of defining a getSoftware Warnings () method for carrying out software detection and generating software alarm information, defining a getHardwware Warnings () method for carrying out hardware detection and generating hardware alarm information, and defining a generalWarnMsg () method for generating alarm information. The run () method of the WarnTimeTask class is used for calling the getsoftware Warnings () method and getHardwarwarnings () method of the WarnProcessor class in an asynchronous mode to realize the timing alarm detection.
6) Operational audit
The operation auditing module is realized based on the Spring AOP technology, and can track the execution conditions of the methods and persist auditing information to an operation and maintenance database by configuring surrounding notice or post-notice for the method of the operation needing auditing, wherein the auditing information comprises an IP address of the operation, a login user name, an operation object, the start-stop time of the operation, the execution result and the like. The operation auditing module can perform auditing operations including starting, stopping, detecting and upgrading services (including business software, a database and system services), updating and unloading system plug-ins, setting timing heartbeat detection, troubleshooting software faults, starting and stopping timing alarm detection, switching alarm rules, remotely restarting the server and other important operations. The user can view the audit record of the operation and export the audit record into an excel form.
The system deployment form is as follows:
the automatic operation and maintenance system is realized based on B/S (browser/server) architecture design. The automatic operation and maintenance system is deployed on an independent operation and maintenance server, and the operation and maintenance database can be deployed together with the automatic operation and maintenance system or on the independent server. The operation and maintenance server and each application server in the management information system are in network communication with the database server, and the operation of the automatic operation and maintenance system does not depend on the management information system. Each server in the management information system needs to install a client plug-in.
When the automatic operation and maintenance system is started, the system configuration information obtained by the client plug-in is read through the client agent, the key configuration is written into the cache of the automatic operation and maintenance system, and meanwhile, the timing detection alarm task is started according to the user setting condition.
The automatic operation and maintenance system can communicate with software in the management information system through an HTTP (hyper text transport protocol) and an REST (representational state transfer) interface to obtain necessary service data, and assists a user in finishing centralized configuration of the system. The client agent sends a control instruction to the client plug-in unit through an SSH protocol and receives feedback information of the instruction, and a user is assisted in completing remote monitoring of the system.
The system use mode is as follows:
the automatic operation and maintenance system is independently deployed and used, and a user logs in the software through a browser, so that the operation and maintenance of the management information system can be carried out through the software. FIG. 8 is an illustration of an use case of the automated operation and maintenance system showing the main work that a user can perform through the automated operation and maintenance system. The usage mode of the software is specifically described below with reference to the usage diagram.
1) The user can operate the automatic operation and maintenance system to discover and display the service software, the database and the system service in the system, and the user can maintain the resources (including the server) in the system. On the basis, the user can remotely monitor the business software, the database, the system service and the server.
2) The user can maintain the operation and maintenance knowledge base and configure any software knowledge related software. On the basis of the method, one-key troubleshooting on specific software failure can be realized.
3) The user can perform centralized configuration on the management information system through the automatic operation and maintenance system, and the centralized configuration comprises the synchronization of personnel and organizations and the configuration of system parameters.
4) The user can perform alarm management on the management information system. The user can carry out self-defined setting on the alarm rule, and after the setting is finished, an alarm task can be started according to the setting, namely, the timing alarm detection is carried out. The user can also view the alarm information generated by the alarm task.
5) The user can check the history of important operations of the operation and maintenance software, such as the starting, stopping, detecting and upgrading of services, the troubleshooting of software faults, the switching of alarm rules and the like.
The present invention provides an automatic operation and maintenance system based on a management information system, and a method and a way for implementing the technical solution are numerous, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (9)

1. An automatic operation and maintenance system applied to a management information system is characterized by comprising a system monitoring module, a resource management module, a system configuration module, an operation and maintenance knowledge base module, an alarm management module, an operation auditing module, an operation and maintenance database, a client agent and a client plug-in;
the system monitoring module is used for processing an operation request sent by a user in a foreground, converting a request parameter of the user into a parameter format accepted by a client agent and calling the client agent to execute remote operation; the operation request of the user comprises the starting, stopping and heartbeat detection of business software, a database and system services in the management information system and the updating and upgrading of the business software;
the client agent sends the instruction to the client plug-in to execute specific operation, and feeds back the execution result to the service thread for calling the client plug-in line by line in a character string mode; according to different operations of a user, a business thread can define different character string processing functions to analyze the character strings and extract key information, and the extracted information is persisted to an operation and maintenance database according to the situation;
the client plug-in is used for assisting the management information system to complete system deployment, and when the system deployment is completed, the client plug-in can generate system information related to the management information system and store the system information in a text file, wherein the text file comprises a service.list file, a database.list file and a service _ db file, the service.list file stores identification, ip addresses and port numbers of all service software in the management information system, the database.list file stores identification, ip addresses and port numbers of all databases in the management information system, and the service _ db file records the corresponding relation between the service software identification and the database identification; the resource management module reads the text files by means of the client agent, analyzes the information in the text form, respectively generates corresponding resource entities, and stores the entity information and the relation data among the entities into the operation and maintenance database to realize the automatic construction of the incidence relation between the business software and the database service;
the system configuration module has the functions of synchronization of personnel and organizations and system parameter configuration;
the operation and maintenance knowledge base module is used for recording the experiences summarized by the operation and maintenance personnel in daily work, and a user can check the experiences in the foreground;
the alarm management module is used for customizing an alarm rule in a foreground by a user and storing the alarm rule;
the operation auditing module configures surrounding notification or post notification for key operation methods, tracks the execution conditions of the methods and persists key information to the operation and maintenance database;
the system monitoring module comprises a software service interface software service, which defines an operatesoftware () method, and can convert foreground request parameters into parameters accepted by a client agent so as to realize software start-stop, heartbeat detection and upgrading operations;
the software service interface software service is realized by a software service Impl class, an object of the software service interface software service is provided with an object processor of a Command processor type for calling a client agent, and a software Dao object for realizing a software Dao interface for persisting the generated business software object.
2. The system of claim 1, wherein the client agent sends an instruction to the client plug-in to execute a specific operation, and feeds back the execution result to the service thread calling it line by line in the form of a string, i.e. a textdealermipl class defining a TextDealer interface, and its dealText () method scans the feedback information line by line and compares it with the defined keywords, and when it detects that the service has been started, it shows that the service has been started successfully, sets the state of the service software to run, and persists the state of this software to the runtime database.
3. The system of claim 2, wherein the resources in the system include business software, database services, and system services.
4. The system according to claim 3, wherein the resource management module supports maintenance of resource information in the management information system and supports import and export in the form of excel table.
5. The system of claim 4, wherein the system monitoring module supports real-time viewing of instruction execution results or real-time logs of business software by a user in a foreground.
6. The system of claim 5, wherein the system configuration module sequentially calls different business software services with personnel and organization management functions through the REST interface according to a specific business logic, and performs certain processing on the obtained data to realize synchronization of personnel and organizations.
7. The system of claim 6, wherein when the user defines the alarm rule through the alarm management module, the user can define the service software that is not concerned by the rule, and when the timed alarm detection is performed according to the rule, the automatic operation and maintenance system automatically ignores the detection results of the service software; the user can start any one alarm rule to carry out timing alarm detection, and if the detection information contains the abnormal condition defined in the alarm rule, the alarm information is output and displayed for the user; the user can start, stop or switch the alarm rules in real time.
8. The system of claim 7, wherein the operation auditing module is further configured to audit information, the audited information including an IP address of the operation, a login user name, an operation object, a start-stop time of the operation, and an execution result; the automatic operation and maintenance system can perform auditing operations including starting, stopping, detecting and upgrading of services, updating and unloading of system plug-ins, setting of timing heartbeat detection, troubleshooting of software faults, starting and stopping of timing alarm detection and switching of alarm rules, and remotely restarting a server; the user can view the audit record of the operation and export it as an excel form.
9. The system of claim 8, wherein the automated operation and maintenance system performs troubleshooting of software present in the management information system by performing the steps of:
step 1, according to a specific software fault, an automatic operation and maintenance system can find service software associated with the software fault;
step 2, the automatic operation and maintenance system searches a database which is associated with the service software in the step 1 in the management information system;
step 3, carrying out heartbeat detection on the associated databases in sequence and restarting the databases in the stop state;
step 4, searching all the service software associated with the stop state database, and restarting the associated service software in parallel;
and 5, restarting the service software associated with the software fault.
CN201811427415.1A 2018-11-27 2018-11-27 Automatic operation and maintenance system based on management information system Active CN109495308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811427415.1A CN109495308B (en) 2018-11-27 2018-11-27 Automatic operation and maintenance system based on management information system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811427415.1A CN109495308B (en) 2018-11-27 2018-11-27 Automatic operation and maintenance system based on management information system

Publications (2)

Publication Number Publication Date
CN109495308A CN109495308A (en) 2019-03-19
CN109495308B true CN109495308B (en) 2021-08-06

Family

ID=65697864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811427415.1A Active CN109495308B (en) 2018-11-27 2018-11-27 Automatic operation and maintenance system based on management information system

Country Status (1)

Country Link
CN (1) CN109495308B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413308B (en) * 2019-07-30 2023-06-09 深圳前海微众银行股份有限公司 Software operation and maintenance method, device, equipment and computer storage medium
CN110430073B (en) * 2019-07-30 2022-06-21 中国工程物理研究院计算机应用研究所 Heterogeneous system automatic operation and maintenance method based on abstract service atomic operation
CN110991970A (en) * 2019-12-11 2020-04-10 成都市赛力培物流科技有限公司 Automatic operation and maintenance management method for logistics platform
CN111585867B (en) * 2020-03-31 2022-04-19 北京奇艺世纪科技有限公司 Message processing method and device, electronic equipment and readable storage medium
CN111596952B (en) * 2020-04-29 2023-06-30 西安震有信通科技有限公司 System global name configuration processing method, device, equipment and medium
CN111784277B (en) * 2020-05-22 2023-03-24 贵州电网有限责任公司 IT customer service work order quality inspection analysis method
CN111768079A (en) * 2020-06-01 2020-10-13 国网江苏省电力有限公司 Safe operation and maintenance management system and method for power system
CN111915275A (en) * 2020-07-31 2020-11-10 上海燕汐软件信息科技有限公司 Application operation process management method, device and system
CN112270417A (en) * 2020-10-28 2021-01-26 首都信息发展股份有限公司 Intelligent acquisition method and system for operation and maintenance data of domestic equipment
CN112738212B (en) * 2020-12-23 2022-09-30 高新兴智联科技有限公司 Method and system for operation and maintenance of motor vehicle electronic identification read-write equipment
CN113190367B (en) * 2021-07-02 2021-10-01 成都数联铭品科技有限公司 Cross-system data interaction method and device based on browser and electronic equipment
CN113391827B (en) * 2021-08-17 2021-11-02 湖南省佳策测评信息技术服务有限公司 Application software publishing method and system based on automation script
CN114039873B (en) * 2021-11-09 2023-11-28 北京天融信网络安全技术有限公司 Audit method and operation and maintenance security audit system aiming at client type
CN114202859A (en) * 2021-12-08 2022-03-18 无锡玖千工品供应链管理有限公司 Alarm monitoring system and method for intelligent cabinet
CN114866397B (en) * 2022-03-25 2024-05-07 理工雷科电子(西安)有限公司 Automatic system health state monitoring method based on domestic platform
CN116467014B (en) * 2023-06-19 2023-08-29 南京麦豆健康科技有限公司 Equipment function management system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8761993B2 (en) * 2008-09-15 2014-06-24 Airbus Operations S.A.S. Method and device for automating procedures for verifying equipment in an aircraft
CN104135389A (en) * 2014-08-14 2014-11-05 华北电力大学句容研究中心 SSH protocol operation and maintenance auditing system and method based on proxy technology
US8972361B1 (en) * 2011-06-22 2015-03-03 Emc Corporation Providing system management services
CN105844543A (en) * 2016-04-07 2016-08-10 国网天津市电力公司 Automation operation management system for power enterprise information system
CN106529783A (en) * 2016-11-02 2017-03-22 国网重庆市电力公司电力科学研究院 Comprehensive operation and maintenance management system and method for power grid scheduling automation system
CN107423191A (en) * 2017-04-28 2017-12-01 红有软件股份有限公司 A kind of constructing system that the automatic O&M of information system is realized based on representation
CN107680194A (en) * 2017-09-22 2018-02-09 国网天津市电力公司 A kind of information system for power enterprise automates cruising inspection system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288208A1 (en) * 2004-08-20 2007-12-13 Lockheed Martin Corporation Measurable enterprise CBRNE protection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8761993B2 (en) * 2008-09-15 2014-06-24 Airbus Operations S.A.S. Method and device for automating procedures for verifying equipment in an aircraft
US8972361B1 (en) * 2011-06-22 2015-03-03 Emc Corporation Providing system management services
CN104135389A (en) * 2014-08-14 2014-11-05 华北电力大学句容研究中心 SSH protocol operation and maintenance auditing system and method based on proxy technology
CN105844543A (en) * 2016-04-07 2016-08-10 国网天津市电力公司 Automation operation management system for power enterprise information system
CN106529783A (en) * 2016-11-02 2017-03-22 国网重庆市电力公司电力科学研究院 Comprehensive operation and maintenance management system and method for power grid scheduling automation system
CN107423191A (en) * 2017-04-28 2017-12-01 红有软件股份有限公司 A kind of constructing system that the automatic O&M of information system is realized based on representation
CN107680194A (en) * 2017-09-22 2018-02-09 国网天津市电力公司 A kind of information system for power enterprise automates cruising inspection system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"关于作战数据体系构建的思考";毛军等;《第六届中国指挥控制大会论文集(上册)》;20180702;全文 *

Also Published As

Publication number Publication date
CN109495308A (en) 2019-03-19

Similar Documents

Publication Publication Date Title
CN109495308B (en) Automatic operation and maintenance system based on management information system
CN107317724B (en) Data acquisition system and method based on cloud computing technology
US20200097466A1 (en) Method and system for implementing target model configuration metadata for a log analytics system
CN107291565B (en) Operation and maintenance visual automatic operation platform and implementation method
US9727407B2 (en) Log analytics for problem diagnosis
US9420068B1 (en) Log streaming facilities for computing applications
US7917536B2 (en) Systems, methods and computer program products for managing a plurality of remotely located data storage systems
US8510720B2 (en) System landscape trace
WO2021203979A1 (en) Operation and maintenance processing method and apparatus, and computer device
CN111930355B (en) Web back-end development framework and construction method thereof
US11392873B2 (en) Systems and methods for simulating orders and workflows in an order entry and management system to test order scenarios
JP2003099410A (en) Multiple device management method and system
US20140040916A1 (en) Automatic event correlation in computing environments
Olups Zabbix Network Monitoring
CN111966465B (en) Method, system, equipment and medium for modifying host configuration parameters in real time
CN116048467A (en) Micro-service development platform and business system development method
US10089167B2 (en) Log file reduction according to problem-space network topology
CN113672452A (en) Method and system for monitoring operation of data acquisition task
US8402125B2 (en) Method of managing operations for administration, maintenance and operational upkeep, management entity and corresponding computer program product
US11924284B1 (en) Automated security, orchestration, automation, and response (SOAR) app generation based on application programming interface specification data
CN103457771B (en) The management method of the cluster virtual machine of a kind of HA and equipment
WO2016091141A1 (en) Method and apparatus for information collection
Uytterhoeven et al. Zabbix 4 Network Monitoring: Monitor the performance of your network devices and applications using the all-new Zabbix 4.0
CN112882892B (en) Data processing method and device, electronic equipment and storage medium
CN114756301A (en) Log processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant