CN105897487B - Equipment management method and device for operation and maintenance system - Google Patents

Equipment management method and device for operation and maintenance system Download PDF

Info

Publication number
CN105897487B
CN105897487B CN201610411090.2A CN201610411090A CN105897487B CN 105897487 B CN105897487 B CN 105897487B CN 201610411090 A CN201610411090 A CN 201610411090A CN 105897487 B CN105897487 B CN 105897487B
Authority
CN
China
Prior art keywords
maintenance
fault
pool
equipment
delivery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610411090.2A
Other languages
Chinese (zh)
Other versions
CN105897487A (en
Inventor
王洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610411090.2A priority Critical patent/CN105897487B/en
Publication of CN105897487A publication Critical patent/CN105897487A/en
Application granted granted Critical
Publication of CN105897487B publication Critical patent/CN105897487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a device management method and device for an operation and maintenance system. The operation and maintenance system comprises a resource management system, a fault pool and a delivery pool, wherein the resource management system is used for executing operation and maintenance operations of equipment, the fault pool comprises a set of fault equipment, and the delivery pool comprises a set of equipment which completes operation and maintenance; one embodiment of the method comprises: querying a failure pool to determine if a failed device exists; responding to the determined fault equipment, and sending an operation and maintenance request to the resource management system; and receiving an operation and maintenance completion notification message sent by the resource management system, and adding the relevant information of the fault equipment which completes the operation and maintenance to the delivery pool. The implementation method can realize automatic operation and maintenance of the fault equipment, reduce the labor cost of the operation and maintenance system and improve the operation and maintenance efficiency.

Description

Equipment management method and device for operation and maintenance system
Technical Field
The present application relates to the field of computer technologies, and in particular, to the field of device management or maintenance technologies, and in particular, to a device management method and apparatus for an operation and maintenance system.
Background
In the existing equipment operation and maintenance management scheme, after receiving alarm information monitored by a machine management system, an administrator needs to manually confirm a fault type, then a fault machine needs to be offline, and then a corresponding server is searched and a repair report is initiated; after the operation and maintenance are finished, an administrator needs to inquire whether a delivery pool has a machine which finishes the operation and maintenance at regular time, and after the running environment of the server is manually configured when the machine which finishes the operation and maintenance is inquired, the pool outlet operation is carried out through a delivery pool interface and the monitoring is started, and then the machine which finishes the operation and maintenance can provide service. Along with the increase of the number of the devices, the operation and maintenance demand is increased, a large amount of manpower is needed to complete the operation and maintenance of the system in the scheme, the cost is high, manual monitoring cannot timely respond to faults, and the operation and maintenance efficiency is poor.
Disclosure of Invention
In order to solve one or more problems in the prior art, the present application provides a device management method and apparatus for an operation and maintenance system.
In one aspect, the present application provides an apparatus management method for an operation and maintenance system, where the operation and maintenance system includes a resource management system, a fault pool, and a delivery pool, where the resource management system is configured to perform operation and maintenance operations on apparatuses, the fault pool includes a set of faulty apparatuses, and the delivery pool includes a set of apparatuses that complete operation and maintenance; the method comprises the following steps: querying a failure pool to determine if a failed device exists; responding to the determined fault equipment, and sending an operation and maintenance request to the resource management system; and receiving an operation and maintenance completion notification message sent by the resource management system, and adding the fault equipment completing the operation and maintenance to the delivery pool.
In some implementations, the method further comprises: monitoring the operation and maintenance progress of the fault equipment; configuring service environment and monitoring information for the fault equipment which completes operation and maintenance; and clearing the configured fault equipment from the delivery pool.
in some implementations, querying the failure pool to determine whether a failed device exists includes: querying a fault pool through a fault query interface to determine whether a faulty device exists; sending an operation and maintenance request to a resource management system, comprising: an operation and maintenance initiating interface is utilized to send an operation and maintenance request to a resource management system; monitoring the operation and maintenance progress of the fault equipment, comprising: and inquiring the operation and maintenance progress of the fault equipment through the operation progress inquiry interface.
In some implementations, the method further comprises: detecting whether the equipment in the delivery pool is effective; clearing the validated equipment in the delivery pool; wherein detecting whether a device in the delivery pool has been validated comprises: it is detected whether a monitoring program for the devices in the delivery pool has been run and/or whether the devices in the delivery pool have provided service.
In some implementations, querying the failure pool to determine whether a failed device exists includes: inquiring whether equipment meeting preset repair conditions exists in the fault pool or not; and taking the equipment meeting the preset repair condition as fault equipment.
In some implementations, the method further comprises: acquiring fault information of a fault device; and responding to the determined fault equipment, and sending fault information according to a pre-configured repair method.
In some implementations, the method further comprises: inquiring a fault pool, a delivery pool and a resource management system to determine the state information of the fault equipment; wherein the state information includes: fault status, operation and maintenance status and state to be on-line.
In a second aspect, the present application provides an apparatus management device for an operation and maintenance system, where the operation and maintenance system includes a resource management system, a fault pool, and a delivery pool, where the resource management system is configured to perform operation and maintenance operations on equipment, the fault pool includes a set of faulty equipment, and the delivery pool includes a set of equipment that completes operation and maintenance; the device comprises: the query unit is used for querying the fault pool to determine whether fault equipment exists; the sending unit is used for responding to the fact that the fault equipment exists and sending an operation and maintenance request to the resource management system; and the processing unit is used for receiving the operation and maintenance completion notification message sent by the resource management system and adding the fault equipment completing operation and maintenance to the delivery pool.
In some implementations, the apparatus further comprises: the monitoring unit is used for monitoring the operation and maintenance progress of the fault equipment; the configuration unit is used for configuring the service environment and the monitoring information of the fault equipment which completes operation and maintenance; and the clearing unit is used for clearing the configured fault equipment from the delivery pool.
In some implementations, the query unit is to query the failure pool through the failure query interface to determine whether a failed device exists; the sending unit is used for sending an operation and maintenance request to the resource management system by using the operation and maintenance initiating interface; the monitoring unit is used for inquiring the operation and maintenance progress of the fault equipment through the operation progress inquiry interface.
In some implementations, the apparatus further includes a detection unit to: detecting whether the equipment in the delivery pool is effective; and clearing the validated equipment in the delivery pool; wherein detecting whether a device in the delivery pool has been validated comprises: it is detected whether a monitoring program for the devices in the delivery pool has been run and/or whether the devices in the delivery pool have provided service.
in some implementations, the query unit is to query the failure pool to determine whether a failed device exists as follows: inquiring whether equipment meeting preset repair conditions exists in the fault pool or not; and taking the equipment meeting the preset repair condition as fault equipment.
In some implementations, the apparatus further includes a repair unit to: acquiring fault information of a fault device; and responding to the determined fault equipment, and sending fault information according to a pre-configured repair method.
In some implementations, the apparatus further includes an update unit to: inquiring a fault pool, a delivery pool and a resource management system to determine the state information of the fault equipment; wherein the state information includes: fault status, operation and maintenance status and state to be on-line.
according to the equipment management method and device of the operation and maintenance system, whether fault equipment exists or not is determined by inquiring the fault pool, an operation and maintenance request is sent to the resource management system when the fault equipment is determined to exist, so that the resource management system can carry out operation and maintenance operation on the fault equipment, then the notification message of operation and maintenance completion sent by the resource management system is received, the fault equipment with operation and maintenance completed is added to the delivery pool, automatic repair reporting and automatic operation and maintenance of the fault equipment are achieved, the labor cost of the operation and maintenance system is reduced, and the operation and maintenance efficiency is improved.
drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:
FIG. 1 is a schematic system architecture diagram of an automated operation and maintenance system to which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a device management method for an operation and maintenance system according to the application;
FIG. 3 is a schematic diagram illustrating an application scenario of the device management method for an operation and maintenance system according to the present application;
FIG. 4 is a flow diagram of another embodiment of a device management method for an operation and maintenance system according to the application;
FIG. 5 is a schematic diagram illustrating another application scenario of the device management method for an operation and maintenance system according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of a device management apparatus for an operation and maintenance system according to the present application;
Fig. 7 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to FIG. 1, there is shown a schematic system architecture diagram of an automated operation and maintenance system to which the present application may be applied.
As shown in fig. 1, the system architecture 100 includes a terminal device 101, a device management server 102, a resource management server 103, and the like. The terminal device 101 may be connected to the device management server 102 through a wired connection or a wireless connection, and the device management server 102 may be connected to the resource management server 103 through a wired connection or a wireless connection.
The operation and maintenance personnel 110 can use the terminal device 101 to interact with the device management server 102. The terminal apparatus 101 may install an operation platform that controls the apparatus management server 102. The operation and maintenance personnel 110 may perform device management operations on the operation platform, and the terminal device 101 may generate an operation instruction according to the device management operations of the operation and maintenance personnel 110 and send the operation instruction to the operation and maintenance management server. The operation platform may also show the operation status and the operation and maintenance progress of the device to the operation and maintenance personnel 110.
the device management server 102 may receive an operation instruction sent by the terminal device 101, generate an operation and maintenance request after analyzing the instruction, and send an automatic operation and maintenance processing request to the resource management server 103.
The resource management server 103 may perform the automation operation and maintenance tasks included in the automation operation and maintenance request, such as disk formatting, server offline, and the like.
it should be noted that, the device management method for the operation and maintenance system provided in the embodiment of the present application is generally executed by the device management server 102, and accordingly, the device management apparatus for the operation and maintenance system is generally disposed in the device management server 102.
it should be understood that the numbers of terminal devices, device management servers, resource management servers in fig. 1 are merely illustrative. There may be any number of terminal devices, device management servers, and resource management servers, as desired for implementation.
referring to fig. 2, a flowchart of an embodiment of a device management method for an operation and maintenance system according to the present application is shown. In this embodiment, the operation and maintenance System may include a Resource Management System (RMS), a failure pool, and a delivery pool. The resource management system may include the resource management server 103 shown in fig. 1, configured to perform operation and maintenance operations of the device; the failure pool may include a collection of failed devices and the delivery pool may include a collection of devices that complete the operation. The process 200 of the device management method for the operation and maintenance system includes the following steps:
step 201, a failure pool is queried to determine if there is a failed device.
The operation and maintenance system can perform operation and maintenance on electronic equipment (including servers and terminals) in the network, for example, a plurality of storage servers of a data center. Electronic devices in the network may be configured with fault monitoring programs that may be added to the fault pool when an electronic device fails. The failure pool may be a collection of failed devices, which may store information about the failed devices, such as identification information of the failed devices.
In this embodiment, an electronic device (e.g., the device management server 102 shown in fig. 1) on which the device management method for the operation and maintenance system operates may query the failure pool to determine whether a failed device currently exists. When the fault pool contains more than zero devices, it may be determined that a faulty device is present. At this time, the failed device may be identified using the information about the failed device stored in the failure pool.
in some embodiments, the electronic device on which the device management method for the operation and maintenance system operates may query the operation state of the predetermined device. For example, whether a predetermined device fails or not may be determined by querying a failure pool for whether the device is located in the failure pool based on the identifier of the predetermined device.
In a further embodiment, the electronic device on which the device management method for the operation and maintenance system operates may configure a fault query interface, and then may query the fault pool through the fault query interface to determine whether a faulty device exists.
Step 202, in response to determining that the faulty device exists, sending an operation and maintenance request to the resource management system.
in this embodiment, if it is found that there is a faulty device from the fault pool, the operation of the faulty device may be stopped, and the service of the faulty device is migrated to another device that normally operates, that is, the faulty device may be offline. At this point, an operation and maintenance request may be sent to the RMS. The operation and maintenance request may include identification information of the failed device and failure information. After receiving the operation and maintenance request, the RMS may initiate an operation and maintenance flow to repair the faulty device.
Furthermore, the RMS may forward the received operation and maintenance request to the automation operation and maintenance system, and distribute the operation and maintenance request to the corresponding automation task processing server for processing through a task distribution module in the automation operation and maintenance system. Specifically, the automatic task changing processing server may perform operation and maintenance operations such as formatting, system reinstallation, reboot, and network address allocation on the faulty device, and feed back status information of the operation and maintenance operations to the RMS. After the RMS receives the operation and maintenance request, the operation and maintenance personnel can also intervene in the operation and maintenance process to manually repair the fault equipment.
In a further embodiment, the electronic device on which the device management method for the operation and maintenance system operates may configure an operation and maintenance initiation interface, and then send the operation and maintenance request to the resource management system, which may include sending the operation and maintenance request to the resource management system RMS by using the operation and maintenance initiation interface. The operation initiation interface may include an interface to communicate with the resource management system, the interface supporting a communication protocol of the resource management system.
Step 203, receiving the operation and maintenance completion notification message sent by the resource management system, and adding the operation and maintenance completed faulty device to the delivery pool.
When the faulty device completes the operation and maintenance, the resource management system RMS may send a notification message to the electronic device on which the device management method for the operation and maintenance system is running to inform that the faulty device of the electronic device has been repaired. After the electronic device on which the device management method for the operation and maintenance system operates receives the notification message, the failed device which completes operation and maintenance can be added to the delivery pool. Alternatively, the related information of the faulty device that completes the operation and maintenance may be stored in the delivery pool, for example, the identification information, the resource configuration information, and the like of the faulty device that completes the operation and maintenance may be stored in the delivery pool.
the delivery pool may be a collection of devices that complete the operation and maintenance, which may store information about the devices that complete the operation and maintenance. The failed device operation and maintenance can be added to the delivery pool after finishing. And the equipment in the delivery pool is operated on line after a manager manually deploys the service environment or automatically deploys the service environment according to service requirements, so as to provide services.
In this embodiment, the resource management system RMS may send an operation and maintenance state query request to the device for processing the automation operation and maintenance task, and the device for processing the automation operation and maintenance task may feed back the operation and maintenance state to the resource management system RMS after receiving the operation and maintenance state query request. After the resource management system RMS receives the feedback message that the operation and maintenance state is "complete", a notification message that the operation and maintenance is complete may be automatically sent to the electronic device on which the device management method for the operation and maintenance system is running, so that the electronic device may add the device that completes the operation and maintenance to the delivery pool.
In the above embodiment, the electronic device on which the device management method for the operation and maintenance system operates may query whether a faulty device exists, and access the resource management system to initiate operation and maintenance operations when the faulty device is determined to exist, and after the operation and maintenance flow of the faulty device is ended, the electronic device may automatically send a notification and add the notification to the delivery pool, so that it is not necessary to manually pay attention to whether the faulty device exists and whether the operation and maintenance flow of the faulty device is ended, thereby saving the labor cost of the operation and maintenance system, and simultaneously improving the response speed of the operation and maintenance of the faulty device, and improving the operation and maintenance efficiency.
In some implementations, the device management method for the operation and maintenance system may further include: and monitoring the operation and maintenance progress of the fault equipment. Specifically, the progress of the operation and maintenance flow can be queried through the resource management system. Furthermore, the electronic device on which the device management method for the operation and maintenance system operates may be configured with an operation progress query interface, and the electronic device may be in communication connection with the resource management system RMS through the interface, so that the operation and maintenance progress of the faulty device may be further queried through the operation progress query interface. Specifically, the resource management system RMS may send the operation and maintenance status information obtained from the device for processing the automation operation and maintenance task to the electronic device on which the device management method for the operation and maintenance system is run through the operation progress query interface. Optionally, the equipment that completes operation and maintenance may be monitored in a polling manner, so that the service environment and the monitoring information and configuration are started immediately after the equipment completes operation and maintenance, the online speed of the equipment is increased, and the time consumed for fault repair is reduced.
Further, the device management method for the operation and maintenance system may further include: configuring service environment and monitoring information for the fault equipment which completes operation and maintenance; and clearing the configured failed device from the delivery pool. And after the operation and maintenance flow of the fault equipment is finished, the fault equipment can be added to the delivery pool. The out-of-pool operations may be performed on the devices in the delivery pool, including configuring the service environment and monitoring information for the devices in the delivery pool. Specifically, the service environment of the operation and maintenance completed device may be configured according to the current service requirement, and the operation state monitoring program of the operation and maintenance completed device may be started. And then, the configured fault equipment can be cleared from the delivery pool, and the equipment which has completed operation and maintenance can be brought online again to provide service. In an actual scene, after the operation and maintenance of the equipment are finished, a product line to be provided with services can be automatically notified, a service environment deployment and monitoring scheme is provided according to the requirements of the product line, and after the completion, the equipment can automatically go out of a pool and provide services on line. Compared with the scheme that the service environment is manually deployed and the pool outlet operation is carried out after the operation and maintenance technology in the prior art, the scheme provided by the embodiment further reduces the labor cost and improves the on-line efficiency of the equipment.
please refer to fig. 3, which shows a schematic diagram of an application scenario of the device management method for an operation and maintenance system according to the present application.
As shown in fig. 3, the operation and maintenance system includes an automation operation and maintenance platform 31 and a device management platform 32. The device management platform 32 may configure a plurality of APIs (Application Programming interfaces), including a failure query API, an operation and maintenance initiation API, an operation progress query API, and a machine out-of-pool API. The equipment management process of the operation and maintenance system comprises the following steps: in step 301, a fault query API queries whether a fault pool has a fault device, if so, a machine offline and service migration operation may be performed on the fault device in step 302, and then an operation and maintenance process may be initiated in step 303 through an operation and maintenance initiation API. In the operation and maintenance process, an operation and maintenance progress can be inquired through an operation progress inquiry API in step 304, after the operation and maintenance process is finished, configuration and deployment of a device service environment and a monitoring program for taking effect are performed through a machine pool-out API in step 305, a pool-out operation is completed, then the device can be provided with service online in step 306, in an actual scene, the device which completes the pool-out operation can be removed from a delivery pool after the device is provided with the service online, and the state of the device is updated from "in operation and maintenance" to "normal operation".
In a further embodiment, the device management method for the operation and maintenance system further includes: detecting whether the devices in the delivery pool have been validated, and clearing the validated devices from the delivery pool. Optionally, the relevant information of the validated devices in the delivery pool may be cleared. Specifically, it may be detected whether the devices in the delivery pool have been validated by: it is detected whether a monitoring program for the devices in the delivery pool has been run and/or whether the devices in the delivery pool have provided service. If the monitoring program for the device in the delivery pool is running or the device in the delivery pool is providing service, it may be determined that the device is in effect.
in this embodiment, it is possible to improve the accuracy of the availability of the devices in the operation and maintenance system, reduce the load of the delivery pool, and improve the operation and maintenance efficiency by detecting whether the devices in the delivery pool have been validated and removing the validated devices from the delivery pool.
With continued reference to FIG. 4, a flow diagram of another embodiment of a device management method for an operation and maintenance system is shown, according to the present application. In this embodiment, the operation and maintenance system may include a resource management system RMS, a failure pool, and a delivery pool. The resource management system may include the resource management server 103 shown in fig. 1, configured to perform operation and maintenance operations of the device; the failure pool may include a collection of failed devices and the delivery pool may include a collection of devices that complete the operation. The process 400 of the device management method for the operation and maintenance system includes the following steps:
Step 401, inquiring whether equipment meeting preset repair conditions exists in the fault pool, and taking the equipment meeting the preset repair conditions as fault equipment.
The operation and maintenance system can perform operation and maintenance on electronic equipment (including servers and terminals) in the network, for example, a plurality of storage servers of a data center. The failure pool may be a collection of failed devices, which may store information about the failed devices, such as identification information of the failed devices.
In this embodiment, an electronic device (for example, the device management server 102 shown in fig. 1) on which the device management method for the operation and maintenance system operates may query the fault pool, and determine whether a device meeting a preset repair condition exists in the fault pool. Wherein the preset repair condition can be configured by an administrator. The administrator can configure preset repair conditions according to the requirements of the product line, and can automatically initiate repair when the equipment in the product line meets the preset repair conditions. In some optional implementation manners, the electronic device on which the device management method for the operation and maintenance system is run may automatically configure the preset repair condition according to the requirement of the product line, and optionally, the administrator may review the automatically configured preset repair condition, and the preset repair condition becomes effective after the review is passed. The equipment meeting the preset repair condition in the fault pool can be used as the fault equipment to be operated and maintained.
Step 402, acquiring fault information of the fault equipment.
in this embodiment, the electronic device on which the device management method for the operation and maintenance system operates may obtain the fault information of the faulty device that meets the preset repair condition and is determined in step 401. Wherein the failure information may include a failure type, a failure time, a failure reason, and the like. The failure type may be, for example, a hardware type or a software type, and the failure cause may include an excessive amount of data, a disk damage, and the like. The electronic device may also obtain the related information of the fault by using a monitoring program or a log collection program configured in the faulty device, and determine the fault information of the faulty device according to the monitoring program or the log.
And step 403, in response to determining that the fault equipment exists, sending fault information according to a pre-configured repair method.
After the fault equipment meeting the preset repair condition is determined to be stored, repair can be automatically initiated, and fault information is sent in a pre-configured mode. The method for sending the fault information may be configured in advance by an administrator, and may include sending an email to a specified mailbox, generating alarm information, and presenting the alarm information in an operation and maintenance management operation interface of the administrator. The sent fault information can include information such as fault type, fault time, fault reason and the like, and an administrator or a resource management system can quickly and accurately locate the fault and carry out corresponding operation and maintenance operation according to the fault information.
step 404, in response to determining that there is a failed device, sending an operation and maintenance request to the resource management system.
in this embodiment, the failed device may be taken offline and an operation and maintenance request may be sent to the RMS. The operation and maintenance request may include identification information of the failed device and failure information. After receiving the operation and maintenance request, the RMS may initiate an operation and maintenance flow to repair the faulty device.
step 405, receiving an operation and maintenance completion notification message sent by the resource management system, and adding the operation and maintenance completed faulty device to the delivery pool.
When the faulty device completes the operation and maintenance, the resource management system RMS may send a notification message to the electronic device on which the device management method for the operation and maintenance system is running to inform that the faulty device of the electronic device has been repaired. After the electronic device on which the device management method for the operation and maintenance system operates receives the notification message, the failed device which completes operation and maintenance can be added to the delivery pool. Alternatively, the related information of the faulty device that completes the operation and maintenance may be stored in the delivery pool, for example, the identification information, the resource configuration information, and the like of the faulty device that completes the operation and maintenance may be stored in the delivery pool.
In the above implementation flow, steps 404 and 405 are respectively the same as steps 202 and 203 in the foregoing embodiment, and are not described again here.
as can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the process 400 of the device management method for the operation and maintenance system in this embodiment refines the step of determining the faulty device, and adds a step of automatically sending fault information when the preset repair condition is met. Therefore, the scheme described in the embodiment can support the product line to configure the repair reporting rule, and when the equipment fault according with the repair reporting rule exists, the repair reporting can be automatically initiated, so that the operation and maintenance efficiency is further improved.
Please refer to fig. 5, which shows a schematic diagram of an application scenario of automatic repair reporting in the device management method for an operation and maintenance system according to the present application, that is, a schematic diagram of a specific application scenario of step 401, step 402, and step 403 in the above-mentioned flow 400. As shown in fig. 5, the automation operation and maintenance platform 51 or the administrator 52 may configure the automatic repair reporting rule for the equipment management platform 53 in step 501, including the type of the automatically repaired fault, the reason of the automatically repaired fault, and the like, and then the administrator may review the automatic repair reporting rule in step 502, and the automatic repair reporting rule takes effect after the review is passed in step 503. After discovering the machines meeting the automatic repair reporting rules in step 504, the device management platform 53 may initiate a fault repair process in step 505, notify the administrator 52 or the automation operation and maintenance platform 51 that the device has a fault, and the notification may include mail notification, generating a notification message and sending the notification message. Then, the administrator 52 or the automation operation and maintenance platform may repair the faulty device and redeploy the service, and then the device management platform 53 may receive a notification message of completion of the operation and maintenance.
In some embodiments, the process of the device management method for an operation and maintenance system may further include: the failure pool, the delivery pool, and the resource management system are queried to determine status information for the failed device. Wherein the state information includes: fault status, operation and maintenance status and state to be on-line.
If the relevant information of the equipment is found in the fault pool, the state information of the equipment can be determined to be a fault state; if the relevant information of the equipment is found in the delivery pool, the state information of the equipment can be determined to be the on-line state, and if the relevant information of the equipment is found in the operation and maintenance flow of the resource management system, the state information of the equipment can be determined to be the operation and maintenance state. Therefore, the state of the equipment can be updated in real time, the state of the equipment does not need to be monitored manually, and the high-efficiency operation of the operation and maintenance system is facilitated.
With further reference to fig. 6, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of a device management apparatus for an operation and maintenance system, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices (for example, in the device management server 102 shown in fig. 1). The operation and maintenance system comprises a resource management system, a fault pool and a delivery pool, wherein the resource management system is used for executing operation and maintenance operations of the equipment, the fault pool comprises a set of fault equipment, and the delivery pool comprises a set of equipment for completing the operation and maintenance.
as shown in fig. 6, the device management apparatus 600 for an operation and maintenance system according to this embodiment includes: an inquiring unit 601, a sending unit 602 and a processing unit 603. Wherein, the query unit 601 is configured to query the failure pool to determine whether a failed device exists; the sending unit 602 is configured to send an operation and maintenance request to the resource management system in response to determining that the faulty device exists; the processing unit 603 is configured to receive an operation and maintenance completion notification message sent by the resource management system, and add the faulty device that completes operation and maintenance to the delivery pool.
In this embodiment, the operation and maintenance system can perform operation and maintenance on a large number of electronic devices (including servers and terminals). These electronic devices may be configured with fault monitoring programs that may be added to the fault pool when an electronic device fails. The fault pool may store information about the faulty device, such as identification information of the faulty device. The querying unit 601 may query whether the number of devices included in the failure pool is greater than zero, and if so, determine that a failed device exists. Alternatively, the querying unit 601 may query the operation state of the predetermined device. For example, whether a predetermined device fails or not may be determined by querying a failure pool for whether the device is located in the failure pool based on the identifier of the predetermined device.
If the query unit 601 determines that a faulty device exists, the faulty device may be taken offline. At this time, the sending unit 602 may send an operation and maintenance request to the resource management system RMS. The operation and maintenance request may include identification information of the failed device and failure information. After receiving the operation and maintenance request, the resource management system RMS may initiate an operation and maintenance flow to repair the faulty device.
when a failed device completes the operation and maintenance, the resource management system RMS may send a notification message to notify that the failed device has been repaired. The processing unit 603 may receive a notification message sent by the asset management system RMS and then add the failed device that completed the operation to the delivery pool. Optionally, information (including identification information, resource configuration information, etc.) about the failed device that completed the operation may be saved in the delivery pool.
In some embodiments, the querying unit 601 is configured to query the failure pool to determine whether there is a failed device as follows: inquiring whether equipment meeting preset repair conditions exists in the fault pool or not; and taking the equipment meeting the preset repair condition as fault equipment. The preset repair condition can be configured by an administrator according to the requirements of the product line, or can be automatically configured by the query unit according to the requirements of the product line. When the equipment in the product line meets the preset repair condition, repair can be automatically initiated.
Further, the apparatus 600 may further include a repair unit configured to: acquiring fault information of a fault device; and responding to the determined fault equipment, and sending fault information according to a pre-configured repair method. The repair unit may acquire the fault information including the fault type, the fault time, the fault cause, etc. using a monitoring program or a log collecting program configured in the faulty device. And then the fault information can be sent according to a pre-configured repair mode. The pre-configured repair method may be, for example, a mail notification.
In some optional implementations of this embodiment, the apparatus 600 further includes a monitoring unit, configured to monitor an operation and maintenance schedule of the faulty device. The monitoring unit may obtain the operation and maintenance progress of the faulty device through the resource management system RMS, and the processing unit 603 adds the faulty device to the delivery pool when the monitoring unit monitors that the operation and maintenance of the faulty device is completed.
Further, the apparatus 600 may further include a configuration unit and a clearing unit. The configuration unit is used for configuring the service environment and the monitoring information of the fault equipment which completes operation and maintenance. After the processing unit 603 adds the faulty device that has completed operation and maintenance to the delivery pool, the configuration unit may deploy the service environment of the device according to the requirements of the product line, and may also start a monitoring program to monitor the service state of the device. The clearing unit is used for clearing the configured fault equipment from the delivery pool. And the configured fault equipment can be brought online again to provide service.
In a further embodiment, the query unit is configured to query the fault pool through the fault query interface to determine whether a faulty device exists, the sending unit is configured to send an operation and maintenance request to the resource management system through the operation and maintenance initiating interface, and the monitoring unit is configured to query the operation and maintenance progress of the faulty device through the operation progress query interface.
In some embodiments, the apparatus 600 further comprises a detection unit for: detecting whether the equipment in the delivery pool is effective; and clearing the validated devices in the delivery pool. The device may be determined to be in effect when the device has been brought online to provide service or a monitor in the device has started running. Detecting whether the device in the delivery pool has validated comprises: it is detected whether a monitoring program for the devices in the delivery pool has been run and/or whether the devices in the delivery pool have provided service. After the device is on-line to provide service, the clearing unit can clear the relevant information of the device in the delivery pool, so that the accuracy of the availability of the device in the operation and maintenance system is improved.
In some embodiments, the apparatus 600 further comprises an updateable unit to: the failure pool, the delivery pool, and the resource management system are queried to determine status information for the failed device. Wherein the state information includes: fault status, operation and maintenance status and state to be on-line. If the relevant information of the equipment is found in the fault pool, the state information of the equipment can be updated to be in a fault state; if the relevant information of the equipment is found in the delivery pool, the state information of the equipment can be updated to be in an on-line state, and if the relevant information of the equipment is found in the operation and maintenance flow of the resource management system, the state information of the equipment can be updated to be in an operation and maintenance state. Therefore, the state of the equipment can be updated in real time, the state of the equipment does not need to be monitored manually, and the high-efficiency operation of the operation and maintenance system is facilitated.
Those skilled in the art will appreciate that the apparatus 600 described above also includes some other well-known structures, such as a processor, memory, etc., which are not shown in fig. 6 in order to not unnecessarily obscure embodiments of the present disclosure.
The device management apparatus 600 for the operation and maintenance system provided by the above embodiment of the application can realize automatic repair reporting and automatic operation and maintenance of faulty devices, reduce the labor cost of the operation and maintenance system, and improve the operation and maintenance efficiency.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing a terminal device or server of an embodiment of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
in particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
the units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a query unit, a sending unit, and a processing unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, a query unit may also be described as a "unit for querying a failure pool to determine if there is a failed device".
As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the apparatus in the above-described embodiments; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-volatile computer storage medium stores one or more programs, and is applied to an operation and maintenance system, where the operation and maintenance system includes a resource management system, a failure pool, and a delivery pool, the resource management system is used to perform operation and maintenance operations on devices, the failure pool includes a set of failed devices, and the delivery pool includes a set of devices that complete operation and maintenance. The one or more programs, when executed by an apparatus, cause the apparatus to: querying the failure pool to determine if a failed device exists; in response to determining that the faulty equipment exists, sending an operation and maintenance request to the resource management system; and receiving an operation and maintenance completion notification message sent by the resource management system, and adding the fault equipment completing the operation and maintenance to the delivery pool.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. the equipment management method for the operation and maintenance system is characterized in that the operation and maintenance system comprises a resource management system, a fault pool and a delivery pool, wherein the resource management system is used for executing operation and maintenance operations of equipment, the fault pool comprises a set of fault equipment, and the delivery pool comprises a set of equipment which completes operation and maintenance; the method comprises the following steps:
Querying the fault pool to determine whether a fault device exists, wherein the electronic device is configured with a fault monitoring program and is added to the fault pool when the fault monitoring program monitors that the electronic device fails;
In response to determining that the faulty equipment exists, sending an operation and maintenance request to the resource management system;
Receiving an operation and maintenance completion notification message sent by a resource management system, and adding the fault equipment completing operation and maintenance to the delivery pool;
Monitoring the operation and maintenance progress of the fault equipment;
Configuring service environment and monitoring information for the fault equipment which completes operation and maintenance;
And clearing the configured fault equipment from a delivery pool.
2. The method of claim 1, wherein said querying the failure pool to determine if a failed device exists comprises:
Querying the failure pool through a failure query interface to determine whether a failed device exists;
The sending the operation and maintenance request to the resource management system includes:
Sending an operation and maintenance request to the resource management system by using an operation and maintenance initiating interface;
the monitoring the operation and maintenance progress of the fault equipment comprises the following steps:
And inquiring the operation and maintenance progress of the fault equipment through an operation progress inquiry interface.
3. the method of claim 1, further comprising:
Detecting whether equipment in the delivery pool is in effect;
clearing the validated equipment in the delivery pool;
Wherein detecting whether a device in the delivery pool has validated comprises:
detecting whether a monitoring program of the equipment in the delivery pool is operated and/or whether the equipment in the delivery pool provides service.
4. The method of claim 1, wherein said querying the failure pool to determine if a failed device exists comprises:
Inquiring whether equipment meeting preset repair conditions exists in the fault pool;
And taking the equipment meeting the preset repair condition as the fault equipment.
5. the method of claim 4, further comprising:
Acquiring fault information of the fault equipment;
And responding to the determined fault equipment, and sending the fault information according to a pre-configured repair mode.
6. The method according to any one of claims 1-5, further comprising:
querying the failure pool, the delivery pool, and the resource management system to determine status information of the failed device;
Wherein the state information includes: fault status, operation and maintenance status and state to be on-line.
7. the device management apparatus for an operation and maintenance system is characterized in that the operation and maintenance system comprises a resource management system, a fault pool and a delivery pool, wherein the resource management system is used for executing operation and maintenance operations of devices, the fault pool comprises a set of fault devices, and the delivery pool comprises a set of devices which complete operation and maintenance; the device comprises:
The query unit is used for querying the fault pool to determine whether fault equipment exists or not, wherein the electronic equipment is provided with a fault monitoring program, and the fault monitoring program is added into the fault pool when the fault monitoring program monitors that the electronic equipment has faults;
A sending unit, configured to send an operation and maintenance request to the resource management system in response to determining that the faulty device exists;
the processing unit is used for receiving an operation and maintenance completion notification message sent by the resource management system and adding the fault equipment which completes operation and maintenance to the delivery pool;
the monitoring unit is used for monitoring the operation and maintenance progress of the fault equipment;
the configuration unit is used for configuring the service environment and the monitoring information of the fault equipment which completes operation and maintenance;
And the clearing unit is used for clearing the configured fault equipment from the delivery pool.
8. The apparatus of claim 7, wherein the querying unit is configured to query the failure pool through a failure query interface to determine whether a failed device exists;
The sending unit is used for sending an operation and maintenance request to the resource management system by using an operation and maintenance initiating interface;
and the monitoring unit is used for inquiring the operation and maintenance progress of the fault equipment through an operation progress inquiry interface.
9. The apparatus of claim 7, further comprising a detection unit to:
detecting whether equipment in the delivery pool is in effect; and
Clearing the validated equipment in the delivery pool;
Wherein detecting whether a device in the delivery pool has validated comprises:
detecting whether a monitoring program of the equipment in the delivery pool is operated and/or whether the equipment in the delivery pool provides service.
10. The apparatus of claim 7, wherein the querying unit is configured to query the failure pool to determine whether a failed device exists as follows:
Inquiring whether equipment meeting preset repair conditions exists in the fault pool;
and taking the equipment meeting the preset repair condition as the fault equipment.
11. the apparatus of claim 10, further comprising a repair unit configured to:
Acquiring fault information of the fault equipment;
And responding to the determined fault equipment, and sending the fault information according to a pre-configured repair mode.
12. the apparatus according to any of claims 7-11, wherein the apparatus further comprises an updating unit configured to:
Querying the failure pool, the delivery pool, and the resource management system to determine status information of the failed device;
Wherein the state information includes: fault status, operation and maintenance status and state to be on-line.
CN201610411090.2A 2016-06-13 2016-06-13 Equipment management method and device for operation and maintenance system Active CN105897487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610411090.2A CN105897487B (en) 2016-06-13 2016-06-13 Equipment management method and device for operation and maintenance system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610411090.2A CN105897487B (en) 2016-06-13 2016-06-13 Equipment management method and device for operation and maintenance system

Publications (2)

Publication Number Publication Date
CN105897487A CN105897487A (en) 2016-08-24
CN105897487B true CN105897487B (en) 2019-12-10

Family

ID=56730166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610411090.2A Active CN105897487B (en) 2016-06-13 2016-06-13 Equipment management method and device for operation and maintenance system

Country Status (1)

Country Link
CN (1) CN105897487B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681598B (en) * 2018-05-21 2023-06-02 平安科技(深圳)有限公司 Automatic task rerun method, system, computer equipment and storage medium
CN109358572A (en) * 2018-09-26 2019-02-19 深圳壹账通智能科技有限公司 Method, apparatus, computer equipment and the storage medium of machinery equipment O&M
CN110855489B (en) * 2019-11-14 2022-08-12 北京京东尚科信息技术有限公司 Fault processing method and device and fault processing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103138960A (en) * 2011-11-24 2013-06-05 百度在线网络技术(北京)有限公司 Method and device for processing network failures
CN103684828A (en) * 2012-09-18 2014-03-26 亿阳信通股份有限公司 Method and device for processing faults of telecommunication equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7409318B2 (en) * 2000-02-14 2008-08-05 Nextnine Ltd. Support network
CN105303457B (en) * 2015-10-19 2021-09-21 清华大学 Offshore wind resource assessment method based on Monte Carlo operation maintenance simulation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103138960A (en) * 2011-11-24 2013-06-05 百度在线网络技术(北京)有限公司 Method and device for processing network failures
CN103684828A (en) * 2012-09-18 2014-03-26 亿阳信通股份有限公司 Method and device for processing faults of telecommunication equipment

Also Published As

Publication number Publication date
CN105897487A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
US10152382B2 (en) Method and system for monitoring virtual machine cluster
CN113742031B (en) Node state information acquisition method and device, electronic equipment and readable storage medium
CN109654666B (en) Method, device and equipment for debugging unit
CN105897487B (en) Equipment management method and device for operation and maintenance system
US11544052B2 (en) Tenant declarative deployments with release staggering
CN104360878A (en) Method and device for deploying application software
CN110865835A (en) Configuration file updating method and device, computer equipment and storage medium
CN110932914B (en) Deployment method, deployment device, hybrid cloud system architecture and computer storage medium
CN111459631A (en) Automatic batch processing method and system for server
CN111897697A (en) Server hardware fault repairing method and device
CN111526049A (en) Operation and maintenance system, operation and maintenance method, electronic device and storage medium
CN112036823A (en) Workflow-based work order transfer method, device and equipment
CN104239091A (en) File cleaning method and device and terminal
CN105025179A (en) Method and system for monitoring service agents of call center
CN114363334A (en) Network configuration method, device and equipment for cloud system and cloud desktop virtual machine
CN108241545B (en) Debugging method and device for system fault
CN110795332A (en) Automatic testing method and device
CN103685405B (en) Network service system and the method updating firmware thereof
CN111309456B (en) Task execution method and system
JP5444739B2 (en) Management device, management system, and management program
CN109120433B (en) Method and apparatus for containerized deployment of hosts
CN113746676B (en) Network card management method, device, equipment, medium and product based on container cluster
CN113407403B (en) Cloud host management method and device, computer equipment and storage medium
CN103914339A (en) Server management system and server management method
CN109327529B (en) Distributed scanning method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant