WO2016095716A1 - Fault information processing method and related device - Google Patents

Fault information processing method and related device Download PDF

Info

Publication number
WO2016095716A1
WO2016095716A1 PCT/CN2015/096567 CN2015096567W WO2016095716A1 WO 2016095716 A1 WO2016095716 A1 WO 2016095716A1 CN 2015096567 W CN2015096567 W CN 2015096567W WO 2016095716 A1 WO2016095716 A1 WO 2016095716A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
fault
time points
managed objects
data center
Prior art date
Application number
PCT/CN2015/096567
Other languages
French (fr)
Chinese (zh)
Inventor
和江涛
王波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016095716A1 publication Critical patent/WO2016095716A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks

Definitions

  • the present invention relates to the field of information, and in particular, to a fault information processing method and related apparatus.
  • the data center is a complex set of facilities, including not only computer systems and other supporting equipment, but also data communication connections, environmental control equipment, monitoring equipment and various security devices.
  • data center related technologies mature, more and more enterprises are starting to build their own data centers and migrate their business to the data center platform.
  • the actual data center has a complex IT system environment. When the data center fails, it needs to manually locate the fault according to the massive state management information of the data center. These state management information is used to indicate the running status of the data center, including the data center system. Configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  • the primary task is to restore the service.
  • the state management information of the data center has changed compared with the time of failure. It takes a lot of time to manually search for historical state management information, and then analyze the fault. The location occurred. Even so, many state management information at the time of failure is not queried, resulting in inaccurate fault location. Therefore, the fault information processing method of the prior art is time consuming, complicated in operation, and low in reliability.
  • the embodiment of the invention provides a fault information processing method for optimizing fault location.
  • a first aspect of the embodiments of the present invention provides a fault information processing method, which is applicable to a data center, where the data center includes a managed object, and the method includes:
  • the plurality of time points and status information of the N managed objects corresponding to each of the time points are recorded.
  • the recording, the multiple time points, and the N managed objects corresponding to each of the time points are recorded
  • the status information also includes:
  • the plurality of time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points are recorded.
  • the state management information of the data center includes:
  • System configuration information and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  • the determining, according to the state management information, determining the data center includes:
  • the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or the username of the managed object.
  • the method further includes:
  • a second aspect of the embodiments of the present invention provides a fault information processing apparatus, which is applicable to a data center, where the data center includes a managed object, and the apparatus includes:
  • An information acquiring module configured to acquire state management information of the data center at multiple time points, where the state management information is used to describe an operating state of the data center;
  • a security determining module configured to determine, according to the state management information, status information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object;
  • the information recording module is configured to record state information of the plurality of managed objects corresponding to the plurality of time points and each of the time points.
  • the first implementation manner of the second aspect of the embodiment of the present invention further includes:
  • An association determining module configured to determine an association between the N managed objects before the information recording module records the plurality of time points and status information of the N managed objects corresponding to each of the time points relationship;
  • the information recording module is specifically configured to:
  • the state management information of the data center includes:
  • System configuration information and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  • the security determining module is specifically configured to:
  • the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or the username of the managed object.
  • the fourth implementation manner of the second aspect of the embodiment of the present invention further includes:
  • An instruction receiving module configured to receive a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
  • the fault finding module is configured to search for status information of the N managed objects corresponding to the fault occurrence time from the recorded plurality of time points and the status information of the N managed objects corresponding to each of the time points ;
  • the fault feedback module is configured to feed back status information of the N managed objects corresponding to the fault occurrence time to the client.
  • the state management information of the data center is acquired at a plurality of time points; and the state information of the N managed objects of the data center is determined according to the state management information, where the state information is used to represent the managed object.
  • Security status ; record multiple time points and status information of N managed objects corresponding to each time point.
  • the method provided by the embodiment of the present invention classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly locate the fault occurrence time according to the saved information, and according to the time, each time is Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the method provided by the embodiment of the invention can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • FIG. 1 is a flowchart of an embodiment of a method for processing fault information according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another embodiment of a method for processing fault information according to an embodiment of the present invention
  • FIG. 3 is a flowchart of an embodiment of a fault information processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a method for processing fault information, which is used to reduce the length of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • the embodiment of the present invention further provides related fault information processing apparatus, which will be separately described below.
  • FIG. 1 mainly includes:
  • the fault information processing device acquires state management information of the data center at a plurality of time points, and the state management information is used to describe an operating state of the data center.
  • the plurality of time points may be manually set, or may be set by default for the fault information processing device.
  • the fault information processing device sets a time point every 15 minutes by default.
  • the plurality of time points may also be determined by other means, and is not limited herein.
  • the data center includes no less than one managed object, and the data center manages these managed objects.
  • the managed object may be an entity object such as a physical device, or may be a software object such as an operating system, a database, or a middleware, which is not limited in this embodiment.
  • the failure information processing device determines state information of the N managed objects of the data center based on the state management information.
  • the status information is used to indicate the working status of the managed object.
  • the fault information processing device records the plurality of time points and the state information of the N managed objects corresponding to each time point, so that the user can obtain the plurality of saved time points and the corresponding state information when performing the fault location. Find the security status of each managed object at the time of the failure, and then accurately locate which one is managed by the management unit.
  • the embodiment provides a fault information processing method, wherein the fault information processing apparatus acquires state management information of the data center at a plurality of time points; and determines state information of the N managed objects of the data center according to the state management information, where The status information is used to indicate the security status of the managed object; the status information of the N managed objects corresponding to the plurality of time points and each time point is recorded.
  • the method provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs.
  • Manage the security status of the object for accurate fault location eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the method provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • FIG. 1 provides a basic flow of a fault information processing method provided by an embodiment of the present invention. A more detailed embodiment is provided below to provide a more accurate fault location. 2.
  • the basic process includes:
  • the fault information processing device acquires state management information of the data center at a plurality of time points, and the state management information is used to describe an operating state of the data center.
  • the plurality of time points may be set manually or may be set by default for the fault information processing device.
  • the fault information processing device sets a time point every 15 minutes by default.
  • the plurality of time points may also be determined by other means, and is not limited herein.
  • the fault information processing device may include a configuration library (CMDB, Configuration Management Database), a network management system, a log system, a complaint guarantee system, a configuration change system, and a work order system.
  • CMDB Configuration Management Database
  • the fault information processing device can actively acquire state management information of the data center from these systems, or passively receive state management information of the data center sent by these systems.
  • the fault information processing device can also obtain the state management information of the data center by other means, which is not limited herein.
  • the data center state management information may include system configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint support information, corresponding to the data center system. And/or configuration change information, and/or work order information, and may also include other information, which is not limited herein.
  • the data center includes no less than one managed object, and the data center manages these managed objects.
  • the managed object may be an entity object such as a physical device, or may be a software object such as an operating system, which is not limited in this embodiment.
  • the failure information processing device determines state information of the N managed objects of the data center based on the state management information.
  • the status information is used to indicate the working status of the managed object.
  • the fault information processing apparatus may divide the state management information acquired in step 201 into state information of the N managed objects according to attributes of the N managed objects in the data center.
  • the attribute of the managed object may include one or more of a device name, an IP address, a device code, and a user name of the managed object, or may be other attributes.
  • the fault information processing apparatus may divide the alarm information of the data center, and/or the performance monitoring information, and/or the log information into alarm information of each managed object, and/or performance monitoring information according to the IP address of the managed object.
  • the configuration change information of the data center, and/or the work order information is divided into configuration change information of each managed object, and/or work order information;
  • the system configuration information and/or the complaint guarantee information of the data center are divided into configuration information of each managed object, and/or complaint guarantee information.
  • Divide the state management information obtained in step 201 into The status information of the N managed objects may also be other methods, which are not limited herein.
  • the fault information processing device divides the state management information acquired in step 201 into the state information of the N managed objects according to the attributes of the N managed objects in the data center, in order to reduce the data to be recorded, Further processing of the status information, such as deleting invalid data or duplicate data (such as info information in the log).
  • Further processing of the status information such as deleting invalid data or duplicate data (such as info information in the log).
  • the failure information processing apparatus determines the association relationship between the N managed objects at the plurality of time points described in step 201.
  • the association relationship is used to associate a managed object having information interaction among the N managed objects.
  • the fault information processing device records state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point, so that the user performs fault location.
  • the security state of each managed object at the time of occurrence of the failure can be searched for, and the target object can be accurately located.
  • the user may also analyze the association relationship between the N managed objects corresponding to the fault time to determine whether the fault is caused by the managed object itself or the information interaction between the managed objects. Channel.
  • the embodiment provides a fault information processing method, wherein the fault information processing apparatus acquires state management information of the data center at a plurality of time points; and determines state information of the N managed objects of the data center according to the state management information, where The status information is used to indicate the security status of the managed object; the association relationship between the N managed objects is determined; and the status information of the N managed objects corresponding to the plurality of time points and each time point and the N managed objects are recorded. The relationship between them.
  • the method provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs.
  • the method provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • the association relationship between the N managed objects at multiple time points is also recorded. The user provides further reference for fault location, enabling the user to perform more accurate fault location.
  • the fault information processing device can also Receiving a fault finding instruction sent by the client, where the fault finding command includes a fault occurrence time; the fault information processing device records the state information of the N managed objects corresponding to each time point from the plurality of recorded time points, and each time point In the association relationship between the corresponding N managed objects, the relationship between the state information of the N managed objects corresponding to the failure occurrence time and the N managed objects is found, and the N times corresponding to the failure occurrence time are The association between the state information of the management object and the N managed objects is fed back to the client, so that the user can obtain the search result of the fault information processing device through the client.
  • the status information of the N managed objects corresponding to the fault occurrence time may be a preset time period before and after the fault occurrence time (such as 30 minutes before the fault occurrence time and 20 minutes after the fault occurrence time), and the fault information is processed. Status information of N managed objects stored by the device.
  • the fault information processing device acquires the alarm information of the data center from the network management system of the data center every 15 minutes, obtains the log information of the data center from the log system of the data center, and obtains the configuration change information of the data center from the configuration change system of the data center.
  • the work order system of the data center obtains the work order information of the data center.
  • the data center includes three managed objects, namely, network device A, storage device B, and computing device C.
  • the fault information processing device divides the obtained alarm information and log information of the data center according to the IP addresses of the devices A, B, and C, and is classified into the alarm information and the log information of the device A, the alarm information and the log information of the device B, and the log information.
  • the alarm information and log information of device C are divided into the configuration change information and work order information of the data center according to the asset codes of devices A, B, and C, and are classified into configuration change information and work order information of device A.
  • the fault information processing device determines the association relationship between the devices A, B, and C.
  • the device A has information interaction with the device, and the device B and the device C have information interaction.
  • the fault information processing device records the time points, and the alarm information, log information, configuration change information, work order information, and the relationship between the devices A, B, and C of the devices A, B, and C corresponding to the time points.
  • the user uses the client to search for information corresponding to the fault time from the fault information processing device, and the fault information processing device receives the fault finding command sent by the user client, where the fault finding command includes a fault occurrence time of 10:22 am; the fault information processing device
  • the alarm information, log information, configuration change information, work order information, and the relationship between devices A, B, and C of devices A, B, and C are found in the recorded information at 10:00 am, 10:15 am, and 10:30 am.
  • the fault information processing device feeds back the search result to the client.
  • the search result is displayed at 10:15 am
  • the alarm information of the device A indicates that the device A is powered off.
  • the user locates the failed managed object as device A according to the alarm information.
  • the above embodiment provides a fault information processing method.
  • the following embodiment provides a fault information processing apparatus for implementing the above method.
  • the basic structure of the method is as shown in FIG. 3, including:
  • the information obtaining module 301 is configured to acquire state management information of the data center, where the state management information is used to describe an operating state of the data center, at multiple time points;
  • the security determining module 302 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
  • the information recording module 303 is configured to record the plurality of time points and the state information of the N managed objects corresponding to each time point.
  • the embodiment provides a fault information processing apparatus, wherein the information acquiring module 301 acquires state management information of the data center at a plurality of time points; the security determining module 302 determines the N managed objects of the data center according to the state management information. The status information is used to indicate the security status of the managed object; the information recording module 303 records the plurality of time points and the status information of the N managed objects corresponding to each time point.
  • the device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs. Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • FIG. 3 shows the basic structure of the fault information processing apparatus provided by the embodiment of the present invention. A more detailed embodiment is provided below to provide more accurate fault location. 4. Its basic structure includes:
  • the information obtaining module 401 is configured to acquire state management information of the data center at a plurality of time points, where the state management information is used to describe an operating state of the data center;
  • the security determining module 402 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
  • the association determining module 403 is configured to determine an association relationship between the N managed objects before the information recording module records the plurality of time points and the state information of the N managed objects corresponding to each time point;
  • the information recording module 404 is configured to record state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point.
  • the state management information of the data center may include: system configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint support information, and/or configuration change information, And/or work order information.
  • the security determining module may be configured to: divide the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the managed object is The attributes include: the device name of the managed object, and/or the IP address of the managed object, and/or the device code of the managed object, and/or the username of the managed object.
  • the embodiment provides a fault information processing apparatus, wherein the information acquiring module 401 acquires state management information of the data center at a plurality of time points; the security determining module 402 determines the N managed objects of the data center according to the state management information. Status information, the status information is used to indicate the security status of the managed object; the association determination module 403 determines an association relationship between the N managed objects; the information recording module 404 records a plurality of time points, N corresponding to each time point The relationship between the state information of the managed object and the N managed objects.
  • the device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs.
  • Manage the security status of the object for accurate fault location eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • the information recording module 404 also records the association relationship between the N managed objects at multiple time points, which provides a further reference for the user to perform fault location, so that the user can perform more accurate fault location.
  • FIG. 4 provides a basic structure of a more detailed fault information processing apparatus according to an embodiment of the present invention.
  • a more detailed fault information processing apparatus is provided below, which can be used with a client.
  • the basic structure includes:
  • the information obtaining module 501 is configured to acquire state management information of the data center, where the state management information is used to describe an operating state of the data center, at multiple time points;
  • the security determining module 502 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
  • the association determining module 503 is configured to determine an association relationship between the N managed objects before the information recording module records the plurality of time points and the state information of the N managed objects corresponding to each time point;
  • the information recording module 504 is configured to record state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point.
  • the instruction receiving module 505 is configured to receive a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
  • the fault finding module 506 is configured to search for status information of the N managed objects corresponding to the fault occurrence time from the recorded plurality of time points and the status information of the N managed objects corresponding to each time point;
  • the fault feedback module 507 is configured to feed back state information of the N managed objects corresponding to the fault occurrence time to the client.
  • the embodiment provides a fault information processing apparatus, wherein the information acquiring module 501 acquires state management information of the data center at a plurality of time points; the security determining module 502 determines the N managed objects of the data center according to the state management information. Status information, the status information is used to indicate the security status of the managed object; the association determination module 503 determines an association relationship between the N managed objects; the information recording module 504 records a plurality of time points, N corresponding to each time point The relationship between the state information of the managed object and the N managed objects.
  • the device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved before and after the fault occurs according to the saved information, according to the information saved according to the saved information. Accurate fault location of each managed object's security state at the moment before and after the failure occurs, without the need to manually search for massive state management information, and without manual analysis of state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
  • the information recording module 504 also records N times when a plurality of time points are recorded. The relationship between the objects provides a further reference for the user to locate the fault, enabling the user to perform more accurate fault location.
  • the command receiving module 505 can receive the fault finding command sent by the client; the fault finding module 506 searches for the N corresponding to the fault occurrence time from the recorded plurality of time points and the state information of the N managed objects corresponding to each time point.
  • the status information of the managed objects is forwarded to the client by the fault feedback module 507, so that the user can obtain the search result of the fault information processing device through the client.
  • the information acquisition module 501 acquires the alarm information of the data center from the network management system of the data center every 15 minutes, obtains the log information of the data center from the log system of the data center, and obtains the configuration change information of the data center from the configuration change system of the data center.
  • the work order system of the data center obtains the work order information of the data center.
  • the data center includes three managed objects, namely, network device A, storage device B, and computing device C.
  • the security determination module 502 divides the obtained alarm information and the log information of the data center according to the IP addresses of the devices A, B, and C, and is classified into the alarm information and the log information of the device A, the alarm information and the log information of the device B, and the log information.
  • the alarm information and log information of device C are divided into the configuration change information and work order information of the data center according to the asset codes of devices A, B, and C, and are classified into configuration change information and work order information of device A.
  • the association determination module 503 determines the association relationship between the devices A, B, and C.
  • the device A has information interaction with the device, and the device B and the device C have information interaction.
  • the information recording module 504 records the time points, the alarm information of the devices A, B, and C corresponding to the time points, the log information, the configuration change information, the work order information, and the association relationship between the devices A, B, and C.
  • the user uses the client to search for information corresponding to the fault time from the fault information processing device, and the command receiving module 505 receives the fault finding command sent by the user client, where the fault finding command includes a fault occurrence time of 10:22 am; the fault finding module 506
  • the alarm information, log information, configuration change information, work order information, and the relationship between devices A, B, and C of devices A, B, and C are found in the recorded information at 10:00 am, 10:15 am, and 10:30 am.
  • the fault feedback module 507 will find the result Feedback to the client.
  • the search result shows 10:15am
  • the alarm information of device A indicates that device A is powered off.
  • the user locates the failed managed object as device A according to the alarm information.
  • the fault information processing apparatus in the embodiment of the present invention is described above from the perspective of the unitized functional entity.
  • the fault information processing apparatus in the embodiment of the present invention is described below from the perspective of hardware processing. Referring to FIG. 6, the embodiment of the present invention is described.
  • Another embodiment of the fault information processing apparatus 600 includes:
  • the input device 601, the output device 602, the processor 603, and the memory 604 (wherein the number of processors 603 in the fault information processing device 600 may be one or more, and one processor 603 is taken as an example in FIG. 6).
  • the input device 601, the output device 602, the processor 603, and the memory 604 may be connected by a bus or other means, wherein the bus connection is taken as an example in FIG.
  • the processor 603 is configured to perform the following steps by calling an operation instruction stored in the memory 604:
  • the plurality of time points and the state information of the N managed objects corresponding to each time point are recorded.
  • the processor 603 further performs the following steps:
  • the state information of the N managed objects corresponding to the plurality of time points and each time point, and the relationship between the N managed objects corresponding to each time point are recorded.
  • the state management information of the data center includes:
  • System configuration information and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  • the processor 603 further performs the following steps:
  • the state management information is divided into state information of N managed objects according to attributes of the N managed objects in the data center, and the attributes of the managed object include: a device name of the managed object, and/or an IP of the managed object The address, and/or the device code of the managed object, and/or the username of the managed object.
  • the processor 603 further performs the following steps:
  • the relationship between the state information of the N managed objects corresponding to the failure occurrence time and the N managed objects is fed back to the client.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), A medium that can store program code, such as a RAM (Random Access Memory), a disk, or an optical disk.

Abstract

Disclosed in an embodiment of the present invention is a fault information processing method, for optimizing fault positioning in a data center; the method in the embodiment of the present invention comprises: acquiring state management information of a data center at multiple time points (101); determining state information of N managed objects of the data center according to the state management information, the state information being used to represent security states of the managed objects (102); and recording the multiple time points and the state information of the N managed objects corresponding to each of the time points (103). Also provided in an embodiment of the present invention is a related fault information processing device.

Description

一种故障信息处理方法与相关装置Fault information processing method and related device
本申请要求于2014年12月16日提交中国专利局、申请号为201410784311.1、发明名称为“一种故障信息处理方法与相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201410784311.1, entitled "A Fault Information Processing Method and Related Device", filed on December 16, 2014, the entire contents of which are incorporated herein by reference. In the application.
技术领域Technical field
本发明涉及信息领域,尤其涉及一种故障信息处理方法与相关装置。The present invention relates to the field of information, and in particular, to a fault information processing method and related apparatus.
背景技术Background technique
数据中心是一整套复杂的设施,不仅包括计算机系统和其他与之配套的设备,还包括数据通信连接,环境控制设备,监控设备以及各种安全装置。随着数据中心相关技术的成熟,越来越多的企业开始构建自己的数据中心并将业务迁移到数据中心平台上。The data center is a complex set of facilities, including not only computer systems and other supporting equipment, but also data communication connections, environmental control equipment, monitoring equipment and various security devices. As data center related technologies mature, more and more enterprises are starting to build their own data centers and migrate their business to the data center platform.
实际的数据中心具有复杂的IT系统环境,当数据中心发生故障时,需要根据数据中心海量的状态管理信息人工进行故障定位,这些状态管理信息用于表示数据中心的运行状态,包括数据中心的系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。The actual data center has a complex IT system environment. When the data center fails, it needs to manually locate the fault according to the massive state management information of the data center. These state management information is used to indicate the running status of the data center, including the data center system. Configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
但是数据中心发生故障时,首要的任务是恢复业务,而业务恢复后,数据中心的状态管理信息与故障时刻相比已经发生了变化,需要人工投入大量时间查找历史状态管理信息,然后分析故障的发生位置。即便如此,故障时刻的很多状态管理信息也已经不可查询,导致无法实现精准的故障定位。因此,现有技术的故障信息处理方法耗时长,操作复杂,且可靠性不高。However, when the data center fails, the primary task is to restore the service. After the service is restored, the state management information of the data center has changed compared with the time of failure. It takes a lot of time to manually search for historical state management information, and then analyze the fault. The location occurred. Even so, many state management information at the time of failure is not queried, resulting in inaccurate fault location. Therefore, the fault information processing method of the prior art is time consuming, complicated in operation, and low in reliability.
发明内容Summary of the invention
本发明实施例提供了一种故障信息处理方法,用于优化故障定位。The embodiment of the invention provides a fault information processing method for optimizing fault location.
本发明实施例的第一方面提供了一种故障信息处理方法,适用于数据中心,所述数据中心包括被管理对象,所述方法包括:A first aspect of the embodiments of the present invention provides a fault information processing method, which is applicable to a data center, where the data center includes a managed object, and the method includes:
在多个时刻点,获取所述数据中心的状态管理信息,所述状态管理信息用于描述所述数据中心的运行状态; Obtaining, in a plurality of time points, state management information of the data center, where the state management information is used to describe an operating state of the data center;
根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息,所述状态信息用于表示所述被管理对象的工作状态;Determining, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object;
记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息。The plurality of time points and status information of the N managed objects corresponding to each of the time points are recorded.
结合本发明实施例的第一方面,本发明实施例的第一方面的第一种实现方式中,所述记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息之前还包括:With reference to the first aspect of the embodiments of the present invention, in a first implementation manner of the first aspect of the embodiments of the present disclosure, the recording, the multiple time points, and the N managed objects corresponding to each of the time points are recorded The status information also includes:
确定所述N个被管理对象之间的关联关系;Determining an association relationship between the N managed objects;
所述记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息包括:And the status information of the N managed objects corresponding to the plurality of time points and each of the time points is recorded:
记录所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系。The plurality of time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points are recorded.
结合本发明实施例的第一方面的第一种实现方式,本发明实施例的第一方面的第二种实现方式中,所述数据中心的状态管理信息包括:With reference to the first implementation manner of the first aspect of the embodiment of the present invention, in a second implementation manner of the first aspect of the embodiment, the state management information of the data center includes:
系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。System configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
结合本发明实施例的第一方面的第一种或第二种实现方式,本发明实施例的第一方面的第三种实现方式中,所述根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息包括:With reference to the first or second implementation manner of the first aspect of the embodiment of the present invention, in a third implementation manner of the first aspect of the embodiment, the determining, according to the state management information, determining the data center The status information of the N managed objects includes:
根据所述数据中心的N个被管理对象的属性,将所述状态管理信息划分为所述N个被管理对象的状态信息,所述被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或被管理对象的用户名。And dividing the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or the username of the managed object.
结合本发明实施例的第一方面的第一种或第二种实现方式,本发明实施例的第一方面的第四种实现方式中,所述方法还包括:With reference to the first or second implementation manner of the first aspect of the embodiment of the present invention, in a fourth implementation manner of the first aspect of the embodiment, the method further includes:
接收客户端发送的故障查找指令,所述故障查找指令包括故障发生时刻;Receiving a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
从记录的所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系中,查找所述故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系;Searching from the plurality of recorded time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points Describe the relationship between the state information of the N managed objects corresponding to the time when the failure occurred and the N managed objects;
将所述故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象 之间的关联关系反馈给所述客户端。State information of N managed objects corresponding to the failure occurrence time and N managed objects The relationship between the feedback is fed back to the client.
本发明实施例的第二方面提供了一种故障信息处理装置,适用于数据中心,所述数据中心包括被管理对象,所述装置包括:A second aspect of the embodiments of the present invention provides a fault information processing apparatus, which is applicable to a data center, where the data center includes a managed object, and the apparatus includes:
信息获取模块,用于在多个时刻点,获取所述数据中心的状态管理信息,所述状态管理信息用于描述所述数据中心的运行状态;An information acquiring module, configured to acquire state management information of the data center at multiple time points, where the state management information is used to describe an operating state of the data center;
安全确定模块,用于根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息,所述状态信息用于表示所述被管理对象的工作状态;a security determining module, configured to determine, according to the state management information, status information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object;
信息记录模块,用于记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息。The information recording module is configured to record state information of the plurality of managed objects corresponding to the plurality of time points and each of the time points.
结合本发明实施例的第二方面,本发明实施例的第二方面的第一种实现方式还包括:With reference to the second aspect of the embodiments of the present invention, the first implementation manner of the second aspect of the embodiment of the present invention further includes:
关联确定模块,用于在所述信息记录模块记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息之前,确定所述N个被管理对象之间的关联关系;An association determining module, configured to determine an association between the N managed objects before the information recording module records the plurality of time points and status information of the N managed objects corresponding to each of the time points relationship;
所述信息记录模块具体用于:The information recording module is specifically configured to:
记录所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系;Recording, at the plurality of time points, state information of the N managed objects corresponding to each of the time points, and an association relationship between the N managed objects corresponding to each of the time points;
结合本发明实施例的第二方面的第一种实现方式,本发明实施例的第二方面的第二种实现方式中,所述数据中心的状态管理信息包括:With reference to the first implementation manner of the second aspect of the embodiment of the present invention, in the second implementation manner of the second aspect of the embodiment, the state management information of the data center includes:
系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。System configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
结合本发明实施例的第二方面的第一种或第二种实现方式,本发明实施例的第二方面的第三种实现方式中,所述安全确定模块具体用于:With reference to the first or second implementation manner of the second aspect of the embodiment of the present invention, in the third implementation manner of the second aspect of the embodiment of the present disclosure, the security determining module is specifically configured to:
根据所述数据中心的N个被管理对象的属性,将所述状态管理信息划分为所述N个被管理对象的状态信息,所述被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或被管理对象的用户名。And dividing the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or the username of the managed object.
结合本发明实施例的第二方面的第一种或第二种实现方式,本发明实施例的第二方面的第四种实现方式还包括:With reference to the first or second implementation manner of the second aspect of the embodiment of the present invention, the fourth implementation manner of the second aspect of the embodiment of the present invention further includes:
指令接收模块,用于接收客户端发送的故障查找指令,所述故障查找指令包括故障发生时刻; An instruction receiving module, configured to receive a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
故障查找模块,用于从记录的所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息中,查找所述故障发生时刻对应的N个被管理对象的状态信息;The fault finding module is configured to search for status information of the N managed objects corresponding to the fault occurrence time from the recorded plurality of time points and the status information of the N managed objects corresponding to each of the time points ;
故障反馈模块,用于将所述故障发生时刻对应的N个被管理对象的状态信息反馈给所述客户端。The fault feedback module is configured to feed back status information of the N managed objects corresponding to the fault occurrence time to the client.
本发明实施例提供的方法中,在多个时刻点,获取数据中心的状态管理信息;根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;记录多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。本发明实施例提供的方法将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以根据保存的信息,直接定位到故障发生时刻,根据该时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本发明实施例提供的方法能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。In the method provided by the embodiment of the present invention, the state management information of the data center is acquired at a plurality of time points; and the state information of the N managed objects of the data center is determined according to the state management information, where the state information is used to represent the managed object. Security status; record multiple time points and status information of N managed objects corresponding to each time point. The method provided by the embodiment of the present invention classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly locate the fault occurrence time according to the saved information, and according to the time, each time is Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the method provided by the embodiment of the invention can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
附图说明DRAWINGS
图1为本发明实施例中故障信息处理方法一个实施例流程图;1 is a flowchart of an embodiment of a method for processing fault information according to an embodiment of the present invention;
图2为本发明实施例中故障信息处理方法另一个实施例流程图;2 is a flowchart of another embodiment of a method for processing fault information according to an embodiment of the present invention;
图3为本发明实施例中故障信息处理装置一个实施例流程图;3 is a flowchart of an embodiment of a fault information processing apparatus according to an embodiment of the present invention;
图4为本发明实施例中故障信息处理装置另一个实施例流程图;4 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention;
图5为本发明实施例中故障信息处理装置另一个实施例流程图;FIG. 5 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention; FIG.
图6为本发明实施例中故障信息处理装置另一个实施例流程图。FIG. 6 is a flowchart of another embodiment of a fault information processing apparatus according to an embodiment of the present invention.
具体实施方式detailed description
本发明实施例提供了一种故障信息处理方法,用于减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。本发明实施例还提供了相关的故障信息处理装置,以下将分别进行说明。The embodiment of the invention provides a method for processing fault information, which is used to reduce the length of fault location, simplify the operation of fault location, and improve the reliability of fault location. The embodiment of the present invention further provides related fault information processing apparatus, which will be separately described below.
本发明实施例提供的故障信息处理方法的基本流程请参阅图1,主要包括:For the basic process of the fault information processing method provided by the embodiment of the present invention, please refer to FIG. 1 , which mainly includes:
101、在多个时刻点,获取数据中心的状态管理信息;101. Obtain state management information of the data center at multiple time points;
故障信息处理装置在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态。 The fault information processing device acquires state management information of the data center at a plurality of time points, and the state management information is used to describe an operating state of the data center.
其中,多个时刻点可以为人为设定,也可以为故障信息处理装置默认设定,如故障信息处理装置默认每隔15min设置一个时刻点。该多个时刻点也可以通过其他方式确定,此处不做限定。The plurality of time points may be manually set, or may be set by default for the fault information processing device. For example, the fault information processing device sets a time point every 15 minutes by default. The plurality of time points may also be determined by other means, and is not limited herein.
故障信息处理装置获取数据中心的状态管理信息的方法有很多,将在后面的实施例中详述,此处不做限定。There are many methods for the fault information processing device to obtain the state management information of the data center, which will be described in detail in the following embodiments, and is not limited herein.
102、根据状态管理信息,确定数据中心的N个被管理对象的状态信息;102. Determine, according to the state management information, status information of the N managed objects in the data center.
数据中心包括不少于一个的被管理对象,数据中心对这些被管理对象进行管理。其中,被管理对象可以为物理设备等实体对象,也可以为操作系统、数据库、中间件等软件对象,本实施例中不做限定。The data center includes no less than one managed object, and the data center manages these managed objects. The managed object may be an entity object such as a physical device, or may be a software object such as an operating system, a database, or a middleware, which is not limited in this embodiment.
故障信息处理装置根据状态管理信息,确定数据中心的N个被管理对象的状态信息。其中,状态信息用于表示被管理对象的工作状态。The failure information processing device determines state information of the N managed objects of the data center based on the state management information. The status information is used to indicate the working status of the managed object.
103、记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。103. Record status information of the N managed objects corresponding to the plurality of time points and each time point.
故障信息处理装置记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息,使得用户在进行故障定位的时候,能够从保存的多个时刻点与对应的状态信息中,查找故障发生时刻每个被管理对象的安全状态,进而准确的定位是哪一个被管理单元发生故障。The fault information processing device records the plurality of time points and the state information of the N managed objects corresponding to each time point, so that the user can obtain the plurality of saved time points and the corresponding state information when performing the fault location. Find the security status of each managed object at the time of the failure, and then accurately locate which one is managed by the management unit.
本实施例提供了一种故障信息处理方法,其中故障信息处理装置在多个时刻点,获取数据中心的状态管理信息;根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;记录多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。本实施例提供的方法将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以直接查找故障发生前后时刻所保存的信息,根据故障发生前后时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本实施例提供的方法能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。The embodiment provides a fault information processing method, wherein the fault information processing apparatus acquires state management information of the data center at a plurality of time points; and determines state information of the N managed objects of the data center according to the state management information, where The status information is used to indicate the security status of the managed object; the status information of the N managed objects corresponding to the plurality of time points and each time point is recorded. The method provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs. Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the method provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
图1所示的实施例给出了本发明实施例提供的故障信息处理方法的基本流程,下面将提供一种更为细化的实施例,用于提供更为精准的故障定位,请参阅图2,其基本流程包括:The embodiment shown in FIG. 1 provides a basic flow of a fault information processing method provided by an embodiment of the present invention. A more detailed embodiment is provided below to provide a more accurate fault location. 2. The basic process includes:
201、在多个时刻点,获取数据中心的状态管理信息; 201. Acquire state management information of the data center at multiple time points;
故障信息处理装置在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态。The fault information processing device acquires state management information of the data center at a plurality of time points, and the state management information is used to describe an operating state of the data center.
其中,该多个时刻点可以为人为设定,也可以为故障信息处理装置默认设定,如故障信息处理装置默认每隔15min设置一个时刻点。该多个时刻点也可以通过其他方式确定,此处不做限定。The plurality of time points may be set manually or may be set by default for the fault information processing device. For example, the fault information processing device sets a time point every 15 minutes by default. The plurality of time points may also be determined by other means, and is not limited herein.
故障信息处理装置获取数据中心的状态管理信息的方法有很多,例如,数据中心可以包括配置库(CMDB,Configuration Management Database)、网管系统、日志系统、投诉保障系统、配置变更系统、工单系统中的一个或几个系统,故障信息处理装置可以从这些系统中主动获取数据中心的状态管理信息,或被动的接收这些系统发送的数据中心的状态管理信息。故障信息处理装置也可以通过其他方式获取数据中心的状态管理信息,此处不做限定。There are many methods for the fault information processing device to obtain state management information of the data center. For example, the data center may include a configuration library (CMDB, Configuration Management Database), a network management system, a log system, a complaint guarantee system, a configuration change system, and a work order system. One or several systems, the fault information processing device can actively acquire state management information of the data center from these systems, or passively receive state management information of the data center sent by these systems. The fault information processing device can also obtain the state management information of the data center by other means, which is not limited herein.
可选的,与数据中心的系统相对应的,数据中心的状态管理信息可以包括系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息,也可以包括其他信息,此处不做限定。Optionally, the data center state management information may include system configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint support information, corresponding to the data center system. And/or configuration change information, and/or work order information, and may also include other information, which is not limited herein.
202、根据状态管理信息,确定数据中心的N个被管理对象的状态信息;202. Determine, according to the state management information, status information of the N managed objects in the data center.
数据中心包括不少于一个的被管理对象,数据中心对这些被管理对象进行管理。其中,被管理对象可以为物理设备等实体对象,也可以为操作系统等软件对象,本实施例中不做限定。The data center includes no less than one managed object, and the data center manages these managed objects. The managed object may be an entity object such as a physical device, or may be a software object such as an operating system, which is not limited in this embodiment.
故障信息处理装置根据状态管理信息,确定数据中心的N个被管理对象的状态信息。其中,状态信息用于表示被管理对象的工作状态。The failure information processing device determines state information of the N managed objects of the data center based on the state management information. The status information is used to indicate the working status of the managed object.
可选的,故障信息处理装置可以根据数据中心的N个被管理对象的属性,将步骤201中获取的状态管理信息划分为N个被管理对象的状态信息。其中,被管理对象的属性可以包括被管理对象的设备名称、IP地址、设备编码、用户名中的一个或几个,也可以为其他的属性。例如故障信息处理装置可以根据被管理对象的IP地址,将数据中心的告警信息、和/或性能监控信息、和/或日志信息分为每个被管理对象的告警信息、和/或性能监控信息、和/或日志信息;或,根据被管理对象的资产编码,将数据中心的配置变更信息、和/或工单信息分为每个被管理对象的配置变更信息、和/或工单信息;或,根据被管理对象的设备名称,将数据中心的系统配置信息、和/或投诉保障信息分为每个被管理对象的配置信息、和/或投诉保障信息。将步骤201中获取的状态管理信息划分为 N个被管理对象的状态信息也可以为其它方法,此处不做限定。Optionally, the fault information processing apparatus may divide the state management information acquired in step 201 into state information of the N managed objects according to attributes of the N managed objects in the data center. The attribute of the managed object may include one or more of a device name, an IP address, a device code, and a user name of the managed object, or may be other attributes. For example, the fault information processing apparatus may divide the alarm information of the data center, and/or the performance monitoring information, and/or the log information into alarm information of each managed object, and/or performance monitoring information according to the IP address of the managed object. And/or log information; or, according to the asset code of the managed object, the configuration change information of the data center, and/or the work order information is divided into configuration change information of each managed object, and/or work order information; Or, according to the device name of the managed object, the system configuration information and/or the complaint guarantee information of the data center are divided into configuration information of each managed object, and/or complaint guarantee information. Divide the state management information obtained in step 201 into The status information of the N managed objects may also be other methods, which are not limited herein.
可选的,故障信息处理装置根据数据中心的N个被管理对象的属性,将步骤201中获取的状态管理信息划分为N个被管理对象的状态信息后,为了减少待记录的数据,还可以对状态信息做进一步的处理,如删除无效数据或重复数据(如日志中info信息)等。此处不做限定。Optionally, after the fault information processing device divides the state management information acquired in step 201 into the state information of the N managed objects according to the attributes of the N managed objects in the data center, in order to reduce the data to be recorded, Further processing of the status information, such as deleting invalid data or duplicate data (such as info information in the log). There is no limit here.
203、确定N个被管理对象之间的关联关系;203. Determine an association relationship between the N managed objects.
故障信息处理装置在步骤201中所述的多个时刻点,确定N个被管理对象之间的关联关系。该关联关系用于关联该N各被管理对象中,具有信息交互的被管理对象。The failure information processing apparatus determines the association relationship between the N managed objects at the plurality of time points described in step 201. The association relationship is used to associate a managed object having information interaction among the N managed objects.
204、记录该多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系。204. Record state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point.
故障信息处理装置记录该多个时刻点、每个时刻点对应的N个被管理对象的状态信息,以及每个时刻点对应的N个被管理对象之间的关联关系,使得用户在进行故障定位的时候,能够从保存的多个时刻点与对应的状态信息中,查找故障发生时刻每个被管理对象的安全状态,进而准确的定位是哪一个被管理对象发生故障。特别的,由于有些时候,数据中心的故障并不是被管理对象本身发生故障,而是两个或多个被管理对象之间信息交互的通道发生了故障。因此,用户在进行故障定位时,还可以结合故障时刻对应的N个被管理对象之间的关联关系来进行分析,判断发生故障的究竟是被管理对象本身,还是被管理对象之间的信息交互的通道。The fault information processing device records state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point, so that the user performs fault location. In the case of a plurality of saved time points and corresponding state information, the security state of each managed object at the time of occurrence of the failure can be searched for, and the target object can be accurately located. In particular, because sometimes the failure of the data center is not caused by the failure of the managed object itself, but the channel of information exchange between two or more managed objects fails. Therefore, when performing fault location, the user may also analyze the association relationship between the N managed objects corresponding to the fault time to determine whether the fault is caused by the managed object itself or the information interaction between the managed objects. Channel.
本实施例提供了一种故障信息处理方法,其中故障信息处理装置在多个时刻点,获取数据中心的状态管理信息;根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;确定N个被管理对象之间的关联关系;记录多个时刻点、每个时刻点对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系。本实施例提供的方法将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以直接查找故障发生前后时刻所保存的信息,根据故障发生前后时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本实施例提供的方法能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。且本实施例中还记录了多个时刻点时N个被管理对象之间的关联关系,为 用户进行故障定位提供了进一步的参考,使得用户能够进行更为精准的故障定位。The embodiment provides a fault information processing method, wherein the fault information processing apparatus acquires state management information of the data center at a plurality of time points; and determines state information of the N managed objects of the data center according to the state management information, where The status information is used to indicate the security status of the managed object; the association relationship between the N managed objects is determined; and the status information of the N managed objects corresponding to the plurality of time points and each time point and the N managed objects are recorded. The relationship between them. The method provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs. Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the method provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location. In this embodiment, the association relationship between the N managed objects at multiple time points is also recorded. The user provides further reference for fault location, enabling the user to perform more accurate fault location.
用户在进行故障定位时,可以使用客户端来从故障信息处理装置中查找故障时刻对应的信息,因此可选的,作为本发明的又一个实施例,在步骤204之后,故障信息处理装置还可以接收客户端发送的故障查找指令,该故障查找指令包括故障发生时刻;故障信息处理装置从记录的多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系中,查找故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系,并将故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系反馈给客户端,使得用户能通过客户端获取故障信息处理装置的查找结果。其中,故障发生时刻对应的N个被管理对象的状态信息,可以为故障发生时刻前后预置时间段(如故障发生时刻前30分钟至故障发生时刻后20分钟的时间段)内,故障信息处理装置所保存的N个被管理对象的状态信息。When the fault location is performed, the user can use the client to search for information corresponding to the fault time from the fault information processing device. Therefore, as a further embodiment of the present invention, after the step 204, the fault information processing device can also Receiving a fault finding instruction sent by the client, where the fault finding command includes a fault occurrence time; the fault information processing device records the state information of the N managed objects corresponding to each time point from the plurality of recorded time points, and each time point In the association relationship between the corresponding N managed objects, the relationship between the state information of the N managed objects corresponding to the failure occurrence time and the N managed objects is found, and the N times corresponding to the failure occurrence time are The association between the state information of the management object and the N managed objects is fed back to the client, so that the user can obtain the search result of the fault information processing device through the client. The status information of the N managed objects corresponding to the fault occurrence time may be a preset time period before and after the fault occurrence time (such as 30 minutes before the fault occurrence time and 20 minutes after the fault occurrence time), and the fault information is processed. Status information of N managed objects stored by the device.
为了便于理解上述实施例,下面将以上述实施例的一个具体应用场景为例进行描述。In order to facilitate the understanding of the above embodiments, a specific application scenario of the foregoing embodiment will be described as an example.
故障信息处理装置每隔15min,从数据中心的网管系统获取数据中心的告警信息、从数据中心的日志系统获取数据中心的日志信息、从数据中心的配置变更系统获取数据中心的配置变更信息、从数据中心的工单系统获取数据中心的工单信息。The fault information processing device acquires the alarm information of the data center from the network management system of the data center every 15 minutes, obtains the log information of the data center from the log system of the data center, and obtains the configuration change information of the data center from the configuration change system of the data center. The work order system of the data center obtains the work order information of the data center.
数据中心包括三个被管理对象,分别为网络设备A,存储设备B与计算设备C。故障信息处理装置将获取到的数据中心的告警信息和日志信息,按照设备A、B、C的IP地址进行划分,分为设备A的告警信息和日志信息、设备B的告警信息和日志信息和设备C的告警信息和日志信息,将获取到的数据中心的配置变更信息和工单信息,按照设备A、B、C的资产编码进行划分,分为设备A的配置变更信息和工单信息、设备B的配置变更信息和工单信息和设备C的配置变更信息和工单信息。The data center includes three managed objects, namely, network device A, storage device B, and computing device C. The fault information processing device divides the obtained alarm information and log information of the data center according to the IP addresses of the devices A, B, and C, and is classified into the alarm information and the log information of the device A, the alarm information and the log information of the device B, and the log information. The alarm information and log information of device C are divided into the configuration change information and work order information of the data center according to the asset codes of devices A, B, and C, and are classified into configuration change information and work order information of device A. Configuration change information and work order information of device B and configuration change information and work order information of device C.
故障信息处理装置确定设备A、B、C的关联关系,其中,设备A与设备之间有信息交互,设备B与设备C之间有信息交互。 The fault information processing device determines the association relationship between the devices A, B, and C. The device A has information interaction with the device, and the device B and the device C have information interaction.
故障信息处理装置这些时刻点,以及这些时刻点对应的设备A、B、C的告警信息、日志信息、配置变更信息、工单信息以及设备A、B、C的关联关系记录下来。The fault information processing device records the time points, and the alarm information, log information, configuration change information, work order information, and the relationship between the devices A, B, and C of the devices A, B, and C corresponding to the time points.
用户使用客户端来从故障信息处理装置中查找故障时刻对应的信息,故障信息处理装置接收用户客户端发送的故障查找指令,该故障查找指令包括故障发生时刻为10:22am;故障信息处理装置从记录的信息中查找到10:00am、10:15am以及10:30am时,设备A、B、C的告警信息、日志信息、配置变更信息、工单信息以及设备A、B、C的关联关系,故障信息处理装置将查找结果反馈给客户端,该查找结果显示10:15am时,设备A的告警信息显示设备A掉电。用户根据该告警信息,将发生故障的被管理对象定位为设备A。The user uses the client to search for information corresponding to the fault time from the fault information processing device, and the fault information processing device receives the fault finding command sent by the user client, where the fault finding command includes a fault occurrence time of 10:22 am; the fault information processing device The alarm information, log information, configuration change information, work order information, and the relationship between devices A, B, and C of devices A, B, and C are found in the recorded information at 10:00 am, 10:15 am, and 10:30 am. The fault information processing device feeds back the search result to the client. When the search result is displayed at 10:15 am, the alarm information of the device A indicates that the device A is powered off. The user locates the failed managed object as device A according to the alarm information.
上面的实施例提供了一种故障信息处理方法,下面的实施例将提供一种故障信息处理装置,用于实现上述方法,其基本结构请参阅图3,包括:The above embodiment provides a fault information processing method. The following embodiment provides a fault information processing apparatus for implementing the above method. The basic structure of the method is as shown in FIG. 3, including:
信息获取模块301,用于在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态;The information obtaining module 301 is configured to acquire state management information of the data center, where the state management information is used to describe an operating state of the data center, at multiple time points;
安全确定模块302,用于根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的工作状态;The security determining module 302 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
信息记录模块303,用于记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。The information recording module 303 is configured to record the plurality of time points and the state information of the N managed objects corresponding to each time point.
本实施例提供了一种故障信息处理装置,其中信息获取模块301在多个时刻点,获取数据中心的状态管理信息;安全确定模块302根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;信息记录模块303记录多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。本实施例提供的装置将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以直接查找故障发生前后时刻所保存的信息,根据故障发生前后时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本实施例提供的装置能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。The embodiment provides a fault information processing apparatus, wherein the information acquiring module 301 acquires state management information of the data center at a plurality of time points; the security determining module 302 determines the N managed objects of the data center according to the state management information. The status information is used to indicate the security status of the managed object; the information recording module 303 records the plurality of time points and the status information of the N managed objects corresponding to each time point. The device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs. Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location.
图3所示的实施例给出了本发明实施例提供的故障信息处理装置的基本结构,下面将提供一种更为细化的实施例,用于提供更为精准的故障定位,请参阅图4,其基本结构包括: The embodiment shown in FIG. 3 shows the basic structure of the fault information processing apparatus provided by the embodiment of the present invention. A more detailed embodiment is provided below to provide more accurate fault location. 4. Its basic structure includes:
信息获取模块401,用于在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态;The information obtaining module 401 is configured to acquire state management information of the data center at a plurality of time points, where the state management information is used to describe an operating state of the data center;
安全确定模块402,用于根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的工作状态;The security determining module 402 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
关联确定模块403,用于在信息记录模块记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息之前,确定该N个被管理对象之间的关联关系;The association determining module 403 is configured to determine an association relationship between the N managed objects before the information recording module records the plurality of time points and the state information of the N managed objects corresponding to each time point;
信息记录模块404,用于记录该多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系。The information recording module 404 is configured to record state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point.
可选的,数据中心的状态管理信息可以包括:系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。Optionally, the state management information of the data center may include: system configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint support information, and/or configuration change information, And/or work order information.
可选的,安全确定模块具体可以用于:根据所述数据中心的N个被管理对象的属性,将所述状态管理信息划分为所述N个被管理对象的状态信息,所述被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或被管理对象的用户名。Optionally, the security determining module may be configured to: divide the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the managed object is The attributes include: the device name of the managed object, and/or the IP address of the managed object, and/or the device code of the managed object, and/or the username of the managed object.
本实施例提供了一种故障信息处理装置,其中信息获取模块401在多个时刻点,获取数据中心的状态管理信息;安全确定模块402根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;关联确定模块403确定N个被管理对象之间的关联关系;信息记录模块404记录多个时刻点、每个时刻点对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系。本实施例提供的装置将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以直接查找故障发生前后时刻所保存的信息,根据故障发生前后时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本实施例提供的装置能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。且本实施例中信息记录模块404还记录了多个时刻点时N个被管理对象之间的关联关系,为用户进行故障定位提供了进一步的参考,使得用户能够进行更为精准的故障定位。 The embodiment provides a fault information processing apparatus, wherein the information acquiring module 401 acquires state management information of the data center at a plurality of time points; the security determining module 402 determines the N managed objects of the data center according to the state management information. Status information, the status information is used to indicate the security status of the managed object; the association determination module 403 determines an association relationship between the N managed objects; the information recording module 404 records a plurality of time points, N corresponding to each time point The relationship between the state information of the managed object and the N managed objects. The device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved at the time before and after the fault occurs, according to each time before and after the fault occurs. Manage the security status of the object for accurate fault location, eliminating the need to manually search for massive state management information or manually analyzing state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location. In the embodiment, the information recording module 404 also records the association relationship between the N managed objects at multiple time points, which provides a further reference for the user to perform fault location, so that the user can perform more accurate fault location.
图4所示的实施例给出了本发明实施例提供的一种较为细化的故障信息处理装置的基本结构,下面将提供一种更为细化的故障信息处理装置,该装置能够与客户端进行信息交互,请参阅图5,其基本结构包括:The embodiment shown in FIG. 4 provides a basic structure of a more detailed fault information processing apparatus according to an embodiment of the present invention. A more detailed fault information processing apparatus is provided below, which can be used with a client. For information interaction, please refer to Figure 5. The basic structure includes:
信息获取模块501,用于在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态;The information obtaining module 501 is configured to acquire state management information of the data center, where the state management information is used to describe an operating state of the data center, at multiple time points;
安全确定模块502,用于根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的工作状态;The security determining module 502 is configured to determine, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object.
关联确定模块503,用于在信息记录模块记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息之前,确定该N个被管理对象之间的关联关系;The association determining module 503 is configured to determine an association relationship between the N managed objects before the information recording module records the plurality of time points and the state information of the N managed objects corresponding to each time point;
信息记录模块504,用于记录该多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系。The information recording module 504 is configured to record state information of the N managed objects corresponding to the plurality of time points and each time point, and an association relationship between the N managed objects corresponding to each time point.
指令接收模块505,用于接收客户端发送的故障查找指令,该故障查找指令包括故障发生时刻;The instruction receiving module 505 is configured to receive a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
故障查找模块506,用于从记录的多个时刻点以及每个时刻点对应的N个被管理对象的状态信息中,查找故障发生时刻对应的N个被管理对象的状态信息;The fault finding module 506 is configured to search for status information of the N managed objects corresponding to the fault occurrence time from the recorded plurality of time points and the status information of the N managed objects corresponding to each time point;
故障反馈模块507,用于将故障发生时刻对应的N个被管理对象的状态信息反馈给客户端。The fault feedback module 507 is configured to feed back state information of the N managed objects corresponding to the fault occurrence time to the client.
本实施例提供了一种故障信息处理装置,其中信息获取模块501在多个时刻点,获取数据中心的状态管理信息;安全确定模块502根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的安全状态;关联确定模块503确定N个被管理对象之间的关联关系;信息记录模块504记录多个时刻点、每个时刻点对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系。本实施例提供的装置将数据中心的状态管理信息按照被管理对象进行分类保存,这样在进行故障定位的时候,用户可以根据保存的信息,直接查找定位到故障发生前后时刻所保存的信息,根据故障发生前后时刻该时刻每个被管理对象的安全状态进行准确的故障定位,无需人工查找海量的状态管理信息,也无需人工对状态管理信息进行分析。因此,本实施例提供的装置能够减少故障定位的时长,简化故障定位的操作,提高故障定位的可靠性。且本实施例中信息记录模块504还记录了多个时刻点时N个被管 理对象之间的关联关系,为用户进行故障定位提供了进一步的参考,使得用户能够进行更为精准的故障定位。同时,指令接收模块505能够接收客户端发送的故障查找指令;故障查找模块506从记录的多个时刻点以及每个时刻点对应的N个被管理对象的状态信息中查找故障发生时刻对应的N个被管理对象的状态信息;故障反馈模块507将故障发生时刻对应的N个被管理对象的状态信息反馈给客户端,这样就使得用户能通过客户端获取故障信息处理装置的查找结果。The embodiment provides a fault information processing apparatus, wherein the information acquiring module 501 acquires state management information of the data center at a plurality of time points; the security determining module 502 determines the N managed objects of the data center according to the state management information. Status information, the status information is used to indicate the security status of the managed object; the association determination module 503 determines an association relationship between the N managed objects; the information recording module 504 records a plurality of time points, N corresponding to each time point The relationship between the state information of the managed object and the N managed objects. The device provided in this embodiment classifies the state management information of the data center according to the managed object, so that when the fault is located, the user can directly search for the information saved before and after the fault occurs according to the saved information, according to the information saved according to the saved information. Accurate fault location of each managed object's security state at the moment before and after the failure occurs, without the need to manually search for massive state management information, and without manual analysis of state management information. Therefore, the device provided in this embodiment can reduce the duration of fault location, simplify the operation of fault location, and improve the reliability of fault location. In the embodiment, the information recording module 504 also records N times when a plurality of time points are recorded. The relationship between the objects provides a further reference for the user to locate the fault, enabling the user to perform more accurate fault location. At the same time, the command receiving module 505 can receive the fault finding command sent by the client; the fault finding module 506 searches for the N corresponding to the fault occurrence time from the recorded plurality of time points and the state information of the N managed objects corresponding to each time point. The status information of the managed objects is forwarded to the client by the fault feedback module 507, so that the user can obtain the search result of the fault information processing device through the client.
为了便于理解上述实施例,下面将以上述实施例的一个具体应用场景为例进行描述。In order to facilitate the understanding of the above embodiments, a specific application scenario of the foregoing embodiment will be described as an example.
信息获取模块501每隔15min,从数据中心的网管系统获取数据中心的告警信息、从数据中心的日志系统获取数据中心的日志信息、从数据中心的配置变更系统获取数据中心的配置变更信息、从数据中心的工单系统获取数据中心的工单信息。The information acquisition module 501 acquires the alarm information of the data center from the network management system of the data center every 15 minutes, obtains the log information of the data center from the log system of the data center, and obtains the configuration change information of the data center from the configuration change system of the data center. The work order system of the data center obtains the work order information of the data center.
数据中心包括三个被管理对象,分别为网络设备A,存储设备B与计算设备C。安全确定模块502将获取到的数据中心的告警信息和日志信息,按照设备A、B、C的IP地址进行划分,分为设备A的告警信息和日志信息、设备B的告警信息和日志信息和设备C的告警信息和日志信息,将获取到的数据中心的配置变更信息和工单信息,按照设备A、B、C的资产编码进行划分,分为设备A的配置变更信息和工单信息、设备B的配置变更信息和工单信息和设备C的配置变更信息和工单信息。The data center includes three managed objects, namely, network device A, storage device B, and computing device C. The security determination module 502 divides the obtained alarm information and the log information of the data center according to the IP addresses of the devices A, B, and C, and is classified into the alarm information and the log information of the device A, the alarm information and the log information of the device B, and the log information. The alarm information and log information of device C are divided into the configuration change information and work order information of the data center according to the asset codes of devices A, B, and C, and are classified into configuration change information and work order information of device A. Configuration change information and work order information of device B and configuration change information and work order information of device C.
关联确定模块503确定设备A、B、C的关联关系,其中,设备A与设备之间有信息交互,设备B与设备C之间有信息交互。The association determination module 503 determines the association relationship between the devices A, B, and C. The device A has information interaction with the device, and the device B and the device C have information interaction.
信息记录模块504将这些时刻点,以及这些时刻点对应的设备A、B、C的告警信息、日志信息、配置变更信息、工单信息以及设备A、B、C的关联关系记录下来。The information recording module 504 records the time points, the alarm information of the devices A, B, and C corresponding to the time points, the log information, the configuration change information, the work order information, and the association relationship between the devices A, B, and C.
用户使用客户端来从故障信息处理装置中查找故障时刻对应的信息,指令接收模块505接收用户客户端发送的故障查找指令,该故障查找指令包括故障发生时刻为10:22am;故障查找模块506从记录的信息中查找到10:00am、10:15am以及10:30am时,设备A、B、C的告警信息、日志信息、配置变更信息、工单信息以及设备A、B、C的关联关系,故障反馈模块507将查找结果 反馈给客户端,该查找结果显示10:15am时,设备A的告警信息显示设备A掉电。用户根据该告警信息,将发生故障的被管理对象定位为设备A。The user uses the client to search for information corresponding to the fault time from the fault information processing device, and the command receiving module 505 receives the fault finding command sent by the user client, where the fault finding command includes a fault occurrence time of 10:22 am; the fault finding module 506 The alarm information, log information, configuration change information, work order information, and the relationship between devices A, B, and C of devices A, B, and C are found in the recorded information at 10:00 am, 10:15 am, and 10:30 am. The fault feedback module 507 will find the result Feedback to the client. When the search result shows 10:15am, the alarm information of device A indicates that device A is powered off. The user locates the failed managed object as device A according to the alarm information.
上面从单元化功能实体的角度对本发明实施例中的故障信息处理装置进行了描述,下面从硬件处理的角度对本发明实施例中的故障信息处理装置进行描述,请参阅图6,本发明实施例中的故障信息处理装置600另一实施例包括:The fault information processing apparatus in the embodiment of the present invention is described above from the perspective of the unitized functional entity. The fault information processing apparatus in the embodiment of the present invention is described below from the perspective of hardware processing. Referring to FIG. 6, the embodiment of the present invention is described. Another embodiment of the fault information processing apparatus 600 includes:
输入装置601、输出装置602、处理器603和存储器604(其中故障信息处理装置600中的处理器603的数量可以一个或多个,图6中以一个处理器603为例)。在本发明的一些实施例中,输入装置601、输出装置602、处理器603和存储器604可通过总线或其它方式连接,其中,图6中以通过总线连接为例。The input device 601, the output device 602, the processor 603, and the memory 604 (wherein the number of processors 603 in the fault information processing device 600 may be one or more, and one processor 603 is taken as an example in FIG. 6). In some embodiments of the present invention, the input device 601, the output device 602, the processor 603, and the memory 604 may be connected by a bus or other means, wherein the bus connection is taken as an example in FIG.
其中,通过调用存储器604存储的操作指令,处理器603用于执行如下步骤:The processor 603 is configured to perform the following steps by calling an operation instruction stored in the memory 604:
在多个时刻点,获取数据中心的状态管理信息,该状态管理信息用于描述数据中心的运行状态;Obtaining state management information of the data center at a plurality of time points, where the state management information is used to describe an operating state of the data center;
根据状态管理信息,确定数据中心的N个被管理对象的状态信息,该状态信息用于表示被管理对象的工作状态;Determining state information of the N managed objects of the data center according to the state management information, where the state information is used to indicate the working state of the managed object;
记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息。The plurality of time points and the state information of the N managed objects corresponding to each time point are recorded.
本发明的一些实施例中,处理器603还执行如下步骤:In some embodiments of the invention, the processor 603 further performs the following steps:
在记录该多个时刻点以及每个时刻点对应的N个被管理对象的状态信息之前,确定N个被管理对象之间的关联关系;Determining an association relationship between the N managed objects before recording the plurality of time points and status information of the N managed objects corresponding to each time point;
记录该多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系。The state information of the N managed objects corresponding to the plurality of time points and each time point, and the relationship between the N managed objects corresponding to each time point are recorded.
本发明的一些实施例中,数据中心的状态管理信息包括:In some embodiments of the present invention, the state management information of the data center includes:
系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。System configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
本发明的一些实施例中,处理器603还执行如下步骤:In some embodiments of the invention, the processor 603 further performs the following steps:
根据数据中心的N个被管理对象的属性,将状态管理信息划分为N个被管理对象的状态信息,该被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或被管理对象的用户名。The state management information is divided into state information of N managed objects according to attributes of the N managed objects in the data center, and the attributes of the managed object include: a device name of the managed object, and/or an IP of the managed object The address, and/or the device code of the managed object, and/or the username of the managed object.
本发明的一些实施例中,处理器603还执行如下步骤:In some embodiments of the invention, the processor 603 further performs the following steps:
接收客户端发送的故障查找指令,该故障查找指令包括故障发生时刻; Receiving a fault finding instruction sent by the client, where the fault finding instruction includes a moment when the fault occurs;
从记录的多个时刻点、每个时刻点对应的N个被管理对象的状态信息、以及每个时刻点对应的N个被管理对象之间的关联关系中,查找故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系;Find N times corresponding to the occurrence time of the failure from the plurality of recorded time points, the state information of the N managed objects corresponding to each time point, and the relationship between the N managed objects corresponding to each time point. The relationship between the state information of the managed object and the N managed objects;
将故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系反馈给客户端。The relationship between the state information of the N managed objects corresponding to the failure occurrence time and the N managed objects is fed back to the client.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随 机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), A medium that can store program code, such as a RAM (Random Access Memory), a disk, or an optical disk.
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。 The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims (10)

  1. 一种故障信息处理方法,适用于数据中心,其特征在于,所述数据中心包括被管理对象,所述方法包括:A fault information processing method is applicable to a data center, wherein the data center includes a managed object, and the method includes:
    在多个时刻点,获取所述数据中心的状态管理信息,所述状态管理信息用于描述所述数据中心的运行状态;Obtaining, in a plurality of time points, state management information of the data center, where the state management information is used to describe an operating state of the data center;
    根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息,所述状态信息用于表示所述被管理对象的工作状态;Determining, according to the state management information, state information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object;
    记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息。The plurality of time points and status information of the N managed objects corresponding to each of the time points are recorded.
  2. 根据权利要求1所述的故障信息处理方法,其特征在于,所述记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息之前还包括:The fault information processing method according to claim 1, wherein the recording of the plurality of time points and the state information of the N managed objects corresponding to each of the time points further comprises:
    确定所述N个被管理对象之间的关联关系;Determining an association relationship between the N managed objects;
    所述记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息包括:And the status information of the N managed objects corresponding to the plurality of time points and each of the time points is recorded:
    记录所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系。The plurality of time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points are recorded.
  3. 根据权利要求2所述的故障信息处理方法,其特征在于,所述数据中心的状态管理信息包括:The fault information processing method according to claim 2, wherein the state management information of the data center comprises:
    系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。System configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  4. 根据权利要求2或3所述的故障信息处理方法,其特征在于,所述根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息包括:The fault information processing method according to claim 2 or 3, wherein the determining the state information of the N managed objects of the data center according to the state management information comprises:
    根据所述数据中心的N个被管理对象的属性,将所述状态管理信息划分为所述N个被管理对象的状态信息,所述被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或被管理对象的用户名。And dividing the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or the username of the managed object.
  5. 根据权利要求2或3所述的故障信息处理方法,其特征在于,所述方法还包括:The fault information processing method according to claim 2 or 3, wherein the method further comprises:
    接收客户端发送的故障查找指令,所述故障查找指令包括故障发生时刻; Receiving a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
    从记录的所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系中,查找所述故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系;Searching from the plurality of recorded time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points Describe the relationship between the state information of the N managed objects corresponding to the time when the failure occurred and the N managed objects;
    将所述故障发生时刻对应的N个被管理对象的状态信息与N个被管理对象之间的关联关系反馈给所述客户端。The relationship between the state information of the N managed objects corresponding to the failure occurrence time and the N managed objects is fed back to the client.
  6. 一种故障信息处理装置,适用于数据中心,其特征在于,所述数据中心包括被管理对象,所述装置包括:A fault information processing apparatus is applicable to a data center, wherein the data center includes a managed object, and the apparatus includes:
    信息获取模块,用于在多个时刻点,获取所述数据中心的状态管理信息,所述状态管理信息用于描述所述数据中心的运行状态;An information acquiring module, configured to acquire state management information of the data center at multiple time points, where the state management information is used to describe an operating state of the data center;
    安全确定模块,用于根据所述状态管理信息,确定所述数据中心的N个被管理对象的状态信息,所述状态信息用于表示所述被管理对象的工作状态;a security determining module, configured to determine, according to the state management information, status information of the N managed objects of the data center, where the state information is used to indicate an operating state of the managed object;
    信息记录模块,用于记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息。The information recording module is configured to record state information of the plurality of managed objects corresponding to the plurality of time points and each of the time points.
  7. 根据权利要求6所述的故障信息处理装置,其特征在于,所述装置还包括:The fault information processing apparatus according to claim 6, wherein the apparatus further comprises:
    关联确定模块,用于在所述信息记录模块记录所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息之前,确定所述N个被管理对象之间的关联关系;An association determining module, configured to determine an association between the N managed objects before the information recording module records the plurality of time points and status information of the N managed objects corresponding to each of the time points relationship;
    所述信息记录模块具体用于:The information recording module is specifically configured to:
    记录所述多个时刻点、每个所述时刻点对应的N个被管理对象的状态信息、以及每个所述时刻点对应的N个被管理对象之间的关联关系。The plurality of time points, the state information of the N managed objects corresponding to each of the time points, and the relationship between the N managed objects corresponding to each of the time points are recorded.
  8. 根据权利要求7所述的故障信息处理装置,其特征在于,所述数据中心的状态管理信息包括:The fault information processing apparatus according to claim 7, wherein the state management information of the data center comprises:
    系统配置信息、和/或告警信息、和/或性能监控信息、和/或日志信息、和/或投诉保障信息、和/或配置变更信息、和/或工单信息。System configuration information, and/or alarm information, and/or performance monitoring information, and/or log information, and/or complaint assurance information, and/or configuration change information, and/or work order information.
  9. 根据权利要求7或8所述的故障信息处理装置,其特征在于,所述安全确定模块具体用于:The fault information processing apparatus according to claim 7 or 8, wherein the security determining module is specifically configured to:
    根据所述数据中心的N个被管理对象的属性,将所述状态管理信息划分为所述N个被管理对象的状态信息,所述被管理对象的属性包括:被管理对象的设备名称、和/或被管理对象的IP地址、和/或被管理对象的设备编码、和/或 被管理对象的用户名。And dividing the state management information into state information of the N managed objects according to attributes of the N managed objects in the data center, where the attributes of the managed object include: a device name of the managed object, and / or the IP address of the managed object, and / or the device code of the managed object, and / or The username of the managed object.
  10. 根据权利要求7或8所述的故障信息处理装置,其特征在于,所述装置还包括:The fault information processing apparatus according to claim 7 or 8, wherein the apparatus further comprises:
    指令接收模块,用于接收客户端发送的故障查找指令,所述故障查找指令包括故障发生时刻;An instruction receiving module, configured to receive a fault finding instruction sent by the client, where the fault finding instruction includes a fault occurrence time;
    故障查找模块,用于从记录的所述多个时刻点以及每个所述时刻点对应的N个被管理对象的状态信息中,查找所述故障发生时刻对应的N个被管理对象的状态信息;The fault finding module is configured to search for status information of the N managed objects corresponding to the fault occurrence time from the recorded plurality of time points and the status information of the N managed objects corresponding to each of the time points ;
    故障反馈模块,用于将所述故障发生时刻对应的N个被管理对象的状态信息反馈给所述客户端。 The fault feedback module is configured to feed back status information of the N managed objects corresponding to the fault occurrence time to the client.
PCT/CN2015/096567 2014-12-16 2015-12-07 Fault information processing method and related device WO2016095716A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410784311.1 2014-12-16
CN201410784311.1A CN104539449B (en) 2014-12-16 2014-12-16 A kind of failure information processing method and relevant apparatus

Publications (1)

Publication Number Publication Date
WO2016095716A1 true WO2016095716A1 (en) 2016-06-23

Family

ID=52854918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096567 WO2016095716A1 (en) 2014-12-16 2015-12-07 Fault information processing method and related device

Country Status (2)

Country Link
CN (2) CN104539449B (en)
WO (1) WO2016095716A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104539449B (en) * 2014-12-16 2019-02-19 华为技术有限公司 A kind of failure information processing method and relevant apparatus
CN106909550A (en) * 2015-12-22 2017-06-30 中国移动通信集团吉林有限公司 A kind of data handling system and method
CN111401577A (en) * 2020-02-14 2020-07-10 上海电气分布式能源科技有限公司 Device management method, device and storage medium
CN111782437B (en) * 2020-07-10 2023-08-11 中国工商银行股份有限公司 Fault positioning method, device, computing equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150718A1 (en) * 2007-12-11 2009-06-11 Choon-Seo Park Large-scale cluster monitoring system, and method of automatically building/restoring the same
CN102546274A (en) * 2010-12-20 2012-07-04 中国移动通信集团广西有限公司 Alarm monitoring method and alarm monitoring equipment in communication service
US20140189086A1 (en) * 2013-01-03 2014-07-03 Microsoft Corporation Comparing node states to detect anomalies
CN104539449A (en) * 2014-12-16 2015-04-22 华为技术有限公司 Handling method and related device for fault information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0272742A (en) * 1988-09-07 1990-03-13 Nec Corp Data error location detecting system
CN101304340B (en) * 2007-05-09 2011-09-21 华为技术有限公司 Method and apparatus for monitoring resource condition as well as communication network
CN102739415A (en) * 2011-03-31 2012-10-17 华为技术有限公司 Method and device for determining network failure data and recording network instantaneous state data
CN104184826A (en) * 2014-09-05 2014-12-03 浪潮(北京)电子信息产业有限公司 Multi-data-center storage environment managing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150718A1 (en) * 2007-12-11 2009-06-11 Choon-Seo Park Large-scale cluster monitoring system, and method of automatically building/restoring the same
CN102546274A (en) * 2010-12-20 2012-07-04 中国移动通信集团广西有限公司 Alarm monitoring method and alarm monitoring equipment in communication service
US20140189086A1 (en) * 2013-01-03 2014-07-03 Microsoft Corporation Comparing node states to detect anomalies
CN104539449A (en) * 2014-12-16 2015-04-22 华为技术有限公司 Handling method and related device for fault information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WAN, RUIHUA. ET AL.: "To Take Full Advantage of N2000 Network Management and Improve the Comprehensive Control of Telecommunication Networks", JOURNAL OF XI'AN UNIVERSITY OF POST AND TELECOMMUNICATIONS, vol. 12, no. 03, 31 May 2007 (2007-05-31) *
ZHANG, YANLING ET AL.: "To Design Exception Handling Module of the Digital Microwave Monitor and Control System", MICROCOMPUTER INFORMATION, vol. 19, no. 06, 30 June 2003 (2003-06-30) *

Also Published As

Publication number Publication date
CN104539449B (en) 2019-02-19
CN109921920A (en) 2019-06-21
CN104539449A (en) 2015-04-22

Similar Documents

Publication Publication Date Title
US20220206889A1 (en) Automatic correlation of dynamic system events within computing devices
US11196756B2 (en) Identifying notable events based on execution of correlation searches
TWI728036B (en) Information processing method, device and system
US9355007B1 (en) Identifying abnormal hosts using cluster processing
WO2020029407A1 (en) Alarm data management method and apparatus, and computer device and storage medium
US10133622B2 (en) Enhanced error detection in data synchronization operations
JP2019503525A (en) Event batch processing, output sequencing, and log-based state storage in continuous query processing
EP3411795B1 (en) Cloud-based platform instrumentation and monitoring system for maintenance of user-configured programs
US20200351190A1 (en) Virtual Probes
US10896198B2 (en) Scaling for elastic query service system
WO2016095716A1 (en) Fault information processing method and related device
US20160191369A1 (en) Monitoring support system, monitoring support method, and recording medium
EP3178004B1 (en) Recovering usability of cloud based service from system failure
WO2018233630A1 (en) Fault discovery
US10303678B2 (en) Application resiliency management using a database driver
JP2020057416A (en) Method and device for processing data blocks in distributed database
US11194649B2 (en) Early diagnosis of hardware, software or configuration problems in data warehouse system utilizing grouping of queries based on query parameters
CN110737891A (en) host intrusion detection method and device
US10033737B2 (en) System and method for cross-cloud identity matching
CN110309206B (en) Order information acquisition method and system
US10855563B2 (en) Supplementing log messages with metadata
CN110928885B (en) Method and device for updating data of Mysql database to Es database
US20160085638A1 (en) Computer system and method of identifying a failure
CN111258845A (en) Detection of event storms
CN112528327A (en) Data desensitization method and device and data restoration method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15869224

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15869224

Country of ref document: EP

Kind code of ref document: A1