CN110535701B

CN110535701B - Problem positioning method and device

Info

Publication number: CN110535701B
Application number: CN201910816165.9A
Authority: CN
Inventors: 刘浩
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2022-07-01
Anticipated expiration: 2039-08-30
Also published as: CN110535701A

Abstract

The invention provides a problem positioning method and a problem positioning device, wherein the method comprises the following steps: acquiring operation and maintenance data from a technical domain tool; distributing identification information for the operation and maintenance data according to the key attribute of the operation and maintenance data; maintaining a unified operation and maintenance interface according to the operation and maintenance data and the identification information of the operation and maintenance data, and positioning problems based on the unified operation and maintenance interface; and the unified operation and maintenance interface records alarms of different technical domains. The embodiment of the invention can realize the unified operation and maintenance of the multi-technology-domain scenes.

Description

Problem positioning method and device

Technical Field

The present invention relates to the field of network communication technologies, and in particular, to a problem location method and apparatus.

Background

The traditional operation and maintenance diagnosis is mainly based on the experience of operation and maintenance personnel and manual analysis by means of data and capability provided by a plurality of different operation and maintenance tools such as scattered monitoring, flow, automation and the like, and various problems encountered on site are tried to be solved.

The operation and maintenance of a new era is a cloud era, and with the development of containerization technology, the traditional single-domain operation and maintenance is difficult to meet the current complex multi-domain operation and maintenance requirements, and how to realize multi-domain unified operation and maintenance becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of this, the present invention provides a problem location method and apparatus, so as to solve the problem that the prior art cannot implement multi-domain unified operation and maintenance.

According to a first aspect of the embodiments of the present invention, there is provided a problem location method applied to a unified operation and maintenance system, the method including:

acquiring operation and maintenance data from a technical domain tool; the technology domain tool comprises a plurality of technology domain tools of different technical fields;

distributing identification information for the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different;

maintaining a unified operation and maintenance interface according to the operation and maintenance data and the identification information of the operation and maintenance data, and positioning problems based on the unified operation and maintenance interface; and the unified operation and maintenance interface records alarms of different technical domains.

According to a second aspect of the embodiments of the present invention, there is provided a problem location apparatus, applied to a unified operation and maintenance system, the apparatus including:

the acquisition unit is used for acquiring operation and maintenance data from the technical field tool; the technology domain tool comprises a plurality of technology domain tools of different technical fields;

the distribution unit is used for distributing identification information to the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different;

the maintenance unit is used for maintaining a unified operation and maintenance interface according to the operation and maintenance data and the identification information of the operation and maintenance data; alarms of different technical fields are recorded in the unified operation and maintenance interface;

and the positioning unit is used for positioning the problems based on the unified operation and maintenance interface.

According to a third aspect of embodiments of the present invention, there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-readable instructions executable by the processor, the processor being caused by the machine-readable instructions to perform the problem locating method described above.

According to a fourth aspect of embodiments of the present invention, there is provided a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the problem location method described above.

By applying the embodiment of the invention, the operation and maintenance data are obtained from the multiple technical domain tools, and the identification information is distributed to the operation and maintenance data according to the key attributes of the obtained operation and maintenance data, so that the operation and maintenance data are unified, a unified operation and maintenance interface is maintained according to the operation and maintenance data and the identification information of the operation and maintenance data, the problem is positioned based on the unified operation and maintenance interface, and the unified operation and maintenance of multiple technical domain scenes are realized.

Drawings

FIG. 1 is a flow chart of a problem location method according to an embodiment of the present invention;

fig. 2A is a schematic diagram of a main interface of a unified operation and maintenance system according to an embodiment of the present invention;

FIG. 2B is a diagram illustrating a problem location-related operation and maintenance data display according to an embodiment of the present invention;

FIG. 2C is a diagram illustrating an effect on an associated tab page according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a problem locating device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another problem location device provided in an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, some technical terms mentioned in the embodiments of the present invention will be briefly described below.

Technical domain tool: the operation and maintenance tool focuses on the operation and maintenance function of a single technical field; the technical field may include, but is not limited to, an application field, a host field, a network field, a terminal field, or the like.

In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic flow chart of a problem location method according to an embodiment of the present invention is shown in fig. 1, where the problem location method may include the following steps:

step 101, acquiring operation and maintenance data from a technical domain tool, wherein the technical domain tool comprises a plurality of technical domain tools in different technical fields.

102, distributing identification information for the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different.

In the embodiment of the invention, in order to implement uniform operation and maintenance, data unification needs to be implemented first, that is, unification of operation and maintenance data acquired from technical field tools in different technical fields is implemented.

The operation and maintenance data includes, but is not limited to, basic data (such as resource data, topology data, and the like) of system networking and data (such as monitoring data, alarm data, and the like) generated in the system networking operation process, and is used for providing data support for maintenance of system networking.

Illustratively, the operation and maintenance data may include, but is not limited to, resource data, monitoring data, alarm data, or topology data.

The resource data is resource information of nodes (such as switches, servers, etc., which may be referred to as monitoring objects in an operation and maintenance system) in an actual networking, such as a hard disk size, a memory size, a Central Processing Unit (CPU) core number, and the like; the monitoring data is performance monitoring information of nodes in the actual networking, such as Resource usage information of memory usage rate, CPU utilization rate, and the like, or/and application usage experience information, such as application response time (such as response time for opening an application, response time for an application to access a specified URL (Uniform Resource Locator), and the like); the alarm data is an alarm generated by a node in an actual networking according to a preset alarm rule, or an alarm generated by a technical field tool based on monitoring data according to the preset alarm rule, such as the memory occupancy rate exceeding a preset threshold value, the application access response time exceeding a preset time threshold value, and the like; the topology data is topology information between nodes in an actual networking, such as connection between a node a and a node B.

Because different types of operation and maintenance data of the same monitored object can come from different technical domain tools, IDs (identifications) of the same monitored object in different technical domains may be different, and the IDs of the monitored objects in the technical domains cannot be used for realizing data unification, the matching of the operation and maintenance data of the same monitored object obtained from the different technical domain tools is a key of data unification.

Correspondingly, in the embodiment of the invention, for the operation and maintenance data obtained from the multiple technical domain tools, the identification information can be allocated to the operation and maintenance data according to the key attribute of the operation and maintenance data.

The key attributes are used for uniquely identifying the monitored objects, and the identification information corresponds to the key attributes one to one.

It should be noted that, in the embodiment of the present invention, the key attribute includes a plurality of attributes, the plurality of attributes uniquely identify the monitoring object, and the identification information corresponds to the attribute group (or referred to as attribute set) formed by the plurality of attributes one to one.

103, maintaining a unified operation and maintenance interface based on the operation and maintenance data and the identification information of the operation and maintenance data, and positioning problems based on the same operation and maintenance interface, wherein alarms of different technical domains are recorded in the unified operation and maintenance interface.

In the embodiment of the invention, after the identification information is allocated to the operation and maintenance data according to the key attribute of the acquired operation and maintenance data, a unified operation and maintenance interface can be maintained according to the acquired operation and maintenance data and the identification information allocated to the operation and maintenance data.

The unified operation and maintenance interface may record alarms of different technical domains, for example, the alarms of different technical domains may be recorded in different tab pages.

Illustratively, a technology domain may include types of an application class, a host class, a network class, or a terminal class.

In one example, the alarms for the same technology domain may be further classified.

For example, for the problem of the application class, it can be further classified as: application level (e.g., access experience, database connections), hardware level (e.g., CPU performance, memory overflow), code level (e.g., process crash, exception trap), etc.

In the embodiment of the invention, the problem positioning can be carried out based on the unified interface maintained according to the operation and maintenance data and the identification of the operation and maintenance data.

As can be seen, in the method flow shown in fig. 1, operation and maintenance data are obtained from multiple technical domain tools, identification information is allocated to the operation and maintenance data according to key attributes of the obtained operation and maintenance data, so that the operation and maintenance data are unified, a unified operation and maintenance interface is maintained according to the operation and maintenance data and the identification information of the operation and maintenance data, problem location is performed based on the unified operation and maintenance interface, and unified operation and maintenance of multiple technical domain scenes are achieved.

Optionally, in an embodiment of the present invention, the allocating identification information to the operation and maintenance data according to the key attribute of the operation and maintenance data includes:

when one piece of operation and maintenance data is acquired, whether target operation and maintenance data comprising the same key attribute exist is inquired according to the key attribute of the operation and maintenance data; the target operation and maintenance data are operation and maintenance data which are distributed with identification information;

if yes, distributing the identification information of the target operation and maintenance data to the operation and maintenance data;

otherwise, new identification information is allocated to the operation and maintenance data.

In this embodiment, in order to implement data unification, in the process of acquiring operation and maintenance data from multiple technical domain tools, for each acquired operation and maintenance data, a unique identification information needs to be allocated to the operation and maintenance data based on the key attribute of the operation and maintenance data.

Accordingly, for each piece of operation and maintenance data obtained from multiple technical domain tools, the operation and maintenance data (referred to as target operation and maintenance data herein) to which identification information has been assigned may be queried according to the key attribute of the operation and maintenance data to determine whether there is target operation and maintenance data whose key attribute is the same as the key attribute of the operation and maintenance data (the operation and maintenance data to be assigned with the identification).

If the identification information exists, the identification information of the target operation and maintenance data is distributed to the operation and maintenance data, and new identification information does not need to be distributed to the operation and maintenance data;

and if the identification information does not exist, allocating new identification information for the operation and maintenance data.

In an example, if there is target operation and maintenance data including the same key attribute, the problem location method may further include:

and when the target operation and maintenance data conflicts with the operation and maintenance data, generating an audit task, wherein the audit task is used for prompting that the target operation and maintenance data conflicts with the operation and maintenance data.

In this example, when the target operation and maintenance data with the same key attribute as the key attribute of the operation and maintenance data is queried, it may be further determined whether a conflict exists between the target operation and maintenance data and the operation and maintenance data.

The operation and maintenance data conflict means that the key attributes of two pieces of operation and maintenance data are the same, but the other non-key attributes have a conflict (the attribute values of the non-key attributes of the same type are different).

For example, taking resource data as an example, assuming that the key attributes of the operation and maintenance data 1 and the operation and maintenance data 2 are the same, but the number of CPU cores in the operation and maintenance data 1 is 4 cores, and the number of CPU cores in the operation and maintenance data 2 is 6 cores, because the same monitoring object cannot be 4 cores and 6 cores at the same time, an audit task may be generated to prompt an operation and maintenance person that the operation and maintenance data 1 and the operation and maintenance data 2 conflict, and the operation and maintenance person audits the operation and maintenance data 1 and the operation and maintenance data 2.

Optionally, in an embodiment of the present invention, the problem location based on the unified operation and maintenance interface includes:

when a selection instruction aiming at a target alarm in the unified operation and maintenance interface is detected, displaying a target operation and maintenance sub-interface matched with a target technical domain to which the target alarm belongs;

and carrying out problem positioning on the target alarm based on the target operation and maintenance sub-interface.

In this embodiment, the unified operation and maintenance interface shows the alarms of each technical domain, and when any alarm (referred to as a target alarm herein) in the unified operation and maintenance interface is detected, according to the technical domain to which the target alarm belongs (referred to as a target technical domain herein), a jump may be made to an operation and maintenance sub-interface (referred to as a target operation and maintenance sub-interface herein) matched with the target technical domain, and problem location is performed on the target alarm based on the target operation and maintenance sub-interface, that is, a reason for generating the target alarm is located.

In an example, the positioning the target alarm based on the target operation and maintenance sub-interface includes:

when a selection instruction for a historical statistical label in a target operation and maintenance sub-interface is detected, displaying a historical statistical label page of the target operation and maintenance sub-interface, wherein monitoring data matched with target identification information is displayed in the historical statistical label page, and the target identification information is identification information associated with a target alarm;

and carrying out problem positioning on the target alarm based on the historical statistical label page.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include a historical statistics tag, and a historical statistics tag page of the corresponding technology domain may be entered through the historical statistics tag, where the historical statistics tag page includes monitoring data that matches an alarm to be located in the technology domain.

Accordingly, when a selection instruction for a historical statistics tab in the target operation and maintenance sub-interface is detected, a transition may be made to a historical statistics tab page of the target operation and maintenance sub-interface, in which monitoring data matching identification information associated with a target alarm (referred to herein as target identification information) is displayed.

For example, taking an alarm "web site access experience is slow" in which a target alarm is an application-class technical domain as an example, monitoring data associated with page access of the web site may be displayed in a corresponding history statistics tab page, such as a blank screen time, a first screen time, a DOM (Document Object Model) load time, a full load time, and the like of different browsers accessing the site.

Further, problem location of target alarms may be performed based on the data presented in the historical statistics tab.

In another example, the locating the target alarm based on the target operation and maintenance sub-interface includes:

when a selection instruction for an alarm list label in a target operation and maintenance sub-interface is detected, displaying an alarm list label page of the target operation and maintenance sub-interface, wherein the alarm list label page displays an associated alarm with a target alarm;

and carrying out problem positioning on the target alarm based on the alarm list tab.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include an alarm list tag, and an alarm list tag page of the corresponding technology domain may be entered through the alarm list tag, where the alarm list tag page includes an associated alarm of the alarm to be located.

The association relationship between the alarm data may be determined based on the identification information of the alarm data and the indicator of generating the alarm, and the specific implementation thereof may be described with reference to specific examples below.

Correspondingly, when a selection instruction for an alarm list label in the target operation and maintenance sub-interface is detected, the operation and maintenance sub-interface can jump to an alarm list label page of the target operation and maintenance sub-interface, the alarm list label page displays the associated alarm of the target alarm, and then the problem positioning can be carried out on the target alarm based on the associated alarm of the target alarm.

In another example, the problem locating the target alarm based on the target operation and maintenance sub-interface includes:

when a selection instruction for an alarm distribution label in a target operation and maintenance sub-interface is detected, displaying an alarm distribution label page of the target operation and maintenance sub-interface, wherein historical distribution statistical information of an alarm related to the target alarm is displayed in the alarm distribution label page;

and carrying out problem positioning on the target alarm based on the alarm distribution label page.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include an alarm distribution tag, the alarm distribution tag of the corresponding technology domain may be entered through the alarm distribution tag, and the alarm distribution tag page includes historical distribution statistical information of an alarm related to the alarm to be located.

Correspondingly, when a selection instruction for an alarm distribution tag in the target operation and maintenance sub-interface is detected, the operation and maintenance sub-interface can jump to an alarm distribution tag page of the target operation and maintenance sub-interface, historical distribution statistical information of an alarm related to the target alarm is displayed in the alarm distribution tag page, and then the target alarm can be positioned according to the historical distribution statistical information of the alarm related to the target alarm.

when a selection instruction aiming at an influence associated label in a target operation and maintenance sub-interface is detected, displaying an influence associated label page of the target operation and maintenance sub-interface, wherein associated topology information corresponding to a target alarm is displayed in the influence associated label page;

and carrying out problem positioning on the target alarm based on the influence association label page.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include an influence association tag, and an influence association tag page of the corresponding technology domain may be entered through the influence association tag, where the influence association tag page includes association topology information corresponding to the alarm to be located.

Correspondingly, when a selection instruction for an influence associated label in the target operation and maintenance sub-interface is detected, the operation and maintenance sub-interface can jump to an influence associated label page of the target operation and maintenance sub-interface, and associated topology information corresponding to the target alarm is displayed in the influence associated label page, so that the problem location can be performed on the target alarm based on the associated topology information corresponding to the target alarm, and the specific implementation of the operation and maintenance sub-interface can be described with reference to a specific example in the following.

when a selection instruction aiming at a trend prediction label in a target operation and maintenance sub-interface is detected, displaying a trend prediction label page of the target operation and maintenance sub-interface, wherein trend prediction information associated with a target alarm is displayed in the trend prediction label page;

and carrying out problem positioning on the target alarm based on the trend prediction label page.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include a trend prediction tag, and the trend prediction tag may be used to enter a trend prediction tag page of the corresponding technology domain, where the trend prediction tag page includes trend prediction information associated with an alarm to be located.

Accordingly, when a selection instruction for the trend prediction tag in the target operation and maintenance sub-interface is detected, a trend prediction tag page of the target operation and maintenance sub-interface can be skipped, and trend prediction information associated with the target alarm is displayed in the trend prediction tag page.

For example, assuming that the target alarm is ". times.application access slow", and ". times.application" is deployed on the monitored object a, the trend prediction information associated with the target alarm may include trend prediction of resource occupancy such as hard disk occupancy, memory occupancy, or/and CPU utilization of the monitored object a, and region prediction of access response time of ". times.application", and the like.

In this example, problem location may be performed on the target alarm based on the trend prediction information associated with the target alarm and shown in the trend prediction tab page, and a specific implementation thereof may be described below with reference to a specific example.

when a selection instruction for a knowledge suggestion tag in a target operation and maintenance sub-interface is detected, displaying a knowledge suggestion tag page of the target operation and maintenance sub-interface, wherein technical information related to a target alarm is displayed in the knowledge suggestion tag page;

and performing problem positioning on the target alarm based on the knowledge suggestion tag page.

In this example, the operation and maintenance sub-interface corresponding to each technology domain may include a knowledge suggestion tag, and the knowledge suggestion tag may be used to enter a knowledge suggestion tag page of the corresponding technology domain, where the knowledge suggestion tag page includes technical details associated with the alarm to be located.

Accordingly, when a selection instruction for the knowledge suggestion tag in the target operation and maintenance sub-interface is detected, the method can jump to a knowledge suggestion tag page of the target operation and maintenance sub-interface, wherein technical materials related to the target alarm are displayed in the knowledge suggestion tag page.

Illustratively, the association between the technical resource and the alarm may be implemented based on keywords.

In this example, the problem location may be performed on the target alarm based on the technical data associated with the target alarm displayed in the knowledge suggestion tab page, and a specific implementation thereof may be described below with reference to a specific example.

Further, in the embodiment of the present invention, the uniform implementation needs to be implemented based on multiple technical domain tools, and in the process of problem location, a specific function of any one of the technical domain tools may need to be invoked, for example, a security risk processing function is provided by security software. Similarly, invoking other specific functions of other technical domain tools may also involve authority authentication, and if invoking the function of each technical domain tool performs authority authentication, the operation will be very complicated and the efficiency will be very low.

Based on this, in the embodiment of the invention, unified authority and single sign-on can be realized by connecting a unified authentication server and a user database, so that unified authority authentication is realized.

Accordingly, in one embodiment of the present invention, the above problem locating method further includes:

receiving a login request aiming at the unified operation and maintenance interface, wherein the login request carries login verification information;

verifying the user verification information;

when the authentication is passed, allocating role ID for the login requester;

and performing authority control on the login requester according to the recorded corresponding relation among the role ID, the operation ID and the identification information of the operation and maintenance data.

In this embodiment, login verification for each technical domain tool is unified into login verification for the unified operation and maintenance interface.

When a login request aiming at the unified operation and maintenance interface is received, login authentication information carried in the login request, such as a user name, a password and the like, can be obtained, and the login authentication information is authenticated.

When the authentication is passed, a role ID may be assigned to the login requester, such as a role ID assigned to the login requester according to the user name of the login requester.

For example, different role IDs correspond to different operation permissions of different operation and maintenance data, which may be defined by a correspondence relationship between the role ID, the operation ID, and identification information of the operation and maintenance data.

Furthermore, the authority of the login requester can be controlled based on the corresponding relationship among the role ID, the operation ID and the identification information of the operation and maintenance data, that is, when the operation instruction of the login requester is detected, whether the login requester has the operation authority or not is determined according to the operation ID corresponding to the operation instruction of the login requester and the identification information of the operation and maintenance data for which the operation is directed, and if so, the operation is allowed; otherwise, the operation is denied.

It should be noted that, in the embodiment of the present invention, when the login authentication information of the login requester fails, it is determined that the login of the login requester fails, and at this time, any operation on any operation and maintenance data by the login requester may be rejected, or only a specific operation (which may be set according to an actual scenario) on specific operation and maintenance data (which may be set according to an actual scenario) by the login requester may be allowed, such as a lookup operation on public data.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present invention, the technical solutions provided by the embodiments of the present invention are described below with reference to specific examples.

In this embodiment, the implementation of the unified operation and maintenance mainly includes the following unifications:

1. operation and maintenance data unification

i) The unified operation and maintenance system can interface with each technical domain tool through a southbound interface of a CMDB (Configuration Management Database), pull or receive resource data of each technical domain tool, form unified resource data Management, and facilitate authority authentication;

ii), the unified operation and maintenance system can interface each technical domain tool through the south-oriented interface of the IOM (Infrastructure operation and maintenance monitoring), pull or receive the monitoring data of each technical domain tool, and collect the alarm data and the topology data;

and iii) outputting the resource data, the monitoring data, the alarm data and the topology data to a big data platform, realizing the unification of the operation and maintenance data, and performing deep data model calculation and data analysis based on the unified operation and maintenance data.

In this embodiment, since the resource data, the monitoring data, the alarm data, and the topology data are obtained from different interfaces of different systems, and in addition, different types of operation and maintenance data of the same monitored object may come from different technical domain tools, that is, an ID inside a technical domain tool does not have a function of a Unique Identifier, the different types of operation and maintenance data need to be matched by a Unique Identifier (UUID (Universally Unique Identifier)), so as to achieve unification of the operation and maintenance data.

Illustratively, the uniqueness of the UUID can be realized by a unique resource reconciliation service, the key of the resource reconciliation is a key attribute, and a monitoring object is uniquely identified by a plurality of key attributes.

For example, assume that the unified operation and maintenance system includes 6 technology domain tools: A. b, C, D, E and F; wherein, A, B, D, E and F5 technical domain tools have resource management capability; C. d, E and F, the 4 technology domain tools have performance monitoring capability; in addition, the actual networking includes 2 stations (station IDs are 1 and 2, respectively), and there are 4 devices in total (assuming 2 switches and two terminals of the Linux system, the device types are switch and Linux, respectively).

In the following, the unification of the resource data and the monitoring data is taken as an example to match and identify the resource data and the monitoring data through the UUID.

In this example, assuming that the key attributes include site ID, IP address, and device type, the resource data obtained by the CBDM system from the technology domain tool is shown in table 1, and the performance monitoring data obtained by the IOM system from the technology domain tool is shown in table 2:

TABLE 1

TABLE 2

Data source	Site ID	IP address	Type of device	CPU utilization
					a	1	192.168.1.1	Switch	20％
b	1	192.168.1.2	Linux	60％
					C	2	192.168.1.1	Switch	40％
D	2	192.168.1.2	Linux	80％

In this embodiment, since the time for the CMDB system and the IOM system to obtain the resource data and the monitoring data from each technology domain tool is not consistent, it is assumed that the CMDB system first obtains a piece of resource data from the technology domain tool a: the 192.168.1.1 switch device of site 1 (as shown in row 1 of table 1) does not query the operation and maintenance data of the assigned UUID according to the key attribute of the resource data, and therefore, assigns the UUID to the resource data (assuming that e7295fca-5c50-11e9-8647-d663bd873d 93).

The CMDB system also obtains a piece of resource data from the technical domain tool B: 192.168.1.2 Linux at site 1 (as shown in row 2 of table 2), the operation and maintenance data assigned the UUID is not queried according to the key attribute of the resource data, and therefore, the UUID is assigned to the resource data (assume 2291669c-5c52-11e9-8647-d663bd873d 93).

The CMDB system also obtains a piece of resource data from the technical domain tool D: 192.168.1.2 Linux of site 1 inquires operation and maintenance data (namely, resource data acquired from technical domain tool B) to which the UUID is allocated according to the key attribute of the resource data, and the number of CPU cores (5) in the piece of resource data is different from the number of CPU cores (6) in the resource data acquired from technical domain tool B, namely, there is a conflict between the two, at this time, 2291669c-5c52-11e9-8647-d663bd873d93 can also be used as the UUID of the piece of resource data, and the task is audited to prompt that there is a resource data conflict, which is manually determined by a user (such as an administrator or an operation and maintenance person).

The CMDB system also obtains a piece of resource data from the technical domain tool E: the 192.168.1.1 switch device at site 2 does not query the operation and maintenance data to which the UUID is allocated according to the key attribute of the resource data, and at this time, the UUID is allocated to the operation and maintenance data (assumed to be 52c144ca-5c54-11e9-8647-d663bd873d 93).

The CMDB system also obtains a piece of resource data from the technical domain tool F: the 192.168.1.3 Linux at site 2 does not query the operation and maintenance data assigned UUID according to the key attribute of the resource data, and at this time, the UUID is assigned to the operation and maintenance data (assumed to be 6a4a8d0e-5c54-11e9-8647-d663bd873d 93).

In this embodiment, the UUID assignment of each resource data may be as shown in table 3:

TABLE 3

Similarly, for the monitoring data acquired by the IOM system from each technical domain tool, UUID allocation can be performed according to the key attribute in the above manner.

For each monitoring data shown in table 2, the first 3 pieces of resource data having an allocated UUID may not allocate a new UUID, but directly use the UUID of the corresponding resource data, and the last 1 piece of operation and maintenance data having no allocated UUID needs to allocate a new UUID (assumed to be 07b2ee4c-5c55-11e9-8647-d663bd873d93), where the UUID allocation condition of each monitoring data may be as shown in table 4:

TABLE 4

2. Unifying the authority: unified Authentication and authority control of the multi-technology domain tool are realized through unified Authentication Server interface, for example, through a CAS (Central Authentication Service) Server + LDAP (Lightweight Directory Access Protocol).

It should be noted that the implementation of unified authentication and permission control of the multi-technology domain tool is not limited to the manner of using LDAP, but may also be implemented in other manners, such as a manner of using a local database, and specific implementation thereof is not described herein.

Illustratively, the user system interfaces the CAS Server and the LDAP to complete unified authentication and permission control, wherein the permission control comprises data-level permission control and operation-level permission control.

Because the uniqueness of the UUID ensures the unified identification of the operation and maintenance data, the data-level authority can be distinguished based on the resource UUID.

The operation level authority is function association, and for specific functions of each technical domain tool, the unified operation and maintenance system can perform function mounting association in a menu or button mode.

For example, the security risk processing function is provided by professional security software, the unified operation and maintenance system can mount a menu function of the security risk, set the authority of the menu for the user, and click a page drilled to the security risk processing of the technical domain. Because the unified operation and maintenance system and each technical domain tool are unified authentication and authority control, single sign-on can be realized, and the corresponding page can be entered directly through the menu function mounted by the unified operation and maintenance system without repeated sign-on authentication.

3. Unified application: through a BSM (Business Service Management) system, the operation and maintenance data of each technical domain is changed into Business from a Business view, data support is provided for Business health, and the health of the Business is calculated by setting different KQI (Key Quality Indicators) models for different applications.

4. Unifying the flow: service flow information records (which can be called Service worksheets) of all technical domain tools are pulled through by a flow architecture of an ITSM (Internet Technology Service Management) system, and the unified operation and maintenance requirements of a multi-cloud multi-organization scene are met.

As shown in fig. 2A, in this embodiment, in a main interface (which may be referred to as a unified operation and maintenance interface) of the unified operation and maintenance system, alarm classification may be performed for a characteristic feature of each type of technology domain. For example, the alarm of the application class technology domain, the classification of which may include: application level (access experience, database connections), hardware level (CPU performance, memory overflow), code level (process crash, exception trap), etc.

When the unified operation and maintenance system acquires alarm data from any technical domain tool, the alarm Tag can be determined, the class of the alarm Tag in the corresponding technical domain is determined based on the alarm Tag, and the class is recorded into the corresponding class of the corresponding technical domain.

Illustratively, the Tag of the alarm may be carried in the alarm when the alarm is generated, or may be dynamically generated.

For example, assuming that the alarm data is ". x. access time exceeds 2-level threshold 12 seconds", Tag of the alarm is slow to respond, and the corresponding classification is application access experience under the application class technology domain, and therefore, it can be added to "application class → application access experience → slow to respond".

Wherein, clicking the number corresponding to a Tag in a category under a certain technical domain can enter an alarm list; and clicking the alarm in the alarm list to check the specific information of the alarm.

For another example, assume that the alarm data is "×" the index exceeds the secondary threshold, and the magnitude is: tag of the alarm is related to an index, and if the index is the CPU utilization rate, the Tag can be attributed to the CPU performance; if the indicator is the URL connection time, the Tag belongs to the application access experience.

In the embodiment, each type of technical domain in the unified operation and maintenance interface corresponds to an operation and maintenance sub-interface, and each operation and maintenance sub-interface can provide labels such as historical statistics, alarm lists, alarm distribution, influence association, trend prediction, knowledge suggestion and the like to realize problem positioning.

The actual contents of the 6 tags are different for different types of technical domains, and different data supports are provided for different operation and maintenance scenes.

For example, each label in the operation and maintenance sub-interface corresponding to the application technology field is taken as an example, where:

1. and (3) historical statistics: the historical statistical label page can comprise a plurality of widgets (cards), and each widget is provided with Tag and can belong to different categories. In addition, based on the UUID of the resource data associated with the problem to be positioned, the data of the corresponding application are displayed.

For example, the widget may have a plurality of tags, each Tag is provided with a priority, and the display order of the widget in the history tab page is determined based on the priority.

The layout of the widget can be manually adjusted, and a default layout can also be adopted.

For example, referring to fig. 2B, taking alarm data ". times.web site visit experience is slow" as an example, the core of problem location for the alarm lies in performance analysis of page visit, and by refining performance parameters of each stage, it can be obtained in which stage the visit is mainly slow, and then detailed data analysis for the corresponding stage is performed. And simultaneously focusing on the most core 3 indexes, namely Top user behavior (the user behavior with the largest occurrence number), Top JS error (the JS error with the largest occurrence number) and Top web request (the web request with the largest occurrence number) to assist in positioning the cause of the problem.

2. Alarm list: and the alarm list tab is used for displaying the alarm to be positioned and the associated alarm of the alarm to be positioned. The association between alarms may be based on the CI relationship of the CMDB system (i.e. alarm association is made based on the UUID of the resource data to which the alarm data relates and the index involved).

For example, the alarm list tab may be linked to the impact association tab implementation, as described below.

3. Alarm distribution: and the alarm distribution label page is used for displaying historical distribution statistical information of the alarm related to the alarm to be positioned.

In addition, the alarm distribution tab page may also display information such as the number of closed alarms (i.e. alarms that have completed processing or exceeded the processing deadline)/the number of pending alarms/the number of affected users/the longest duration of alarms/dissatisfaction rate.

4. Influence association: the influence association tab is used for showing the topological relation of the resource data, and the schematic diagram can be as shown in fig. 2C.

Illustratively, when any monitoring object in the topological relation displayed in the associated tab is affected, core index data of the monitoring object, such as CPU utilization, memory occupancy, hard disk utilization, and the like, may be displayed.

5. And (3) trend prediction: the trend prediction tab page is used for displaying trend prediction of indexes (such as CPU utilization rate, memory occupancy rate and the like) associated with the alarm to be positioned, which is determined based on historical data, so as to determine whether the alarm to be positioned brings more serious problems.

6. And (3) knowledge suggestion: the knowledge suggestion tab is used for displaying technical data associated with the alarm to be positioned, and the association between the technical resource and the alarm can be realized based on keywords.

For example, a knowledge suggestion tab may provide keyword-based query functionality.

The following alarm data "financial System Access Slow, over 12 seconds" problem location is an example.

1. The unified operation and maintenance system displays the alarm of 'slow financial system access, over 12 seconds' in a real-time alarm station, when a selection instruction for the alarm in the real-time alarm station is detected, the unified operation and maintenance interface is jumped to, the alarm is included under the classification of slow application access experience of the application technology field of the unified operation and maintenance interface, and when the selection instruction for the alarm in the unified operation and maintenance interface is received, the operation and maintenance sub-interface corresponding to the application technology field is jumped to;

the operation and maintenance sub-interface comprises historical statistics, an alarm list, alarm distribution, influence association, trend prediction, knowledge suggestion and other labels.

2. And when a selection instruction aiming at the historical statistical label is received, skipping to a historical statistical label page, wherein a widget related to performance analysis of the financial system access is displayed in the historical statistical label page, and historical data of the financial system access are recorded in the widget and can be used for determining whether the financial system access is slow, namely, the problem of the front end or the problem of the rear end.

Assume that there is no problem with the front end in this example, but invoking the database returns results slowly.

3. And when a selection instruction aiming at the alarm list tag is received, jumping to an alarm list tag page, wherein the alarm list tag page displays the associated alarm of which the financial system access is slow and exceeds 12 seconds.

Assume that there are several associated alarms as follows:

"D disk occupancy of Windows exceeds 2-level threshold 95%"

"the CPU utilization rate of Linux exceeds 95% of 2-level threshold"

"Oracle Slow SQL"

4. And when a selection instruction for influencing the associated tag is detected, jumping to an influencing associated tag page, determining that the financial system is deployed on windows based on the topological data, and deploying the connected Oracle on another Linux.

Since there is a problem with slow database return and the slow SQL of Oracle in the correlation alarm includes the SQL of the financial system, the SQL of the financial system can be checked to determine if it is a problem with the SQL of the financial system that results in "slow financial system access".

Assuming that the SQL of the financial system does not detect the problem, the information of the key indexes of the Oracle nodes on the associated topology is further detected to determine whether the problem of the Linux where Oracle is located causes 'slow access to the financial system'.

The detection result is assumed that the key index of the Oracle node does not exceed the threshold value, but is close to the threshold value.

5. When a selection instruction for a tag of an alarm distribution is received, jumping to an alarm distribution tab, which displays historical distribution statistical information of alarms related to 'slow access to financial system'.

Assume that the number of financial system access associated alarms is determined to be below a preset threshold based on the historical distribution statistics.

6. When a selection instruction aiming at the trend prediction tag is detected, jumping to a trend prediction tag page, wherein the trend prediction tag page displays key indexes related to a financial system, such as the D disk occupancy rate of Windows and the CPU utilization rate of Linux, so as to determine the prediction trend of the related indexes.

Assuming that the predicted trend does not include an increase in CPU utilization of Linux, it is determined that the CPU utilization of Linux is a sudden increase due to an abnormal cause.

7. And detecting the process occupation of the Linux to determine whether a process with abnormal CPU occupancy exists.

Assume that the CPU occupancy of the "aaa" process is found to exceed a preset threshold.

8. When a selection instruction aiming at the knowledge suggestion tag is detected, jumping to a knowledge suggestion tag page, wherein the knowledge suggestion tag page can display technical data related to the Oracle slow SQL and aaa processes, and the Linux system is recovered based on the data.

9. And (3) assuming that the financial system access is recovered after the CPU utilization rate of the Linux is reduced to a normal range, determining that the financial system access is slow and exceeds 12 seconds due to overhigh CPU utilization rate of the Linux.

As can be seen from the above description, in the technical scheme provided in the embodiment of the present invention, the operation and maintenance data are obtained from multiple technical domain tools, and the identification information is allocated to the operation and maintenance data according to the key attributes of the obtained operation and maintenance data, so that the operation and maintenance data are unified, and further, according to the operation and maintenance data and the identification information of the operation and maintenance data, a unified operation and maintenance interface is maintained, and problem location is performed based on the unified operation and maintenance interface, so that unified operation and maintenance of multiple technical domain scenes are realized.

Fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. The electronic device may include a processor 301, a machine-readable storage medium 302 having machine-executable instructions stored thereon. The processor 301 and the machine-readable storage medium 302 may communicate via a system bus 303. Also, by reading and executing machine-executable instructions in the machine-readable storage medium 302 corresponding to the issue location control logic, the processor 301 may perform the issue location method described above.

The machine-readable storage medium 302 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

As shown in fig. 4, the problem location control logic may include, functionally divided:

an obtaining unit 410, configured to obtain operation and maintenance data from a technology domain tool; the technology domain tool comprises a plurality of technology domain tools of different technical fields;

an allocating unit 420, configured to allocate identification information to the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different;

the maintenance unit 430 is configured to maintain a unified operation and maintenance interface according to the operation and maintenance data and the identification information of the operation and maintenance data; the unified operation and maintenance interface records alarms of different technical domains;

and the positioning unit 440 is configured to perform problem positioning based on the unified operation and maintenance interface.

In an optional embodiment, the allocating unit 420 is specifically configured to, when the obtaining unit obtains one piece of operation and maintenance data, query whether there is target operation and maintenance data including the same key attribute according to the key attribute of the operation and maintenance data; the target operation and maintenance data are operation and maintenance data which are distributed with identification information;

In an optional embodiment, the allocating unit 420 is further configured to generate an audit task when the target operation and maintenance data conflicts with the operation and maintenance data, where the audit task is used to prompt that the target operation and maintenance data conflicts with the operation and maintenance data.

In an optional embodiment, the positioning unit 430 is specifically configured to, when detecting a selection instruction for a target alarm in the unified operation and maintenance interface, display a target operation and maintenance sub-interface matched with a target technology domain to which the target alarm belongs;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for a historical statistics tag in a target operation and maintenance sub-interface is detected, display a historical statistics tag page of the target operation and maintenance sub-interface, where monitoring data matched with target identification information is displayed in the historical statistics tag page, and the target identification information is identification information associated with the target alarm;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for an alarm list tag in a target operation and maintenance sub-interface is detected, display an alarm list tag page of the target operation and maintenance sub-interface, where an alarm associated with the target alarm is displayed in the alarm list tag page;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for an alarm distribution tag in a target operation and maintenance sub-interface is detected, display an alarm distribution tag page of the target operation and maintenance sub-interface, where historical distribution statistical information of an alarm related to a target alarm is displayed in the alarm distribution tag page;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for an influence association tag in a target operation and maintenance sub-interface is detected, display an influence association tag page of the target operation and maintenance sub-interface, where the influence association tag page displays association topology information corresponding to a target alarm;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for a trend prediction tag in a target operation and maintenance sub-interface is detected, display a trend prediction tag page of the target operation and maintenance sub-interface, where trend prediction information associated with the target alarm is displayed in the trend prediction tag page;

In an optional embodiment, the positioning unit 430 is specifically configured to, when a selection instruction for a knowledge suggestion tag in a target operation and maintenance sub-interface is detected, display a knowledge suggestion tag page of the target operation and maintenance sub-interface, where technical data associated with a target alarm is displayed in the knowledge suggestion tag page;

and carrying out problem positioning on the target alarm based on the knowledge suggestion tab page.

As shown in fig. 5, the problem location control logic further comprises:

the authority control unit 450 is configured to receive a login request for the unified operation and maintenance interface, where the login request carries login authentication information;

verifying the user verification information;

when the authentication is passed, allocating role ID for the login requester;

and performing authority control on the login requester according to the recorded corresponding relationship among the role ID, the operation ID and the identification information of the operation and maintenance data.

Embodiments of the present invention also provide a machine-readable storage medium, such as the machine-readable storage medium 302 in fig. 3, comprising machine-executable instructions that are executable by the processor 301 in the message transmitting device to implement the problem location method described above.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

According to the embodiment, the operation and maintenance data are obtained from the multiple technical domain tools, the identification information is distributed to the operation and maintenance data according to the key attributes of the obtained operation and maintenance data, the operation and maintenance data are unified, then, a unified operation and maintenance interface is maintained according to the operation and maintenance data and the identification information of the operation and maintenance data, problem location is carried out based on the unified operation and maintenance interface, and unified operation and maintenance of multiple technical domain scenes are achieved.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A problem positioning method is applied to a unified operation and maintenance system, and comprises the following steps:

distributing identification information for the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different; the key attributes comprise a plurality of attributes which uniquely identify the monitored object, and the identification information corresponds to attribute groups formed by the attributes one to one;

2. The method according to claim 1, wherein the allocating identification information to the operation and maintenance data according to the key attribute of the operation and maintenance data comprises:

3. The method of claim 2, wherein if there is target operation and maintenance data comprising the same key attributes, the method further comprises:

4. The method of claim 1, wherein performing problem location based on the unified operation and maintenance interface comprises:

when a selection instruction for a target alarm in the unified operation and maintenance interface is detected, displaying a target operation and maintenance sub-interface matched with a target technical domain to which the target alarm belongs;

5. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

when a selection instruction for a historical statistical label in a target operation and maintenance sub-interface is detected, displaying a historical statistical label page of the target operation and maintenance sub-interface, wherein monitoring data matched with target identification information is displayed in the historical statistical label page, and the target identification information is identification information associated with the target alarm;

and carrying out problem positioning on the target alarm based on the historical statistic label page.

6. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

when a selection instruction aiming at an alarm list label in a target operation and maintenance sub-interface is detected, displaying an alarm list label page of the target operation and maintenance sub-interface, wherein the alarm list label page displays an associated alarm related to the target alarm;

7. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

when a selection instruction aiming at an alarm distribution label in a target operation and maintenance sub-interface is detected, displaying an alarm distribution label page of the target operation and maintenance sub-interface, wherein historical distribution statistical information of an alarm related to a target alarm is displayed in the alarm distribution label page;

8. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

9. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

when a selection instruction aiming at a trend prediction label in a target operation and maintenance sub-interface is detected, displaying a trend prediction label page of the target operation and maintenance sub-interface, wherein trend prediction information related to the target alarm is displayed in the trend prediction label page;

10. The method of claim 4, wherein the problem locating the target alarm based on the target operation and maintenance sub-interface comprises:

11. The method of claim 1, further comprising:

verifying the login verification information;

when the authentication is passed, allocating role ID for the login requester;

12. A problem location device, which is applied to a unified operation and maintenance system, the device comprises:

the acquisition unit is used for acquiring operation and maintenance data from the technical domain tool; the technology domain tool comprises a plurality of technology domain tools of different technical fields;

the distribution unit is used for distributing identification information to the operation and maintenance data according to the key attribute of the operation and maintenance data; the identification information of the operation and maintenance data with the same key attribute is the same, and the identification information of the operation and maintenance data with different key attributes is different; the key attributes comprise a plurality of attributes which uniquely identify the monitored object, and the identification information corresponds to attribute groups formed by the attributes one to one;

the maintenance unit is used for maintaining a unified operation and maintenance interface according to the operation and maintenance data and the identification information of the operation and maintenance data; the unified operation and maintenance interface records alarms of different technical domains;

13. An electronic device comprising a processor and a machine-readable storage medium having stored thereon machine-readable instructions executable by the processor, the processor being caused by the machine-readable instructions to perform the problem localization method of any of claims 1-11.

14. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the problem localization method of any one of claims 1-11.