CN111611132A

CN111611132A - Operation and maintenance analysis method, device, equipment and medium for service

Info

Publication number: CN111611132A
Application number: CN202010427840.1A
Authority: CN
Inventors: 王建宏; 张臻; 徐杨
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2020-09-01
Anticipated expiration: 2040-05-20
Also published as: CN111611132B

Abstract

The embodiment of the invention provides a method, a device, equipment and a medium for operation and maintenance analysis facing to business. The method comprises the following steps: acquiring tracking information of a service tracked by a preset detection script, wherein the tracking information comprises first service alarm information; alarm merging is carried out on first service alarm information in the tracking information, and a first alarm merging result is obtained; and displaying the merging result of the first alarm. By the embodiment of the invention, the service can be automatically tracked based on the preset detection script, the information acquisition efficiency is improved, the problems in service operation are timely found, the operation and maintenance efficiency is improved, and the continuous operation of the service is ensured.

Description

Operation and maintenance analysis method, device, equipment and medium for service

Technical Field

The present invention relates to the field of operation and maintenance technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for service-oriented operation and maintenance analysis.

Background

At present, in the operation And maintenance industry, the performance monitoring And management of services And the requirement for assisting in quick positioning of faults are generally solved by applying a performance management And monitoring (apm) (application performance management And monitoring) product.

However, the process of acquiring the service operation data by the traditional operation and maintenance is complex, the efficiency is not high, the extraction requirement of a large amount of service operation data is difficult to meet, and the operation and maintenance efficiency is affected. In addition, the traditional operation and maintenance usually focuses on indexes of hardware and middleware layers, operation and maintenance personnel are tired of processing massive discrete alarms and work orders every day, and the operation and maintenance labor cost is increased exponentially while the service complexity is increased.

Disclosure of Invention

The embodiment of the invention provides a service-oriented operation and maintenance analysis method, a service-oriented operation and maintenance analysis device, service-oriented operation and maintenance analysis equipment and a computer-readable storage medium, which can automatically track the service based on a preset detection script, improve the information acquisition efficiency, find problems in service operation in time, improve the operation and maintenance efficiency and ensure the continuous operation of the service.

In a first aspect, an embodiment of the present invention provides a service-oriented operation and maintenance analysis method, where the method includes: acquiring tracking information of a service tracked by a preset detection script, wherein the tracking information comprises first service alarm information; alarm merging is carried out on first service alarm information in the tracking information, and a first alarm merging result is obtained; and displaying the merging result of the first alarm.

In some implementations of the first aspect, obtaining tracking information of a service tracked by a preset detection script includes: when a service request is generated, injecting a preset detection script; generating tracking chain information based on a preset detection script; and tracking the service based on the tracking chain information, and recording the tracking information.

In some implementations of the first aspect, the data structure of the preset probe script includes: task, tracing chain identity identification ID, tracing chain; wherein the task represents a Remote Procedure Call (RPC); the tracking chain ID includes: the task management system comprises a transaction ID, a task ID and a father task ID, wherein the transaction ID is used for identifying a service, the task ID is used for identifying a task, the father task ID is used for identifying the ID of a father task of a generated task, and the task ID and the father task ID are used for representing the father-son relationship of the task; the trace chain represents a collection of associated tasks, the tasks in the trace chain sharing the same transaction ID.

In some implementation manners of the first aspect, performing alarm merging on first service alarm information in the tracking information to obtain a first alarm merging result, includes: respectively performing alarm merging on the first service alarm information based on each preset algorithm in multiple algorithms to obtain a second alarm merging result of each algorithm; determining the weight of each algorithm based on the second alarm merging result of each algorithm; and combining each algorithm based on the weight of each algorithm, and performing alarm merging on the first service alarm information to obtain a first alarm merging result.

In some implementations of the first aspect, the method further comprises: determining a weight of each algorithm based on the second alarm merging result of each algorithm, comprising: acquiring a preset number of second service alarm information based on the second alarm merging result of each algorithm; authenticating the second service alarm information with the preset quantity to obtain an authentication result; counting a first voting result of each algorithm for a preset number of second service alarm information and the total number of first type alarm information based on the authentication result, wherein the first type alarm information is the second service alarm information which is successfully authenticated; the weight of each algorithm is determined based on the first voting results and the total number of the first type of alert information.

In some implementations of the first aspect, combining each algorithm based on its weight to perform alarm merging on the first service alarm information to obtain a first alarm merging result, includes: calculating based on the weight of each algorithm to obtain the sum of the weights of all algorithms; counting a second voting result of each algorithm for the first service alarm information; and determining a first alarm merging result based on the weight of each algorithm, the sum of the weights of all algorithms and the second voting result.

In some implementations of the first aspect, the tracking information further includes at least one of: service data information, server information, middleware information; the method further comprises the following steps: and monitoring the service according to the tracking information.

In a second aspect, an embodiment of the present invention provides a service-oriented operation and maintenance analysis apparatus, where the apparatus includes: the acquisition module is used for acquiring tracking information of a service tracked by a preset detection script, wherein the tracking information comprises first service alarm information; the merging module is used for merging the alarms of the first service alarm information in the tracking information to obtain a first alarm merging result; and the display module is used for displaying the first alarm merging result.

In some implementations of the second aspect, the obtaining module is specifically configured to: when a service request is generated, injecting a preset detection script; generating tracking chain information based on a preset detection script; and tracking the service based on the tracking chain information, and recording the tracking information.

In some implementations of the second aspect, the data structure of the preset probe script includes: task, track chain Identity (ID), track chain; wherein the task represents a Remote Procedure Call (RPC); the tracking chain ID includes: the task management system comprises a transaction ID, a task ID and a father task ID, wherein the transaction ID is used for identifying a service, the task ID is used for identifying a task, the father task ID is used for identifying the ID of a father task of a generated task, and the task ID and the father task ID are used for representing the father-son relationship of the task; the trace chain represents a collection of associated tasks, the tasks in the trace chain sharing the same transaction ID.

In some implementations of the second aspect, the merging module is specifically configured to: respectively performing alarm merging on the first service alarm information based on each preset algorithm in multiple algorithms to obtain a second alarm merging result of each algorithm; determining the weight of each algorithm based on the second alarm merging result of each algorithm; and combining each algorithm based on the weight of each algorithm, and performing alarm merging on the first service alarm information to obtain a first alarm merging result.

In some implementations of the second aspect, the merging module is specifically configured to: acquiring a preset number of second service alarm information based on the second alarm merging result of each algorithm; authenticating the second service alarm information with the preset quantity to obtain an authentication result; counting a first voting result of each algorithm for a preset number of second service alarm information and the total number of first type alarm information based on the authentication result, wherein the first type alarm information is the second service alarm information which is successfully authenticated; the weight of each algorithm is determined based on the first voting results and the total number of the first type of alert information.

In some implementations of the second aspect, the merging module is specifically configured to: calculating based on the weight of each algorithm to obtain the sum of the weights of all algorithms; counting a second voting result of each algorithm for the first service alarm information; and determining a first alarm merging result based on the weight of each algorithm, the sum of the weights of all algorithms and the second voting result.

In some implementations of the second aspect, the tracking information further includes at least one of: service data information, server information, middleware information; the device also includes: and the monitoring module is used for monitoring the service according to the tracking information.

In a third aspect, an embodiment of the present invention provides a service-oriented operation and maintenance analysis device, where the device includes: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the service-oriented operation and maintenance analysis method described in the first aspect or any of the implementable forms of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when executed by a processor, the computer program instructions implement the service-oriented operation and maintenance analysis method described in the first aspect or any of the implementable manners of the first aspect.

The operation and maintenance analysis method, the operation and maintenance analysis device, the operation and maintenance analysis equipment and the computer readable storage medium for the service, which are provided by the embodiment of the invention, track the service through the preset detection script and record the tracking information, perform alarm merging on the first service alarm information in the tracking information to obtain a first alarm merging result, and display the first alarm merging result, can realize automatic tracking of the service on the basis of not modifying a service source code, improve the service monitoring and deployment efficiency and the service information acquisition efficiency, timely find problems existing in the service operation process, and ensure the continuous operation of the service.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a service-oriented operation and maintenance analysis method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating default probe script injection according to an embodiment of the present invention;

FIG. 3 is a flow chart of a trace chain according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a trace service according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a service-oriented operation and maintenance analysis apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a service-oriented operation and maintenance analysis device according to an embodiment of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

At present, the process of acquiring service operation data by a traditional operation and maintenance (for example, APM) is complex, the efficiency is not high, the extraction requirement of a large amount of service operation data is difficult to meet, and the operation and maintenance efficiency is affected. Moreover, the traditional operation and maintenance can not provide an effective means, help users to know the service architecture, quickly find the abnormality and even accurately locate the problem.

In view of the above, embodiments of the present invention provide a service-oriented operation and maintenance analysis method, device, equipment, and computer-readable storage medium, which track a service through a preset detection script, record tracking information, perform alarm merging on first service alarm information in the tracking information to obtain a first alarm merging result, and display the first alarm merging result, so that automatic tracking of the service can be implemented without modifying a service source code, service monitoring deployment efficiency and service information acquisition efficiency are improved, problems existing in a service operation process are timely discovered, and continuous operation of the service is ensured.

The operation and maintenance analysis method for the service provided by the embodiment of the invention is described below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a service-oriented operation and maintenance analysis method according to an embodiment of the present invention. As shown in fig. 1, the operation and maintenance analysis method 100 for business may include S110 to S130.

S110, obtaining the tracking information of the service tracked by the preset detection script.

Specifically, the service system may be monitored, when the service system generates a service request, a preset detection script is injected, tracking chain information is generated based on the preset detection script, a service is tracked based on the tracking chain information, and tracking information of the service is recorded. The service may be a transaction, the service request may be a transaction request, and optionally, the service may be a distributed transaction in a distributed system, and the service request may be a distributed transaction request. As one example, the preset probe script may be a data acquisition probe. The trace information may include first service alarm information, and the service alarm information may be an alarm error occurring during the service execution.

Wherein, the data structure of the preset detection script may include: task, trace chain ID, trace chain. It is to be appreciated that the trace chain information can be a trace chain ID. Wherein, the task represents RPC, which is the basic component unit of the trace chain. The tracking chain ID includes: a transaction ID, a task ID, and a parent task ID. The transaction ID may be used to identify the service, and it is understood that the transaction ID may be an ID of the service for sending and receiving information in the distributed system, and the ID is globally unique. The task ID may be used to identify the task, is a job ID when processing the received RPC, and is generated when the RPC arrives. The parent task ID may be used to identify a task ID of the parent task that generated the task. If task A is the start point of the program, task A will not have a parent task, and a specified ID can be used to indicate its parent task ID, e.g., -2, -1, 0, etc., indicating that task A is the root task of the trace chain. The task ID and parent task ID may be used to represent a parent-child relationship between two tasks. For example, one task is a World Wide Web (Web) request, the task ID can be considered as the ID of a thread that processes a hypertext Transfer Protocol (HTTP) request, and the parent task ID is the ID of a parent task that makes an RPC call. The trace chain represents a collection of associated tasks, where associated tasks refer to tasks having the same transaction ID, that is, tasks in the trace chain share the same transaction ID. The trace chain can be used for representing the calling flow of the whole business and representing the calling flow of the distributed transaction. And the trace chain can be constructed as a hierarchical tree structure by task ID and parent task ID.

In some embodiments, the transaction ID may consist of the agent ID, the server start time, and the sequence number. Wherein the server may be implemented by a Virtual Machine, such as a Java Virtual Machine (JVM). The proxy ID is a user-created ID at server startup, and is globally unique across the entire server group in which the probe script is installed. A simple way to make this unique may be to use a hostname, since hostnames are typically not duplicated. Alternatively, if multiple servers need to be run in the server group, a suffix can be added to the hostname to avoid duplication. The server startup time may be a unique sequence number beginning with zero, which may be used to prevent ID collision when the user creates duplicate proxy IDs by mistake. The sequence number may be used to refine the ID issued by the agent, and optionally, the number may be incremented from zero, as specified in the order of message issuance.

Fig. 2 is a schematic flow chart of injecting a preset probe script according to an embodiment of the present invention, and as shown in fig. 2, before the preset probe script based on the bytecode detection technology is not injected, a service program may be:

“public void hello world(){

System.out.print(“hello”)；

}”。

when the service program is loaded, the service program is intervened, and a preset detection script is injected to track the service. The resulting business process may be:

“public void hello world(){

Interceptor.before()；

System.out.print(“hello”)；

Interceptor.after()；

}”。

among them, the interpolator. Therefore, on the basis of not modifying the service source code, the automatic tracking of the service can be realized, and the service monitoring deployment efficiency and the service information acquisition efficiency are improved. When the database analysis or the preset detection script has problems, the source code does not need to be modified, and only the configuration data related to the preset detection script needs to be deleted, so that the method is simple and convenient. Meanwhile, the collected data and various RPCs are asynchronously transmitted by another thread, and the preset detection script does not interfere with the service thread.

In some embodiments, the service request may be a first RPC in the trace chain, and based on the preset probe script, trace chain information of the first RPC may be generated when the service request arrives, that is, when the first RPC arrives, each RPC may be intervened before being transmitted by each of a plurality of subsequent RPCs, and the trace chain information may be generated in each RPC, and the service may be traced using the trace chain information. When an RPC arrives and contains trace chain information, the RPC has been processed by the default probe script.

Fig. 3 is a flowchart of a trace chain according to an embodiment of the present invention, as shown in fig. 3, where the server may be a JVM, and the JVM1 receives a service request, obtains a trace chain ID1 (transaction ID ^ JVM1^ start time ^1, task ID ^1, parent task ID ^ -1) based on a preset probe script, and records running data in the JVM1 based on the trace chain ID 1. Intervene RPC1 in JVM1 based on a preset probe script, generate trace chain ID2 in RPC1 (transaction ID ^ JVM1^ start time ^1, task ID ^ 2, parent task ID ^ 1), record run data in JVM2 based on trace chain ID 2. Intervene RPC2 in JVM2 based on a preset probe script, generate trace chain ID3 in RPC2 (transaction ID ^ JVM1^ start time ^1, task ID ^ 3, parent task ID ^ 2), record run data in JVM3 based on trace chain ID 3. Intervene RPC2 in JVM2 based on a preset probe script, generate trace chain ID4 in RPC2 (transaction ID ^ JVM1^ start time ^1, task ID ^ 4, parent task ID ^ 2), record run data in JVM4 based on trace chain ID 4. Thus, the tracking of the service can be realized.

Fig. 4 is a schematic flow chart of a tracing service according to an embodiment of the present invention, and as shown in fig. 4, taking a tracing chain of a single task as a single service, the tracing service based on a preset probe script may include the following steps:

step 1, when the Service request reaches the server A, marking a tracking chain ID1 (for example, a transaction ID: Service A ^ TIME ^1, a task ID: 10, a parent task ID: -1) based on a preset detection script, and then executing step 2.

Step 2, recording data A from program A in server A based on tracking chain ID1, wherein data A corresponds to tracking chain ID1, and then executing step 7 and step 3.

Step 3, intervene in the Service request A of the server A, mark the tracking chain ID2 (for example, transaction ID: Service A ^ TIME ^1, task ID: 20, parent task ID: 10) in the Service request A, where the Service request represents RPC, and then execute step 4.

Specifically, a tracking chain ID2 is created, tracking chain ID2 is a child tracking chain ID of tracking chain ID1, and tracking chain ID2 is configured at the request header of service request a.

Step 4, the marked service request a is transmitted to the server B, and then step 5 is executed.

Specifically, server B checks the request header in the received service request a to obtain the tracking chain ID 2.

Step 5, recording data B from program B in server B based on tracking chain ID2, wherein data B corresponds to tracking chain ID2, and then performing step 6.

And 6, when the service call of the server B is terminated and the service request A from the server A is completed, sending the data B of the tracking chain to the data collection module based on the preset detection script, storing the data B in the database and corresponding to the tracking chain ID1, and identifying the data A by the tracking chain ID 1.

And 7, when the service call of the server A is terminated, sending the data A of the tracking chain to a data collection module based on a preset detection script, storing the data A in a database, corresponding to the tracking chain ID2, and identifying the data B by using the tracking chain ID 2.

Therefore, the tracking chain ID is generated based on the preset detection script, the operation data of the service can be recorded based on the tracking chain ID, and the tracking information of the service can be further acquired.

And S120, performing alarm merging on the first service alarm information in the tracking information to obtain a first alarm merging result.

Specifically, the alarm merging may be performed on the first service alarm information based on each preset algorithm of the multiple algorithms to obtain a second alarm merging result of each algorithm, the weight of each algorithm is determined based on the second alarm merging result of each algorithm, each algorithm is combined based on the weight of each algorithm, and the alarm merging is performed on the first service alarm information to obtain the first alarm merging result. As an example, the alarm merging may be performed on the first service alarm information based on each of a plurality of preset time sequence algorithms and a plurality of preset homologous analysis algorithms, so as to implement merging optimization of the alarm information and reduce the number of the alarm information.

In some embodiments, a preset number of second service alarm information may be obtained based on the second alarm merging result of each algorithm, and the preset number of second service alarm information may be authenticated to obtain an authentication result. Optionally, the operation and maintenance personnel may authenticate the second service alarm information, determine whether the second service alarm information is accurate, and obtain an authentication result. The first voting result of each algorithm for a preset number of second service alarm information and the total number of the first type alarm information may be counted based on the authentication result, and the weight of each algorithm may be determined based on the first voting result and the total number of the first type alarm information. The first voting result can comprise the approval or disapproval votes cast by each algorithm for each piece of second service alarm information, and the first type alarm information is the second service alarm information which is successfully authenticated.

And then, calculation can be carried out based on the weight of each algorithm to obtain the sum of the weights of all algorithms, the second voting result of each algorithm on the first service alarm information is counted, and the first alarm merging result is determined based on the weight of each algorithm, the sum of the weights of all algorithms and the second voting result. Wherein the second voting result may comprise the positive or negative votes cast by each algorithm for each piece of the first service alarm information.

As a specific example, 500 pieces of service alarm information (the specific number may be flexibly adjusted according to actual needs) subjected to manual feedback may be obtained first, and if there are less than 500 pieces, the maximum value may be obtained. That is, the traffic alarm information is authenticated.

And secondly, obtaining the voting result of each algorithm on the service alarm information fed back manually and counting the result. Specifically, when the information of a certain alarm is fed back accurately by manual work, the approval vote of each algorithm for the alarm can be counted, and when the information of the certain alarm is fed back by manual work, the approval vote of each algorithm for the alarm can be counted.

Again, the weight of each algorithm and the sum of the weights of all algorithms are calculated. Specifically, the weight of each algorithm may be a correct rate of each algorithm, which is a vote counted by each algorithm/a total number of actually acquired alarms, where the total number of actually acquired alarms refers to a total number of alarm information for which manual feedback is accurate. The sum of the weights of all algorithms is accumulated from the weights of each algorithm.

And then, performing the adjusted algorithm weight application, and when the algorithm throws vote 1 to the data point (the data point is considered to be abnormal), multiplying the weight corresponding to the algorithm to obtain the result of the algorithm voting weight calculation, thereby obtaining the result of each algorithm voting weight calculation.

Then, the detection result of the data point may be the sum of the results of the weighted calculation of each algorithm vote and/or the sum of all algorithm weights, a preset threshold is formulated according to actual needs, and if the detection result is greater than the preset threshold, it is determined as an abnormal point.

Therefore, an alarm feedback mechanism can be continuously introduced on the basis of the alarm processing based on the time sequence algorithm and the associated alarm processing based on the homologous analysis algorithm, the weights of a plurality of algorithms are automatically adjusted and optimized according to manual feedback, one-sided diagnosis of a single algorithm on data characteristics is avoided, and the adaptability of the intelligent analysis model is improved.

S130, displaying the merging result of the first alarm.

Specifically, the first alarm merging result can be displayed on a visual interface, so that operation and maintenance personnel can monitor the service conveniently.

According to the operation and maintenance analysis method for the service, provided by the embodiment of the invention, the service is tracked through the preset detection script, the tracking information is recorded, the first service alarm information in the tracking information is subjected to alarm merging to obtain the first alarm merging result, and the first alarm merging result is displayed, so that the automatic tracking of the service can be realized on the basis of not modifying the service source code, the service monitoring deployment efficiency and the service information acquisition efficiency are improved, the problems existing in the service operation process are found in time, and the continuous operation of the service is ensured.

In some embodiments, the tracking information further comprises at least one of the following options: service data information, server information, middleware information. The server information may be virtual machine information, such as JVM information.

In some embodiments, traffic may be monitored based on the tracking information. Specifically, information such as service topology, service performance monitoring, data report statistics and the like can be obtained according to the tracking information and displayed on the visual interface. Therefore, global monitoring aiming at the system layer and the service layer can be provided, visualization and real-time dynamic refreshing of the whole service chain are realized, operation and maintenance personnel are assisted to know the whole service system architecture more comprehensively, and the load and the health degree of the service and the service are monitored in time.

Fig. 5 is a schematic structural diagram of an operation and maintenance analysis apparatus for business provided in an embodiment of the present invention, and as shown in fig. 5, the operation and maintenance analysis apparatus 200 for business may include: an acquisition module 210, a merging module 220, and a presentation module 230.

The obtaining module 210 is configured to obtain tracking information of a service tracked by a preset detection script, where the tracking information includes first service alarm information. The merging module 220 is configured to merge alarms for the first service alarm information in the tracking information to obtain a first alarm merging result. And a display module 230, configured to display the first alarm merging result.

In some embodiments, the obtaining module 210 is specifically configured to: when a service request is generated, injecting a preset detection script, generating tracking chain information based on the preset detection script, tracking the service based on the tracking chain information, and recording the tracking information.

In some embodiments, the data structure of the preset probe script comprises: task, track chain Identity (ID), track chain. Wherein the task represents a Remote Procedure Call (RPC). The tracking chain ID includes: the task management system comprises a transaction ID, a task ID and a father task ID, wherein the transaction ID is used for identifying a service, the task ID is used for identifying a task, the father task ID is used for identifying the ID of a father task of a generated task, and the task ID and the father task ID are used for representing the task parent-child relationship. The trace chain represents a collection of associated tasks, the tasks in the trace chain sharing the same transaction ID.

In some embodiments, the merging module 220 is specifically configured to: and respectively performing alarm merging on the first service alarm information based on each preset algorithm in multiple algorithms to obtain a second alarm merging result of each algorithm. Determining a weight for each algorithm based on the second alarm merging result for each algorithm. And combining each algorithm based on the weight of each algorithm, and performing alarm merging on the first service alarm information to obtain a first alarm merging result.

In some embodiments, the merging module 220 is specifically configured to: and acquiring a preset number of second service alarm information based on the second alarm merging result of each algorithm, and authenticating the preset number of second service alarm information to obtain an authentication result. And counting the first voting result of each algorithm on the preset number of second service alarm information and the total number of the first type alarm information based on the authentication result, wherein the first type alarm information is the second service alarm information which is successfully authenticated. The weight of each algorithm is determined based on the first voting results and the total number of the first type of alert information.

In some embodiments, the merging module 220 is specifically configured to: and calculating based on the weight of each algorithm to obtain the sum of the weights of all algorithms. And counting second voting results of each algorithm on the first service alarm information, and determining a first alarm merging result based on the weight of each algorithm, the sum of the weights of all algorithms and the second voting results.

In some embodiments, the tracking information further comprises at least one of the following options: service data information, server information, middleware information. The apparatus 200 further comprises: and the monitoring module 240 is configured to monitor the service according to the tracking information.

The operation and maintenance analysis device for the service, provided by the embodiment of the invention, tracks the service by the preset detection script, records the tracking information, performs alarm merging on the first service alarm information in the tracking information to obtain a first alarm merging result, and displays the first alarm merging result, so that the automatic tracking of the service can be realized on the basis of not modifying a service source code, the service monitoring deployment efficiency and the service information acquisition efficiency are improved, the problems existing in the service operation process are found in time, and the continuous operation of the service is ensured.

It can be understood that the operation and maintenance analysis device 200 for service in the embodiment of the present invention may correspond to the execution main body of the operation and maintenance analysis method for service in fig. 1 in the embodiment of the present invention, and specific details of the operation and/or function of each module/unit of the operation and maintenance analysis device 200 for service may refer to the description of the corresponding part in the operation and maintenance analysis method for service in fig. 1 in the embodiment of the present invention, and are not described herein again for brevity.

Fig. 6 is a schematic hardware structure diagram of a service-oriented operation and maintenance analysis device according to an embodiment of the present invention.

As shown in fig. 6, the service-oriented operation and maintenance analysis device 300 in this embodiment includes an input device 301, an input interface 302, a central processing unit 303, a memory 304, an output interface 305, and an output device 306. The input interface 302, the central processing unit 303, the memory 304, and the output interface 305 are connected to each other through a bus 310, and the input device 301 and the output device 306 are connected to the bus 310 through the input interface 302 and the output interface 305, respectively, and further connected to other components of the operation and maintenance analysis device 300 facing the service.

Specifically, the input device 301 receives input information from the outside and transmits the input information to the central processor 303 through the input interface 302; central processor 303 processes the input information based on computer-executable instructions stored in memory 304 to generate output information, stores the output information temporarily or permanently in memory 304, and then transmits the output information to output device 306 through output interface 305; the output device 306 outputs the output information to the outside of the service-oriented operation and maintenance analysis device 300 for use by the user.

In one embodiment, the service-oriented operation and maintenance analysis device 300 shown in fig. 6 includes: a memory 304 for storing programs; the processor 303 is configured to execute the program stored in the memory to execute the service-oriented operation and maintenance analysis method provided in the embodiment shown in fig. 1.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; the computer program instructions, when executed by the processor, implement the service-oriented operation and maintenance analysis method provided by the embodiment shown in fig. 1.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A service-oriented operation and maintenance analysis method is characterized by comprising the following steps:

acquiring tracking information of a service tracked by a preset detection script, wherein the tracking information comprises first service alarm information;

alarm merging is carried out on first service alarm information in the tracking information, and a first alarm merging result is obtained;

and displaying the merging result of the first alarm.

2. The method according to claim 1, wherein the obtaining of the trace information of the service traced by the preset probe script comprises:

when a service request is generated, injecting the preset detection script;

generating tracking chain information based on the preset detection script;

and tracking the service based on the tracking chain information, and recording tracking information.

3. The method according to claim 1 or 2, wherein the data structure of the preset probe script comprises: task, tracing chain identity identification ID, tracing chain; wherein the content of the first and second substances,

the task represents a Remote Procedure Call (RPC);

the tracking chain ID includes: the task management system comprises a transaction ID, a task ID and a parent task ID, wherein the transaction ID is used for identifying a service, the task ID is used for identifying a task, the parent task ID is used for identifying the ID of a parent task for generating the task, and the task ID and the parent task ID are used for representing the parent-child relationship of the task;

the trace chain represents a collection of associated tasks, the tasks in the trace chain sharing the same transaction ID.

4. The method according to claim 1, wherein said performing alarm merging on the first service alarm information in the tracking information to obtain a first alarm merging result comprises:

respectively performing alarm merging on the first service alarm information based on each preset algorithm in multiple algorithms to obtain a second alarm merging result of each algorithm;

determining a weight of each algorithm based on the second alarm merging result of each algorithm;

and combining each algorithm based on the weight of each algorithm, and performing alarm merging on the first service alarm information to obtain a first alarm merging result.

5. The method of claim 4, wherein determining the weight of each algorithm based on the second alarm merge results of each algorithm comprises:

acquiring a preset number of second service alarm information based on the second alarm merging result of each algorithm;

authenticating the second service alarm information of the preset number to obtain an authentication result;

counting a first voting result of each algorithm on the preset number of second service alarm information and the total number of first type alarm information based on the authentication result, wherein the first type alarm information is the second service alarm information which is successfully authenticated;

determining a weight for each of the algorithms based on the first voting results and the total number of the first type of alert information.

6. The method according to claim 4, wherein said combining said each algorithm based on said weight of each algorithm, performing alarm merging on said first service alarm information to obtain a first alarm merging result, comprises:

calculating based on the weight of each algorithm to obtain the sum of the weights of all algorithms;

counting a second voting result of each algorithm to the first service alarm information;

determining the first alarm merging result based on the weight of each algorithm, the sum of the weights of all algorithms, and the second voting result.

7. The method of claim 1, wherein the tracking information further comprises at least one of: service data information, server information, middleware information;

the method further comprises the following steps:

and monitoring the service according to the tracking information.

8. A service-oriented operation and maintenance analysis device, the device comprising:

the acquisition module is used for acquiring tracking information of a service tracked by a preset detection script, wherein the tracking information comprises first service alarm information;

the merging module is used for merging the alarms of the first service alarm information in the tracking information to obtain a first alarm merging result;

and the display module is used for displaying the first alarm merging result.

9. A service-oriented operation and maintenance analysis device, the device comprising: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the service oriented operation and maintenance analysis method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, implement the service-oriented operation and maintenance analysis method according to any one of claims 1 to 7.