EP4070197A1

EP4070197A1 - Device for monitoring a computer network system

Info

Publication number: EP4070197A1
Application number: EP20703010.7A
Authority: EP
Inventors: Jose Manuel NAVARRO CONZALEZ; Dario Rossi
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-01-30
Filing date: 2020-01-30
Publication date: 2022-10-12
Also published as: WO2021151494A1

Abstract

The present disclosure relates to monitoring and anomaly detection in a computer network system. A device for monitoring the computer network system is configured to receive a dataset comprising a set of indicators indicative of a performance of the system, detect an anomaly in the performance based on the indicators, and determine a score for each indicator based on the detected anomaly, the score being indicative of a relationship of the respective indicator with the anomaly. Moreover, the device is configured to obtain an expert factor for each indicator in a subset of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for a previous anomaly, and modify the determined score of each indicator in the subset based on the expert factor.

Description

DEVICE FOR MONITORING A COMPUTER NETWORK SYSTEM

TECHNICAL FIELD

The present disclosure relates to anomaly detection in a computer network system. In particular, embodiments of the invention provide a device for monitoring a computer network system, such as to detect anomalies in the computer network system. Embodiments of the invention also relate to a method for monitoring a computer network system.

BACKGROUND

Due to the fact that networking systems are rapidly expanding, an exponential growth of complexity in networked applications and information services is generated. These large-scale, distributed, networking systems usually comprise a huge variety of components, which work together in a complex and coordinated manner. A central task in running these large scale distributed systems is to automatically monitor the system status, detect anomalies, and diagnose system fault, so as to guarantee stable and high-quality services or outputs.

Anomaly detection relates to the problem of identifying anomalies in a data set, where anomalies correspond to points generated by a different process than the one that normal samples are assumed to be generated from. In most applications, however, statistical anomalies do not always correspond to semantically-meaningful anomalies. For example, in a computer security application, a user may be considered statistically anomalous due to an unusually high amount of copying and printing activity, which in reality has a benign explanation, and hence is not a true anomaly.

Because of this gap between statistics and semantics, an analyst typically investigates the anomalies, in order to decide which ones are likely to be true anomalies and deserve further action. This leads to a lot of time consumed, which experts need to spend on diagnosing detected anomalies.

In particular, given an anomaly, an analyst faces the problem of analysing the data associated with that anomaly, in order to make a judgement about whether it is an anomaly or not. This is rather time consuming. Even when an anomaly is described by just tens of features this the task can be challenging for the expert, especially, when feature interactions are critical to the judgement. In practice, the situation is often much worse, with anomalies being described by thousands of features. In these cases, there is a significant risk that even when the anomaly detector passes a true anomaly to the expert, the expert will not recognize the key properties of the anomaly, due to information overload.

Thus, there is a need for an improved device for monitoring a computer network system, in particular for detecting anomalies in the computer network system.

SUMMARY

In view of the above-mentioned problems and disadvantages, embodiments of the present invention aim to improve the conventional devices for monitoring computer network systems. An object is thereby to provide a device and method for monitoring a computer network system, which allows a more efficient expert analysis of detected anomalies. The device should in this way allow detecting anomalies in the computer network system faster and more reliably.

The object is achieved by the embodiments provided in the enclosed independent claims. Advantageous implementations of the embodiments are further defined in the dependent claims.

According to a first aspect, the disclosure relates to a device for monitoring a computer network system, the device being configured to: receive a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system, detect an anomaly in the performance of the computer network system based on the received set of indicators, determine a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly, wherein the determined score is indicative of a relationship of the respective indicator with the detected anomaly, obtain an expert factor for each indicator in a subset of the set of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system, and modify the determined score of each indicator in the subset of the set of indicators based on the expert factor.

This device provides the advantage that detected anomalies can be analyzed faster and more efficiently, due to the scoring. Given the cost of an expert having to analyze anomalies, this translates into significant cost-saving. Since the time to diagnose anomalies and accordingly to correct faults is reduced, more cases can be tackled in the same amount of time. A computer network system may comprise a plurality of computer network system entities, for instance, computers or routers. An anomaly may be an (unexpected) change or drop in the performance of the computer network system, which may be reflected in the set of indicators. In particular, an anomaly may correspond to a values of an indicator that is different from a “normal” value of the indicator. An expert factor may be a predetermined factor that is associated to a respective indicator. A human expert may have previously determined a level of relevance for one or more indicators with respect to one or more anomalies, which is reflected by the expert factors. Expert factors may be stored in a database and may be obtained by the device from that database.

The set of indicators may comprise one or more indicators. Also the subset of indicators may comprise one or more indicators. Thereby, the subset may comprise all indicators comprised in the set of indicators, or may comprise fewer indicators than comprised in the set of indicators.

In an embodiment, the device is further configured to sort the indicators in the subset of the set of indicators based on a respective modified score and sort the indicators not included in the subset of the set of indicators based on a respective score.

This provides the advantage that the number of indicators that need to be analyzed by experts is significantly reduced, which translates into an increased reliability of the managed system, reduced costs.

In an embodiment, the device is further configured to modify the determined score of each indicator in the subset of the set of indicators based on a weighting factor, wherein the weighting factor is indicative of an adjustable numeric value to be applied to a value of the expert factor of the respective indicator.

This provides the advantage that the device can flexibly accommodate current and past knowledge of the same indicator from different troubleshooting cases, by giving either more importance to the current case (important for new problems) or by trusting more the expert knowledge (very useful for recurring problems), through the tuning of the weighting factor. In an embodiment, the level of relevance of the respective indicator for the at least one previous anomaly comprises the respective indicator being related to the at least one previous anomaly in an expert diagnosis of the computer network system.

In an embodiment, the device is further configured to obtain the expert factor for each indicator in the subset of the set of indicators by querying a database storing an expert diagnosis of the computer network system, and generate a second database based on the expert factor obtained for each indicator in the subset of the set of indicators.

In an embodiment, the device is further configured to update the second database in response to a modification of the expert diagnosis of the computer network system stored in the database.

In an embodiment, each indicator of the set of indicators is indicative of a development of the performance of the computer network system over time.

In an embodiment, the device is configured to sample each indicator of the set of indicators with a same frequency, wherein the indicators of the set of indicators are aligned in time.

In an embodiment, each indicator of the set of indicators comprises an indicator value for each of multiple time slots covered by the dataset.

In an embodiment, the device is configured to determine a duration of the detected anomaly.

In an embodiment, the device is further configured to determine the relationship of an indicator of the set of indicators with the detected anomaly, based on a difference between an average value of the indicator over a duration of the detected anomaly and an average value of the indicator before and/or after the duration of the detected anomaly.

In an embodiment, the device is configured to detect the anomaly by using a machine learning method, in particular an unsupervised machine learning method.

In an embodiment, at least one indicator of the set of indicators is indicative of one or more of: a processing power consumption of the computer network system, a memory consumption in the computer network system, and an amount of traffic routed through the computer network system.

According to a second aspect, the disclosure relates to a method for monitoring a computer network system, the method comprising: receiving a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system, detecting an anomaly in the performance of the computer network system based on the received set of indicators, determining a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly, wherein the determined score is indicative of a relationship of the respective indicator with the detected anomaly, obtaining an expert factor for each indicator in a subset of the set of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system, and modifying the determined score of each indicator in the subset of the set of indicators based on the expert factor.

In an embodiment, the method further comprises sorting the indicators in the subset of the set of indicators based on a respective modified score and sorting the indicators not included in the subset of the set of indicators based on a respective score.

In an embodiment, the method further comprises modifying the determined score of each indicator in the subset of the set of indicators based on a weighting factor, wherein the weighting factor is indicative of an adjustable numeric value to be applied to a value of the expert factor of the respective indicator.

In an embodiment, the level of relevance of the respective indicator for the at least one previous anomaly comprises the respective indicator being related to the at least one previous anomaly in an expert diagnosis of the computer network system.

In an embodiment, the method further comprises: obtaining the expert factor for each indicator in the subset of the set of indicators by querying a database storing an expert diagnosis of the computer network system, and generating a second database based on the expert factor obtained for each indicator in the subset of the set of indicators. The method of the second aspect achieves the same advantages as the device of the first aspect. Moreover, the implementation forms according to the implementation forms of the device of the second aspect achieve the same advantages as the respective implementation forms of the device of the first aspect.

According to a third aspect, the disclosure relates to a computer program comprising a program code for performing the method according to the second aspect or any one of the implementation forms thereof.

According to a fourth aspect, the disclosure relates to a non-transitory storage medium storing executable program code which, when executed by a processor, causes the method according to the second aspect or any of its implementation forms to be performed.

It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.

BRIEF DESCRIPTION OF DRAWINGS

The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:

FIG. 1 shows a schematic representation of a device for monitoring a computer network system according to an embodiment;

FIG. 2 shows a schematic representation of a working scheme of a device for monitoring a computer network system according to an embodiment; FIG. 3 shows a schematic representation of a working scheme of a device for monitoring a computer network system according to an embodiment; and

FIG. 4 shows a schematic representation of a method for monitoring a computer network system according to an embodiment.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1 shows a schematic representation of a device 101 for monitoring a computer network system 100 according to an embodiment. In particular, the device 101 is configured to detect anomalies that occur in the computer network system. The computer network system 100 may comprise, as exemplarily shown, computer network system entities 102, 103, 103 and 105.

The device 101 is configured to receive a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system 100. For instance, the set of indicators may be indicative of ta performance of the computer network system entities. The set of indicators may be obtained by the device 101 from the computer network system entities, or from a device collecting data from the computer network system entities.

Moreover, the device 101 is configured to detect an anomaly in the performance of the computer network system 100 based on the received set of indicators. The anomaly may, for instance, relate to one or more of the computer network system entities. Further, the device 101 is configured to determine a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly, wherein the determined score is indicative of a relationship of the respective indicator with the detected anomaly.

Furthermore the device 101 is configured to obtain an expert factor for each indicator in a subset of the set of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system 100, and modify the determined score of each indicator in the subset of the set of indicators based on the expert factor. The device 101 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the device 101 described herein. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors. In one embodiment, the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device 101 to perform, conduct or initiate the operations or methods described herein.

FIG. 2 shows a schematic representation of a working scheme of the device 101 for monitoring a computer network system 100 according to an embodiment, in particular the device 101 shown in FIG. 1.

In particular, in the following, a summary of steps performed by the device 101 is given, according to an embodiment.

Step 0: the device 101 can be configured to accept as an input the dataset, which may comprise, as the set of indicators, a set of multivariate numeric time series representing the state along time of the managed computer network system 100 (e.g. for a router, its CPU consumption, the traffic it is routing, its memory consumption, etc.). Each of these time series is an indicator of the set of indicators, and is specifically referred to in the following as Key Performance Indicator (KPI). That is an indicator may be a KPI, and the set of indicators may be a set of KPIs. The KPIs may be indicative of a development of the performance of the computer network system 100 over time. Moreover, the KPIs can be sampled with equal frequency, and may be aligned in time. Furthermore, each KPI may comprise a value for each of multiple time slots covered by the dataset.

Mathematically speaking, the dataset can be expressed as a matrix X = *1.2 — , XI _V, ... x_{n w}}, where x_(J- corresponds to the value of the KPI j in the moment at time i. The dataset has a total of n points or timeslots, and a total of v KPIs or features. Step 1: the device 101 may be configured to perform anomaly detection and determine a duration of the detected anomaly. In order to do so, the device 101 can be configured to take X as input and output a vector A = {<¾, a₂, a₃, ... , a_n }, wherein:

Any algorithm that is able to perform the transformation described above would be suitable for this step. In one embodiment, machine learning methods, specifically from the family of unsupervised learning (methods that do not need data that has been explicitly labeled by human experts), may be used. For example, isolation forests, robust random cut forests or local outlier factor can be suitable methods for this step. In the case of using a method whose output is numeric (instead of binary), a specific policy to transform the output into a binary value would be needed. The usage of unsupervised learning methods has the advantage that the need of any human interaction is eliminated this step.

Step 2: the device 101 may be configured to determine a score for each KPI. In one embodiment, the device 101 is configured to take X and A as an input and outputs a vector of tuples:

5 = {{kpi₁,s₁}; {kpi₂,s₂}; ..., {kpi_v,s_v}}, wherein S_j corresponds to a numeric score associated to KPI i and is calculated based on the detected anomalous timeslots in step 1 and the dataset used as input. The intuition behind this step is, having pinpointed on the previous one in which moments in time something anomalous happened in the dataset, to identify which KPIs exhibit the largest changes in these moments when compared to the rest of the time. One possible function to use to calculate this score is the absolute value of the difference between an average value of the KPI over a duration of the detected anomaly and an average value of the KPI before and/or after the duration of the detected anomaly. This is given by: sj = I ^ej - ^rj\

After these scores are obtained, the device 101 may be configured to sort the KPIs, in order to prioritize showing those KPIs that are more correlated with the detected anomaly.

Step 3: the device 101 may be configured to add an expert knowledge. While the two previous steps combined already produce a usable solution, they suffer from a specific pitfall that can reduce the effectiveness of the system: when examining KPIs in order to understand what has happened on their managed environment, the experts look for those that point at the cause of an anomaly, not at the effects (e.g., a large number of launched processes, a cause, can empty the available memory in a router, an effect). Following the anomaly detection and the feature scoring scheme, both causes and effects will be assigned a high score by these processes, which will increase the number of non-useful time series that are shown to the expert before the fault can be diagnosed. It is in the expert’s best interest to prioritize the showing of those KPIs that can help in quickly pinpointing an anomaly’s cause. But establishing causality requires, normally, the ability to interact and perform extensive experiments on a system. In order to overcome this limitation, the device 101 may be configured to exploit information, which should be already available in any organization, in which human experts diagnose network faults: previous diagnoses.

The intuition behind this step is as follows: if, for a certain case with KPIs A, B and C that exhibit similar behavior (and, thus, would be scored similarly in Step 2), only KPI A is labeled as anomalous by a human expert in past cases, it can be assumed that only KPI A is the cause of the actual error and the only KPI that the experts are interested in seeing to correctly diagnose the case. Based on the intuition elucidated above, the device 101 may be configured to create an expert knowledge base (EK), a collection of the decisions of previous experts for the KPIs in the system. Assuming that the computer network system 100 has V possible KPIs, the EK can be expressed as a collection of V tuples of the form:

{KPI name, , wherein K_t corresponds to the expert factor that can be then applied to 5, biasing the scores to tune the KPI sorting and make it more similar to one produced by the experts, producing:

S = {{kpi₁, s₁}; {kpi₂, s₂}; ..., {kpi_v,s_v}} wherein s_t = S_j(l + yK₍), y being a weighting factor, real, positive number that allows for the balancing of the scores, giving more importance to past decisions (y » 1) or to the current dataset (y « 1). The definition of K_t is variable and can be altered or adapted to different situations. Some possibilities for it would be the proportion of times a KPI has appeared on a diagnosed case and has been anomalous or more complex calculations involving conditional Bayesian probabilities depending on the presence or absence of other KPIs or other system variables. As more faults happen and more cases need to be diagnosed, the device 101 can read the second database logs to see which decisions were taken by the experts and can update the different expert factors according to them.

In summary, embodiments of the present invention provide the advantage that the device 101 works in a completely unsupervised way so as to significantly reduce the time spent by network experts diagnosing fault cases. Given the cost of these experts’ time, this translates into more reliable systems, as the time to diagnose and correct them is reduced and more cases can be tackled in the same amount of time.

It should be noted that, although FIG. 2 shows an expert diagnosis step, this is a task that may already be done in any organization that uses human experts for network management, and that no explicit interaction with the system may be required. The device 100 according to an embodiment of the invention may be configured to alter an order in which KPIs are shown. Thus, there is no added workload or learning needed for the experts to do to include it in a managed network.

In the following, some further advantages provided by embodiments of the invention are presented.

The device 101 may flexibly accommodate current and past knowledge of the same KPI from different troubleshooting cases, by giving either more importance to the current case (important for new problems) or by trusting more the expert knowledge (very useful for recurring problems), through the tuning of Moreover, the device 101 can start from an empty EK and is capable to learn over time. Furthermore, the device 101 can directly be plugged into a new network (e.g., when EK is transferred) and start working as intended, as long as there are shared KPIs between the old and new system.

Another advantage is that, the device 101 leverages already existing output from the natural interaction of human expert with the current system and requires no human interaction at all to properly function. Moreover, the device 101 naturally and seamlessly interacts with the user interface by proposing minimally intrusive changes (i.e., altering the ordering in which the KPI are presented to the user) that maximize the gain (i.e., reducing the time it takes to solve the ticket by reducing the number of KPIs to inspect).

The concept and process of blending the results of anomaly detection with expert knowledge in order to decide the order in which KPIs, in the form of time series, can be shown to network experts to diagnose a fault in a managed system, employing only resources already available in this kind of systems and without the need of any human interaction to function. This provides the advantage of reducing the number of KPIs needed to be analyzed by experts, which translates into an increased reliability of the managed system, reduced costs and less reliability on human availability and expertise.

FIG. 3 shows a schematic representation of different modules of the device 101 for monitoring the computer network system 100 according to an embodiment. The different modules may be used for implementing the working scheme shown in FIG. 2.

The device 101 can in particular comprise modules 300, 301, and 302. The role of Anomaly Detection (AD(.)) module 300 is to implement step 1 shown in FIG. 2. In particular, it is configured to find anomalous timeslots, i.e. timeslots in which an anomaly is detected. The device 101 may not be configured to make any assumption about the module AD(.), nor constraints the use of a specific AD(.) function. In one embodiment, unsupervised techniques may be used by the module AD(.) or 300, since they do not require previous training and are general and portable across incidents. However, in other embodiments, the use of supervised techniques (e.g. Long Short-Term Memory (LSTM)) is also possible to detect anomalies. Similarly, the module AD(.) can be configured to leverage further data source including, but not limited to, topology information, configuration parameters, alarm time series, etc.

The role of the Feature Scoring (FS(.)) module 301 is to implement step 2 shown in FIG. 2, and thus to reduce human time involved to solve the ticket by prioritizing human attention to the most anomalous values. The scoring and sorting of module 301 can be implemented with parametric and non-parametric functions (see examples in table below).

Non-parametric functions work well experimentally and may generalize across troubleshooting cases. In one embodiment, the absolute difference of the mean feature score abs(E[FS(X\normal )] — E[FS(X\anomaly)])may be implemented in the FS(.) module 301, wherein FS is the feature score and E denotes the expected value.

The role of Expert Knowledge (EK(.)) module 302 is to support step 3 shown in FIG. 3, for instance, to transparently reuse knowledge, if available, gathered by the computer network system 100 during the past cases. The EK(.) module 302 may be configured to give a simple statistical representation of expert knowledge. The EK(.) module 302 may be activated/deactivated at no additional computational cost. Moreover, the EK(.) module 302 can be configured to learn over past solutions by experts. Moreover, the EK(.) module 302 can be configured to work transparently with sorting, by biasing scores computed by the FS(.) module 301 to also take into account past knowledge. The module EK(.) 302 can be configured to calculate a parametric function that for a feature X modifies the KPI scoring as EK(x ) = FS(X)(1+gR(C)), wherein P(X) is a relevance metric for KPI X gathered over past observation, as determined by previous expert, g is an arbitrary factor, independent from X, that allows to tradeoff between the importance of the current KPI values and the importance of previous expert knowledge. In one embodiment, P(X) is an empirical probability that X has been flagged as anomalous in past cases, and g=1 to equally weight current case statistics and previous observations. P(X) = 0 when the system starts, i.e., EK(X ) = FS(X) in case of no previous observations are available in the system, so that EK(.) is hot-plugged.

The KPI order may be given by the most to least anomalous, based on values of KPI of this case determined by the FS(.) module 301 and EK statistics from past cases (hence most likely causes, according to case under investigation and past cases solved by experts).

Human intervention may be assisted by the FS(.) module 301 and EK(.) module 302, but the human-computer interaction (HCI) in the system can be unmodified, i.e. best settings (EK(X)= FS(X)( 1 +gR(C))) may be used. A HCI interface can be enriched with the ability to prioritize current case (g->1) or past case (g»0). In all the above cases, the time spent by the human in the troubleshooting can be reduced, since the device 101 already “searches” the indicators for the most relevant ones for the detected anomaly. The device 101 does not add any burdens to the human experts: when a ticket is closed, the device 101 may be configured to export the expert knowledge by updating the EK(.) module 302 for the features involved in the case, improving its knowledge over time.

In one embodiment, the computer network system 100 keeps two counters for each feature, tracking the total number O(X) that feature X was observed and the number of times it was tracked as anomalous A(X) by experts and the relevance score is empirically computed as P(X)=A(X)/0(X). In an embodiment, A(X) is increased for features flagged by the expert in his report. In another embodiment, O(X) is increased only by features that expert has analyzed but not flagged in his report. Moreover, human intervention can be seamless. The FS module 301 can be configured to reduce human intervention time even without making use of the EK module 302. Furthermore, the device 101 can be configured to automatically update the EK function at any new ticket. Moreover, as the EK module 302 improves with new tickets, the system learns over time and human intervention is further reduced.

Expert knowledge can be a transferable asset, expert knowledge can initially be absent, or can be seen as “added value”, expert knowledge can easily be combined (e.g. by weighted average).

FIG. 4 shows a schematic representation of a method 400 for monitoring a computer network system 100 according to an embodiment. The method 400 may be performed by the device 101.

The method 400 comprises the following steps. In a step 401, the method 400 receives a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system 100. In a further step 402, the method 400 detects an anomaly in the performance of the computer network system 100 based on the received set of indicators. In a further step 403, the method 400 determines a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly. The determined score is indicative of a relationship of the respective indicator with the detected anomaly. In a further step 404, the method 400 obtains an expert factor for each indicator in a subset of the set of indicators. Each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system 100. In a step 405, the method 400 then modifies the determined score of each indicator in the subset of the set of indicators based on the expert factor.

The present invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

1. A device (101) for monitoring a computer network system (100), the device (101) being configured to: receive a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system (100); detect an anomaly in the performance of the computer network system (100) based on the received set of indicators; determine a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly, wherein the determined score is indicative of a relationship of the respective indicator with the detected anomaly; obtain an expert factor for each indicator in a subset of the set of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system (100); and modify the determined score of each indicator in the subset of the set of indicators based on the expert factor.

2. The device (101) of claim 1, further configured to sort the indicators in the subset of the set of indicators based on a respective modified score and sort the indicators not included in the subset of the set of indicators based on a respective score.

3. The device (101) of claim 1 or 2, further configured to modify the determined score of each indicator in the subset of the set of indicators based on a weighting factor, wherein the weighting factor is indicative of an adjustable numeric value to be applied to a value of the expert factor of the respective indicator.

4. The device (101) of one of claims 1 to 3, wherein the level of relevance of the respective indicator for the at least one previous anomaly comprises the respective indicator being related to the at least one previous anomaly in an expert diagnosis of the computer network system (100).

5. The device (101) of one of claims 1 to 4, configured to: obtain the expert factor for each indicator in the subset of the set of indicators by querying a database storing an expert diagnosis of the computer network system (100); and generate a second database based on the expert factor obtained for each indicator in the subset of the set of indicators.

6. The device (101) of claim 5, further configured to update the second database in response to a modification of the expert diagnosis of the computer network system (100) stored in the database.

7. The device (101) of one of claims 1 to 6, wherein each indicator of the set of indicators is indicative of a development of the performance of the computer network system (100) over time.

8. The device (101) of claim 7, configured to sample each indicator of the set of indicators with a same frequency, wherein the indicators of the set of indicators are aligned in time.

9. The device (101) of claim 7 or 8, wherein each indicator of the set of indicators comprises an indicator value for each of multiple time slots covered by the dataset.

10. The device (101) of one of claims 1 to 9, configured to determine a duration of the detected anomaly.

11. The device (101) of one of claims 1 to 10, further configured to determine the relationship of an indicator of the set of indicators with the detected anomaly, based on a difference between an average value of the indicator over a duration of the detected anomaly and an average value of the indicator before and/or after the duration of the detected anomaly.

12. The device (101) of one of claims 1 to 11, configured to detect the anomaly by using a machine learning method, in particular an unsupervised machine learning method.

13. The device (101) of one of claims 1 to 12, wherein at least one indicator of the set of indicators is indicative of one or more of: a processing power consumption of the computer network system (100), a memory consumption in the computer network system (100), and an amount of traffic routed through the computer network system (100).

14. A method (400) for monitoring a computer network system (100), the method (800) comprising: receiving (401) a dataset comprising a set of indicators, wherein each indicator of the set of indicators is indicative of a performance of the computer network system (100); detecting (402) an anomaly in the performance of the computer network system (100) based on the received set of indicators; determining (403) a score for each indicator in the set of indicators, based on the received set of indicators and the detected anomaly, wherein the determined score is indicative of a relationship of the respective indicator with the detected anomaly; obtaining (404) an expert factor for each indicator in a subset of the set of indicators, wherein each expert factor is indicative of a level of relevance of the respective indicator for at least one previous anomaly in the performance of the computer network system (100); and modifying (405) the determined score of each indicator in the subset of the set of indicators based on the expert factor.

15. The method (400) of claim 14, wherein the method (400) further comprises sorting the indicators in the subset of the set of indicators based on a respective modified score and sorting the indicators not included in the subset of the set of indicators based on a respective score.

16. The method (400) of claim 14 or 15, wherein the method (400) further comprises modifying the determined score of each indicator in the subset of the set of indicators based on a weighting factor, wherein the weighting factor is indicative of an adjustable numeric value to be applied to a value of the expert factor of the respective indicator.

17. The method (400) of any one of the preceding claims 14 to 16, wherein the level of relevance of the respective indicator for the at least one previous anomaly comprises the respective indicator being related to the at least one previous anomaly in an expert diagnosis of the computer network system (100).

18. The method (400) of any one of the preceding claims 14 to 17, wherein the method (400) further comprises: obtaining the expert factor for each indicator in the subset of the set of indicators by querying a database storing an expert diagnosis of the computer network system (100); and generating a second database based on the expert factor obtained for each indicator in the subset of the set of indicators.

19. A computer product comprising a computer program for performing the method (400) of one of claims 14 to 18.