WO2022179996A1

WO2022179996A1 - Predicting an imminent occurrence of a malfunction on the basis of a log data analysis

Info

Publication number: WO2022179996A1
Application number: PCT/EP2022/054300
Authority: WO
Inventors: Andreas Wilke; Ilya Komarov; Manfred Paeschke; Julia BAUR
Original assignee: Bundesdruckerei Gmbh
Priority date: 2021-02-26
Filing date: 2022-02-22
Publication date: 2022-09-01
Also published as: EP4298521A1; DE102021104735A1

Abstract

The invention relates to a method for analyzing log data (122, 152, 182, 196) of a computer system (100). The method comprises: logging log data (122, 152, 182, 196); upon an occurrence of a malfunction (110), extracting the log data (122, 152, 182, 196) logged within a time interval (Δt) preceding the malfunction (110); determining, by means of a statistical analysis, a characteristic combination of features (112) which comprises one or more characteristic features of the extracted combination of log data (122, 152, 182, 196); storing an allocation (108) of the determined characteristic combination of features (112) to the malfunction (110); and monitoring the logged log data (122, 152, 182, 196), wherein the monitoring, upon logging of a combination of log data (122, 152, 182, 196) which has the stored characteristic combination of features (112), involves predicting an imminent occurrence of the malfunction (110).

Description

PREDICTING AN IMMEDIATE OCCURRENCE OF A MALFUNCTION BY ANALYZING LOG DATA

The invention relates to a method for analyzing log data, a computer system for analyzing log data and a distributed computer system which comprises a corresponding computer system for analyzing log data as a server.

In the course of increasing digitization, automation and networking in all areas of life and work, the data processing systems used for this purpose are becoming ever more complex and the data volumes to be processed are ever increasing. As a result, the corresponding systems become more error-prone. Malfunctions that negatively affect the performance of the systems can be subject to an interaction of different influencing factors and are difficult to reproduce, especially if they only occur sporadically. Consequently, the error diagnosis and consequently the

Make troubleshooting difficult. Nevertheless, the corresponding errors can have far-reaching consequences for the system if they occur.

The object of the invention is to create an improved method for predicting and avoiding malfunctions.

The object on which the invention is based is achieved in each case with the features of the independent patent claims. Embodiments of the invention are specified in the dependent claims.

Embodiments include a method for analyzing log data of a computer system. The procedure includes:

• logging of log data, the logging of the log data comprising storing log data in a database, the log data being stored in each case with a time stamp,

• upon the occurrence of a malfunction, extracting the log data logged within a time interval preceding the malfunction,

• determining a characteristic feature combination comprising one or more characteristic features of the extracted combination of log data using statistical analysis,

• Saving an assignment of the specific combination of characteristic features to the malfunction,

• Monitoring the logged log data, wherein the monitoring for logging a combination of log data having the stored characteristic feature combination includes predicting an impending occurrence of the malfunction.

Embodiments can have the advantage that a combination of one or more characteristic features can be determined based on the statistical analysis, which is characteristic for the occurrence of the malfunction. In other words, it can be determined which features occur before the malfunction that otherwise do not occur and therefore have a high probability of contributing to the causes of the malfunction. In the course of the statistical analysis, statistical methods are used to analyze the log data. For example, additional log data is extracted from other time intervals as reference data, and statistically conspicuous or significant differences are determined between the log data associated with the malfunction and the reference data. For example, an outlier detection is used here in order to find a combination of features that deviates from the reference data Reference data Time intervals with log data are selected which are similar to the log data related in time to the malfunction, but which are not directly related in time to a malfunction. Log data is not directly related in terms of time to a malfunction if no malfunction has occurred during the time interval in which the corresponding log data was logged and within a further predefined time interval thereafter. For example, pattern recognition can be used to select similar log data.

According to embodiments, a plurality of malfunctions are used for the statistical analysis. The malfunctions used are, for example, identical or similar malfunctions. For each of the malfunctions, log data is extracted that was logged within a time interval preceding the malfunction. A plurality of data sets with log data is thus provided, to which pattern recognition can be applied. Here, for example, matches between the data sets with log data are determined. For the determination of the characteristic feature combination, for example, matches are taken into account which are not or only rarely found in reference data records which are not directly related in terms of time to a malfunction.

Log data or log data is automatically logged data of all or specific actions of processes on a computer system. For example, all actions that are or could be necessary for later analysis are logged. For example, in addition to the logged action, the corresponding log data includes a time stamp with the date and time of the corresponding action. In log data analysis, the log data from a computer system over a certain period of time is examined according to specific criteria.

The log data log errors, warnings and information, for example. Errors are runtime errors that impede the functioning of an application, or unexpected program errors. Serious errors that lead to an application being terminated are also referred to as "fatals". Warnings include, for example, calls to obsolete interfaces, incorrect calls to interfaces, user errors or unfavorable program states. For example, the characteristic combination of features is a characteristic data pattern, such as a characteristic sequence of specific log data.

The characteristic feature combination is assigned to the malfunction and used for further monitoring of the logged log data. In the course of monitoring the log data, subsequently logged log data is checked to see whether the characteristic combination of features occurs, for example in the form of a characteristic sequence of certain log data. If an occurrence of the characteristic combination of features is detected, this can be used as a trigger for predicting an imminent occurrence of the malfunction.

According to embodiments, an occurrence of a characteristic part of the combination of features can be used as a trigger for predicting the imminent occurrence of the malfunction. For example, in the case of a characteristic sequence of specific log data, the beginning of the sequence can be used as a trigger in order to be able to predict as early as possible the occurrence of the malfunction.

For example, the time interval preceding the malfunction, whose logged log data is used to determine the characteristic combination of features, is varied. For example, a time interval immediately preceding the malfunction is used first. This time interval can then, for example, be lengthened or shortened and/or moved back in time by the functional disorder until a characteristic combination of features is found which has a sufficient difference, for example sufficient statistical significance.

Embodiments may have the advantage of enabling effective prediction of a malfunction occurrence.

A malfunction is understood here to mean a disruption in the intended operation of a computer system. Intended operation is the operation for which the computer system is technically designed and which it achieves under normal conditions. Operating parameters that describe the intended operation or regular operation of the computer system include, for example, performance parameters such as instructions per cycle, instructions per second, floating point operations per second, data transfer rate, data throughput, response time, response rate, frames per second, processor clock, latency or access time. Furthermore, the operating parameters include available software and hardware as well as physical condition parameters such as temperatures of components. Malfunction can take very different forms depending on the complexity of the system and include for example Malfunction, such as software or hardware errors, as well as deviations from the intended operating parameters.

According to embodiments, the malfunction is an error event. According to embodiments, the malfunction involves exceeding or falling below a predefined threshold value. For example, the predefined threshold value defines a minimum value for a line parameter, which should at least be met during regular operation of the computer system. For example, the predefined threshold value defines a maximum value for a load or capacity utilization of the computer system or individual components of the computer system, which should not be exceeded during regular operation of the computer system. For example, the predefined threshold value defines a maximum value for the temperature of the computer system or individual components of the computer system, which should not be exceeded during regular operation of the computer system.

According to embodiments, the computer system itself performs the log data analysis. According to embodiments, the computer system itself monitors the log data. According to embodiments, the database is a database of the computer system. Embodiments may have the advantage that the computer system itself logs the log data, analyzes it and performs log data monitoring using the analysis results.

According to embodiments, an analysis computer system, i.e. another computer system, performs the log data analysis. This can be the case, for example, if the computer system is a server of a distributed computer system that includes a plurality of servers. For example, one of the servers, acting as an analysis computer system, carries out the log data analysis for one, several and/or all individual computer systems or servers of the distributed computer system. According to embodiments, the database is a database of the computer system. The log data can be logged in the database, for example, locally on the individual servers by the corresponding servers, with the analysis computer system having access to the locally stored data. The log data can be logged in one or more central databases to which both the computer system or server logging the computer and the analysis computer system have access. According to embodiments, the database is a database of the analysis computer system.

According to embodiments, the computer system itself monitors the log data. For this purpose, for example, the analysis computer system sends the assignment to the computer system. Embodiments can have the advantage that the monitoring can be done locally. This can, for example, enable a timely local prediction of the upcoming malfunction. If necessary, timely local countermeasures can also be initiated in order to prevent or mitigate the malfunction and/or to prevent or mitigate adverse consequences of the malfunction.

The log data can also be monitored by the analysis computer system or independently by the individual servers. To do this, the computer system executing the analysis needs access to the log data to be analyzed. This access can include, for example, access to the database in which the log data is stored. For example, the computer system sends the extracted log data and/or other logged log data to the analysis computer system for log data analysis. According to embodiments, the analysis computer system monitors the log data of the computer system. For this purpose, for example, the computer system sends the log data to the analysis computer system, for example. To do this, the computer system executing the monitoring needs access to the log data to be monitored. This access can include, for example, access to the database in which the log data is stored in the course of logging. For example, the log data to be monitored is sent to the analysis computer system.

Embodiments may have the advantage that a specifically configured analysis computer system can be used to perform the log data analysis. In a distributed computer system, which comprises a plurality of individual computer systems, such as servers, one of the servers can, for example, perform log data analysis for the distributed computer system as an analysis computer system. The analysis computer system can use log data from several or all servers in the system for log data analysis. This can have the advantage, for example, that a statistical analysis is made possible across a number of servers. Furthermore, when monitoring, characteristic combinations of features across multiple servers can be taken into account and used to predict an imminent malfunction.

According to embodiments, the characteristic feature combination comprises characteristic features from extracted combinations of log data from a number of computer systems, such as servers. Such a characteristic combination of features can, for example, be the result of a statistical analysis of log data from a plurality of servers. Thus, for example, correlations between the log data of several servers can be determined and used in the form of the characteristic combination of features to predict imminent malfunctions. Corresponding correlations can be based, for example, on causal relationships between events are based, which occur on different servers. Corresponding correlations can be based, for example, on a causal connection between the malfunction and events occurring on different servers. For example, the malfunction is based on an interaction of the corresponding events. For example, the characteristic combination of features is a characteristic data pattern across multiple servers, such as a characteristic sequence of specific log data that is recorded on different servers.

According to embodiments, the statistical analysis includes determining one or more statistical parameters. For example, the statistical parameters include a mean value, a variance, a standard deviation, a correlation or a measure of connection and/or a frequency, such as an absolute or relative frequency. For example, the arithmetic, the geometric and the quadratic mean can be calculated as the mean value, which represents a characteristic value for the central tendency of a distribution. The variance or its square root, the standard deviation, is a measure of the spread of a distribution or a probability density around its center of gravity. A correlation or a measure of association, such as the covariance, provides a measure of the strength and possibly the direction of a relationship between two statistical variables.

According to embodiments, a warning is issued in response to the prediction of the impending malfunction. The warning can be issued, for example, on the computer system that performs the log data analysis, on the computer system on which the malfunction is imminent, and/or on several or all individual computer systems of a distributed computer system. For example, the warning is generated by the computer system that performs the log data analysis and sent to one or more other computer systems for output. The output can be visual or acoustic, for example, via an output device of a user interface. For example, the warning signal is output visually, for example on a display, or acoustically, for example via a loudspeaker. Embodiments may have the advantage of informing users of the upcoming malfunction. Thus, users can be prevented from being surprised by the occurrence of the malfunction. Rather, they may be enabled to take action to prevent and/or mitigate the dysfunction. The users can adapt to the malfunction and its consequences if necessary.

According to embodiments, when the malfunction occurs, countermeasures to be taken to avoid the malfunction are determined. An assignment of the specific combination of characteristic features to the countermeasures is made stored together with the assignment of the specific combination of characteristic features to the malfunction. In response to the prediction of the imminent malfunction, the countermeasures are automatically carried out. Embodiments can have the advantage that countermeasures to be carried out automatically can be stored. Automated fault compensation or fault elimination or error compensation or error correction can thus be implemented. For example, data streams can be redirected, instructions diverted or their execution delayed. For example, additional capacities can be added and/or processes can be outsourced. For example, execution of instructions can be blocked. For example, execution of certain instructions can be prioritized while execution of other instructions can be deferred.

According to embodiments, the assignment of the countermeasures to be executed is stored, for example, by the computer system. Embodiments can have the advantage that the countermeasures to be carried out are stored locally and are therefore also available locally for immediate execution if required. According to embodiments, the assignment of the countermeasures to be carried out is stored, for example, by the analysis computer system. The analysis computer system sends the countermeasures to be carried out, for example to those computer systems which are to carry out the corresponding countermeasures to be carried out. Embodiments can be particularly advantageous in the case of a distributed computer system with a plurality of servers, since the analysis computer system can, for example, determine server-specific countermeasures using the countermeasures stored and can send one or more of the servers to be executed. For example, the stored countermeasures include information about which server has to carry out which countermeasures or specify criteria that can be used to determine which server has to carry out which of the countermeasures. In the event of a malfunction in the course of data transmissions between two or more servers, for example, countermeasures can be stored for sending and/or receiving servers, with it being possible to specify which of the countermeasures are to be carried out by sending servers and which countermeasures are to be carried out by receiving servers.

According to embodiments, the countermeasures to be taken are assigned to the malfunction, via which they are indirectly assigned to the specific combination of characteristic features. Embodiments can have the advantage that, for example, different combinations of characteristic features can lead to the same malfunction. However, the malfunction can, for example, make the same countermeasures necessary in each of these cases. For example, using the impending malfunction, the countermeasures to be taken are identified.

According to embodiments, the countermeasures to be taken are directly assigned to the specific combination of characteristic features. For example, different combinations of characteristic features can lead to the same malfunction. The malfunction can have different causes in different cases, for example, which are each characterized by a different combination of features. For example, different causes may require different countermeasures, although without the countermeasures, the different causes each result in the same malfunction. Embodiments can have the advantage that the countermeasures to be carried out can be identified on the basis of the specific combination of characteristic features. Different countermeasures to be carried out can be identified for different characteristic feature combinations, although the different characteristic feature combinations are assigned the same malfunction.

According to embodiments, the countermeasures to be executed comprise program instructions to be executed. According to embodiments, the countermeasures to be executed comprise program instructions to be executed by the computer system. In the case of a distributed computer system, the countermeasures to be carried out include, for example, program instructions to be carried out by one or more other computer systems or servers of the distributed computer system. Embodiments can have the advantage that, for example, the stored program instructions are called up and executed for the automatic execution of the countermeasures. These program instructions can provide program routines for automated fault compensation or fault elimination or error compensation or error correction. For example, in the course of executing the corresponding program routines, sources of errors are eliminated and/or dependent processes are stopped.

According to embodiments, a first tolerance range is assigned to the features of the characteristic feature combination. A logged combination of log data has the stored characteristic feature combination if it has the features according to the characteristic feature combination and these features lie within the assigned first tolerance ranges. Embodiments can have the advantage that possible deviations or fluctuations within the characteristic combination of features can also be taken into account, which nevertheless lead to the same malfunction. According to embodiments, a second tolerance range is assigned to the features of the characteristic feature combination. It is assumed that a logged combination of log data has the stored characteristic feature combination if it has a predetermined minimum number of features of the characteristic feature combination and these features are each within the assigned second tolerance ranges. Embodiments can have the advantage that, in the course of monitoring the logged log data, an impending occurrence of the malfunction can also be predicted in the event that the logged log data does not have all the features of the characteristic feature combination, i.e. the deviations or There are fluctuations in the characteristics themselves.

According to embodiments, the first tolerance range is identical to the second tolerance range for the same feature.

According to embodiments, the first tolerance ranges are respectively larger than the second tolerance ranges for one or more features. According to embodiments, the first tolerance range is greater than the second tolerance range for the same feature. Embodiments may have the advantage that in the case of detecting fewer characteristic features or indicators in the logged log data, more stringent requirements are set for a positive prediction that a malfunction is imminent than in the case of a greater number of characteristics or indicators for the impending occurrence of the functional disorder. If characteristics or indicators for the impending occurrence of the malfunction are detected, larger tolerance ranges can be selected for these, for example. In other words, given a sufficiently large number of features or indicators, it can be assumed that the interference function is imminent, even if some of the features or indicators deviate more than others.

According to embodiments, storing the log data includes normalizing the log data. According to embodiments, the normalizing satisfies sixth normal form. Embodiments can have the advantage that redundancies can be avoided. Embodiments can have the advantage that a chronological classification of the log data is taken into account.

According to embodiments, the log data can be stored in the form of relations or equivalent structures. A relation is understood here in the sense of relational database theory as a set of tuples. A tuple is a set of attribute values. An attribute denotes a data type or one or more data associated property. The number of attributes determines the degree, the number of tuples determines the cardinality of a relation.

A normalization, in particular a normalization of a relational data model, is understood to mean a division of attributes into a plurality of relations according to a normalization rule, so that redundancies are reduced or minimized. A relational data model can be implemented, for example, in table-like data structures in which the relations are implemented in the form of tables, the attributes in the form of table columns and the tuples in the form of table rows.

Data redundancies entail the risk that changes to data that are included several times can lead to inconsistencies and anomalies. Furthermore, the memory requirement increases unnecessarily due to redundancies. Such redundancies can be reduced or minimized by normalization. A relational data model can be brought into a normal form, for example, by progressively breaking down the relations of the data schema into simpler relations based on the functional dependencies that apply to the corresponding normal form.

For example, the following normal forms can be distinguished: 1st normal form (INF), 2nd normal form (2NF), S. normal form (SNF), Boyce-Codd normal form (BCNF), 4th normal form (4NF), 5th normal form ( 5NF), 6th normal form (6NF).

The normalization criteria increase from normal form to normal form and include the normalization criteria of the previous normal forms, i.e.

INF c 2NF C BNF C ßCNF C 4NF C 5NF C 6NF.

A relation is in first normal form if each attribute of the relation has an atomic range and the relation is free of repeating groups. Here, atomic is understood to mean the exclusion of composite, set-valued or nested value ranges for the attributes, i.e. relational attribute value ranges. A freedom from repeating groups requires that attributes that contain the same or similar information are outsourced to different relations.

A relation is in second normal form if it satisfies the requirements of first normal form and no nonprimary attribute depends functionally on a proper subset of a candidate key. A non-primary attribute is an attribute that is not part of a key candidate. This means that each non-primary attribute depends on all whole keys and not just on a part of a key. Relations in first normal form whose key candidates are not composite but consist of each consist of a single attribute, therefore automatically fulfill the second normal form. A key candidate is understood here to be a minimal set of attributes that uniquely identifies the tuples of a relation.

A relation is in third normal form if it satisfies the requirements of second normal form and no non-key attribute transitively depends on a candidate key. An attribute is transitively dependent on a candidate key if the corresponding attribute depends on the corresponding candidate key via another attribute.

A relation is in Boyce-Codd normal form if it satisfies the requirements of third normal form and every determinant is a super key. A determinant is understood here as a set of attributes on which other attributes are functionally dependent. A determinant thus describes the dependency between attributes of a relation and determines which sets of attributes determine the value of the other attributes. A super key is a set of attributes in a relation that uniquely identify the tuples in that relation. Consequently, the attributes of this set always include different values for pairs of tuples selected. A key candidate is therefore a minimal subset of the attributes of a super key, which enables the tuples to be identified.

A relation is in fourth normal form if it satisfies the requirements of Boyce-Codd normal form and has no non-trivial multi-valued dependencies.

A relation is in fifth normal form if it satisfies the requirements of fourth normal form and has no multivalued dependencies that are dependent on each other. Fifth normal form is thus given if every non-trivial join dependency is implied by the key candidates. A join dependency is implied by the candidate keys of the source relation if each relation of the set of relations is a super key of the source relation.

A relation is in sixth normal form if it satisfies the requirements of fifth normal form and has no nontrivial join dependencies.

A relation satisfies a join dependency on a plurality of relations if the relation as the initial relation can be broken down into the corresponding set of relations without loss. The join dependency is trivial if one of the relations of the set of relations has all the attributes of the original relation. According to embodiments, the database is a multi-model database with a multi-model database management system that uses a plurality of data models to store the log data. For example, the log data is stored in a first document-oriented data model. A document-oriented data model means that the data model does not make any structural specifications for the data to be stored. Rather, the data is stored in documents or data containers in the form in which it was received. In this sense, the data stored in the document-oriented data model is raw data. Raw data means that the data is stored in the form in which it is received, without any additional data processing by the database management system, in particular no restructuring of the data. Embodiments can have the advantage that the entire information content of the received data can be retained (almost) completely without the assumptions of the database management system being included. This means that the original database can be accessed at any time and taken into account in further processing. Based on this data pool of raw data, which the document-based data model provides, the data is normalized and an index is generated. This index is, for example, a content-based multi-level index structure. This index represents a second data model, which has the sixth normal form, for example. In this way, all fields and field contents can be transferred from the first data model to the normalized second data model without redundancy, which example has the form of a multidimensional key/value store (key/value store) or a multidimensional key-value database.

For example, the transaction time and validity time of the data records are also stored bitemporally. The transaction time indicates the point in time at which a data object in the database is changed. The validity time specifies a point in time or a period of time in which a data object in the modeled image of the real world has the described state. If both validity and transaction time are relevant, it is called bitemporal. For each data record, not only the status of the data record at the last transaction or change is visible, but also its history. In this case one speaks of a bitemporal database, in which both the validity and the transaction time of the data records are taken into account.

A key-value data model enables storage, retrieval and management of associative data fields. Values are clearly identified using a key. Embodiments can have the advantage that the log data can be stored in both data models and made available for analysis.

According to embodiments, the computer system is a first server of a distributed computer system that includes a plurality of servers. Log data is logged on each of the servers. The logged log data is monitored. Embodiments may have the advantage of being able to predict malfunctions on a distributed computing system. For example, the log data can be monitored locally on the individual servers or centrally.

According to embodiments, the specific combination of characteristic features is assigned to the malfunction by the first server.

According to embodiments, the specific combination of characteristic features is assigned to the malfunction by the first server. The first server also monitors the log data logged by the servers of the server group. Upon logging a combination of log data, which has the stored combination of characteristic features, an imminent occurrence of the malfunction is predicted.

The resulting assignment is forwarded from the first server to a server group. The server group includes one or more other servers of the plurality of servers. The servers of the server group each store the forwarded assignment. Monitoring is done locally on the servers in the server group. The monitoring of log data by the servers of the server group includes a prediction of an impending occurrence of the malfunction by the corresponding server in each case after a combination of log data which has the stored characteristic feature combination is logged.

According to embodiments, the server group includes all servers of the computer system in addition to the first server.

According to embodiments, when the malfunction occurs in one of the servers of the distributed computer system, log data logged by the servers of the server group are extracted within the time interval preceding the malfunction. The characteristic feature combination is determined by the first server using the extracted combination of log data of the server group and using a statistical analysis across the servers of the server group.

The assignment of the specific characteristic feature combination to the Malfunction is propagated to the servers of the server group. The servers of the server group each store the forwarded assignment. The monitoring of log data by the servers of the server group includes a prediction of an impending occurrence of the malfunction by the corresponding server in each case after a combination of log data which has the stored characteristic feature combination is logged.

According to embodiments, the log data analysis is additionally performed using log data from the first server.

According to embodiments, one or more first identifiers are also determined, which features include one or more servers on which the malfunction occurs, with an assignment of the specific characteristic feature combination to the identifier being stored together with the assignment of the specific characteristic feature combination to the malfunction .

Embodiments can have the advantage that, in addition to the characteristic combination of features, identifiers can be determined, based on which the servers of the distributed system on which the malfunction occurs can be determined. The identifiers can be features of the feature combination, for example, which can be used to determine the corresponding server. For example, using the identifiers, those servers can be identified as servers on which the disruption function is imminent, which have a specific feature of the feature combination, i.e. on which specific log data has been logged.

According to embodiments, the first identifiers are assigned to the malfunction, via which they are indirectly assigned to the specific combination of characteristic features. Embodiments can have the advantage that, for example, different combinations of characteristic features can lead to the same malfunction. However, in each of these cases, for example, the malfunction can occur with one or more servers that have the same characteristics or are identified by the same identifiers.

According to embodiments, the first identifiers are assigned directly to the specific combination of characteristics. Embodiments can have the advantage that, for example, different combinations of characteristic features can lead to the same malfunction. The malfunction can, for example, have different causes in different cases, each of which is caused by a different combinations of features are characterized. Different causes can lead, for example, to the fact that the malfunction occurs in one or more servers, which have different characteristics depending on the respective cause.

According to embodiments, in response to the prediction of the impending malfunction, one or more servers which are experiencing the malfunction are determined using the identifiers and an alert is issued for the specific servers, respectively. For example, the servers are determined centrally and the alerts are sent to the appropriate servers for delivery.

According to embodiments, when the malfunction occurs, countermeasures to be taken to avoid the malfunction are determined for one or more servers. One or more second identifiers are determined, which include features of the corresponding server on which the countermeasures are to be carried out, with an assignment of the specific characteristic feature combination to the countermeasures and the second identifier being stored together with the assignment of the specific characteristic feature combination to the malfunction becomes. Upon the prediction of the impending malfunction, the countermeasures are automatically executed on the servers identified by the second identifiers.

Embodiments can have the advantage that additional identifiers can be determined, based on which the servers of the distributed system can be determined on which the countermeasures against the malfunction are to be executed. These servers, which at least partially cause the malfunction or contribute to it, differ, for example, from the servers on which the malfunction to be prevented is imminent. For example, the second identifiers therefore differ from the first identifiers. The identifiers can be features of the feature combination, for example, which can be used to determine the corresponding servers on which the countermeasures are to be carried out. For example, the identifiers can be used to identify those servers that have a specific feature of the feature combination, i.e. on which specific log data was logged.

According to embodiments, the countermeasures to be taken and second identifiers are assigned to the malfunction, via which they are indirectly assigned to the specific combination of characteristic features. Embodiments can have the advantage that, for example, different characteristics Combinations of features can lead to the same dysfunction. However, in each of these cases, the malfunction can occur, for example, in one or more servers that have the same characteristics.

According to embodiments, the countermeasures to be taken and identifiers are assigned directly to the specific combination of characteristic features. Embodiments can have the advantage that, for example, different combinations of characteristic features can lead to the same malfunction. However, the malfunction can, for example, have different causes in different cases, which are each characterized by a different combination of features. Different causes can lead, for example, to the fact that the malfunction occurs in one or more servers, which have different characteristics depending on the respective cause.

According to embodiments, countermeasures are to be performed on the servers where the malfunction occurs, which is why the second identifiers are identical to the first identifiers. According to embodiments, countermeasures are to be carried out on servers on which no malfunction occurs but which cause or contribute to the malfunction. In this case, the second identifiers differ from the first identifiers, for example.

Embodiments further include a computer system having a processor and a memory, the memory storing program instructions. Execution of the program instruction by the processor causes the processor to control the computer system such that the computer system performs a method of analyzing log data. The procedure includes:

• monitoring the logged log data, the monitoring for logging a combination of log data that contains the stored has a characteristic combination of features, includes predicting an imminent occurrence of the malfunction.

According to embodiments, the computer system is configured to execute any of the previously described embodiments of the method for analyzing log data.

According to embodiments, the log data is log data from the computer system itself. According to embodiments, the log data is log data from another computer system, which the computer system receives or to which the computer system has access and which analyzes the first computer system. For example, the malfunction occurs on the computer system.

For example, the malfunction occurs on another computer system that is connected to the computer system. If an imminent occurrence of the malfunction is predicted, the corresponding prediction and/or a warning is sent to the further computer system.

Embodiments further include a distributed computing system including a plurality of servers. A first server of the plurality of servers is the computer system of one of the previously described embodiments. Log data is logged on each of the servers and the logged log data is monitored.

According to embodiments, the specific characteristic combination of features is assigned to the malfunction by the first server and is forwarded by the first server to a server group with one or more other servers from the plurality of servers. The servers of the server group each store the forwarded assignment. The monitoring of log data by the servers of the server group includes, in each case upon logging of a combination of log data which has the stored characteristic combination of features, by the corresponding server predicting an impending occurrence of the malfunction.

According to embodiments, the specific characteristic feature combination is assigned to the malfunction, as is the monitoring of log data Servers of the server group by the first server. When a combination of log data, which has the stored characteristic feature combination, is logged by the first server, an impending occurrence of the malfunction is predicted. For example, the prediction is sent to the servers that are about to experience the malfunction.

According to embodiments, when the malfunction occurs in one of the servers of the distributed computer system, log data logged by the servers of the server group are extracted within the time interval preceding the malfunction. The characteristic feature combination is determined by the first server using the extracted combination of log data from the server group. The characteristic feature combination is determined using a statistical analysis across the servers of the server group. The allocation of the specific characteristic combination of features to the malfunction is forwarded to the server of the server group. The servers of the server group each store the forwarded assignment. The monitoring of log data by the servers of the server group includes a prediction of an impending occurrence of the malfunction by the corresponding server in each case after a combination of log data which has the stored characteristic feature combination is logged.

According to embodiments, when the malfunction occurs in one of the servers of the distributed computer system, log data logged by the servers of the server group within the time interval preceding the malfunction is extracted, for example from the first server. The first server determines the characteristic feature combination using the extracted combination of server group log data. The characteristic feature combination is determined using a statistical analysis across the servers of the server group. The allocation of the specific combination of characteristic features to the malfunction is stored by the first server. The monitoring of log data of the servers of the server group by the first server comprises in each case a logging of a combination of log data, which has the stored characteristic feature combination, by the corresponding server predicting an impending occurrence of the malfunction. For example, the first server has access to the log data logged by the servers in the server group. For example, in the course of logging, the log data is stored in databases to which the first server has access and/or is sent from the servers in the server group to the first server. A "database" is understood here as a stored amount of data. The amount of data can be structured, for example according to a structure specified for the database. Furthermore, a "database management system" or data management software can be provided for managing the data in the database will. A "database management system" is understood here to mean data management software running on a computer system for storing and retrieving data in a database. For example, the database management system specifies the structure to be used for storing the data. Depending on the data management software used, the data can be stored in different forms or using different structures For example, the data will be stored in data sets each consisting of a number of data fields.

A "processor" is understood here and in the following to mean a logic circuit that is used to execute program instructions. The logic circuit can be implemented on one or more discrete components, in particular on a chip. A processor includes, for example, an arithmetic unit, a control unit, registers and Data lines for communication with other components. In particular, a “processor” is understood to mean a microprocessor or a microprocessor system made up of a number of processor cores and/or a number of microprocessors.

A "memory" is understood here to mean both volatile and non-volatile electronic memories or digital storage media.

A "non-volatile memory" is understood here as an electronic memory for the permanent storage of data, in particular static cryptographic keys, attributes or identifiers. A non-volatile memory can be configured as a non-modifiable memory, which is also known as a read-only memory (ROM ) is referred to, or as changeable memory, which is also referred to as non-volatile memory (NVM). In particular, this can be an EEPROM, for example a flash EEPROM, referred to as flash for short. A non-volatile memory is characterized characterized in that the data stored on it are retained even after the power supply has been switched off.

An “interface” or “communications interface” is understood here to mean an interface via which data can be received and sent, with the communication interface being able to be configured as contact-based or contactless. A communication interface can, for example, enable communication via a network. Depending on the configuration, a communication interface can, for example, be wireless communication based on a mobile radio standard, Bluetooth, RFID, WiFi and/or Provide NFC standard. Depending on the configuration, a communication interface can provide cable-based communication, for example.

Communication can take place, for example, via a network. A "network" is understood here to mean any transmission medium with a connection for communication, in particular a local connection or a local network, in particular a local area network (LAN), a private network, in particular an intranet, and a digital private network (Virtual Private Network - VPN). For example, a computer system can have a standard wireless interface for connecting to a WLAN. It can also be a public network, such as the Internet. Depending on the embodiment, this connection can also be established via a mobile network .

Furthermore, embodiments of the invention are explained in more detail with reference to the drawings. Show it:

Figure 1 is a schematic diagram of a computer system for analyzing log data,

FIG. 2 shows a schematic diagram of a distributed computer system with a server for analyzing log data,

FIG. 3 shows a schematic diagram of a distributed computer system with a server for analyzing log data,

Figures 4 shows a schematic diagram of a log data analysis,

Figure 5 is a flow chart of an exemplary method for analyzing log data and

FIG. 6 shows a flowchart of an exemplary method for monitoring log data.

Elements of the following embodiments that correspond to one another are identified with the same reference numbers.

Figure 1 shows a computer system 100 for analyzing log data 122. The computer system 100 includes a processor 102, a memory 106 and a communication interface 118. The processor 102 is configured to execute program instruction 104 computer system 100 to analyze log -Data 122 to control. The computer system 100 logs log data 122. The log data 122 is stored in a database 120. FIG. Computer system 100 has access to database 122. For example, computer system 100 includes database 120. For example, database 120 is an external or remote database. The logged log data 122 can be log data of the computer system 100 and/or are log data from one or more other computer systems, such as servers. The log data 122 log errors, warnings and information, for example, which are recorded, for example, by an operating system and/or an analysis program of the computer system logging the log data 122 . For example, the log data 122 are each logged with a time stamp. Furthermore, the log data 122 may include data collected using one or more sensors 116 of the computer system 100 to monitor the operation of the computer system 100 . Sensors 116 may be configured to sense temperatures, voltages, or currents, for example.

Upon the occurrence of a malfunction, the computer system 100 extracts from the database 120 those log data which were logged within a time interval Δt preceding the malfunction. The corresponding log data 122 within the time interval Δt is identified, for example, using its timestamp. The computer system 100 determines a characteristic feature combination 112, which comprises one or more characteristic features of the extracted combination of log data. The characteristic feature combination 112 includes, for example, a characteristic combination and/or sequence of specific log data from the extracted combination of log data, which is characteristic of the extracted combination of log data. These characteristic log data form, for example, the characteristic features of the characteristic feature 112. For example, the order or chronological sequence of the characteristic log data can also be characteristic for the extracted combination of log data. For example, the log data of the characteristic combination of features 112 only become characteristic based on their sequence or chronological sequence. The characteristic feature combination 112 is determined, for example, using a statistical analysis. The statistical analysis can be used, for example, to determine which log data or sequence of log data includes the extracted combination of log data that does not appear in the otherwise logged log data. If log data or a sequence of log data deviate in a statistically significant manner from log data that has been logged up to now without a malfunction having occurred, and which is therefore to be expected for the regular operation of computer system 100 , there is a high probability of a connection between the deviating log data or the deviating sequence of log data and the malfunction that occurs.

The computer system 100 creates an association between the specific combination of features 112 and the malfunction 110 that has occurred. The association 108 is stored and used as a comparison data set for predicting an impending reoccurrence of the malfunction 110 . The assignment can be stored in the memory 106 computer system 100 or in the database 120, for example. Further log data 122, which are logged in the database 120, are continuously monitored by the computer system 100 or another computer system which has the assignment 108 and/or which has access to the assignment 108. When the characteristic combination of features 112 occurs in the logged log data 122, an imminent occurrence of the associated malfunction 110 is predicted. For example, upon the prediction of the impending malfunction, a warning is output via the communication interface of the computer system 100 to other computer systems which are directly affected by the malfunction or indirectly affected when the malfunction occurs on the computer system 100 . For example, the other computer system is an admin computer system that is assigned to an administrator of computer system 100 . For example, the warning is output via a user interface of computer system 100 to an output device of computer system 100, such as a display.

Furthermore, countermeasures 114 to be carried out upon the occurrence of the malfunction 110 in order to avoid or limit the malfunction 110 can be added to the allocation 108 in a fixed manner. For example, the countermeasures include executable program instructions that are to be executed to avoid or limit the malfunction 110 . The countermeasures 114, for example by the computer system 100 and/or other computer systems, are carried out automatically in response to the prediction of the imminent malfunction 110. The countermeasures include, for example, blocking the execution of expected and potentially problematic instructions, delaying the execution of the corresponding instructions and/or outsourcing the execution of the corresponding instructions to an alternative component of the computer system 100 or an alternative computer system.

FIG. 2 shows a distributed computer system 198 with a server 100 for analyzing log data 152, 182. The server 100 is, for example, the computer system from FIG. 1. For example, the computer system 100 itself does not record any log data. For example, the computer system 100 itself does not record any log data. The analyzed log data 152, 182 is log data from servers 130, 160 of a server group 190 with a plurality of N servers in the distributed computer system 198, where N is a natural number greater than 1. The servers 130, 160 of the server group 190 each include, for example, a processor 132, 162 for executing program instructions 134, 164, a memory 136, 166, and a communication interface 140, 170. The servers 130, 160 are configured, for example, to log to log data 152, 182 in a database 150, 180. To capture the log data 152, 182 can the servers ISO, 160 additionally include one or more sensors 138, 168, for example.

The servers 150, 160 of the server group 190 communicate, for example, via a network 192 with one another and with the first server 100. The network is, for example, a public network, such as the Internet, or a private network, such as an intranet and /or an internal communication network of the distributed computer system 198.

If a malfunction occurs on one or more of the servers 150, 160 of the server group 190, a fault message is sent to the first server 100, for example. The error report indicates, for example, the type and time of the malfunction that occurred and the server or servers affected by the malfunction. Upon receipt of the error message, the first server 100 asks for log data to be extracted from the databases 150, 180, which were logged within a time interval At preceding the malfunction. At his request, the first server 100 receives the extracted log data and determines a characteristic combination of features 112. For this purpose, the first server 100 uses, for example, a statistical analysis of the servers 150, 160 of the server group 190 or the log recorded by them -Data on. For example, further data sets with server log data are used for statistical evaluation

150. 160 of the server group 190 to determine the log data to be expected when the servers 150, 160 are in regular operation . For example, the information stored is updated regularly. The first server 100 creates, for example, an association 108 between the specific characteristic feature combination 112 and the malfunction 110 that has occurred. Furthermore, countermeasures against the malfunction that has occurred 110 can be defined and added to the association 108, for example.

The first server sends the assignment 108, for example, to the servers 150, 160 of the server group 190, which use the characteristic combination of features 112 to monitor the log data 152, 182 logged by them. If the characteristic combination of features 112 occurs in the logged log data 152, 182, an impending occurrence of the malfunction 110 is predicted. For example, the server 130, 160 predicting the malfunction 110 sends a warning about the impending malfunction 110 to the other servers in the server group 190 and/or to the first server 100. The server predicting the malfunction 110 also leads

130. 160 For example, one or more countermeasures defined by the mapping 108 took 114 out. Additionally, one or more of the alert recipients Servers of the server group 190 and/or the first server 100 also execute one or more countermeasures 114 defined by the association 108 .

FIG. 3 shows a distributed computer system 198 with a server 100 for analyzing log data 152, 182, the structure and function of which is analogous to the distributed computer system 198 of FIG. The difference from the distributed computer system 198 in FIG. 2 is that the log data 152, 182 of the servers 150, 160 of the server group 190 are stored in a central database 194, to which the first server 100, for example, has access. Upon receipt of a fault report from one of the servers 150, 160 of the server group 190, the first server 100 can therefore retrieve log data from the individual servers 150, 160, which were logged within a time interval At preceding the malfunction, from the central database 194 extract. The first server 100 determines the characteristic combination of features 112 using the extracted log data and creates the assignment 108 between the characteristic combination of features 112, the malfunction 110 and, if necessary, countermeasures ll4 against the malfunction 110. Furthermore, for example, the first server 100 monitors the central database 194 logged log data 152, 182. If the characteristic combination of features 112 occurs in the logged log data 152, 182, an impending occurrence of the malfunction 110 is predicted by the first server 100. For example, the first server 100 sends a warning about the impending malfunction 110 to the servers 130, 150 of the server group 190. Furthermore, the first server 100 causes, for example, one or more countermeasures 114 defined by the assignment 108 to be carried out by one or more servers 130, 150 of the server group 190 and/or through the server 100.

Figures 4A through 4C show an exemplary log data analysis. The upper diagram of FIG. 4A shows a chronological sequence of logged log data 196 of types "A", "B", "C" and "D". For example, the time is plotted on the x-axis, while the types of log data are plotted on the y-axis, for example. For example, a sequence "BABADCDBA" is logged, on which the occurrence of a malfunction 110 at time ts is recorded or logged Extracted log data The extracted log data of the time interval Δt are shown as an example in FIG. For example, the log data of type “A”, “B” is log data that occurs or is logged frequently, without a malfunction occurring. FIG. 4C shows an exemplary sequence of log data 196 of type “A”, “B”, such as frequently occurs in the logged log data within a time interval At. This frequently The sequence of log data that occurs is thus, for example, not characteristic of the extracted log data. Rather, the remaining sequence of log data of type "D", "C" is characteristic of the extracted log data. As shown in FIG. 4A, this sequence is determined as a characteristic combination of features 112 with the sequence "DCD". has a be protruding malfunction 110 can occur. For example, countermeasures can be assigned to the characteristic feature combination 112 . For example, it can be determined that an impending occurrence of malfunction 110 is already predicted when a log data sequence "DC" is present and the countermeasures block, delay and/or outsource the execution of the action marked with log date D to another system component for execution.

Figure 5 shows an exemplary method for analyzing log data. In block 200, log data is logged. In block 202 a malfunction is detected, upon the detection of which in block 204 log data is extracted from the logged log data which was logged within a time interval Δt preceding the malfunction. In block 206, a feature combination characteristic of the occurrence of the functional disorder is determined in the extracted log data, and in block 208 an assignment of the characteristic feature combination to the functional disorder detected in block 202 is created. A statistical analysis, for example, is used to determine the characteristic combination of features. In block 210, the mapping created is stored for monitoring future logged log data. In block 212 logged log data is monitored.

FIG. 6 shows an exemplary method for examining log data using a map created by a log data analysis method such as the method shown in FIG. In block 300 logged log data is monitored. In block 302 it is checked whether the logged log data includes the characteristic combination of features according to the assignment provided and/or according to an assignment from a plurality of assignments provided. If the characteristic combination of features is not detected, the monitoring of the log data in block 300 continues unchanged. If the characteristic combination of features is detected, the method continues in block 304 . In block 304, an impending occurrence of the malfunction is predicted, which is assigned to the detected combination of characteristic features.

In block 306, for example, a warning about the upcoming function is issued. In block 306, for example, stored countermeasures are executed, which are also assigned to the characteristic combination of features and/or the predicted malfunction. reference list

100 computer system

102 processor

104 program instructions

106 memory

108 assignment

110 Malfunction

112 combination of features

114 countermeasures

116 sensors

118 communication interface

120 database

122 log data

130 servers

132 processor

134 program instructions

136 memory

138 sensors

140 communication interface

150 database

152 log data

160 servers

162 processor

164 program instructions

166 memory

168 sensors

170 communication interface

180 database

182 log data

190 server group

192 network

194 database

196 log date

198 distributed computer system

Claims

P atentClaims

A method for analyzing log data (122, 152, 182, 196) of a computer system (100), the method comprising:

• Logging of log data (122, 152, 182, 196), wherein logging of the log data (122, 152, 182, 196) includes storing log data (122, 152, 182, 196) in a database (120, 150, 180, 194), wherein the log data (122, 152, 182, 196) are each stored with a time stamp,

• upon the occurrence of a malfunction (110), extracting the log data (122, 152, 182, 196) logged within a time interval (At) preceding the malfunction (110),

• determining a characteristic feature combination (112) comprising one or more characteristic features of the extracted combination of log data (122, 152, 182, 196) using a statistical analysis,

• Saving an assignment (108) of the specific combination of characteristic features (112) to the malfunction (110),

• monitoring the logged log data (122, 152, 182, 196), the monitoring for logging a combination of log data (122, 152, 182, 196) that reflects the stored characteristic feature combination (112). comprises predicting an impending occurrence of the malfunction (110).

2. The method as claimed in claim 1, wherein a warning is issued in response to the prediction of the impending malfunction (110).

3. The method according to any one of the preceding claims, wherein countermeasures (114) to be carried out to avoid the functional disturbance (110) to be carried out when the functional disturbance (110) occurs are defined, wherein together with the assignment (108) of the specific characteristic feature combination (112) an assignment of the specific characteristic feature combination (112) to the countermeasures (114) is stored for the malfunction (110), the countermeasures (114) being automatically carried out on the prediction of the imminent malfunction (110).

4. The method according to claim 3, wherein the countermeasures (114) to be carried out are assigned to the malfunction (110) via which they are indirectly assigned to the specific characteristic combination of features (112).

5. The method according to claim 3, wherein the countermeasures (114) to be carried out are assigned directly to the specific characteristic combination of features (112).

6. The method according to any one of the preceding claims, wherein a first tolerance range is assigned to each of the features of the characteristic feature combination (112), with a logged combination of log data (122, 152, 182, 196) containing the stored characteristic feature combination (112 ) if it has the features according to the characteristic feature combination (112) and these features are each within the assigned first tolerance ranges.

7. The method according to any one of the preceding claims, wherein features of the characteristic feature combination (112) are each assigned a second tolerance range, it being assumed that a logged combination of log data (122, 152, 182, 196) contains the stored characteristic feature combination (112) if it has a predetermined minimum number of features of the characteristic feature combination (112) and these features are each within the associated second tolerance ranges.

8. The method according to claim 7, wherein the first tolerance range is identical to the second tolerance range for the same feature, or wherein the first tolerance range is greater than the second tolerance range for the same feature.

9. The method according to any one of the preceding claims, wherein the malfunction (110) is an error event.

10. The method according to any one of the preceding claims, wherein the malfunction (110) is a predefined threshold value being exceeded or not reached.

11. The method of any preceding claim, wherein storing the log data (122, 152, 182, 196) comprises normalizing the log data (122, 152, 182, 196).

12. The method of claim 11, wherein the normalizing satisfies sixth normal form.

13. The method according to any one of the preceding claims, wherein the computer system (100) is a first server of a distributed computer system (198) comprising a plurality of servers (100, 130, 160), on each of the servers (100, 1B0, 160) log data (122, 152, 182, 196) are respectively logged, the logged log data (122, 152, 182, 196) being monitored.

14. The method according to claim 13, wherein the assignment (108) of the specific characteristic feature combination (112) to the malfunction (110) is carried out by the first server (100) and from the first server (100) to a server group (190) forwarded to one or more further servers (130, 160) of the plurality of servers (100, 130, 160), the servers (130, 160) of the server group (190) each storing the forwarded assignment (108), wherein the servers (130, 160) of the server group (190) monitoring log data (152, 182, 196) for logging, respectively, of a combination of log data (152, 182, 196) indicative of the stored characteristic combination (112) comprises, by the corresponding server (130, 160), predicting an impending occurrence of the malfunction (110).

15. The method according to claim 13, wherein upon the occurrence of the malfunction (110) in one of the servers (100, 130, 160) of the distributed computer system (198), the servers (100, 130, 160) of the server group (190 ) log data (122, 152, 182, 196) logged within the time interval (At) preceding the disturbance function are extracted, with the determination of the characteristic combination of features (112) using the extracted combination of log data (152 , 182, 196) of the server group (190), the characteristic combination of features (112) being determined using a statistical analysis of the servers (130, 160) of the server group (190), the assignment (108) of the certain characteristic feature combination (112) for the malfunction (110) is forwarded to the servers (130, 160) of the server group (190), the servers (130, 160) of the server group (190) receiving the forwarded assignment (108) in each case sp calibrate, wherein the monitoring of log data (152, 182, 196) by the servers (130, 160) of the server group (190) each for logging a combination of log data (122, 152, 182, 196), which has the stored characteristic combination of features (112), through the corresponding server (130, 160) includes a prediction of an impending occurrence of the malfunction (110).

16. The method as claimed in claim 15, wherein the log data analysis is additionally carried out using log data (122) from the first server (100).

17. The method according to any one of claims 15 to 16, wherein one or more first identifiers are also determined, which features include one or more servers (130, 160) on which the malfunction (110) occurs, wherein together with the assignment (108 ) of the specific characteristic feature combination (112) to the Malfunction (110) an assignment (108) of the specific characteristic feature combination (112) to the identifiers is stored.

18. The method according to claim 17, wherein on the prediction of the imminent func on disturbance (110) out using the identifier one or more servers (130, 160) are determined in which the malfunction (110) occurs, and a warning for each given server (130, 160).

19. The method according to any one of claims 15 to 18, wherein countermeasures (114) to be carried out to avoid the malfunction (110) for one or more servers (130, 160) are defined when the malfunction (110) occurs, with one or several second identifiers are determined, which include features of the corresponding servers (130, 160) on which the countermeasures (114) are to be carried out, together with the assignment (108) of the specific characteristic feature combination (112) to the malfunction (110) an assignment (108) of the specific characteristic combination of features (112) to the countermeasures (114) and the second identifier is stored, with the countermeasures (114) on the servers identified by the second identifiers based on the prediction of the impending malfunction (110). (130, 160) are executed automatically.

20. Computer system (100) with a processor (102) and a memory (106), wherein program instructions (104) are stored in the memory, wherein executing the program instruction (104) by the processor (102) the processor (102) to do so causes the computer system (100) to control such that the computer system (100) performs a method of analyzing log data (122, 152, 182, 196), the method comprising:

• logging of log data (122, 152, 182, 196), the logging of the log data (122, 152, 182, 196) including storing log data (122, 152, 182, 196) in a database (120, 150, 180, 194), wherein the log data (122, 152, 182, 196) are each stored with a time stamp,

• upon occurrence of a malfunction (110), extracting the log data (122, 152, 182, 196) logged within a time interval (At) preceding the malfunction,

• Saving an assignment (108) of the specific combination of characteristic features (112) to the malfunction (110), • monitoring the logged log data (122, 152, 182, 196), the monitoring for logging a combination of log data (122, 152, 182, 196) that reflects the stored characteristic feature combination (112). comprises predicting an impending occurrence of the malfunction (110).

A distributed computer system (198) comprising a plurality of servers (100, 130, 160), a first server (100) of said plurality of servers (100, 130, 160) being said computer system (100). Claim 20 is, wherein log data (122, 152, 182, 196) are logged on each of the servers (100, 130, 160) and the logged log data (122, 152, 182, 196) are monitored.

22. Distributed computer system (198) according to claim 21, wherein the assignment (108) of the specific combination of characteristics (112) to the malfunction (110) is carried out by the first server (100) and from the first server (100) to a server group (190) with one or more other servers (130, 160) of the plurality of servers

(100. 130. 160), the servers (130, 160) of the server group (190) each storing the forwarded association (108), the monitoring of log data (152, 182, 196) by the servers ( 130, 160) of the server group (190) in each case to a logging of a combination of log data (122, 152, 182, 196), which has the stored characteristic combination of features (112), by the corresponding server

(130, 160) to predicting an impending occurrence of the malfunction (110).

The distributed computer system (198) of claim 21, wherein upon an occurrence of the malfunction (110) in one of the servers (130, 160) of the distributed computer system (198), from the servers (130, 160) of the server group (190) log data (122, 152, 182,

196) are extracted, the characteristic feature combination (112) being determined using the extracted combination of log data (152, 182, 196) of the server group (190), the characteristic feature combination (112) being determined using a statistical analysis of the servers (130, 160) of the server group (190) takes place, with the assignment (108) of the specific characteristic combination of features (112) to the malfunction (110) being sent to the servers (130, 160) of the server group ( 190). ) The server group (190) each Weil to a logging of a combination of log data (122, 152, 182, 196), which has the stored characteristic combination of features (112), by the corresponding server (ISO, 160) includes predicting an impending occurrence of the malfunction (110).