WO2018182442A1

WO2018182442A1 - Machine learning system and method for generating a decision stream and automonously operating device using the decision stream

Info

Publication number: WO2018182442A1
Application number: PCT/RU2017/000171
Authority: WO
Inventors: Dmitry Yurievich IGNATOV; Alexander Nikolaevich Filippov; Xuecang ZHANG
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2017-03-27
Filing date: 2017-03-27
Publication date: 2018-10-04

Abstract

The present invention concerns a machine learning system and a machine learning method. At first, from a sample set, a sample feature is selected that has a closest relation to the target variable representing decision/prediction. The sample set comprises a plurality of samples, based on which the prediction is met and which are considered at a node of the decision stream. The samples have one or more sample features and one target variable. Then, a set of sample groups is generated by splitting the samples into at least two sample groups according to the selected sample feature, and by executing merging of sample groups that meet a merging rule. Subsequently, a splitting rule is generated for splitting the sample set into the set of sample groups and assigned to the node of the decision stream. After every phase of node generation the leaf nodes from the same and/or different levels of decision stream are merged according to the merging rule. The last operation determines the unique structure and functional properties of decision stream.

Description

MACHINE LEARNING SYSTEM AND METHOD FOR GENERATING A DECISION STREAM AND AUTOMONOUSLY OPERATING DEVICE USING

THE DECISION STREAM TECHNICAL FIELD

The present invention is directed to a machine learning system and to a machine learning method, both arranged to generate a decision stream to be used by an autonomously decision determining computing device. Additionally, the present invention is directed to the autonomously decision determining computing device using the generated decision stream.

BACKGROUND The machine learning has a great impact on all spheres of the modern world. Generally, the idea and goal behind the machine learning is generating a machine, device and/or system that is/are capable to work autonomously. This autonomous work comprises making of decisions in different situations, i.e. with regard to different conditions, environments, features etc. In machine learning, a machine, device and/or system that works/work autonomously has/have to have a kind of knowledge and a framework, which allows the machine, device and/or system to use the knowledge for autonomously acting (e.g., making decisions) in the different situations. An explicit programming of a machine, device and/or system has limitations, particularly, because it is nearly impossible to cover all imaginable situations by a program operating the machine, device and/or system. Thus, the advantage of machine learning is creation of machines, devices and/or systems that operate autonomously and may meet decisions with regard to different situations flexibly.

The creation of the framework, which allows the machine, device and/or system to use the knowledge for autonomously acting, is a crucial and challenging task. Although the knowledge (e.g., rules, conditions, features, possible decisions in view of the rules, conditions, features, etc.) is provided by a human, its representation and, thus, the framework have to be established by a machine, device and/or system. The machine, device and/or system that will use the knowledge for autonomously acting has/have to be able to understand the knowledge and the framework. In the following, the machine, device and/or system that establish the knowledge and framework, which will be used by a further machine, device and/or system for autonomous operation, will be generally referred to as "machine learning system". The machine learning system may be a particular device or an arrangement, i.e. a system of different particular interconnected devices. The machine learning system is a hardware and software arrangement, i.e. a combination of both hardware and software.

One of the key machine learning technologies for the tasks of classification and regression is a decision tree. This non-parametric supervised learning method builds a prediction model, which takes as input features of an object with regard to which predictions are made, and returns a label of the object (e.g., a class of an object, predicted parameter, etc.). Additionally, a probability of prediction correctness may be returned.

Figures la and lb show exemplary two decision trees. Fig. la shows a decision tree with a binary branch splitting, and Fig. 1 b shows a decision tree with a multi-branch splitting. The upper common node of a decision tree is a root node, to which, during the prediction, the data/information (e.g., features) is provided that describe the situation with regard to which the prediction, i.e. the decision is done. At each node of the decision tree, at least one rule is pre-defined and defines, to which decision branch it has to be proceeded with regard to the given data/information (e.g., features). The rule is referred to also as "splitting rule" or "decision rule" because it determines the direction or branch to be taken when providing the decisions or prediction in the decision tree.

The depth of decision trees varies among the different decision trees, wherein the depth specifies the number of levels of the decision tree. In Fig. la, the decision tree has four node levels (starting from the root node (upper node of the tree) to the leaf nodes (bottom nodes of the tree)). Thus, the decision tree of Fig. l a has the depth of 4. In Fig. lb, the decision tree has three node levels and, thus, the depth of 3. A decision or prediction is found when the decision tree has been traversed by staring at the root node and ending at a leaf node. In Figures la and lb, each leaf node indicates the answer/result of the prediction or decision respectively. According to Figures . l a and lb, the answer/result comprises a decision class and the probability for the correctness of the answer/result. The decision trees of Figures la and lb differ in that the decision tree of Fig. la has a binary branch splitting, i.e. each node of the decision tree, which is not a leaf node, leads to further two nodes of the decision tree, and the decision tree of Fig. lb has a multi- branch splitting, i.e. each node of the decision tree, which is not a leaf node, leads to further two or more nodes of the decision tree. By traversing the decision tree, the answer/result of a decision or prediction is refined in view of given data/information (e.g., features). It has to be noted, that decision trees as such are generally known. Therefore, the structure of a decision tree is not considered in more detail in the following but is considered as being well known.

The family of decision tree technologies includes well known methods such as Iterative Dichotomiser 3 and its successor C4.5, classification and regression tree, chi-squared automatic interaction detector and conditional inference tree, as well as enhanced and/or hybrid solutions of said methods, which are used to generate a decision tree from a dataset. The main drawbacks of the decision tree learning techniques are, for example:

1) overfitting;

2) demanding of large amount of data;

3) big depth of generated prediction models, i.e. decision trees;

4) big amount of branches in generated prediction models;

5) slow selection of feature for splitting at a node of the decision tree;

6) manual regulation of parameters of prediction models during the training process.

The functionality of decision tree learning techniques is usually based on heuristic approaches, permutation tests and/or statistics such as Chi-squared test and F-test. The last two statistics, i.e. the Chi-squared test and the F-test have strong theoretical background, bus also burden with following restrictions. Chi-squared test is effective only for classification on the basis of categorical features. The F-test is applicable to the data samples with F-distribution only. There are more parametric and non-parametric tests, which take into account variance and/or mean values of the data samples analyzed in processes of decision tree training. The usage of these approaches can give better accuracy of estimation, especially in the case of continuous values. Thus, a prediction method is required that uses all power of statistics to overcome overfitting and increase accuracy of prediction, as well as to speed-up the prediction process. To implement a supervised learning method, an improved approach for fast selection of a feature to be used for splitting at a decision node, when more branches are required. Further, not only the speed and efficiency of the feature selection and splitting process but also the accuracy of the prediction or decision is highly important. The feature selection and splitting process has to enable a quick and efficient splitting of data samples without loss of accuracy of prediction. It is desired that the splitting into further branches at a node allows to divide data samples, considered at that node, into multiple sub-samples (each corresponding to a further node), which have significant differences between themselves. To provide accurate results, it is further important to support statistically representative quantity of data in leaf nodes. Thus, the data samples considered at the leaf nodes have to be statistically different, i.e. each leaf node has to represent a strong set of data samples that is significantly statistically different from the data samples represented by other leaf nodes.

The known solutions are still not perfect in providing of fast and efficient methods for selecting features and rules for splitting into further branches at a decision node. Further, many known solutions perform binary or triple splitting, which leads to a big depth of the prediction model or decision tree. This leads to long duration of training and prediction. Thus, both the training with regard to a decision tree and the execution of a prediction or the search for an appropriate decision respectively require too much time and are not effective. Moreover, for splitting data samples, some existing solutions use one-sample statistical tests, which are less powerful than two-sample tests. Additionally, existing solutions often represent only disjunction of data and do not consider possible conjunctions of data flows, which leads to a fast diminishing of data quantity in leaf nodes during training process.

Thus, generally expressed, the known machine learning systems still need improvement. They generate predictions models (e.g., decision trees) that are too complex and/or do not provide the required prediction accuracy. Additionally, the prediction model generation itself lacks efficiency. The complex prediction models, which are not as accurate as possible, in turn, lead to a complex and/or inaccurate operation of machines, devices and/or systems, that use these prediction models for autonomous operation, e.g. for decisions/predictions, determined by use of the prediction models, for operating. Consequently, further machine learning methods are required that overcome the above- mentioned drawbacks. Particularly, machine learning methods have to generate an efficient prediction model to be used by an autonomously decision determining computing device (e.g., machine, system, arrangement, apparatus etc.) in a fast, efficient and accurate way. I.e. the machine learning method has to set up a prediction model that allows the autonomously decision determining computing device to operate in a fast, efficient and accurate way.

SUMMARY The object of the present invention is to provide an improved machine learning system and method. Particularly, the object of the present invention is to provide a machine learning system and method that essentially reduce the severity of at least some (preferably all) of the above-mentioned drawbacks. The object of the present invention is achieved by the solution provided in the enclosed independent claims. Advantageous implementations of the present invention are further defined in the respective dependent claims, in the description, and/or in the appended figures. According to a first aspect, a machine learning system is provided that is arranged to generate a decision stream to be used by an autonomously decision determining computing device, wherein the machine learning system comprises: a selector entity configured to select from a sample set a sample feature of samples of the sample set that has a closest relation to target variable, wherein the sample set comprises a plurality of samples, based on which the prediction is met, wherein the samples of the sample set are samples considered at a node of the decision stream, and wherein the samples of the sample set have one or more sample features and one target variable; a group merging entity configured to generate a set of sample groups by splitting the samples of the sample set into at least two sample groups according to the selected sample feature, and by executing a merging of sample groups that meet a merging rule; a splitting entity configured to generate a splitting rule for splitting the sample set into the set of sample groups and to assign the splitting rule to the node of the decision stream. In a first possible implementation according to the first aspect, the selector entity is configured to select the sample feature that has a closest relation to target variable by: calculating, for each sample feature of the one or more sample features that is a continuous feature, a corresponding coefficient of determination between values of the sample feature and target variable of samples in the sample set; calculating, for each sample feature of the one or more sample features that is a categorical feature, a corresponding correlation ratio of target variable with respect to categories of the sample feature of samples in the sample set; and selecting the sample feature from the one or more sample features that has the largest coefficient of determination or that has the largest correlation ratio.

In a second possible implementation form according to the first aspect as such or according to the first implementation form of the first aspect, the group merging entity is configured to: split the samples of the sample set into the at least two sample groups according to categories of the selected sample feature if the selected sample feature is a categorical feature; and split the samples of the sample set into the at least two sample groups according to values of the selected sample feature if the selected sample feature is a continuous feature.

In a third possible implementation form according to the first aspect as such or according to the first or second implementation form of the first aspect, the group merging entity is configured to execute the merging of the sample groups that meet the merging rule by: determining, among the at least two sample groups, at least one pair of similar sample groups; and executing a merging process on the pair of similar sample groups. In a fourth possible implementation form according to the third implementation form of the first aspect, the group merging entity is configured to determine the pair of similar sample groups by determining a group similarity probability for a pair of sample groups, and deciding that the pair of sample groups is a pair of similar sample groups if the probability of similarity between values of target variable in two sample groups is above a predetermined probability threshold.

In a fifth possible implementation form according to the third or fourth implementation form of the first aspect, the group merging entity is configured to execute the merging process by: estimating a significance of similarity of values of target variable of samples of the pair of similar sample groups; and merging the pair of similar sample groups if the estimated significance of similarity is above a predetermined threshold. In a sixth possible implementation form according to the fifth implementation form of the first aspect: the group merging entity is configured to estimate the significance of similarity by olmogorov-Smirnov test statistic method; the group merging entity is configured to estimate the significance of similarity by Mann- Whitney U test statistic method; the group merging entity is configured to estimate the significance of similarity by: determining, for the pair of similar sample groups, unpaired two-sample Z-test statistic method as the statistical method to be used if a number of samples in every group of the pair of similar sample groups is larger than or equal to 30; and determining, for the pair of similar sample groups, unpaired two-sample Student's t-test statistic method as the statistical method to be used if a number of samples in the at least one sample group of pair of similar sample groups is smaller than 30; and/or the group merging entity is configured to estimate the significance of similarity by: determining, for each sample group of the pair of similar sample groups, a corresponding normality of distribution of the values of target variable of the samples of the sample group; determining, for the pair of similar sample groups, at least one statistical method, to be used for the estimating the significance of similarity, according to the corresponding normalities of distribution of results of predictions executed with regard to the samples of sample groups of the pair of similar sample groups; and estimating the significance of similarity of the values of target variable of the pair of similar sample groups by executing the corresponding determined at least one statistical method.

In a seventh possible implementation form according to the sixth implementation form of the first aspect, the group merging entity is configured to: determine, for the pair of similar sample groups, Kolmogorov-Smirnov test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as non- normal; determine, for the pair of similar sample groups, unpaired two-sample Z-test statistic method as the statistical method to be used if the distribution of the values target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as normal and if a number of samples in every group of the pair of similar sample groups is larger than or equal to 30; and determine, for the pair of similar sample groups, unpaired two-sample Student's t-test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as normal and if a number of samples in the at least one sample group of pair of similar sample groups is smaller than 30.

In an eight possible implementation form according to any one of the fifth to seventh implementation forms of the first aspect, the group merging entity is configured to merge the pair of similar sample groups by: applying a Bonferroni correction to the significance of similarity of the values of target variable of the pair of similar sample groups with regard to the quantity of compared pairs of sample groups; and, if the corrected significance of similarity is above the predetermined threshold, merging the pair of similar sample groups.

In a ninth possible implementation form according to the first aspect as such or according to any one of the first to eight implementation forms of the first aspect, the machine learning system further comprises: a node merging entity configured to merge leaf nodes of the decision stream; and/or a node generating entity configured to generate new nodes associated with the node of the decision stream, wherein each one of the new nodes is associated with one sample group of the set of sample groups generated by the group merging entity, and wherein each sample group of the set of sample groups is associated with one new node. In a tenth possible implementation form according to the ninth implementation form of the first aspect, the node merging entity is configured to merge the leaf nodes of the decision stream by: determining, among the leaf nodes of the decision stream, at least one pair of similar leaf nodes, wherein a pair of leaf nodes is considered as a pair of similar leaf nodes if an average value of target variable in samples of a first leaf node of the pair of leaf nodes is equal to an average value of target variable in samples of a second leaf node of the pair of leaf nodes and/or if a probability of similarity between the values of target variable in the first and the second leaf nodes is above a predetermined leaf node similarity threshold; and merging the pair of similar leaf nodes if an estimated significance of leaf node similarity is above a predetermined leaf node similarity threshold.

In an eleventh possible implementation form according to the tenth implementation form of the first aspect: the node merging entity is configured to estimate the significance of leaf node similarity by Kolmogorov-Smirnov test statistic method; the node merging entity is configured to estimate the significance of leaf node similarity by Mann- Whitney U test statistic method; the node merging entity is configured to estimate the significance of leaf node similarity by: determining, for the pair of similar leaf nodes, unpaired two- sample Z-test statistic method as the statistical method to be used if a number of samples in every node of the pair of similar leaf nodes is larger than or equal to 30; and determining, for the pair of similar leaf nodes, unpaired two-sample Student's t-test statistic method as the statistical method to be used if a number of samples in the at least one leaf of pair of similar leaf nodes is smaller than 30; and/or the node merging entity is configured to estimate the significance of leaf node similarity by: determining, for each leaf node of the pair of similar leaf nodes, a corresponding normality of distribution of the values of target variable of the samples of the leaf node; determining, for the pair of similar leaf nodes, at least one statistical method, to be used for the estimating the significance of similarity, according to the corresponding normality of distributions of the values of target variable of the samples of leaf nodes of the pair of similar leaf nodes; and estimating the significance of leaf node similarity of the values of target variable of the pair of similar leaf nodes by executing the corresponding determined at least one statistical method.

In a twelfth possible implementation form according to the eleventh implementation form of the first aspect, the node merging entity is configured to: determine, for the pair of similar leaf nodes, Kolmogorov-Smirnov test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of leaf nodes of the pair of similar leaf nodes is determined as non-normal; determine, for the pair of similar leaf nodes, unpaired two-sample Z-test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of leaf nodes of the pair of similar leaf nodes is determined as normal and if a number of samples in every node of the pair of similar leaf nodes is larger than or equal to 30; and determine, for the pair of similar leaf nodes, unpaired two-sample Student's t-test statistic method as the statistical method to be used if the nature of distribution of predictions executed with regard to the samples of at least one of leaf nodes of the pair of similar leaf nodes is normal and if a number of samples in the at least one leaf of pair of similar leaf nodes is smaller than 30. In a thirteenth possible implementation form according to any one of the tenth to twelfth implementation forms of the first aspect, the node merging entity is configured to merge the pair of similar leaf nodes by: applying a Bonferroni correction to the significance of leaf node similarity with regard to the quantity of compared pairs of leaf nodes; and, if the corrected significance of leaf node similarity is above the predetermined leaf node similarity threshold, merging the pair of similar leaf nodes.

According to a second aspect, a machine learning method is provided that is arranged to generate a decision stream to be used by an autonomously decision determining computing device, wherein the machine learning method comprises: selecting from a sample set a sample feature of samples of the sample set that has a closest relation to a target variable, wherein the sample set comprises a plurality of samples, based on which the prediction is met, wherein the samples of the sample set are samples considered at a node of the decision stream, and wherein the samples of the sample set have one or more sample features and one target variable; generating a set of sample groups by splitting the samples of the sample set into at least two sample groups according to the selected sample feature, and by executing a merging of sample groups that meet a merging rule; generating a splitting rule for splitting the sample set into the set of sample groups and to assign the splitting rule to the node of the decision stream. Generally, the machine learning method is arranged to execute any one or any combination of steps executed by the machine learning system, described herein.

According to a third aspect, an autonomously operating device is provided that is arranged to autonomously determine a decision by use of a decision stream generated by the machine learning system, described herein, and/or by an execution of the machine learning method, described herein.

BRIEF DESCRIPTION OF DRAWINGS

The above-described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which Fig. 1 a shows a decision tree with a binary branch splitting.

Fig. lb shows a decision tree with a multi -branch splitting.

Fig. 2 shows an exemplary decision stream used by a machine learning system according to an embodiment of the present invention.

Fig. 3 a shows a machine learning system arranged according to an embodiment of the present invention. Fig. 3b shows a machine learning system arranged according to an embodiment of the present invention.

Fig. 3c shows a machine learning system arranged according to an embodiment of the present invention.

Fig. 4a shows steps of a machine learning method according to an embodiment of the present invention.

Fig. 4b shows steps of a machine learning method according to an embodiment of the present invention.

Fig. 4c shows steps of a machine learning method according to an embodiment of the present invention. shows an exemplary sample set at a node of a decision stream associated with one or more sample features according to an embodiment of the present invention. shows a generation of a set of sample groups according to an embodiment of the present invention. shows a generation of new nodes according to an embodiment of the present invention. shows a splitting of samples of a sample set into at least two sample groups according to an embodiment of the present invention. shows a merging of the at least two split sample groups according to an embodiment of the present invention. shows a generation of new nodes in the decision stream according to an embodiment of the present invention. shows a merging of the nodes, being leaf nodes, according to an embodiment of the present invention. shows a result of the merging of the nodes according to an embodiment of the present invention. shows a splitting of samples of a sample set into at least two sample groups according to an embodiment of the present invention. shows a merging of the at least two split sample groups according to an embodiment of the present invention. shows a further merging of at least two split sample groups according to an embodiment of the present invention. shows a further splitting into sub-groups according to an embodiment of the present invention. shows a merging of the sub-groups according to an embodiment of the present invention. shows a result of the merging of the sub-groups according to an embodiment of the present invention, said result indicating the nodes of the decision stream. shows a decision stream with leaf nodes in different levels according to an embodiment of the present invention. shows an exemplary operating of an autonomously operating device according to an embodiment of the present invention.

DETAILED DESCRIPION OF EMBODIMENTS

Generally, it has to be noted that all arrangements, devices, modules, components, models, elements, units, entities, and means and so forth described in the present application could be implemented by software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionality described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if in the following description of the specific embodiments, a specific functionality or step to be performed by a general entity is not reflected in the description of a specific detailed element of the entity which performs the specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective hardware or software elements, or any kind of combination thereof. Further, the method of the present invention and its various steps are embodied in the functionalities of the various described apparatus elements. Moreover, any of the embodiments and features of any of the embodiments, described herein, may be combined with each other, unless a combination is explicitly excluded.

According to the present invention, a decision stream is used as a prediction model. Fig. 2 shows an exemplary decision stream used by a machine learning system according to an embodiment of the present invention. The structure of a decision stream is similar to the structure of the decision tree. Also the decision stream is a treelike graph or model, which allows to execute predictions or to make decisions by searching starting from a root node (see the upper node in Fig. 2) until a leaf node has been reached. At any of the nodes of the decision stream, which are not leaf nodes, one or more rules (referred to also as "split rules" or "decision rules") are present that, based on given data and features, allow to decide on the branch to be taken in the decision stream. A decision is met or a prediction is made when a leaf node is reached. In comparison to decision trees as exemplary shown in Figs, l a and l b, the decision stream allows a merging of different leaf nodes from the same and/or different levels in the prediction model structure. Thus, in the decision stream, branches may be brought together by linking a node of one branch in a given level to a node of another branch in a next or farther level that is below the given level. This is shown in Fig. 2 and 10 exemplary with regard to leaf nodes. In Fig. 2, every node of one branch in the intermediate level leads to leaf nodes of other branches in lower terminal level.

In view of the aforesaid, the decision stream can be defined as a decision tree that allows linking from a first node in a one level of the decision tree to a second node in another level of the decision tree, wherein the another level is a level in the decision tree that is directly or one/several levels below the level of the first node and wherein the first node and the second node belong to different branches of the decision tree.

In this way, by use of the decision stream structure, data storage resources may be saved on the one side and the decision search or predicting respectively may be accelerated on the other side. Thus, according to the present invention, a less complex and a more efficient structure is used as prediction model. Additionally, the accuracy of the predictions or decisions respectively is improved because redundancies at the nodes of the decision stream are reduced and because a more clear differentiation between the decisions is achieved. Fig. 3a shows a machine learning system 3 arranged according to an embodiment of the present invention. The machine learning system 3 is arranged to generate a decision stream to be used by an autonomously decision determining computing device. Thus, the machine learning system 3 generates a decision stream that will be used by a further computing device (e.g., system, arrangement, apparatus) for autonomous determination of decisions or predictions respectively. The generated decision stream is arranged as explained above with regard to Fig. 2.

For generating a decision stream according to the present invention, the machine learning system 3 of Fig. 3a comprises a selector entity 31 , a group merging entity 32, and a splitting entity 33.

The selector entity 31 is configured to select from a sample set the one sample feature of samples of the sample set such that the selected feature has the closest relation to a target variable (also named as label or predicted value), which represents prediction or decision that will be met (by the further computing device) by use of the decision stream. The sample set comprises a plurality of samples, based on which the prediction or decision respectively is met. The samples of the sample set are samples considered at a node of the decision stream. Thus, at a root node, the sample set will comprise all samples that are generally relevant for the prediction or decision. With any branch, leading from the root node towards leaf nodes, the amount of samples of the sample set of each further node of the decision stream will be reduced in view of the

requirements, i.e. split or decision rules to be met at that node in the decision stream. In the training process the samples have different features (e.g. one or more sample features) that describe the samples, and one target variable that represents the correct value of prediction or decision. When a prediction or a decision has to be done, a feature of the samples has to be found that is as close as possible to the subject matter of the prediction or decision, i.e. that is the most relevant and significant characteristic of that prediction or decision. So in prediction process the samples have the features without target variable and the prediction or decision is made by the decision stream.

Fig. 5 shows an exemplary sample set 51 considered at a node of a decision stream according to an embodiment of the present invention. The sample set 51 comprises one or more samples 51_1 , ..., 5 l_n, wherein n is a positive integer. Each one of the samples 51 1 , 51_n of the sample set 51 is associated with one value of target variable 52_0 and the sample features 52_1 , 52_m of the feature set 52, wherein m is a positive integer. The samples 51_1 , 51_n are different from each other if the values of the target variable 52_0 in this samples are different.

Thus, the selector entity 31 selects one of the sample features 52_1 , ..., 52_m that is significant characteristics the most relevant to a given target variable 52_0 representing correct prediction or decision.

The group merging entity 32 is configured to generate a set of sample groups by splitting the samples 51 1 , 51_n of the sample set 51 into at least two sample groups according to the selected sample feature of 52_1 , 52_m features, and by executing a merging of sample groups, according to the values of target variable 52_0, that meet a merging rule. The different samples 51 1 , ..., 5 l_n have different values for the selected feature, for example, feature 52 1 and they are grouped with regard to the values of the feature 52_1.

If, for example, the selected feature 52 1 is "age", then the age diapasons will be identified and the samples 51 1 , 51_n as well as associated with them values of target variable 52_0 will be grouped into the age diapasons of that feature 52_1. After the splitting is done, the merging rule is applied to the values of target variables 52_0 in groups, which are result of splitting. For example, a merging rule is configured to determine similarity between two sample groups and to determine whether a merging of the two sample groups should be done. In case the similarity between the values of target variable 52_0 in two sample groups is significant according to the merging rule, the two sample groups will be merged into one sample group.

Fig. 6 shows a generation of a set 6 of sample groups according to an embodiment of the present invention. As shown on the left side of Fig. 6, at first the samples 51 1 , 5 l_n of the sample set 51 are split into groups 6_1 , 6_2, ..., 6_k according to the selected sample feature 52 1 (e.g., "age" or any other appropriate feature), wherein k is a positive integer (e.g., larger than or equal to two). Then, the similarity of the split groups 6_1 , 6_2, 6_k is determined according to the values of target variable 52_0. Particularly, a merging rule is applied that defines under which circumstances (e.g., by one of test statistics) two or more of the split sample groups 6_1 , 6_2, 6_k should be merged. According to the exemplary embodiment of Fig. 6, sample groups 6_1 and 6_2 are determined according to the merging rule as two groups that should be merged (e.g., the samples 51_1 , ..., 5 l_n of both sample groups 6_1 , 6_2 have similar values of the target variable 52_0 and so the samples 51_1 , ..., 5 l_n of both sample groups 6_1 , 6_2 lead to the similar prediction/determination results). The merging rule, however, is not restricted to the above example comprising the determination of similarity. Here, different appropriate merging rules may be applied for achieving distinguishable prediction results. Thus, after merging, the set 6' of sample groups comprises sample groups 6_1 ', ..., 6_k, wherein sample group 6_1 ' is a sample group obtained after merging the original split sample groups 6_1 , 6_2.

The splitting entity 33 is configured to generate a splitting rule for splitting the sample set 51 into the set 6' of sample groups 6_1 ', ... , 6_k and to assign the splitting rule to the node of the decision stream. Often, the splitting rule will be based on the selected feature (one of the features 52_1 , 52_m). For example, the splitting rule will define (e.g., in view of and/or based on the diapasons of selected feature) as condition or pipeline of conditions that lead(s) to the splitting of the sample set 51 into the set 6' of sample groups 6_ , 6_k.

According to an embodiment, the selector entity 31 is configured to select the sample feature from the features 52 1 , ..., 52_m that has a closest relation to the target variable 52_0 prediction or decision by: calculating, for each sample feature of the one or more sample features 52_1 , ..., 52_m or of the sample feature set 52, respectively, that is a continuous feature, a corresponding coefficient of determination between the values of target variable 52_0 and every sample feature 52_1 , 52_m of samples 51_1 , 51_n of the sample set 51 ; calculating, for each sample feature 52 1 , 52_m of the one or more sample features 52_1 , 52_m that is a categorical feature, a corresponding correlation ratio of target variable 52_0 with respect to categories of feature of the samples 51_1 , 51_n of the sample set 51 ; and selecting the sample feature from the one or more sample features 52_1 , ..., 52_m that has the largest coefficient of determination or that has the largest correlation ratio. The term "continuous feature" refers to a feature 52_1 , 52_m that has an infinite number of possible values. Examples of continuous features are age, temperature, pressure, voltage level etc. The term "categorical feature" refers to a feature 52 1 , 52_m that has fixed number of possible values and that enables dividing of samples in categories. Examples of categorical features are manufacturer identity, device type, monitored event type etc.

The terms "coefficient of determination" and "correlation ratio" are well known in different areas including machine learning. The term "coefficient of determination" has its origin in statistics and indicates a proportion of the variance in a dependent variable that is predictable from independent variable(s). Particularly, coefficient of

determination provides a measure of how well observed outcomes are replicated by a prediction model such as the decision sequence, based on the proportion of total variation of outcomes explained by the prediction model. The term "correlation ratio" has also it origin in statistics and is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. This measure is defined as the ratio of two standard deviations representing these types of variation. Coefficients of determination and correlation ratios are comparable. Therefore, it is possible to determine among features of the two types - continuous features and categorical features - a feature that has a closest relation to the prediction or decision.

According to an embodiment, the group merging entity 32 is configured to: split the samples 51 1 , ..., 5 l_n of the sample set 51 into the at least two sample groups 6_1 ,

6_2, ..., 6_k according to categories of the selected sample feature (one of 52_1 , ..., 52_m) if the selected sample feature is a categorical feature; and split the samples 51_1 , ... , 5 l_n of the sample set 51 into the at least two sample groups 6_1 , 6_2, ... , 6_k according to values of the selected sample feature (one of 52 1 , 52_m) if the selected sample feature is a continuous feature.

According to an embodiment, the group merging entity 32 is configured to execute the merging of the sample groups 6_1 , 6_2, 6_k that meet the merging rule by:

determining, among the at least two sample groups 6_1 , 6_2, 6_k, at least one pair of similar sample groups; and executing a merging process on the selected pair(s) of similar sample groups.

According to an embodiment, the group merging entity 32 is configured to determine the pair of similar sample groups within groups 6_1 , 6_2, 6_k by determining a probability of group similarity for a pair of sample groups, and deciding that the pair of sample groups within groups 6_1 , 6_2, ..., 6_k is a pair of similar sample groups if the probability of similarity between values of target variable 52_0 in two sample groups is above a predetermined probability threshold.

According to a further embodiment, the group merging entity 32 is configured to execute the merging process by: estimating a significance of similarity of target variables 52_0 of samples 51 1 , 51_n of the pair of similar sample groups within groups 6_1 , 6_2, 6_k; and merging the pair of similar sample groups if the estimated significance of similarity is above a predetermined threshold.

According to an embodiment, the group merging entity 32 is configured to estimate the significance of similarity by Kolmogorov-Smirnov test statistic method or by Mann- Whitney U test statistic method.

According to an alternative embodiment, the group merging entity 32 is configured to estimate the significance of similarity by: determining, for the pair of similar sample groups from groups 6_1 , 6_2, 6_k, unpaired two-sample Z-test statistic method as the statistical method to be used if a number of samples 51_1 , 51_n in every group of the pair of similar sample groups is larger than or equal to 30; and determining, for the pair of similar sample groups within groups 6_1 , 6_2, 6_k, unpaired two- sample Student's t-test statistic method as the statistical method to be used if a number of samples 51 1 , 51_n in the at least one group of pair of similar sample groups within groups 6_1 , 6_2, 6_k is smaller than 30.

According to a further alternative embodiment, the group merging entity 32 is configured to estimate the significance of similarity by: determining, for each sample group 6_1 , 6_2, ... , 6_k of the pair of similar sample groups from groups 6_1 , 6_2, ... , 6_k, a corresponding normality of distribution of target variable 52_0 values of the samples 51 1 , 51_n of the sample group 6_1 , 6_2, 6_k; determining, for the pair of similar sample groups, at least one statistical method, to be used for the estimating the significance of similarity, according to the con-esponding normality of distributions of target variable 52_0 values of the samples 51 1 , 51_n of the pair of similar sample groups from groups 6_1 , 6_2, ..., 6_k; and estimating the significance of similarity of target variable 52_0 values of the samples 51_1 , 51_n of the pair of similar sample groups within groups 6_1 , 6_2, 6_k by executing the corresponding determined at least one statistical method. Here, according to a further embodiment, the group merging entity 32 is configured to: determine, for the pair of similar sample groups from groups 6_1 , 6_2, ..., 6_k, Kolmogorov-Smirnov test statistic method as the statistical method to be used if the distribution of target variable 52_0 values of the samples 51_1 , 51_n of at least one of sample groups of the pair of similar sample groups from groups 6_1 , 6_2, 6_k is determined as non-normal; determine, for the pair of similar sample groups from groups 6_1 , 6_2, 6_k, unpaired two-sample Z- test statistic method as the statistical method to be used if the distribution of target variable 52_0 values of the samples 51_1 , ..., 5 l_n of at least one of sample groups of the pair of similar sample groups from groups 6_1 , 6_2, 6_k is determined as normal and if a number of samples 51_1 , 51_n in every group of the pair of similar sample groups within groups 6_1 , 6_2, 6_k is larger than or equal to 30; and determine, for the pair of similar sample groups within groups 6_1 , 6_2, ..., 6_k, unpaired two-sample Student's t-test statistic method as the statistical method to be used if the distribution of target variable 52_0 values of the samples 51 1 , 51_n of at least one of sample groups of the pair of similar sample groups from groups 6_1, 6_2, ... , 6_k is determined as normal and if a number of samples 51 1 , ... , 51 _n in the at least one group of pair of similar sample groups within groups 6_1 , 6_2, ..., 6_k is smaller than 30.

Further, according to an embodiment, the group merging entity 32 is configured to merge the pair of similar sample groups from groups 6_1 , 6_2, ..., 6_k by: applying a Bonferroni correction to the significance of similarity of target variable 52_0 values of samples of the pair of similar sample groups from groups 6_1 , 6_2, ..., 6_k; and, if the corrected significance of similarity is above the predetermined threshold, merging the pair of similar sample groups from groups 6_1 , 6_2, ..., 6_k. Fig. 3b shows a machine learning system 3 arranged according to an embodiment of the present invention. The embodiment of Fig. 3b is based on the embodiment of Fig. 3a and presents a supplemented arrangement of the machine learning system 3 shown in Fig. 3a.

According to the embodiment of Fig. 3b, the machine learning system 3 further comprises a node generating entity 34 configured to generate new nodes associated with the node of the decision stream, with regard to which the splitting rule has been generated by the splitting entity 33 and with regard to which the group merging entity 32 generated the set 6' of sample groups 6_1 ', ... , 6_k, wherein each one of the new nodes is associated with one sample group 6_1 ', ..., 6_k of the set 6' of sample groups 6_1 ', ..., 6_k generated by the group merging entity 33, and wherein each sample group 6_1 ', ..., 6_k of the set 6' of sample groups 6_1 ', ..., 6_k is associated with one new node.

The node generating entity 34 uses the results of the computations of the selector entity 31 , the group merging entity 32 and the splitting entity 33 and establishes further branches at the decision stream node handled by the selector entity 31 , the group merging entity 32 and the splitting entity 33. The new nodes are associated with said decision stream node and refine decisions or predictions to be made by use of the decision stream. The splitting or decision rule, associated by the splitting entity 33 to that node, defines conditions that correspondingly lead to the new nodes during the prediction process.

Fig. 7 shows a generation of new nodes 7_1 , 7_k at a node 7 according to an embodiment of the present invention. As exemplary shown in Fig. 7, the set 6' of sample groups 6_Γ, 6_k, generated by the splitting entity 33, is used as starting point for the new node generation. Each one of the sample groups 6_1 ', ... , 6_k leads to a generation of a corresponding new node 7_1, ..., 7_k. Fig. 3c shows a machine learning system 3 arranged according to an embodiment of the present invention. The embodiment of Fig. 3c is based on the embodiments of Figs. 3a and 3b and presents a supplemented arrangement of the machine learning system 3 shown in Figs. 3a and 3b. According to the embodiment of Fig. 3c, the machine learning system 3 comprises a node merging entity 35 configured to merge leaf nodes of the decision stream. The node merging entity 35 ensures that leaf nodes with data, which are not statistically different, are merged. In this way, statistically representative quantity of data in leaf nodes is supported. Consequently, accurate dividing of data samples in multiple sub- samples, which have significant statistical differences between themselves, is enabled. This improves the efficiency and the accuracy of the decision stream established by the machine learning system 3. According to an embodiment, the node merging entity 35 is configured to merge the new nodes such as 7_1 , ..., 7_k generated by the node generating entity 34 for different branches on the current level of decision stream as well as leaf nodes (which are not split) on any of the above levels of decision stream. During the node merging by the merging entity 35, leaf nodes of the current level and any of above levels in the decision stream are merged.

According to an embodiment, the node merging entity 35 is configured to merge the leaf nodes of the decision stream by: determining, among the leaf nodes of the decision stream, at least one pair of similar leaf nodes, wherein a pair of similar leaf nodes is considered as a pair of similar nodes if an average value of target variable 52_0 in samples 51 1 , ... , 51 _n of a first leaf node of the pair of leaf nodes is equal to an average value of target variable 52_0 in samples 51 1 , 51_n of a second leaf node of the pair of leaf nodes and/or if a probability of similarity between the values of target variable in the first and second leaf nodes is above a predetermined leaf node similarity threshold; and merging the pair of similar leaf nodes if an estimated significance of leaf node similarity is above a predetermined leaf node similarity threshold.

According to an embodiment, the node merging entity 35 is configured to estimate the significance of leaf node similarity by Kolmogorov-Smirnov test statistic method or by Mann-Whitney U test statistic method.

According to an alternative embodiment, the node merging entity 35 is configured to estimate the significance of leaf node similarity by: determining, for the pair of similar leaf nodes, unpaired two-sample Z-test statistic method as the statistical method to be used if a number of samples 51_1 , 51_n in every node of the pair of similar leaf nodes is larger than or equal to 30; and determining, for the pair -of similar leaf nodes, unpaired two-sample Student's t-test statistic method as the statistical method to be used if a number of samples 51 1 , 51_n in the at least one leaf of pair of similar leaf nodes is smaller than 30.

According to another alternative embodiment, the node merging entity 35 is configured to estimate the significance of leaf node similarity by: determining, for each leaf node of the pair of similar leaf nodes, a corresponding normality of distribution of target variable 52_0 values of the samples 51_1 , 51_n of the leaf node;

determining, for the pair of similar leaf nodes, at least one statistical method, to be used for the estimating the significance of similarity, according to the corresponding normality of distributions of target variable 52_0 values of the samples 51 1 , 51_n of leaf nodes of the pair of similar leaf nodes; and estimating the significance of leaf node similarity of target variable 52_0 values of the samples of the pair of similar leaf nodes by executing the corresponding determined at least one statistical method. Here, according to an embodiment, the node merging entity 35 is configured to: determine, for the pair of similar leaf nodes, Kolmogorov-Smirnov test statistic method as the statistical method to be used if distribution of target variable 52_0 values of the samples 51 1 , 51_n of at least one of leaf nodes of the pair of similar leaf nodes is determined as non-normal; determine, for the pair of similar leaf nodes, unpaired two- sample Z-test statistic method as the statistical method to be used if distribution of target variable 52_0 values of the samples 51_1 , 51_n of at least one of leaf nodes of the pair of similar leaf nodes is determined as normal and if a number of samples in every leaf node of the pair of similar leaf nodes is larger than or equal to 30; and determine, for the pair of similar leaf nodes, unpaired two-sample Student's t-test statistic method as the statistical method to be used if distribution of target variable

52_0 values of the samples 51 1 , 51_n of at least one of leaf nodes of the pair of similar leaf nodes is determined as normal and if a number of samples 51 1 , 51_n in the at least one leaf node of the pair of similar leaf nodes is smaller than 30.

According to an embodiment, the node merging entity 35 is configured to merge the pair of similar leaf nodes by: applying a Bonferroni correction to the significance of leaf node similarity of target variable 52_0 values of the samples of the pair of similar leaf nodes; and, if the corrected significance of leaf node similarity is above the predetermined leaf node similarity threshold, merging the pair of similar leaf nodes. Fig. 4a shows steps of a machine learning method 4 according to an embodiment of the present invention. In general, the machine learning method 4 comprises steps executed by the machine learning system 3 described herein.

According to the embodiment of Fig. 4a, the machine learning method 4 comprises a step 41 of selecting from a sample set 51 a one sample feature of the features 52_1 ,

52_m of samples 51 1 , 51_n of the sample set 51 that has a closest relation to a target variable 52_0, wherein the sample set 51 comprises a plurality of samples 51_1 ,

51_n, based on which the prediction is met, wherein the samples 51 1 , 51_n of the sample set 51 are samples 51 1 , 51_n considered at a node 7 of the decision stream, and wherein the samples 51_1 , ..., 5 l_n of the sample set 51 have one or more (e.g., a set 52 of) sample features 52_1, 52_m and one target variable 52_0. The step 41 of selecting a sample feature 52 1 , ..., 52_m is executed, for example, by the selector entity 31 of the machine learning system 3. Thus, the step 41 may comprise as its sub- steps any one or any combination of the steps executed by the selector entity 31 . Vice versa, the selector entity 31 may be configured to execute any one or any combination of the sub-steps of the step 41.

Further, according to the embodiment of Fig. 4a, the machine learning method 4 comprises a step 42 of generating a set 6' of sample groups 6_1 ', ..., 6_k by splitting the samples 51 1 , 51_n of the sample set 51 into at least two sample groups 6_1 , 6_2,

6_k according to the selected sample feature from the features 52_1 , .... 52_m, and by executing a merging of sample groups 6_1 , 6_2, 6_k that meet a merging rule. The step 42 of generating of set 6' of sample groups 6_Γ, 6_k is executed, for example, by the group merging entity 32 of the machine learning system 3. Thus, the step 42 may comprise as its sub-steps any one or any combination of the steps executed by the group merging entity 32. Vice versa, the group merging entity 32 may be configured to execute any one or any combination of the sub-steps of the step 42. Additionally, according to the embodiment of Fig. 4a, the machine learning method 4 comprises a step 43 of generating a splitting rule for splitting the sample set 51 into the set 6' of sample groups 6_Γ, ..., 6_k and to assign the splitting rule to the node 7 of the decision stream. The step 43 of generating of the splitting rule is executed, for example, by the splitting entity 33 of the machine learning system 3. Thus, the step 43 may comprise as its sub-steps any one or any combination of the steps executed by the splitting entity 33. Vice versa, the splitting entity 33 may be configured to execute any one or any combination of the sub-steps of the step 43. Fig. 4b shows steps of a machine learning method 4 according to an embodiment of the present invention. In general, the machine learning method 4 comprises steps executed by the machine learning system 3 described herein. The machine learning method 4 is based on the machine learning method 4 of Fig. 4a. According to the embodiment of Fig. 4b, the machine learning method 4 further comprise a step 44 of generating new nodes 7_1 , ..., 7_k associated with the node 7 of the decision stream, wherein each one of the new nodes 7 is associated with one sample group 6_ , 6_k of the set 6' of sample groups 6_ , 6_k generated by the group merging entity 32 and/or in the step 42, and wherein each sample group 6_1 ', ..., 6_k of the set 6' of sample groups 6_Γ, 6_k is associated with one new node 7_1 , 7_k. The step 44 of generating of the new nodes 7_1 , 7_k is executed, for example, by the node generating entity 34 of the machine learning system 3. Thus, the step 44 may comprise as its sub-steps any one or any combination of the steps executed by the node generating entity 34. Vice versa, the node generating entity 34 may be configured to execute any one or any combination of the sub-steps of the step 44.

Fig. 4c shows steps of a machine learning method 4 according to an embodiment of the present invention. In general, the machine learning method 4 comprises steps executed by the machine learning system 3 described herein. The machine learning method 4 is based on the machine learning method 4 of Fig. 4a and of Fig. 4b.

According to the embodiment of Fig. 4c, the machine learning method 4 further comprise a step 45 of merging leaf nodes of the decision stream. The step 45 is executed, for example, by the node merging entity 35 of the machine learning system 3. Thus, the step 45 may comprise as its sub-steps any one or any combination of the steps executed by the node merging entity 35. Vice versa, the node merging entity 35 may be configured to execute any one or any combination of the sub-steps of the step 45. In the following, use cases of the present invention will be explained. The use cases are based on the above-described embodiments. Additionally, it has to be pointed out that also any other use case can be implemented accordingly and that the present invention is not limited to the use cases described in the following. Further, any one or any combination of the features described below can be combined with the above- described embodiments. The below use cases explain the above embodiments in more detail.

The first use case is directed to an automatic inspection of a device or system. In the following, automatic inspection of motor vehicles will be considered exemplary. The prediction/decision to be done is when an inspection of a motor vehicle should be done. Input for the generation of a corresponding decision stream is data on different motor vehicles as samples 51 1 , 51n. This data is, for example, historical data on different motor vehicles 51_1 , ..., 51 n inspected once or more than once over a given period of time. For motor vehicles 51 1 , 5 In is described by a set 52 of features 52_1 , ..., 52_m and target variable 52_0. The features 52_1, ..., 52_m comprise continuous features and/or categorical features. The continuous features comprise, for example, coolant temperature, engine oil pressure, accumulator voltage level, mileage after the last inspection etc. The categorical features comprise, for example, date (e.g., year) of manufacturing, emission and engine noise discrete levels, car type, manufacturer identity, etc. The target variable 52_0 comprises the normal status or damage type of motor vehicle mechanisms, which is proven during repairing work.

Based on this, the machine learning system 3 and/or the machine learning method 4 will generate a decision stream for the prediction/decision on when an inspection of a motor vehicle should be done as follows.

Based on the set 51 of samples 51 1 , 51_n, the features 52_1 , 52_m and target variable 52_0 of the samples 51 1 , 51_n, a feature is selected in 31/ by 41 , which is closely associated with the prediction/decision value of the target variable 52_0. For this purpose, association strength is calculated.

With regard to continuous features 52_1 , 52_m the association strength is coefficient of determination. Thus, for each of the continuous features 52 1 , ..., 52_m, the corresponding coefficient of determination is calculated between the values of target variable 52_0 and continuous feature in the samples 51_1 , 51_n of motor vehicle set 51. For example, the coefficient of determination is determined between the status of motor vehicle mechanisms and the mileage of motor vehicle after the last inspection.

With regard to categorical features 52_1 , 52_m the association strength is correlation ratio. Thus, for each one of the categorical features, the corresponding correlation ratio of target variable 52_0 with respect to categories of the categorical feature from features 52_1 , 52_m is determined in the samples 51 1 , 51_n of motor vehicle set 51. For example, the correlation ratio is determined between the status of motor vehicle mechanisms and discrete level of engine noise. For determining the coefficient of determination and for determining the correlation ratio, any one of the corresponding well known formulas can be used.

Each the coefficient of determination and the correlation ratio produces commensurate results as well as demonstrates high speed of splitting in comparison to greedy methods of decision tree training, which use computationally expensive iterations and consider all possible splits on every feature.

Thus, for each feature 52 1 , 52_m, which characterizes, i.e. is associated with the samples/motor vehicles 51 Ί , 51_n of the sample set 51 , a corresponding association strength is computed in the selection step 41 and/or by the selector entity 31. For the computation of the association strength of a feature 52_1 , ... , 52_m, the values of the feature 52_1 , 52_m in the sample/motor vehicle set 51 are considered.

Each sample/motor vehicle 51_1 , 51_n has a particular value of the feature 51 1 ,

52_m. According to the present embodiment, the selected feature from features 52 1 , ..., 52_m, i.e. the feature with the maximal value of the association strength is, for example, the mileage after the last inspection. For example, the selected feature - mileage after the last inspection contains values in the sample/motor vehicle set 51 that are from 0 to 12· 10⁴ km. The number of samples/motor vehicles 51_1 , 51_n in the sample/motor vehicle set 51 is, for example, n=144. The number k of groups 6_1 , 6_2, 6_k, into which the samples/motor vehicles 51 1 , 51_n in the sample/motor vehicle set 51 should be divided in step 42 and/or by the group merging entity 32, is calculated by the following formula: k = Vn The square root of 144 is 12. Thus, after splitting the sample/motor vehicle set 51 into groups 6_1 , 6_2, 6_k, 12 split groups 6_1 , 6_2, 6_k are obtained (i.e. k=12). In the present example, one group per 10⁴ km (value of the selected feature 52_1 , ... , 52_m of mileage after the last inspection) is obtained. This is shown in Fig. 8a. Then, a merging rule is applied to the split groups 6_1 , 6_2, ... , 6_12. According to the present embodiment the merging rule defines that groups nearest by the mileage are merged. I.e. beginning with groups with the nearest mean values for selected feature (mileage after the last inspection), groups with the values of target variable (status of motor vehicle) that are similar according to unpaired two-sample statistic are merged. The target variable is categorical, so we can't use parametric statistical tests, but can estimate the similarity of target variable by nonparametric olmogorov-Smirnov test, which compares the target variable 52_0 values (status of motor vehicle).

According to the present embodiment, 4 groups are derived after merging at the mileage 0 - 3 10⁴ km -> status of motor vehicle is normal; 3 10⁴ - 7 10⁴ km -> the suspension parts have to be checked; 7 10⁴ to 10 10⁴ km -> the hydraulic mechanisms have to be checked; 10· 10⁴ - 12· 10⁴ km -> all mechanisms of motor vehicle are required detailed inspection. Fig. 8b visualizes the merging of the split groups 6_\ , 6_2, 6_12. As shown in Fig. 8b, 4 sample/motor vehicle groups 6_Γ, 6_2', 6_3', 6_4' are obtained after the merging of the initially split groups 6_1 , 6_2, 6 12.

Here, it has to be pointed out, that the merging can be executed more than once. Generally, the merging is executed until the merging rule indicates that no further merging of the groups is possible. Thus, if the merging rule indicates that at least two of the 4 sample/motor vehicle groups 6_ , 6_2', 6_3', 6_4' can be merged, the corresponding groups indicated according to the merging rule as groups to be merged are merged in a further merging step.

According to the present example, no further merging is executed.

According to the present embodiment, a splitting/decision rule is generated in the splitting step 43 and/or by the splitting entity 33 for dividing the sample/motor vehicle set 51 into the 4 sample/motor vehicle groups 6_ , 6_2', 6_3', 6_4'. The generated splitting/decision rule is assigned in the splitting step 43 and/or by the splitting entity 33 to the node 7, at which the sample/motor vehicle set 51 has been considered. Then, in step 44 and/or by the node generating entity 34, corresponding new nodes 7_1, 7_2, 7_3, 7_4 are generated and associated with the corresponding 4 sample/motor vehicle groups 6_1 ', 6_2', 6_3', 6_4', as shown in Fig. 8c.

Thus, the sample/motor vehicle set 51 , associated with the node 7 that is currently considered during the decision stream generation, is divided into 4 sample/motor vehicle groups 6_Γ, 6_2', 6_3', 6_4' for determining the new nodes 7_1 , 7_2, 7_3, 7_4 to be generated in the decision stream. The sample/motor vehicle group 6_1 ' is then associated with the new node 7_1 , the sample/motor vehicle group 6_2' is associated with the new node 7_2, the sample/motor vehicle group 6_3' is associated with the new node 7_3, and the sample/motor vehicle group 6_4' is associated with the new node 7_4. The processes of selecting by 31 /in 41 a sample feature, generating by 32/in 42 a set of sample groups, generating by 33/in 43 a splitting rule, and subsequently generating by 34/in 44 corresponding new nodes of the decision stream is then continued at each previously generated new node 7_1 , 7_2, 7_3, 7_4. At each one of the previously generated new nodes 7_1 , 7_2, 7_3, 7_4, the respective associated sample/motor vehicle groups 6_1 ', 6_2', 6_3', 6_4' are considered for executing the selection by 31/in 41 of the sample feature, the generation by 32/in 42 of the set of sample groups, the generation by 33/in 43 of the splitting rule, and subsequently the generation by 34/in 44 of the corresponding new nodes of the decision stream.

When the generation of the new nodes 7_1 , 7_2, 7_3, 7_4 in a given level of the decision stream has been finished, according to the present embodiment, the new nodes 7_1 , 7_2, 7_3, 7_4 are merged for simplifying the prediction model or decision stream. In this way, new nodes 7_1 , 7_2, 7_3, 7_4 are merged that represent or are associated with similar sample/motor vehicle groups 6_Γ, 6_2', 6_3', 6_4'. In this way, prediction/decision redundancies are eliminated, and more accurate predictions/decisions are enabled.

Fig. 8d shows exemplary a merging of new nodes 7_1 to 7_4, 7'_1 to 7'_3, 7"_1 , 7"_2 after their generation by considering nodes 7, 7', and 7" in the previous level of the decision stream. Because the new nodes 7_1 to 7_4, 7'_1 to 7'_3, 7"_1 , 7"_2 are nodes of the last level of the decision stream at the time of their merging, they represent leaf nodes in the decision stream. The new nodes 7_1 to 7_4, 7'_1 to 7'_3, 7"_1 , 7"_2 to be merged are connected in Fig. 8d by dashed lines. The node merging is executed in step 45 and or by the node merging entity 35.

Fig. 8e shows an exemplary result of the node merging by 35/in 45, said result comprising new nodes 8_1 to 8_5. If further splitting of the new nodes 8_1 to 8_5 is possible by selecting by 31/in 41 a sample feature, generating by 32/in 42 a set of sample groups, and generating by 33/in 43 a splitting rule, the above described procedure is continued with regard to the new nodes 8_1 to 8_5. If no further splitting of the new nodes 8_1 to 8_5 is possible, the decision stream can be considered as being generated, i.e. the generation of the decision stream is completed.

The second use case is directed to generating of a decision stream for predicting an amount of money, which a client is ready to spend on a product or project. Input for the generation of the decision stream is historical data on investments of different clients. Every data set concerning a particular client or every sample 51 1 , ..., 51 n respectively contains values of target variable 52_0 and features 52_1 , 52_m characterizing the particular client or sample 51_1 , 51n respectively. As discussed above, the input set 51 of clients/samples 51_1 , 51 n is associated with a set 52 of features 52_1 , 52_m and target variable 52_0, and the features 52 1 , 52_m of the feature set 52 with target variable 52_0 provide a framework for specifying each one of the clients/samples 51_1 , 5 In. The features 52_1 , 52_m comprise, for example, age, gender, marital status, number of previous purchases, sum of previous payment, and time period after last purchase. Every data set is labeled with a sum of the last investment. According to the present embodiment, the decision stream is generated by 3/in 4 as follows. At first, a one sample feature from features 52_1 , ..., 52_m is selected by 31/in 41. The selected by 31/in 41 feature from features 52_1 , 52_m is one that is closely associated with the prediction or decision 52_0 - sum of investment. For each one of the features 52_1 , 52_m of the feature set 52, the association strength is calculated. Thus, with regard to each continuous feature 52_1 , 52_m (e.g., age, sum of previous investment, and time period after last investment), a respective coefficient of determination is calculated as the association strength. With regard to each categorical feature 52_1 , 52_m (e.g., gender, marital status, number of previous investments), a respective correlation ratio is calculated as the association strength. Also here, for calculating the coefficient of determination and the correlation ratio, any of the known calculation formulas can be used. The coefficient of determination is calculated, according to the present embodiment, between the sum of investment 52_0 and the continuous features 52_1 , 52_m (e.g., age, sum of previous payment, and time period after last purchase). Further, the correlation ratio is calculated according to the present embodiment for sum of investment with respect to categories of the categorical features 52 1 , ..., 52_m (e.g., gender, marital status, number of previous purchases). It is important, that the coefficient of determination and correlation ratio produce commensurate results, as well as provide high speed of splitting.

Subsequently, a one feature from features 52_1 , 52_m with maximal value of association strength, for example, the time period after previous investment is selected by 31 /in 41.

Based on the selected feature from features 52_1 , 52_m, being time period after previous investment according to the present embodiment, the samples/customers 51 _1 , ..., 5 l_n are divided by 32 / in 42 into groups 6_1 ', ..., 6_k. For example, time period after the last investment contains values in diapason 0 - 12 months. The number n of samples 51_1 , 51_n is 144, i.e. n = 144. The number of groups k is determined by the formula

The square root of 144 is 12. So, after splitting, 12 sample/customer groups 6_1 to 6 12 are obtained, one group for every months of period after previous investment. This is shown in Fig. 9a. Then, a merging of the groups 6_1 to 6_12 nearest by the time is executed according to investment sums. The merging rule is: beginning from groups 6_1 to 6_12 with the nearest mean values of target variable (i.e. prediction/decision), merge groups 6_1 to 6_12 with the values of target variable similar according to unpaired two- sample statistic. For example, according to Kolmogorov-Smirnov test, all groups have normal distribution, and, as soon as the number of examples in every group less than 30, unpaired two-sample Student's t-test is used for a comparison of a mean sum of investment in groups 6_1 to 6_12. According to the present embodiment, the result of merging comprises 4 groups 6_ , 6_2', 6_3', 6_4'^: first group 6_ of 0 - 3 months - $ 4.0; second group 6_2' of 3 - 7 months - $ 5.5; third group 6_3' of 7 - 10 months - $ 6.0; and fourth group 6_4' of 10 - 12 months - $ 5.5. This is shown exemplary in Fig. 9b.

In periods of 3 - 7 and 10 - 12 months, i.e. in groups 6_2' and 6_4', the average sum of investment is the same - $ 5.5. So, anew applying of the merging rule to all groups 6_1 ', 6_2', 6_3', 6_4' leads to merging of these two groups, i.e. of the second and the fourth group 6_2', 6_4' into one. The result of this merging is exemplary shown in Fig. 9c and comprises three groups 6_1 ", 6_2", 6_3". The groups 6_2' and 6_4' to be merged are connected in Fig. 9c with dashed lines. As soon as the merging of the groups 6_1 ", 6_2", 6_3" referring to one node 7 does not reduce quantity of leaves, the sample feature selection by 31 /in 41 is repeated, and for each one of the three groups 6_1 ", 6_2", 6_3" a corresponding sample feature 52 1 , 52_m is selected for splitting the samples of the group 6_1 ", 6_2", 6_3" into sub-groups. For example, for the first group 6_1 " the age feature is selected, for the second group 6_2" the marital status feature is selected, and for the third group 6_3" the feature of the number of previous investments is selected. Within each one of the groups 6_1 ", 6_2", 6_3", the respective sub-groups are split by the above-provided formula, then, the distribution of investment values is estimated by executing the Kolmogorov-Smirnov test, and it is found out that in all cases distribution is not normal. Thus, the sub-groups are merged by applying the merging rule based on Mann- Whitney U test, and as result nine sub-groups 9_1 to 9_9 are obtained. The result is shown exemplary in Fig. 9d.

In the first group 6_1 " (depending on the age of client) the 4 sub-groups 9_1 to 9_4 have average values of investment $ 1 , 3, 5 and 7. In the second group 6_2" (depending on the marital status) the 2 sub-groups 9_5, 9_6 have average values of investment $ 5 and 6. In the third group 6_3" (depending on the number of previous purchases) the 3 subgroups 9_7, 9_8, 9_9 have average values of investment $ 4, 6 and 7. Because average values of investment in some sub-groups are similar, so the application of the merging rule to these 9 sub-groups 9_1 to 9_9 (associated with corresponding leave nodes in the decision stream) leads to their merging into 4 sub-groups and thus into 4 corresponding leave nodes of the decision stream, as shown exemplary in Fig. 9e, where the sub-groups 9_1 to 9_9 to be merged are connected with dashed lines. Thus, the prediction model or the decision stream respectively is simplified, as shown exemplary in Fig. 9f where the four merged sub-groups 9_Γ to 9_2' are shown. In this way, the machine stream learning system 3 and/or method 4 generates a simple and precise prediction model, i.e. decision stream for the client investment willingness. Fig. 10 shows exemplary a decision stream with leaf nodes in different levels according to an embodiment of the present invention. Fig. 10 visualizes that the splitting possibility can end at different levels of the decision stream.

Depending on the type of data and peculiarities of distribution it is reasonable to estimate similarity of groups by different test statistics: Kolmogorov-Smirnov test, Mann- Whitney U test, Chi-squared test, ANOVA F-test, or pair of statistics - Z-test if size of both groups is higher or equal to 30, Student's t-test - otherwise. So, every mentioned statistic tests can be used in the group merging step 42 and/or by the group merging entity 32. For example, if there are only categorical data the best choice can be Chi- squared test. If the nature of distribution of predicted value is unknown, the Kolmogorov-Smirnov test can be used. If a normal distribution of the label (i.e. value to be predicted or decided respectively) on different features is given, it is efficient to use a pair of statistics - Z-test (size of both groups is higher or equal to 30) and Student's t-test (size of at least one of groups is lower than 30). According to another embodiment, different test statistics are applied to the selected feature depending on the type of the feature and/or of the nature of distribution of labels for this feature.

Summarizing, according to an embodiment, a machine learning system 3 is provided that generates a decision stream as a prediction model level by level beginning from the root node. The node generating entity 34 creates the root node for input group of samples and sends this node and associated samples to the selector entity 31. The selector entity 31 calculates coefficient of determination between the sum of payment and continuous features and correlation ratio between the sum of payment and categorical features. Then the selector entity 31 compares calculated values and finds out the closest association (maximal value of coefficient of determination or correlation ratio). The selector entity 31 sends an indication of the selected feature, e.g., identifier of selected feature, samples, and root node to the group merging entity 32. The group merging entity 32 splits samples on the basis of the selected feature and merges resulting groups of samples according to a merging rule. The result of the merging, i.e. the sample groups and the root node are send to the splitting entity 33. The splitting entity 33 finds out that the number of groups is higher than one and assigns to root node the rule of sample splitting into the sample groups. The splitting entity 33 sends every group to the node generating entity 34, receives from node generating entity 34 corresponding new nodes, and associates them with the respective splitting rule. Then, the node generating entity 34 sends the new nodes and the corresponding sample groups to the selector entity 31. In every sample group, the selector entity 31 calculates for all features the strength of association by calculating a coefficient of determination for continuous features and a correlation ratio for categorical features. According to the maximal strength of association, the selector entity 31 selects for each one of the groups a corresponding feature. For every group, the selector entity 31 sends an identifier of the selected features, the respective samples and nodes to the group merging entity 34. The group merging entity 34 makes splitting on the basis of selected feature and performs merging. For every group, the splitting entity 33 and the node generating entity 34 assigns a corresponding splitting rule and generates a corresponding new node. In the next cycle of splitting, the group merging entity 32 merges every input group into one output group, and so the splitting entity 33 sends all groups to the node merging entity 35. The node merging entity 35 adds to each one of the leaf nodes the respective sample group, which can't be merged, and sends to the node generating entity all groups, which are merged. As soon as all threads of merging are stopped, a decision stream prediction model is generated.

In view of the aforesaid, the proposed methodology comprises steps repeated while new nodes are generated, wherein said steps and their different levels of concretization are as follows:

1. Select feature for splitting by executing the following:

A) for every feature of object/sample estimate strength of its association with target variable (e.g., prediction/decision): for a continuous feature - by calculating a coefficient of determination, for a categorical feature - by calculating a correlation ratio; and

B) select a feature that is most closely associated with the target variable (i.e. prediction/decision) such that the selected feature has the maximal strength of association.

2. Split labeled objects/samples by executing the following:

A) if the selected feature is categorical, use categories as separate groups, otherwise (i.e. the selected feature is continuous) split objects/samples into sample groups and merge the sample groups while possible by executing the following:

a) for every sample group, select a pair with the nearest mean values of feature.

b) estimate the significance of sample group similarity by target variable by executing the following:

for every sample group estimate (e.g., by Kolmogorov- Smirnov test) the normality of label distribution;

for every pair of group estimate difference between values of target variable: if at least one distribution is non-normal use Mann- Whitney U test statistic, else - if size of both groups is higher or equal to 30, use unpaired two-sample Z-test statistic, otherwise use unpaired two- sample Student's t-test statistic;

c) merge pair of groups similar by the values of target variable by executing the following:

- select a pair with the highest significance level;

apply Bonferroni correction to the significance level; if significance level above predefined threshold, then merge selected pair.

B) merge groups (output of step A) while possible by executing the following:

a) select pairs of groups with the nearest mean values of target variable.

b) merge the selected pairs according to steps b - c of item 2A. 3. Merge leaves while possible by executing the following:

A) merge leaves by groups of objects as in 2B;

B) replace merged leaves with new nodes (i.e. result of merging).

Here, also deviations from the above-presented methodology may occur. The differences or deviations comprise the use of different statistics, used for estimating the significance of similarity of groups in step 2A-b. The following statistics can be used: Kolmogorov-Smirnov test, Mann-Whitney U test, Chi-squared test, ANOVA F-test. One more alternative for statistics is the pair of tests for significance estimation, wherein Z-test is used if size of both groups is higher or equal to 30, and Student's t-test is used otherwise.

After generation of a decision tree its opportunities for following training are restricted, due to big quantity of branches and small number of data in the leaves. Oppositely, the decision stream, generated as described herein, provides the ability to do continuous training on the basis of additional data, by generation of new nodes and their merging with existing leaves. According to the strategy of continuous training, the prediction model can be adjusted every time, when the number of accumulated impotent correction is significant. The present invention relates also to an autonomously operating device or to an autonomously decision determining computing device respectively arranged to autonomously determine a decision by use of a decision stream generated by the machine learning system 3 and/or by execution of the machine learning method 4.

Fig. 1 1 visualizes an exemplary operating of an autonomously operating device 1 according to an embodiment of the present invention. The embodiment of Fig. 1 1 is exemplary related to the first use case discussed above. Particularly, the autonomously operating device of Fig. 1 1 is a system 1 for automatic inspection of motor vehicles. It has to be pointed out that the automatic inspection of motor vehicles is just one use case of plenty of different possible use cases and that the present invention is not limited by this use case.

The system 1 for automatic inspection of motor vehicles uses functional parameters of cars from their built-in sensors. The system 1 takes as input continuous parameters, for example, coolant temperature, engine oil pressure, accumulator voltage level, as well as categorical parameters, for example, emission and engine noise discrete levels, car type and manufacturer identity. The system 1 comprises, according to the present embodiment, one or more diagnostic entities 1 1 , a data entity 12, a decision stream prediction entity 14, and a training entity 15.

Each one of the one or more diagnostic entities comprises a sensor parameter reader 1 1 1 , an output 1 12 for automatic diagnostic summary and an input 1 13 for updated summary. The sensor parameter reader 1 1 1 is arranged to read parameters from built-in sensors of cars of different type. For example, the sensor parameter reader 1 1 1 has one or more adapters each configured to receive parameters from one or more cars, wherein the cars may be of different types. The sensor parameter reader 1 1 1 sends the car parameters to the data entity 12. The sending of the car parameters is done, for example, via a communication network 13. According to the present embodiment, the one or more diagnostic entities 1 1 , the data entity 12, the decision stream prediction entity 14, and the training entity 15 are connected to the communication network 13 for communication purposes. The data entity 12 saves the received car parameters. For this purpose, the data entity 12 comprises, for example, a database 121. The data entity 12 sends the car parameters to the decision stream prediction entity 14. The decision stream prediction entity 14 holds/comprises a pre-trained decision stream as prediction model, which, on the basis of car parameters transmitted by the data entity 12 to the decision stream prediction entity 14, generates a prediction summary comprising: a type and a probability of defects in a given car, with regard to which the prediction is executed. The decision stream prediction entity 14 sends the prediction summary to the output 1 12 of the diagnostic entity 1 1 , which represents them to specialists.

According to automatic recommendations of the diagnostic system 1, i.e. according to the prediction summary provided by the output 1 12, the repair work can be carried out.

The result of car analysis during repairing, e.g. updated summary of diagnostic, is sent to the diagnostic entity 1 1 via the input 1 13. The result of the car analysis it then provided by the diagnostic entity 1 1 to the data entity 12. The data entity 12 saves the updated summary. E.g. the data entity 12 stores the updated summary in its database 121.

If the number of accumulated impotent correction of predicted results is significant (e.g., is above a given threshold), the data entity 12 sends con-esponding data on values of features and corrected result of prediction to the decision stream training entity 15. The decision stream training entity 15 reads from the decision stream prediction entity 14 the prediction model, i.e. the decision stream, and performs a further training of the decision stream in the following way. The decision stream training entity 15 marks all leaf nodes of the decision stream as non-leaf for training, passes the data on samples (i.e. the data on cars as samples) through the decision stream, splits the data on samples in the nodes of the decision stream where possible, generates new nodes and merges them according to the above method 3 for decision stream generation. When the prediction model training is finished, the decision stream training entity 15 sends the anew trained decision stream to the decision stream prediction entity 14. The decision stream prediction entity 14 updates the decision stream and uses the updated decision stream for the next predictions or decisions. If the size of the decision stream reaches a critical value, e.g. if the storage space required for storing the decision stream is larger than a corresponding predetermined size threshold or inference time of prediction is longer than predetermined time threshold, the decision stream training entity 15 is configured to train the decision stream from scratch, i.e. generate a new decision stream.

As shown above, the machine learning system 3 and the machine learning method 4 not only generate an entirely new decision stream but also correct, supplement, and/or update an existing decision stream. Thus, the generation of a decision stream comprises both a generation of a new decision stream and update of an existing decision stream. The steps of the machine learning method 4 and the actions of the machine learning system 3, as described herein, are executed in the same way in both cases.

Thus, the present invention relates to a machine learning system and a machine learning method. At first, from a sample set, a sample feature is selected that has a closest relation to target variable representing prediction/decision values. The sample set comprises a plurality of samples, based on which the prediction is met and which are considered at a node of the decision stream. The samples have one target variable and one or more sample features. Then, a set of sample groups is generated by splitting the samples into at least two sample groups according to the selected sample feature, and by executing merging of sample groups that meet a merging rule. Subsequently, a splitting rule is generated for splitting the sample set into the set of sample groups and assigned to the node of the decision stream

The invention has been described in conjunction with various embodiments herein. However, other variations to the enclosed embodiments can be understood and effected by those skilled in the art and practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

A machine learning system arranged to generate a decision stream to be used by an autonomously decision determining computing device, wherein the machine learning system comprises:

a selector entity configured to select from a sample set a sample feature of samples of the sample set that has a closest relation to a target variable, wherein the sample set comprises a plurality of samples, based on which the prediction is met, wherein the samples of the sample set are samples considered at a node of the decision stream, and wherein the samples of the sample set have one or more sample features and one target variable;

a group merging entity configured to generate a set of sample groups by splitting the samples of the sample set into at least two sample groups according to the selected sample feature, and by executing a merging of sample groups that meet a merging rule;

a splitting entity configured to generate a splitting rule for splitting the sample set into the set of sample groups and to assign the splitting rule to the node of the decision stream.

The machine learning system according to claim 1 , wherein the selector entity is configured to select the sample feature that has a closest relation to the target variable by:

calculating, for each sample feature of the one or more sample features that is a continuous feature, a corresponding coefficient of determination between values of the sample feature and target variable of samples in the sample set; calculating, for each sample feature of the one or more sample features that is a categorical feature, a corresponding correlation ratio of target variable with respect to categories of the sample feature of samples in the sample set; and selecting the sample feature from the one or more sample features that has the largest coefficient of determination or that has the largest correlation ratio.

The machine learning system according to any one of the preceding claims, wherein the group merging entity is configured to: split the samples of the sample set into the at least two sample groups according to categories of the selected sample feature if the selected sample feature is a categorical feature; and

split the samples of the sample set into the at least two sample groups according to values of the selected sample feature if the selected sample feature is a continuous feature.

The machine learning system according to any one of the preceding claims, wherein the group merging entity is configured to execute the merging of the sample groups that meet the merging rule by:

determining, among the at least two sample groups, at least one pair of similar sample groups; and

executing a merging process on the pair of similar sample groups.

The machine learning system according to claim 4, wherein the group merging entity is configured to determine the pair of similar sample groups by determining a group similarity probability for a pair of sample groups, and deciding that the pair of sample groups is a pair of similar sample groups if the probability of similarity between values of target variable in two sample groups is above a predetermined probability threshold.

The machine learning system according to claim 4 or 5, wherein the group merging entity is configured to execute the merging process by:

estimating a significance of similarity of values of target variable of the samples of the pair of similar sample groups; and

merging the pair of similar sample groups if the estimated significance of similarity is above a predetermined threshold.

The machine learning system according to claim 6, wherein:

the group merging entity is configured to estimate the significance of similarity by Kolmogorov-Smirnov test statistic method;

the group merging entity is configured to estimate the significance of similarity by Mann-Whitney U test statistic method; the group merging entity is configured to estimate the significance of similarity by:

determining, for the pair of similar sample groups, unpaired two- sample Z-test statistic method as the statistical method to be used if a number of samples in every group of the pair of similar sample groups is larger than or equal to 30; and

determining, for the pair of similar sample groups, unpaired two- sample Student's t-test statistic method as the statistical method to be used if a number of samples in the at least one sample group of pair of similar sample groups is smaller than 30; and/or

the group merging entity is configured to estimate the significance of similarity by:

determining, for each sample group of the pair of similar sample groups, a corresponding normality of distribution of values of target variable of the samples of the sample group;

determining, for the pair of similar sample groups, at least one statistical method, to be used for the estimating the significance of similarity, according to the corresponding normality of distribution of the values of target variable of the samples in the sample groups of the pair of similar sample groups; and

estimating the significance of similarity of values of target variable of the pair of similar sample groups by executing the corresponding determined at least one statistical method.

The machine learning system according to claim 7, wherein the group merging entity is configured to:

determine, for the pair of similar sample groups, Kolmogorov-Smirnov test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as non-normal;

determine, for the pair of similar sample groups, unpaired two-sample Z- test statistic method as the statistical method to be used if the distribution of the values target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as normal and if a number of samples in every group of the pair of similar sample groups is larger than or equal to 30; and determine, for the pair of similar sample groups, unpaired two-sample Student's t-test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of sample groups of the pair of similar sample groups is determined as normal and if a number of samples in the at least one sample group of pair of similar sample groups is smaller than 30.

The machine learning system according to any one of claims 6 to 8, wherein the group merging entity is configured to merge the pair of similar sample groups by: applying a Bonferroni correction to the significance of similarity of the values of target variable of the pair of similar sample groups with regard to the quantity of compared pairs of sample groups; and

if the corrected significance of similarity is above the predetermined threshold, merging the pair of similar sample groups.

10. The machine learning system according to any one of the preceding claims, wherein the machine learning system further comprises:

- a node merging entity configured to merge leaf nodes of the decision stream; and/or

a node generating entity configured to generate new nodes associated with the node of the decision stream, wherein each one of the new nodes is associated with one sample group of the set of sample groups generated by the group merging entity, and wherein each sample group of the set of sample groups is associated with one new node.

The machine learning system according to claim 10, wherein the node merging entity is configured to merge the leaf nodes of the decision stream by:

determining, among the leaf nodes of the decision stream, at least one pair of similar leaf nodes, wherein a pair of leaf nodes is considered as a pair of similar leaf nodes if an average value of target variable in samples of a first leaf node of the pair of leaf nodes is equal to an average value of target variable in samples of a second leaf node of the pair of leaf nodes and/or if a probability of similarity between the values of target variable in the first and the second leaf nodes is above a predetermined leaf node similarity threshold; and

merging the pair of similar leaf nodes if an estimated significance of leaf node similarity is above a predetermined leaf node similarity threshold.

The machine learning system according to claim 1 1 , wherein:

the node merging entity is configured to estimate the significance of leaf node similarity by Kolmogorov-Smirnov test statistic method;

the node merging entity is configured to estimate the significance of leaf node similarity by Mann- Whitney U test statistic method;

the node merging entity is configured to estimate the significance of leaf node similarity by:

determining, for the pair of similar leaf nodes, unpaired two- sample Z-test statistic method as the statistical method to be used if a number of samples in every node of the pair of similar leaf nodes is larger than or equal to 30; and

determining, for the pair of similar leaf nodes, unpaired two- sample Student's t-test statistic method as the statistical method to be used if a number of samples in the at least one leaf of pair of similar leaf nodes is smaller than 30; and/or

determining, for each leaf node of the pair of similar leaf nodes, a corresponding normality of distribution of the values of target variable of the samples of the leaf node;

determining, for the pair of similar leaf nodes, at least one statistical method, to be used for the estimating the significance of similarity, according to the corresponding normality of distributions of the values of target variable of the samples of leaf nodes of the pair of similar leaf nodes; and

estimating the significance of leaf node similarity of the values of target variable of the samples of leaf nodes of the pair of similar leaf nodes by executing the corresponding determined at least one statistical method.

The machine learning system according to claim 12, wherein the node merging entity is configured to:

determine, for the pair of similar leaf nodes, Kolmogorov-Smirnov test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of leaf nodes of the pair of similar leaf nodes is determined as non-normal;

determine, for the pair of similar leaf nodes, unpaired two-sample Z-test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of leaf nodes of the pair of similar leaf nodes is determined as normal and if a number of samples in every node of the pair of similar leaf nodes is larger than or equal to 30; and

determine, for the pair of similar leaf nodes, unpaired two-sample Student's t-test statistic method as the statistical method to be used if the distribution of the values of target variable of the samples of at least one of leaf nodes of the pair of similar leaf nodes is determined as normal and if a number of samples in the at least one leaf of pair of similar leaf nodes is smaller than 30.

The machine learning system according to any one of claims 1 1 to 13, wherein the node merging entity is configured to merge the pair of similar leaf nodes by: applying a Bonferroni correction to the significance of leaf node similarity with regard to the quantity of compared pairs of leaf nodes; and

if the corrected significance of leaf node similarity is above the predetermined leaf node similarity threshold, merging the pair of similar leaf nodes.

A machine learning method arranged to generate a decision stream to be used by an autonomously decision determining computing device, wherein the machine learning method comprises:

selecting from a sample set a sample feature of samples of the sample set that has a closest relation to a target variable, wherein the sample set comprises a plurality of samples, based on which the prediction is met, wherein the samples of the sample set are samples considered at a node of the decision stream, and wherein the samples of the sample set have one or more sample features and one target variable;

generating a set of sample groups by splitting the samples of the sample set into at least two sample groups according to the selected sample feature, and by executing a merging of sample groups that meet a merging rule;

generating a splitting rule for splitting the sample set into the set of sample groups and to assign the splitting rule to the node of the decision stream.

16. An autonomously operating device arranged to autonomously determine a decision by use of a decision stream generated by a machine learning system according to any one of the preceding claims and/or by an execution of a machine learning method according to claim 15.