EP1384199A2

EP1384199A2 - Method for determining competing risks

Info

Publication number: EP1384199A2
Application number: EP01999919A
Authority: EP
Inventors: Ronald E. Kates; Nadia Harbeck
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-12-07
Filing date: 2001-12-07
Publication date: 2004-01-28
Also published as: WO2002047026A3; AU2002216080A1; US20040073096A1; WO2002047026A2; US7395248B2

Abstract

The invention relates to a method for determining competing risks for objects after an initial event based on already measured or otherwise objectifiable training data sets, in which a number of signals obtained from an adaptive system are combined in an objective function in such a manner that the adaptive system can identify or predict the underlying probabilities of the respective competing risks.

Description

Procedure for identifying competing risks

Field of the Invention

The invention relates to a method for determining competing risks after an initial event with the aid of systems capable of learning on the basis of data that has already been measured or can otherwise be objectified (training data).

State of the art

Systems capable of learning, such as neural networks, are increasingly being used for risk assessment because they are able to recognize and present complex, previously unknown relationships between raised factors and outcomes. This capability enables them to provide more reliable or more precise estimates of risk probabilities than conventional methods, which have to be based on a special form of the relationship, such as a linear dependency.

In the field of medical applications, for example in the treatment of cancer, it is known to learn systems such as neural networks or recursive partitioning (such as the known method CART, "Classification and Regression Trees", see for example: L. Breiman et al. , "Classification and Regression Trees", Chapman and Hall, New York (1984)) to determine the risk probability of an event even with censored data. (One speaks of a censored data set if the event has not yet arrived by the last observation time.) The determination of the risk probability (for example, for a new illness (recurrence)) subsequently serves as an example for the use of adaptive systems for cancer primary treatment to support the therapy decision.

The factors of the data sets comprise a number of objectifiable parameters, on the values of which a person operating the learning system has no influence. In the case of primary breast cancer, these parameters include, for example Age at the time of surgery, number of lymph nodes affected, laboratory value of the uPA factor, laboratory value of the PAI-1 factor, characteristic value for the tumor size, laboratory value of the estrogen receptor, laboratory value of the progesterone receptor.

The type of therapy actually used can be recorded as an indication, so that the relationship between therapy and outcome is also recognized.

The values are temporarily stored on a suitable storage medium and fed to the system capable of learning. On the one hand, however, the individual details are usually associated with an unsharpness, analogous to signal noise. From these noisy individual signals, it is the task of the adaptive system to form refined signals that can lead to a risk assessment within the framework of a suitable probability representation.

The ability of a neural network to learn, even for non-linear relationships, is a consequence of the architecture and the way it works. A so-called "multilayer perceptron" (in the technical literature always abbreviated as "MLP") contains, for example, an input layer, a hidden layer, and an output layer. The "hidden nodes" in the neural network perform the task of generating a signal for the probability of complex internal processes. They can therefore use the underlying, but not directly detectable, biological processes, which are ultimately decisive for the course of a disease will be, provide information.

Internal biological processes can take place in parallel with each other at different rates and can also interact with each other. Systems capable of learning can also recognize and represent such internal processes that cannot be observed directly, the quality of this detection being subsequently and indirectly noticeable through the quality of the prediction of the actually observed events. Recursive partitioning (such as CART) creates assignments that are analogous in their ability to represent complex internal relationships to the capabilities of the neural networks.

The course of a disease can lead to different critical events, the prevention of which may require different therapeutic approaches. In the case of the first recurrence in breast cancer, the findings are clearly classified, for example, in the mutually exclusive manifestations

1. "distant metastases in bone tissue",

2. "Distant metastases, but no finding in the bone tissue"

3. "Loko-regional" recurrence

possible.

However, since the further course of the disease based on one of these findings can also be influenced with regard to the probabilities of the other forms of recurrence, it is often sensible in the context of statistical treatment of such data to examine only the first recurrence. For example, for a breast cancer patient who has a local recurrence 24 months after the primary surgery and who has the finding "bone metastasis" after 48 months, only category 3 is relevant with regard to "first recurrence". Follow-up observation of bone metastasis is not used in this context, i.e. the patient is to be regarded as "censored" with regard to finding 1 as soon as another finding (here local recurrence) has been determined.

Competing risks can also arise from the fact that a patient dies, for example, from a completely different illness or from a side effect of the treatment, so that the risk of the characteristic of interest to the doctor remains hidden.

It is relatively obvious to experts that an exclusive classification with a censoring rule can map the training data in such a way that for each possible output a neural network or a classification tree can be trained by recursive partitioning according to the state of the art. In the example with the outputs 1 to 3 one would have to train three completely independent neural networks or three different decision trees.

A problem with this use of the prior art is that the detection of a possible informative value of internal nodes with regard to one of the disease outlets is lost for the detection of its informative value with regard to the other disease outlets. In reality, however, an internal biological process recognized by internal nodes in a neural network could contribute to several observable outputs, albeit with different weightings. For example, the biological "invasiveness" of a tumor is of different but significant importance for distant metastases or for local recurrences. The independently trained networks must independently "discover" the meaningfulness of an internal connection represented by the node.

It goes without saying that the number of actual events that are available to a system capable of learning, similar to the size of a statistical system, also determine the quality of recognition. This number is usually limited in medical applications. As a result, the likelihood that internal processes will be barely noticeable on one of the outputs, but not on the other factors, is relatively high. In this case, the potential meaningfulness to differentiate the factors, as well as the biological explanatory potential of the internal nodes also with regard to the further outputs, is lost.

Since therapies also have side effects, a reduction in the risk of developing a disease at the expense of an increase in another risk is typical of the medical decision-making context. For this, the need in the current state of the art to have to train a completely new neural network with regard to each individual risk is unsatisfactory.

According to the prior art, factors whose effect on the probability of the outputs are variable in time can be represented by different nodes in the output layer, to which different time dependencies (for example by the known technique of "fractional polynomials") are assigned. A variable in time for Event density is possible in the prior art, but the problem is competing risks cannot be formulated in such a way that the determination of a time-variable statement is not impaired.

In view of the disadvantages of the prior art, the object of the invention is to provide a method with which competing risks can be detected, identified and represented in their logical or causal context, in particular in such a way that the determination of a temporally variable statement is not impaired ,

Description of the invention

This object is achieved by the method according to claim 1.

The method according to the invention can be used to assign suitable characteristic values to the competing risks through the system capable of learning. These characteristic values are intended to enable the calculation of the conditional probability per unit of time for the occurrence of the respective event (provided that none of the possible end events has occurred to date). “Suitable” characteristic values in the sense of the invention can have the property that a maximum of the statistical “likelihood” regarding all outputs is aimed for.

It is understood that this method in various areas, such as. B. engineering and economics, biology or medicine can be used. In the field of medicine, the objects can then be patients who are subject to competing risks after a first illness, the initial event, of suffering another illness.

Advantageously, data of the initial event and a follow-up observation up to a predetermined time are used for the method for the training data sets or are objectively recorded in some other way.

It is advantageous if the last point in time of the follow-up is explicitly used in the training data records. The method according to the invention can thus also make it possible to use other characteristic values in the context of a trained, learnable system, as long as these characteristic values can be formed from the follow-up observations in a manner analogous to the statistical likelihood.

In an advantageous embodiment, when observing one failure characteristic at a time, the other characteristics are excluded. In this way, a manifestation of a failure can preferably be taken into account.

The objective function L is advantageously given as a function of a function P:

Here, μ means the parameters of the system capable of learning. ("LS" stands for "learnable system".) F _{LS ktX)} (.) Denotes the failure rate of the expression k and S _{LS {kx)} (t.) _Den

Expected value of the proportion of objects j with observed characteristics x, - which do not suffer a failure of the characteristic k at the time tj. P is determined on the basis of the logical relationship from δ _jk , where δ _jk = 1 if an object j fails the expression /. suffered at time _> and otherwise δ _jk = 0.

Advantageously

L (μ- {x _tj , δ _ß }) = l [fl [f _k , _Xj) (_ _,>('y) h

7 = 1 k = \ used as an objective function, where ε _jk and ψ _j are determined from δ _Jk based on the logical relationship.

It is an advantage if

is used as an objective function.

In a preferred alternative, a neural network is used as the learning system. In this case, the above objective function L can have the form depending on P.

It is particularly advantageous if a neural network of the MLP (multilayer perceptron) architecture is used.

In another preferred alternative, the adaptive system performs recursive partitioning, where

a node is assigned to each object,

the frequencies or probabilities of all occurrences are assigned to each node, and

the partitioning is carried out in such a way that the objective function is optimized which statistically takes these frequencies or probabilities into account.

The learnable system is preferably used in the context of a decision-making aid.

It is advantageous if the various probability functions of the competing risks are assigned values to determine a strategy. A therapy strategy can thus be determined, for example, in a medical application of the present invention.

The method according to the invention for determining competing risks is described further below with reference to the drawing. The drawing shows:

FIG. 1 shows a representation of a neural network in an implementation as an MLP,

FIG. 2 shows a Venn diagram of competing risks, and

Figure 3 is an illustration of a trained neural network with three competing risks.

The embodiments described below relate to medical applications, but this should not be understood as a limitation. The following description uses the terminology of the MLP architecture neural networks. However, the application to other architectures of the neural networks as well as for regression trees is analogous and can be understood by experts without further description.

In particular, an additional dimension of the starting layer of the learning system is introduced according to the invention, wherein

• the additional dimension of the starting layer comprises at least two nodes

• The nodes of this additional dimension correspond to the different initial events

Each output node is assigned to a signal,

• The individual signals are each assigned to a risk function with regard to the possible events.

• the signals of the output functions are combined to form an overall signal

• The system capable of learning is trained by using the values of the total signals for all data sets as a lens function for the system

A system trained in this way supports the attending physician and the patient, for example, in the decision for one of several different therapeutic approaches by determining which of the possible manifestations of the risk of recurrence should be directed to the therapy.

Problem description and overview

The goal of individualized patient prognosis with competing risks can be understood mathematically in such a way that several functions fι (x) f ₂ (x) f ₃ (x), ... with the system capable of learning, here with a neural network NNι (x), NN ₂ (x), .... are approximated. More precisely, the neural network estimates the expected value E (y _k | x) of the stochastic variable y _k for observed features x:

NN _k (x) f _k (x) = E (y _k \ x). The neural network can first be represented schematically in the current implementation as an MLP in the exemplary embodiment as in FIG. 1.

All squares represent neurons here. The neurons shown in the picture above either deliver

• raw patient characteristics (for primary breast cancer, for example, uPA, PAI-1, number of affected lymph nodes, etc.) or

• quantities already prepared from these characteristics (e.g. values adjusted for mean or median and standardized by standard deviation of the value distribution) or

• quantities derived from previous knowledge or other statistical methods. Together, these neurons form the input layer.

The middle neurons form the internal layer. Several internal layers can also be provided. Each internal neuron processes the signals from the input neurons and passes on a signal. The mathematical relationship between the "inputs" to the internal neurons and their "outputs" is controlled by leveling out synaptic weights.

The lower neurons provide estimates for the desired parameters (e.g. expected value of survival) and form the starting layer.

In order to teach the network the assumed relationship fι (x) f ₂ (x) f ₃ (x), ..., m patients are available. A data pattern (x, y) is assigned to each patient, with the output variables y being to be understood as “vectors” (y = [yι, y ₂ , y3, ...]) for competing risks. The network must therefore be based on the quantity the data pattern {(x ¹ , y ¹ ), ..., (x ^m , y ^m )} learn the underlying dynamics. The index as a superscript relates to the patient. During learning, the synaptic weights are now adjusted.

The architecture used in the embodiment consists of a classic multilayer feedforward network. Neurons are organized in layers as described above. Connectors exist in the embodiment as follows

• Entry layer -> hidden layer

• Entry layer - exit layer

• Hidden layer - starting layer The use of connectors input layer -> output layer is expedient but not mandatory for the function of the invention, because they are not absolutely necessary for the representation of an image NN (x).

Function of neural networks

Neurons as functions

Each neuron receives a stimulation signal S, processes it according to a predetermined activation function F (S) and outputs a corresponding response signal A = F (S), which is fed to all subsequent neurons that are still connected to it. In the embodiment, the activation function of the hidden layer is the hyperbolic tangent. The invention can also be used using other activation functions such as the logistic function.

Transformations and input neurons

The factors are initially transformed univariate so that they are in an interval of the order of 1.

In other words, the median XMedian is subtracted and the values are scaled with a factor x _Q : values above the median are scaled with the 75% quantile, values below the median with the 25% quantile. The tanh function is then applied.

The input neurons have a static function and are therefore implemented as fields that pass on the transformed values. Conceptually, the tanh function of equation (1a) can be seen as the activation function of the input layer.

Hidden neurons

We are looking for the exit of the hidden node h for patient j. First it is checked whether the hidden node h is still active. If active, the input signals are multiplied by the associated weights to form the sum w _h -x. More specifically, the hidden node signal h at pattern j is a weighted sum of the shape's inputs ^z h (J) = ∑ _i ^w ih ^χ iU),

where w _{ih is} the weight of the connector from the input neuron i to the hidden neuron h, and Xi (j) represents the (scaled) response of the i-th input neuron. The answer of the hidden neuron h is

r _h (j) = F _h (z _h (j) -b _h ). (2.a)

Here b _{h is} the bias of the hidden neuron h, which is mathematically optimized like any other weight of the network. In the exemplary embodiment, the nonlinear activation function F _{h is} the hyperbolic tangent.

output node

We are looking for the exit of the starting node o for patient j. First, it is checked whether the output node o is still active. Connectors from the hidden layer as well as from the input layer are possible. For each connector that is still active, the associated input signals are multiplied by the corresponding weights.

The signal z _Q is initially generated: the bias of the neuron b ₀ is subtracted, and the activation function of the output neuron o is applied to this result. The output O ₀ 0) thus becomes

0 ₀ (j) = F ₀ (z ₀ (j) -b ₀ )

The activation function of the starting layer is chosen as the identity function in the exemplary embodiment.

In the exemplary embodiment, in contrast to the hidden layer, the total bias is not freely optimized, but is chosen so that the median signal of all output neurons is zero. This is possible without restricting the generality of the model. The number of parameters to be optimized is thus reduced by the number of bias parameters.

Survival analysis for competing, time-variable risks in Framework of models capable of learning

Relation to the system capable of learning

A patient population with existing covariates (prognostic factors) X _j , which were measured at an initial point in time t = 0 (approximately at the time of the primary operation) and end points t _j . If the expression k of the jth patient fails at time t _j , δ _jk = 1 (k = 1,2,3, ...) is defined; if censored (further course after t = t _j unknown), δ _jk = 0 defined.

Let S _k (f) be the expected value of the proportion of patients at a time t who do not experience a failure of the expression k, where S _k (∞) = 0 and S _k (0) = 1. It is advisable to have a failure rate f _k (f) and a “hazard function” λ _k (f) according to

4 (0 Λ ⁽ (3.a)

3. ⁽ to define 0 such that λ _k (t) = - [\ ogS _k (t)] (3.b)

applies.

The interpretation of these individual failure rates is as follows: If it were possible to avoid the other characteristics without influencing the characteristic k, f _k (ή would approximate the observed failure rate. In reality f _k (f) is not observed directly the use of the invention in the context of a decision-making aid, however, requires the recording of all forms f _k (f), so that the effectiveness of reducing a form can be taken into account with regard to the overall well-being of the patient.

If the course of the hazard functions λ _k (t) is known, the S _k (ή is obtained by integrating Eq. (3.b) with the initial condition S _k (0) = 1.

At a point in time r after the primary operation for a patient with covariates x, we receive a "hazard function" λ _k (t \ x) from the neural network, which now depends on covariates x. We set the model for the hazard function for given covariates x λ _k (t \ x) = λ _kQ (t) h _k (t \ x) (4.)

With

The functions B _t (t) are chosen to suit the problem. Spline functions are possible here, for example. In the exemplary embodiment, fractional polynomials are preferred for B _t (t), ie B, (t) = t ^{(W) 2} .

So you get

λ _ok exp ∑NN _u x) B _t (t) = - logfe (t)). (6.)

Here, the second equation λ _{0 is} regarded as a constant. The time dependence is in the coefficient B. This model is a proportional hazards model if Bi = 1 and all other B | disappear. Deviations from "proportional hazards" can be modeled by considering terms Bι with l> 1.

In a wide class of applications, lens function takes shape

L (μΛx _J , t _J , δ _jk }) _(7-) optimized, whereby the spelling expresses that P (initially in a manner not yet specified) may depend on the respective survival or failure probabilities. This dependency is problem-related and emerges from a logical model for the occurrence of the different forms. A preferred class of lens functions of the shape (7th) can be understood as statistical likelihood functions, whereby for the embodiment

[ _* _) (' _/ l ^jt (8.) is chosen. The two arguments fNN (k, x) and SNNΓJ _CX ) ^are clearly determined provided that the neural network or the other model capable of learning supplies the corresponding values for the output nodes. This is always the case in the embodiment.

Here ε _jk and ψ _jk are to be determined from δ _j on the basis of the logical relationship, where δ _jk = 1 if patient j has suffered a failure of the expression k at time t _j and otherwise δ _jk = 0. Censored data records correspond to patients, that have not suffered any failure at all, so that δ _jk = 0 for all k = 1,2,3, .... The functional dependency on the model is symbolically characterized by variable parameters μ. An example for the determination of ε _jk and ψ _jk is given below.

In the embodiment, the parameters denoted by μ are the survival time scales λ _ok and the weights of the neural network. The index j denotes the patient record.

In the embodiment, the time integral for solving equation 6 is solved by the standard method “Romberg integration”. Any time dependencies of the functions Bι (t) can thus be taken into account.

At a point in time t let S (t) be the expected value of the proportion of patients who have not suffered failure of any of the expressions k = 1, ..., K. In the embodiment, this size is given by the product of the individual probabilities:

Specification of the embodiment for an example

For a complete specification of the embodiment, the sizes müssen _jk and ε _jk must now be specified. In the following, two cases of the embodiment are fully specified with regard to these functions, which are typical for the application of the invention in the context of the competing risks.

Consider a disease in which failure has three forms. The patient is observed in month t (. = 1,2, ...). Any combination of the three characteristics or no failure at all can be observed in month t, so that the patient is "censored". The situation is illustrated as a Venn diagram in Figure 1. In the case of breast cancer, the three types could be bone metastases (B for "bone", k = 1), other distant metastases (D for "distant", k = 2), or local / regional (L for "local", k = 3). A simultaneous occurrence of all three forms in the observation month t is possible. However, it may be that for clinical, pharmacological or data-technical reasons, the follow-up in month t is given according to the following logic:

• bone metastases Qa / no)? o If yes, then εji o If no: other distant metastases Q ^' a / no)?

^■ If yes, then ε _j i = 0 ε _j2 = 1 ε _j3 = 0 ψ _j i = 1 ψ _j2 = 0 ψ _j3 = 0

^■ If no: local / regional O ^' a / no)

• If yes, then ε _j = 0 ε _j2 = 0 εj ₃ = 1 ψ ji = 1 ψ _j2 = 1 ψ _j3 = 0

• If no, then ε _j1 = 0 ε _j2 = 0 j3 = 0 ψ _j i = 1 ψ _j2 = 1 ψ _j3 = 1

In other words:

In this assignment, priority is given to the observation "bone metastases", for example, so that no question is asked as to whether or not the other manifestations occur at time t. Therefore, in the case of "bone metastases yes", the contribution to the likelihood function (8) is the jth According to this logic, the patient was apparently given solely by the term fuNfl _j) (no term with S _NNftD .)

In the case of "no bone metastases, but other distant metastases", a contribution f _N N (2, j) x S _N N (I, J) emerges from the assignment.

For the case "neither bone nor other distant metastases, but local / regional disease", a contribution f _{m (i)} x S NNO j> x S NN (_J) results from the assignment.

In the case of "censored", a contribution S _N N (i, j) X SNN <? J) XS _{NN (3 (j)} results from the assignment.

An application of the invention for measurements in which the presence or absence of several occurrences is always observed and taken into account at time t is possible if the above equations with corresponding equations for the Probability of simultaneous observation of several occurrences with estimated values of the failure probabilities are replaced.

Building a neural network to identify competing risks

1 shows the structure of a neural network with the MLP architecture. In this case the neural network comprises

An input layer with a plurality of input neurons j (i for “input neuron”)

At least one intermediate layer with intermediate neurons N _h (h for “hidden neuron”)

An output layer with a plurality of output neurons N ₀ (o for “output neuron”)

• A large number of connectors that connect two neurons from different layers.

In the embodiment according to FIG. 1, a two-dimensional starting layer is shown in order to illustrate the possibility for the simultaneous display of temporally variable and also competing risks. The simplified representation of non-time-variable risks is the special case in which only the characteristic dimension is necessary.

The number of input neurons Ni initially used is usually chosen in accordance with the number of objectifiable information available for the patient collective. According to the state of the art, methods are available which either automatically reduce the number of input neurons in advance to a level that is acceptable for the respective computer system or automatically remove unnecessary input neurons in the course of the optimization, so that in both cases the determination of the ultimately input neurons used without intervention of the respective operator.

In the embodiment according to FIG. 1, the original number of hidden neurons is determined by the original number of input neurons, i.e.

N _h = Ni (10.a) For this case, methods are available according to the state of the art, which enable the connectors to be preassigned favorably.

In the embodiment according to FIG. 1, the neurons of the output layer are analogously in a two-dimensional matrix with indices

Jtime = 1 N, _ime (10.b)

arranged, the number of originally active neurons of the output layer being given by

N ₀ = N, i _me x N _ey (10.d)

The index J _{key designates} signals of the respective form, while the index J, _ilτ , _{e designates} the signals _relating to the respective time function (for example “fractional polynomials” or spline functions). An _{output neuron} designated by two indices J _t i _me , J _k ey carries accordingly for determining the coefficient of the time function J _tim e for the risk for the characteristic J _key . In the embodiment, the indices J _key or J _t i _me correspond analogously to the indices k or I of equations 4 to 7. Here, N _ey or N _time in the embodiment corresponding to the quantities K and L of these equations.

End nodes, which are usually arranged in a one-dimensional row, are also available for use in the context of recursive partitioning. According to the prior art, each patient is assigned to such a node. According to the prior art, the node is assigned a risk that can be viewed as a (scalar) signal. The invention now assigns a vector with N _key indices to each end node instead of a scalar.

Learn

For the exemplary embodiment, the aim of learning (training) is to locate the highest possible value of this likelihood function in the parameter space, but at the same time superfluous parameters to avoid if possible. In the embodiment, learning through initialization, optimization steps and complexity reduction is as follows:

Initialization of univariate analysis

Before the entire network is trained with all weights, it is advantageous to carry out a univariate analysis for each factor. This analysis has several uses:

• The univariate strength of the factors or their individual prognostic quality is available for comparison with the complete network.

• Univariate analysis is used to determine a ranking of the factors in the event that there are fewer input nodes than factors.

• The univariate analyzes can be used to preset the weights that favor or at least not disadvantage non-linear configurations (see below).

First, an exponential survival model is determined with the only parameter λ ₀ . This model is used for initialization and also for control in the subsequent analysis.

Linear univariate models

According to Eq. (1a) The transformed value of the jth factor Xj is regarded as a single input into a "network", which consists of exactly one linear connector from this input neuron to an output node (ie, without hidden nodes). The time dependence of this output node corresponds to the " Proportional hazards model "(K = 1) for censored data. The resulting model has only two free parameters: the time parameter (λ ₀ ) and the weight of the connector. These are optimized and stored in a table together with the quality (likelihood) and significance for subsequent purposes.

Nonlinear univariate models

Next, an adaptation with four parameters to a nonlinear univariate model is carried out for each factor. Here the transformation of the jth factor with the result Xj is considered to be an "input neuron". The univariate "network" now consists of this one input neuron, a single hidden neuron, and a single one Output neuron (without a linear connector between the input and output neurons). As above, the time dependency of this initial node corresponds to the "proportional hazards model" (K = 1) for censored data.

The four parameters correspond to the time constant (λ ₀ ), the weight and the bias to the hidden layer, and the weight to the starting layer. These are optimized and stored in a table together with the quality (likelihood) and significance for subsequent purposes.

Ranking of the input variables

After the univariate models have been determined for each factor, the ranking of the univariate significant factors is determined according to the amounts of the linear weights. The numbering of the input nodes for the subsequent analysis corresponds to this ranking. In the event that fewer input nodes are available as factors, this procedure allows an objective preselection of the "most important" factors.

Presetting the weights

To optimize the network (training), initial values for the weights must first be set. A default value of zero is not sought. In the embodiment, the weights of the linear connectors are initially filled with small values as usual. The time parameter is preset with the value λ ₀ determined from the 1-parameter model. The number of hidden nodes H is chosen equal to the number of input nodes J. Now the connector from input neuron j to the hidden neuron with the same index h = j is initially preset with the weight determined under "nonlinear univariate models". The corresponding bias is preset analogously with the bias determined in this way. These two sizes are then offset by a small random amount. Each output of a hidden node therefore corresponds approximately to the univariate optimal non-linear value.

For each hidden node h, the value of the weight obtained from the univariate optimization, which we refer to as w _h ι, for the first neuron of the output layer is also available. To initialize the weights to the starting layer, the quantities w _M , h = 1, ..., H are weighted with H random numbers. In the embodiment, one selects H numbers from an equal distribution to [0.1] and divides each number by the sum. Then these and all other connectors (ie, weights from the hidden layer to neurons of the output layer with k = 2, etc.) offset by a small random amount.

A second way of initialization, which is more common for neural networks, is to assign small, random weights to all connectors. This means that at the beginning of the optimization, all links, including those via the hidden layer, are in the linear range. For small arguments, the "activation function" is almost linear, e.g., tanh (x) «x for small x.

Linear statistics of the input factors

In the embodiment, the covariance matrix of all input factors is calculated and stored. A linear regression of each factor on all other factors is also determined: X ₂ »A Xi + B. Eigenvectors and eigenvalues of the covariance matrix are calculated and recorded. The linear relationships are used in the embodiment for the various thinning processes.

Allocation of patient data in training and validation quantities

For a system capable of learning, it is customary to subdivide the existing patterns randomly into training, validation and generalization sets. For example, the user can specify percentages (even zero) of all patterns that are reserved for validation or generalization. The generalization set is not taken into account at all in training in order to subsequently enable a completely unbiased check of the quality. The quality on the validation set, if available, is used several times during the course of the optimization: The quality on the validation set provides an independent measure of the progress of the optimization based on the training set and also serves to avoid over-adjustment.

Choice of factors

In the embodiment, it is possible to use subsets of the factors, for example to obtain models for future patterns for which only this subset of the factors is available. network optimization

Simplex Optimization

The optimization is about the search for a maximum of the likelihood function, based on the data of the training amount. The search parameter space consists of the network weights that are still active together with the global time constants λ _f c_, k = 1, ..., K. This results in an n-dimensional space in which the search takes place.

The search method implemented in the embodiment uses the construction of an n-fold simplex in this space according to the known method by Neider and Mead (1965). The search requires the formation of an n-dimensional simplex in the parameter space. A simplex can be determined by specifying n + 1 non-degenerate corners, i.e. the corresponding edges are all linearly independent of one another. It therefore comprises an n-dimensional point cloud in the parameter space. The search for optimization takes place in epochs. During each epoch, the quality function on the training set is evaluated at various points in the parameter space, namely at the current location and at n further corners, which are defined by the combination of operations such as reflection, expansion / contraction in one direction, etc. The directions of these operations are automatically selected based on the values of the quality function at the corners defined in the previous epoch. The decrease in the quality function in the embodiment is monotonic and the search always ends at a (at least local) minimum.

Consideration of the validation amount

The validation set described above, if available, is used to control the progress of the optimization and to avoid overfitting.

In the embodiment, the variables minus log-like-iihood per sample of the two quantities are continuously calculated and output as key figures of the instantaneous quality of the optimization with regard to the training and validation quantities. While this key figure must decrease monotonically on the training set, temporary fluctuations in the corresponding key figure on the validation set are possible without an over-adjustment already taking place. However, a monotonous increase in the key figure on the validation set should stop further optimization and lead to a Lead complexity reduction. This type of abort presents a kind of emergency brake to avoid overfitting.

A possible termination criterion that can be carried out automatically is achieved by maintaining the exponentially smoothed quality of the validation quantity. If this smoothed parameter exceeds the previous minimum of the current optimization step by a fixed percentage (deterioration in quality), the optimization is terminated. A percentage increase of about 1% tolerance was found as an empirical value for typical sizes of the training amount around 300 or more data records. With this tolerance and with roughly the same size of training and validation quantities, the training is stopped more often by reaching a minimum on the training quantity than by the deterioration in the quality on the validation quantity. This "normal" termination is preferred because an (almost) monotonous improvement in the quality on the validation set is a sign that the neural network has recognized real underlying structures and not simply the noise.

No validation set was used in the example for the embodiment. Thus, the termination is based solely on the minimum on the training volume.

Structure optimization and complexity reduction

The simplex optimization described for the embodiment results in a set of weights {wpj, ... w _[n] } and other parameters which determine a local minimum of the negative log likelihood. (The numbering [1] ... [n] of the weights in this context does not include the topological order of the weights.) This minimum refers to the fixed number n of the weights and a fixed topology. In order to avoid overfitting, it is desirable to reduce the complexity by thinning the weights as far as this is possible without a significant loss in quality.

Thinning (pruning) refers to the deactivation of connectors. For this purpose, their weights are "frozen" to a fixed value (zero in the embodiment, where one can also speak of "removing"). In principle, it is possible to remove individual weights or even entire knots. In the latter case, all weights are deactivated which either insert into the node to be removed or continue from the node.

In the embodiment, a phase of complexity reduction in the network is carried out following an optimization phase (simplex method). The first step in this is the "thinning" of individual connectors. Subsequently, combinations of different Connectors tested for redundancy. Finally, the consistency of the topology is checked and, if necessary, connectors or nodes are removed which, due to the previous removal of other connectors and nodes, can no longer contribute to the statement. Although this procedure is not the subject of the invention, it is part of good practice according to the prior art.

In order to reduce complexity, various statistical hypotheses are automatically formed in the embodiment, which are checked by means of a likelihood ratio test with regard to a predetermined level of significance. Certain weights or parameters will be considered mandatory, ie they will never be removed. These include the global time parameters λ _0k .

Ranking of the connectors

In order to determine the order of the connectors to be checked, the test variable log (likelihood ratio) is first formed in the embodiment. Two networks are _envisaged for each weight w _IA] :

• The network with all current weights (n degrees of freedom), including w _[A] .

• The network with all current weights except w _[A] , which is deactivated (n-1 degrees of freedom).

If the network is deactivated with w _[A] , the other weights are frozen at the currently optimized values.

Testing

After a ranking {w _m , ... w _Iπ ι} of the weights according to the "likelihood ratio" is known, in the embodiment they are tested in this order for the purpose of thinning until a maximum of G _max weights can be removed. Assuming that A-1 weights have already been removed, two hypotheses can be tested for the A-th additional weight in the order w _[A] .

• Test statistics for the hypothesis H _A -ι: likelihood ratio for the network with weights {w _m ... Wr _A -i _] } deactivated (n-A + 1 degree of freedom)

• Test statistics for the hypothesis H _A : likelihood ratio for the network with weights {w _m ... w _[A] } deactivated (nA degrees of freedom)

The H _A hypothesis is now tested twice: • H _A versus H _A -ι and

• H _A versus H.

Significance is applied with the chi-square test regarding the likelihood ratio. If one of the two comparisons assumes H _A (thinning of A results in a significant deterioration), connector A is not removed and the thinning step is ended.

When deactivated, the connector is removed from the list of active connectors and the associated weight is frozen (mostly zero).

During a thinning phase, in the embodiment, the number G of the removed connectors becomes a maximum number limited, where n is the number of connectors remaining.

Further complexity reduction

In the embodiment, further connectors are removed by analyzing the weights in pairs with regard to the likelihood of the data and taking into account various correlation properties. However, this step is not absolutely necessary for the function of the learnable model and can therefore be omitted. It is also possible to combine the invention with other techniques for reducing complexity, which may have already been implemented in various systems capable of learning.

Checking the topology

Thinning or removal of individual connectors can result in isolation of a node from input signals, output signals, or (in the case of a hidden neuron) from both. In this case, a deactivation flag is set for the node in the embodiment. For neurons of the starting layer e.g. "Isolation" means that there are no active connectors either from the input layer or from the hidden layer. If all connectors from an input neuron to the hidden and to the output layer have been removed, the bias of the linear connectors must also be deactivated.

A hidden neuron that has been isolated from all inputs can still be connected to outputs. The "frozen" contributions of such hidden neurons to the output are then redundant because, in principle, they only include the bias values of the other active connectors change. As a result, such neurons are deactivated and any remaining connectors to the output layer are removed.

The various checks can lead to further isolation of nodes. Therefore, the procedure is iterated until the topology remains constant.

Stop training and issue

In the embodiment, if no further complexity reduction is possible after the last simplex optimization, the training is ended. All weights and other parameters receive their final values, which are saved in files created for this purpose.

The trained neural network is thus clearly determined. By reading in these stored values, either immediately afterwards or at a future point in time, the trained neural network can be used in accordance with the description above to generate the output values and thus the functions defined above for any data which contain the independent factors (“covariates”) x f _k (t), λ _k (_), and S _k (f) to obtain the covariates x. With these functions, the probability model is now determined.

In particular, it is of course possible to calculate the course of these functions depending on the selected factors. Such a dependent determination is useful for evaluating the expected effect of a therapy concept if the therapies to be evaluated have been used as "factors" for training.

example

Data

To illustrate the functioning of the invention in the embodiment, first 1000 fictitious patient data sets with 9 factors (covariates) were generated by means of a random generator. The first 7 factors were created as realizations of a multivariate Gaussian distribution. For this purpose, mean values and variances of the factors and a covariance matrix were specified in the exemplary embodiment: Factor x'yj? ^{0 Xer X} PJ ^xa _. Q ^{e χ} tum xujpa xpai

Average ÖΪ50 Ö45 Ö.45 5.5Ö Ö.5Ϊ Ö.5Ö Ö.5Ö

Variance 0.071 0.087 0.097 0.083 0.083 0.084 0.083

The assumed covariance matrix was

xlypo: xer; xpr: xage: xtum; xupa: xpai xlypo 1.00 -0.06 -0.09 0.03 0.42 0.02 0.05 xer -0.06 1.00 0.54 0.29 -0.07 -0.18 -0.19 xpr -0.09 0.54 1.00 0.03 -0.06 -0.07 -0.14 xage 0.03 0.29 0.03 1.00 0.04 0.02 0.00 xtum 0.42 -0.07 - 0.06 0.04 1.00 0.03 0.06 xupa 0.02 -0.18 -0.07 0.02 0.03 1.00 0.54 xpai 0.05 -0.19 -0.14 0.00 0.06 0.54 1.00

In order to represent a situation that is as realistic as possible, these values were chosen in the order of magnitude known for certain factors in the scientific literature in the case of breast cancer. However, the precise assumptions and the interpretation of the factors are completely irrelevant to the function of the invention.

In addition to the seven named factors, two further binary factors (“therapies”) “et” and “ht” were generated randomly. For ht, 50% were assigned the value 1, 50% the value 0. In the exemplary embodiment for et, only 1% assigned the value 1.99% to the value 0. It can therefore be expected that et will not be recognized as an influencing factor in the neural network.

The first ten of the resulting records are as follows:

Patient xlypo xer xpr xage xtum xupa xpai et ht number

1 0.07 0.89 1.41 0.36 0.49 0.31 0.22 0 1

2 0.25 0.23 0.98 0.15 0.10 0.31 0.05 0 0

3 0.56 0.52 0.79 0.09 0.22 -0.22 -0.07 0 1

4 0.61 0.83 1.10 0.73 0.56 0.21 0.44 0 1

5 0.97 0.38 0.70 0.61 0.51 0.97 0.72 0 0

6 0.44 0.22 0.07 0.90 0.80 0.60 0.55 0 1

7 0.46 0.24 0.47 0.14 0.60 0.57 0.31 0 0

8 0.42 0.60 0.41 0.36 0.54 0.23 0.47 0 0

9 -0.01 0.22 0.80 0.52 0.38 -0.13 0.41 0 0

10 0.80 0.41 0.19 0.11 0.45 0.40 0.51 0 0

Three independent risk probabilities risk (i), i = 1.3 were initially generated for the influence of the factors on the course of the disease. The following model was assumed: risk (1) = exp (r ₁ + r ₂ + r ₃ + r ₄ -r _h ) risk (2) = exp (rι + r ₃ + r ₄ ) risk (3) = exp (r-ι) with r -ι-2 (xlypo-median (xlypo)) r ₂ = 0.5 (xtum-median (xtum)) r ₃ = 0.75 (xupa-median (xupa)) r ₄ = 1.5 (xpai-median (paimed)) and r = 1 if ht = 1.

From these risk values, the actual failure times of the three variants were generated as random realizations of an exponential distribution or a modified exponential distribution with a time constant of 200 months. For the third form, it was additionally assumed that failure after 24 months at the latest is possible in order to create a situation with competing risks similar to the local recurrence in breast cancer. This data was censored according to a simulated "study" and an "observation" was simulated according to the scheme of the priority shown in Figure 1.

The model assumed in the exemplary embodiment shows that only the factor "xlypo" is causally decisive for the failure of the third variant. Nevertheless, there is an indirect connection between the other factors and the observations of the third variant, because increased risks of the other factors may reduce the likelihood of observing the failure of the third variant, although this property of the model assumed is insignificant for the function of the invention, but illustrates a typical benefit.

Trained neural network

The neurons of the output layer are assigned according to equations 4 to 7 and 10 with Nti _m e = 1 and N _key = 3, so that 3 neurons of the output layer are originally active. For the embodiment, 9 neurons of the input or the hidden layer are initially activated. The neural network trained according to the described method is illustrated in FIG. 3 ("xpai" and "xpail" are identical). Note that there is only one connector to the "O3" output, namely from the "xlypo" node (neuron). The outputs 01 to 03 are assigned to the risks "risk (1)" to "risk (3)".

A complete clear representation of the trained neural network takes place by specifying the remaining connectors with their associated weights and bias values as well as the survival time scales. For this purpose, in Table 2a for each neuron which an active connector leads ("tgt"), all sources ("src") with the corresponding weights ("wt") are given. Note that many connectors are no longer active.

tgt src wt src wt src wt src wt src wt src wt src wt src wt src wt h1 ht 13.5 h6 xlypo 0.53 xupa -1.78 xtum 1.02 h7 xer 1.98 xpr -1.37 h8 xage 1.70 h9 xpr 2.31 o1 h1 -1.70 h6 0.30 h 1.10 xlypo 0.19 xpai 0.72 xupa 0.63 xtum 0.22 o2 h1 2.03 h6 -0.68 h7 -0.86 h8 0.33 h9 -0.64 xlypo 0.64 xpail 0.91 xer 0.56 xage -0.42 o3 xlypo 2.39

Table 2a

The bias values are as given in Table 2b:

ht xlypo xpai xupa xtum et xer xage xpr h1 h2 h3 h4 h5 h6 h7 h8 h9 o1 o2 o3 0.17 0.16 Ö Ö (5 Ö Ö Ö 0 -0.94 Ö Ö Ö 0 0.86 1.31 0 2.07 1.03 0.66 -0.11

Table 2b: Bias values (automatically 0 for inactive neurons)

The values of the survival time scales λ ₀ k necessary for the specification of the model of equation 6 can finally be found in Table 2c (the units of this information appropriately correspond to the above time constant of 200 months):

λ_1 ^ 02 Λ03

0.53 / 200 0.13 / 200 0.27 / 200

Table 2c

Temporal variation

In order to use time-varying output _neurons , a higher value could be used instead of N _t i _me = 1 as used here. The number of output neurons is then determined from equation 10.d. In the case of N _key = 3 and _time = 2, for example, N ₀ = 6. The training would then be carried out in the manner previously described. The possible temporal variations of the different forms could be determined independently of one another in the context of the model of equations 4 to 7, the task of recording competing risks in particular would not be affected thereby.

Claims

claims

1.Procedure for determining competing risks for objects after an initial event based on already measured or otherwise objectifiable training data sets, in which several signals obtained from a system capable of learning are combined in an objective function in such a way that the system capable of learning the underlying probabilities of the respective competing risks can recognize or predict.

2. The method of claim 1, in which measured for the training data sets or in any other way objectively recorded data of the initial event and a follow-up to a predetermined point in time are used.

3. The method according to claim 2, in which the last point in time of the follow-up is used explicitly in the training data records.

4. The method according to any one of the preceding claims, in which the observation of one failure characteristic at a time excludes the other characteristics.

5. The method according to any one of the preceding claims, in which the objective function L is given as a function of a function P:

where μ is the parameter of the system capable of learning, f _{LS (kx} (t _y ) the failure rate of the expression k and S _{ω (fc x}} (t.) the expected value of the proportion of objects j with observed features x _; -, which do not fail Sufficiency k at time t _j , denote and P is determined on the basis of the logical relationship from δ _jk , with δ _jk = 1 if an object has suffered a failure of the expression k at time ty and otherwise δ _jk = 0.

6. The method of claim 5, in which is used as an objective function, where ε _Jk and ψ _jk are determined from δ _jk on the basis of the logical relationship.

7. The method according to claim 6, in which

Uμ; {τ _J , t _J , δ _Jt }) it _j )} ^} Jk is used as an objective function.

8. The method according to any one of the preceding claims, in which a neural network is used as a learning system.

9. The method according to claim 8, in which a neural network of the MLP (multilayer perceptron) architecture is used.

10. The method according to any one of claims 1-7, in which the adaptive system performs a recursive partitioning, wherein

a node is assigned to each object,

11. The method according to any one of the preceding claims, in which the learning system is used in the context of a decision aid.

12. The method according to any one of the preceding claims, in which the various probability functions of the competing risks are assigned values for determining a strategy.