WO2022028687A1

WO2022028687A1 - Latent variable decorrelation

Info

Publication number: WO2022028687A1
Application number: PCT/EP2020/072019
Authority: WO
Inventors: Márton KAJÓ; Janne Tapio ALI-TOLPPA; Stephen MWANJE
Original assignee: Nokia Solutions And Networks Oy; Technische Universitaet Muenchen
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-10

Abstract

A method for a neural network comprising an input layer, an output layer and one or more intermediate layers is described. The neural network is arranged to generate an output vector of data values at the output layer corresponding to a learned representation of an input vector of data values that is input to the neural network. The method comprises accessing a set of data variables that are determined according to respective entries of output vectors, the output vectors generated on the basis of the evaluation of the neural network on input vectors of data values selected from a training dataset of input vectors, evaluating a predictive model over the set of data variables to determine a subset of data variables and modifying the predictive model and the neural network on the basis of the evaluation whereby the evaluation of the subset of data variables for subsequent input vectors of data values that are input to the neural network generate output vectors of data values that are grouped, according to a measure of similarity, into at least two substantially disjoint subsets.

Description

LATENT VARIABLE DECORRELATION

TECHNICAL FIELD

The present disclosure relates to a method and system for a neural network. More specifically, the method and system described herein may be used in conjunction with a neural network to determine latent variables in a dataset.

BACKGROUND

Increasing flexibility and complexity of mobile networks are generating demand for more intelligence and autonomy in network Operations, Administration and Management (OAM). Cognitive Autonomous Networks (CAN) are mobile networks that implement one or more Cognitive Functions (CFs). CFs use machine learning and artificial intelligence to perform OAM functions in networks. Increasingly, CANs are replacing rule-based Self Organizing Network (SON). CFs are able to contextualize operating conditions and learn optimal behavior fitting to a specific environment and context. The knowledge built from the learned information increases the autonomy and performance of OAM functions.

SUMMARY

It is an object of the invention to provide a method for a neural network.

The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a method for a neural network comprising an input layer, an output layer and one or more intermediate layers, wherein the neural network is arranged to generate an output vector of data values at the output layer corresponding to a learned representation of an input vector of data values that is input to the neural network is provided. The method comprises accessing a set of data variables that are determined according to respective entries of output vectors, the output vectors generated on the basis of the evaluation of the neural network on input vectors of data values selected from a training dataset of input vectors, evaluating a predictive model over the set of data variables to determine a subset of data variables and modifying the predictive model and the neural network on the basis of the evaluation whereby the evaluation of the subset of data variables for subsequent input vectors of data values that are input to the neural network generates output vectors of data values that are grouped, according to a measure of similarity, into at least two substantially disjoint subsets.

According to a second aspect a neural network is provided to implement the method according to the first aspect.

According to a third aspect a system comprising at least one processor and at least one memory including program code which, when executed by the at least one processor, provides instructions to implement the method and neural network according to the first and second aspects.

In a first implementation form the predictive model is an adversarial neural network.

In a second implementation form evaluating the predictive model over the set of data variables to determine a subset of data variables comprises generating a feature vector of data values on the basis of an output vector of the neural network and data values that are selected according to a pre-determined probability distribution, computing an output of the predictive model on the basis of the feature vector whereby to determine whether data values of the feature vector correspond to data values of the output vector of the neural network or data values selected according to the pre-determined probability distribution and evaluating a first loss function to determine an error between the output of the predictive model and a target vector.

In a third implementation form modifying the predictive model and the neural network on the basis of the evaluation comprises modifying one or more parameters of the predictive model and/or the neural network to minimise the error between the output of the predictive model and the target vector.

In a fourth implementation form the first loss function is a binary cross-entropy function.

In a fifth implementation form the neural network is an encoder in an autoencoder network comprising an encoder and decoder.

In a sixth implementation form the method comprises evaluating the autoencoder over each input vector in the training dataset, evaluating a second loss function to determine an error between the output of the autoencoder and the input vector, modifying one or more parameters of the autoencoder on the basis of the evaluation to minimise the error between the output of the autoencoder and the input vector.

In a seventh implementation form the second loss function is a mean-squared error function.

In an eighth implementation form the method comprises evaluating a further predictive model over the output of the autoencoder and modifying the further predictive model and autoencoder on the basis of the evaluation whereby to enforce separation, determined according to the measure of similarity, between data values of the at least two substantially disjoint subsets.

In a ninth implementation form evaluating the further predictive model over the output of the autoencoder network comprises generating a feature vector of data values on the basis of an input vector to the autoencoder and data values that are selected according to a further predetermined probability distribution, computing an output of the further predictive model on the basis of the feature vector whereby to determine whether data values of the feature vector correspond to data values of an input vector to the autoencoder or data values selected according to the further pre-determined probability distribution and evaluating a third loss function to determine an error between the output of the predictive model and a target vector.

In a tenth implementation form modifying the further predictive model and the autoencoder on the basis of the evaluation comprises modifying one or more parameters of the further predictive model and/or the autoencoder to minimise the error between the output of the further predictive model and the target vector.

In an eleventh implementation form the third loss function is a binary cross-entropy function.

In a twelfth implementation form the further predictive model is an adversarial neural network.

These and other aspects of the invention will be apparent from and the embodiment(s) described below. BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

Figure I is a schematic diagram showing a state-transition graph of a radio network, according to an example;

Figure 2 is a schematic diagram showing data graphs from a radio network, according to an example;

Figure 3 is a schematic diagram showing clusters of data from a radio network, according to an example;

Figure 4 is a schematic diagram showing an autoencoder neural network, according to an example;

Figure 5 is a schematic diagram showing an autoencoder network, according to an example;

Figure 6 is a schematic diagram showing decision logic for a neural network, according to an example;

Figure 7 is a schematic diagram of a neural network topology, according to an example;

Figure 8 is a schematic diagram of a neural network topology, according to an example;

Figure 9 is a schematic diagram showing decision logic for a neural network, according to an example;

Figure 10 is a schematic diagram showing a cognitive autonomous network, according to an example;

Figure I I is a schematic diagram showing a state-transition graph of a radio network, according to an example;

Figure 12 is an illustrative diagram showing examples of user paths in an environment; Figure I 3 is a block diagram showing a method for a neural network, according to an example.

DETAILED DESCRIPTION

Example embodiments are described below in sufficient detail to enable those of ordinary skill in the art to embody and implement the systems and processes herein described. It is important to understand that embodiments can be provided in many alternate forms and should not be construed as limited to the examples set forth herein.

Accordingly, while embodiments can be modified in various ways and take on various alternative forms, specific embodiments thereof are shown in the drawings and described in detail below as examples. There is no intent to limit to the particular forms disclosed. On the contrary, all modifications, equivalents, and alternatives falling within the scope of the appended claims should be included. Elements of the example embodiments are consistently denoted by the same reference numerals throughout the drawings and detailed description where appropriate.

The terminology used herein to describe embodiments is not intended to limit the scope. The articles “a,” “an,” and “the” are singular in that they have a single referent, however the use of the singular form in the present document should not preclude the presence of more than one referent. In other words, elements referred to in the singular can number one or more, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, items, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, items, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein are to be interpreted as is customary in the art. It will be further understood that terms in common usage should also be interpreted as is customary in the relevant art and not in an idealized or overly formal sense unless expressly so defined herein.

In examples described herein radio network behavior is modelled through discrete, quantized states. In these models, the network moves from one state to another, traversing a graph, where states are graph nodes, and possible state transitions correspond to graph edges. Figure I is a schematic diagram showing an example of a state-transition graph 100 for network load in a radio network. State A I 10 represents a state of normal operation of the radio network. State B 120 represents a spike in the downlink (DL) load of the network. State C 130 represents a state in which there is congestion on the network. In Figure I , the network may transition back from a spike to a state of normal operation, as represented by the arrow from State B 120 to state A I 10.

In general, states may be difficult to predefine in a generic fashion during network function development. State definitions may depend on contextual parameters and change from one deployment to another. As such, network states may instead be defined automatically. In many cases, the network already behaves in a fashion that produces inherent states. These inherent states correspond to clusters in measured data from the network. If the learned states do not fit these inherent states, functions using the learned states will run into problems when trying to define control logic based on these states. Clustering algorithms take the distribution of data into account when forming states and aim to find inherent groups in the data.

Although networks are characterized by a large set of Key Performance Indicators (KPIs) or other measurements, network behavior may be described by a reduced set of latent variables, which, through complex interactions, produce the observed behavior represented in KPIs. These latent variables are hidden because they are not explicitly measurable, such as certain user behavior, or are not directly measured by the OAM.

Figure 2 is a schematic diagram showing graphs 200 of data from a radio network. In Figure 2, the graph 210 shows clusters in measured data representing latent variables and the graph 220 shows measured KPIs.

Inherent states are clearly observed when behavior is described through these latent variables. However, although some latent variables show the inherent groups, other variables are usually applicable to all groups uniformly. These variables describe global features that can be observed in any inherent group.

In mobile networks, the majority of the observed behavior is controlled by globally relevant variables, with only a select few latent variables being useful in the discovery of inherent states. Latent variables which help in the identification of inherent states are referred to herein as locally relevant variables. Having an overwhelming number of globally relevant latent variables and few locally relevant latent variables in a dataset makes clustering challenging. Clustering algorithms may not distinguish between important and unimportant features and may try to take into account the data distribution equally among all input features. If many of the input features are irrelevant for clustering, this behaviour causes the clustering algorithms to pay comparatively little attention to the important, clustering-relevant features, and clusters may be created that are not aligned with the inherent groups in a dataset.

Figure 3 is a schematic diagram showing an example 300 of clusters that are output by a clustering algorithm. In Figure 3 the clusters are not aligned with the locally relevant latent variables representing the inherent groups in the dataset.

While pre-processing may be used to remove unwanted globally relevant variables from a training dataset in certain use cases or when dealing with certain kinds of data, pre-processing is not possible on mobile network OAM data. This is because mobile networks are a more complex, less structured and unintuitive domain that contain an overwhelmingly large number of globally relevant latent variables.

The methods and systems described herein may be used to identify latent variables in a dataset and distinguish locally relevant latent variables that are useful from the standpoint of clustering, from globally relevant variables that are not useful for clustering.

The method of latent variable extraction described herein may be used to identify network behaviour-relevant states. The methods described herein may be used by Cognitive Functions (CFs) to accomplish network automation tasks in radio networks. For example, the methods described may be used to determine network states for detection and prediction of anomalies in radio networks, for network environment modelling and for user behaviour.

The methods described herein may be used in conjunction with autoencoder neural networks. Autoencoder neural networks are neural networks that learn to encode observations into a latent space with a reduced number of dimensions, while simultaneously learning to decode the original observations from the latent space.

Figure 4 is a schematic diagram showing an autoencoder neural network 400 according to an example. The autoencoder 400 comprises an encoder subnet 410 and a decoder subnet 420 which are coupled back-to-back. In the autoencoder network 400, a constraint of lower dimensions in the latent space is achieved through the topology of the autoencoder network 400, which forces the autoencoder 400 to disregard irrelevant parts of the data in order to minimize information loss. Both the encoding and decoding is learned together, after which the decoder 420 is discarded. The encoder 410 is used to translate the input observations into a latent space. After encoding into the latent space, clustering may be applied to the latent variables which represent the output of the encoder 410.

The methods described herein are implemented in conjunction with an autoencoder network to separate globally relevant latent variables, which are applicable to all inherent groups in a dataset, from locally relevant latent variables when learning the encoding of the data. The separation is achieved by splitting the encoded latent variables into two sets: G comprising globally relevant variables and containing no information relevant for clustering and L comprising locally relevant variables that are relevant for clustering.

Figure 5 is a schematic diagram showing an autoencoder network 500 according to an example. The autoencoder 500 is arranged to decorrelate globally and locally relevant variables into the two sets G and L The autoencoder 500 is described by the following parameters and components:

• Meta-parameters de and di., specifying the number of variables in sets G and L These meta-parameters are specified in the design phase, during the specification of the neural network topology.

• The reference distribution D_ref, specified prior to training. According to examples, D_ref may be a Gaussian distribution.

• Decorrelator network 510. The decorrelator 510 is attached to enforce decorrelation between sets G and L

• Noise S_n, specified prior to training. S_n may be Gaussian noise.

• Separator network 520. The separator 520 enhances the separation of inherent groups in L

The decorrelator 510 receives as input both variable sets G and L, and is arranged to distinguish latent space encodings from artificially generated encodings, where G is replaced by the reference distribution D_ref. This forces the autoencoder network 500 to create a latent representation where features in G follow the reference distribution D_ref, and are not correlated to any other feature in G or L

The separator 520 acts on the reconstructed output from the decoder. The separator 520 is arranged to distinguish reconstructions of original encodings, and reconstructions generated from artificial encodings, where noise S_n is added to the features of L

After training, the output of the encoder is separated into feature sets G and L and may be used in downstream tasks. For clustering, a clustering algorithm, such as k-Means clustering, or a more advanced algorithm, such as a form of Regularized Information Maximization may be used on the feature set L

Figure 6 is a schematic diagram showing decorrelator decision logic 600 for the decorrelator 510, according to an example.

There are three ways to distinguish the two categories and differentiate the original latent observations from artificial latent observations:

• If the observations in G do not follow the reference distribution D_ref, a simple rule can identify regions in the latent space where it is more likely that observations are from the reference distribution, and vice versa.

• As the reference distribution D_ref is noise-like, there is no correlation between its features. If a general variable in G is correlated to another general variable, this is easily detected.

• As the reference distribution D_ref is noise-like, there is no correlation between it and the variables in L If a general variable in G is correlated to a variable in L this is easily detected.

The decorrelator 510 is an adversary, trying to learn these rules and to separate real observations from artificial ones. The autoencoder tries to counteract this by creating a latent representation where such rules are not present.

For every observation, the output of the decorrelator 510 is a single value, representing whether the observation is thought to be real or artificial. The number of artificial observations should be kept the same as real observations. This ensures that even when the training converges, the decorrelator 510 is only able to achieving 50% accuracy, equivalent to randomly guessing.

According to examples, latent variable extraction is performed through the training of an autoencoder deep neural network. When setting up such a neural network, care may be required by the user to set the neural network topology correctly, as this governs extraction capability.

Figure 7 shows an example of a deep neural network topology 700 for an autoencoder. The autoencoder comprises an encoder subnet 710 and a decoder subnet 720. Each layer of the topology 700 comprises a fully connected sublayer. In some cases, a layer may further comprise a batch normalization sublayer and/or a rectified linear (ReLU) sublayer. In the topology 700 the sublayers each have multiples of 16 or 32 neurons in each sublayer.

For latent variable extraction, the middle layer is small and may comprise a handful of neurons. In the example shown in Figure 7, 16 neurons are present in this layer. A lower number of neurons may compromise reconstruction capabilities of the autoencoder, but enforces generalization and simplification, which helps the correct extraction of latent variables. The compromised reconstruction capability is not an issue, as the end goal for the autoencoder network is to encode into a latent representation, and reconstruction is not needed after training.

Figure 8 shows a schematic diagram of a decorrelator topology 800, according to an example. The topology 800 comprises a few narrow fully-connected layers. Since the decorrelator 510 is working in a small-dimensional space, and is meant to learn relatively simple rules, it does not need to have a high level of complexity.

The separator subnet 520, shown in Figure 5, is attached to the reconstructed representation during autoencoder training. The separator 520 differentiates between observations reconstructed from original latent observations, and observations reconstructed from artificial latent observations, where the locally variables L have a small amount of added noise S_n.

Figure 9 is a schematic diagram showing separator decision logic 900 for the separator 520, according to an example. Intuitively, if the noisy artificial latent observations reconstruct into unbelievable or malformed observations, these are easy to distinguish from real reconstructions. The separator 520 is an adversary, trying to identify the malformed observations. The autoencoder tries to counteract this by creating a latent representation, where small changes do not produce huge differences in the reconstructed observations. This in turn enforces a latent representation in L where clusters are well separated.

Distinguishing between real and malformed observations is a complex task. For this, a relatively deep subnet may be used. As the complexity is on par with the task of encoding, for simplicity one can use the same topology as for the encoder 710 shown in Figure 7, with the only addition being an averaging layer at the end, since the output needs to be a single value for every observation.

During training, the decorrelator 510 and separator 520 enforce the learning of a latent representation which encodes global variables and clustering-relevant variables into their respective sets. The learning is governed by the backpropagation of five losses:

• Reconstruction loss: this affects the encoder and decoder weights and is measured as the mean-squared error between the original and reconstructed observations.

• Decorrelator precision loss: this affects the decorrelator weights and is measured as the binary cross-entropy of correct classification of original and artificial latent observations by the decorrelator.

• Decorrelator adversary loss: this affects the encoder weights and is measured as the binary cross-entropy of incorrect classification of original and artificial latent observations by the decorrelator.

• Separator precision loss: this affects the separator weights and is measured as the binary cross-entropy of correct classification of reconstructions of original and artificial latent observations by the separator.

• Separator adversary loss: this affects the encoder and decoder weights and is measured as the binary cross-entropy of incorrect classification of reconstruction of original and artificial latent observations by the decorrelator.

According to examples, training may be performed in batches using, for example, stochastic gradient descent. The number of artificial observations sampled from the reference distribution D_ref and the number of noisy points with added noise S_n is set equal to the batch size. The decorrelator adversary loss is not backpropagated towards the neurons in set L, only towards set G. This avoids degenerate solutions. In the case of the separator 520, the separator adversary loss is only backpropagated for the noisy observations, but not the original observations, once again to avoid degenerate solutions.

Figure 10 is a schematic diagram showing a first example 1000 of an application of the method described herein in a mobile radio network. In the context of the CAN framework an Environment-state Modelling and Abstraction (EMA) module fulfils the role of environmentstate modelling through automated state definition. The EMA is arranged to extract latent variables from the input data, and quantize (i.e. cluster) the latent space to a fine resolution. If this is achieved, an abstraction module can then learn and store different mappings of these clusters, which correspond to various output measures that are later used by cognitive functions in the CAN.

Although not strictly undertaking clustering, it is critical for the EMA module to fit inherent states correctly, in order for the further mapping to be able to map quanta that only contain a single inherent state from the network. Mixed quanta result in a mapping where some of the observations are mislabelled, resulting in invalid control decisions by the cognitive functions.

Latent variable decorrelation greatly helps in this setting, as separating the clustering-relevant variables allows for the precise fit of quanta to inherent states. These quanta are then mapped to different actions undertaken by the cognitive functions, to realize network automation.

Figure I I is a schematic diagram showing a second example 1 100 of an application of the method described herein in a mobile radio network. Mobile network cell performance can degrade due to misconfiguration, software bugs, hardware failure, as well as environmental effects such as weather damage to the antennas. Anomalously behaving cells may go unnoticed by simple threshold-based alarms, because the problems are only visible in the transitional behaviour of the cell. These anomalies can be detected by looking at the state-transitions of the cells, where the states are defined automatically by an autoencoder and decorrelator network as described herein. Out-of-ordinary sequences can raise alarms, which can then trigger automated self-healing actions, or operator supervision.

Figure 12 is a schematic diagram showing an example 1200 of a further application of the method described herein. The prediction of user mobility enhanced the robustness and reliability of handover procedures between cells. As users usually move on similar paths, user movement is predictable, and clusters around a finite number of similar paths in each cell. These common user paths, either measured directly through user localization methods, or indirectly through radio environment measurements such as R.SR.P, SI NR., may be clustered using the autoencoder and decorrelator described herein. User movement is also governed by many global latent variables, which do not help in distinguishing between different paths. Hence, the decorrelator is particularly useful in this example.

After the clusters are formed, users are assigned to the most likely path based on their movement history, which in turn predicts their future movement. This prediction can be used to set handover parameters to avoid too-late, too-early, or ping-pong handover situations.

Figure I 3 is a block diagram of a method I 300 for a neural network comprising an input layer, an output layer and one or more intermediate layers, wherein the neural network is arranged to generate an output vector of data values at the output layer corresponding to a learned representation of an input vector of data values that is input to the neural network, according to an example. The neural network may be an encoder in an autoencoder network as previously described herein.

At block 1 310 the method comprises accessing a set of data variables that are determined according to respective entries of output vectors. The output vectors are generated on the basis of the evaluation of the neural network on input vectors of data values selected from a training dataset of input vectors.

At block I 320 the method I 300 comprises evaluating a predictive model over the set of data variables to determine a subset of data variables. The predictive model may be an adversarial neural network.

At block I 330 the method I 300 comprises modifying the predictive model and the neural network on the basis of the evaluation whereby the evaluation of the subset of data variables for subsequent input vectors of data values that are input to the neural network generate output vectors of data values that are grouped, according to a measure of similarity, into at least two substantially disjoint subsets. According to examples, the disjoint subsets are the sets of globally and locally relevant variables G and L as described herein.

It should be appreciated that one or more steps of the embodiment methods provided herein may be performed by corresponding units or modules. The respective units or modules may be hardware, software, or a combination thereof. For instance, one or more of the units or modules may be an integrated circuit, such as field programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs).

Although the present disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.

The present inventions can be embodied in other specific apparatus and/or methods. The described embodiments are to be considered in all respects as illustrative and not restrictive. In particular, the scope of the invention is indicated by the appended claims rather than by the description and figures herein. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method for a neural network comprising an input layer, an output layer and one or more intermediate layers, wherein the neural network is arranged to generate an output vector of data values at the output layer corresponding to a learned representation of an input vector of data values that is input to the neural network, the method comprising: accessing a set of data variables that are determined according to respective entries of output vectors, the output vectors generated on the basis of the evaluation of the neural network on input vectors of data values selected from a training dataset of input vectors; evaluating a predictive model over the set of data variables to determine a subset of data variables; and modifying the predictive model and the neural network on the basis of the evaluation whereby the evaluation of the subset of data variables for subsequent input vectors of data values that are input to the neural network generate output vectors of data values that are grouped, according to a measure of similarity, into at least two substantially disjoint subsets.

2. The method of claim I , wherein the predictive model is an adversarial neural network.

3. The method of claim I or 2, wherein evaluating the predictive model over the set of data variables to determine a subset of data variables comprises: generating a feature vector of data values on the basis of an output vector of the neural network and data values that are selected according to a pre-determined probability distribution; computing an output of the predictive model on the basis of the feature vector whereby to determine whether data values of the feature vector correspond to data values of the output vector of the neural network or data values selected according to the pre-determined probability distribution; and evaluating a first loss function to determine an error between the output of the predictive model and a target vector.

4. The method of claim 3, wherein modifying the predictive model and the neural network on the basis of the evaluation comprises: modifying one or more parameters of the predictive model and/or the neural network to minimise the error between the output of the predictive model and the target vector.

5. The method of claim 3 or 4 wherein the first loss function is a binary cross-entropy function.

6. The method of claims I to 5, wherein the neural network is an encoder in an autoencoder network comprising an encoder and decoder.

7. The method of claim 6, comprising: evaluating the autoencoder over each input vector in the training dataset; evaluating a second loss function to determine an error between the output of the autoencoder and the input vector; and modifying one or more parameters of the autoencoder on the basis of the evaluation to minimise the error between the output of the autoencoder and the input vector.

8. The method of claim 7, wherein the second loss function is a mean-squared error function.

9. The method of 7 or 8, comprising: evaluating a further predictive model over the output of the autoencoder; and modifying the further predictive model and autoencoder on the basis of the evaluation whereby to enforce separation, determined according to the measure of similarity, between data values of the at least two substantially disjoint subsets.

10. The method of claim 9, wherein evaluating the further predictive model over the output of the autoencoder network comprises: 17 generating a feature vector of data values on the basis of an input vector to the autoencoder and data values that are selected according to a further pre-determined probability distribution; computing an output of the further predictive model on the basis of the feature vector whereby to determine whether data values of the feature vector correspond to data values of an input vector to the autoencoder or data values selected according to the further predetermined probability distribution; and evaluating a third loss function to determine an error between the output of the predictive model and a target vector.

I I . The method of claims 9 or 10, wherein modifying the further predictive model and the autoencoder on the basis of the evaluation comprises: modifying one or more parameters of the further predictive model and/or the autoencoder to minimise the error between the output of the further predictive model and the target vector.

12. The method of claim 10 or I I wherein the third loss function is a binary cross-entropy function.

13. The method of claims 9 to 12 wherein the further predictive model is an adversarial neural network.

14. A neural network to implement any one of claims I to I 3.

15. A system comprising at least one processor and at least one memory including program code which, when executed by the at least one processor, provides instructions to implement any one of claims I to 14.