CN115803751A

CN115803751A - Training models for performing tasks on medical data

Info

Publication number: CN115803751A
Application number: CN202180049170.7A
Authority: CN
Inventors: R·B·帕蒂尔; C·库尔卡尼; D·迈索尔·西杜; M·Y·潘迪亚
Original assignee: Koninklijke Philips NV
Current assignee: Koninklijke Philips NV
Priority date: 2020-07-10
Filing date: 2021-07-08
Publication date: 2023-03-14
Also published as: EP3937084A1; EP4179467A1; JP2023533188A; WO2022008630A1; US20230252305A1

Abstract

According to one aspect, a method of training a model is provided for performing tasks on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at a plurality of clinical sites. The method comprises the following steps: a) Information is sent (302) to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train the respective local copy of the model according to training data at the respective clinical site. Then, the method comprises b) receiving (304) from each clinical site of the plurality of clinical sites: i) Local updates to parameters in the model, the local updates obtained by training a local copy of the model from training data at the respective clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site; and c) updating (306) the parameters in the global model based on the received local updates to the parameters and the received metadata.

Description

Training models for performing tasks on medical data

Technical Field

Embodiments herein relate to training a model using a distributed machine learning process.

Background

Learning from a large volume of patient data can greatly increase the ability to generate and test assumptions about healthcare. To capture and use the knowledge contained in the large volume of patient data, predictive models are used. A machine learning process may be used to train a model on a large amount of data from patients who have previously received treatment. Models trained in this way have the potential to be used for prediction in many medical fields such as image segmentation and diagnosis. Such a model may be used to better personalize healthcare.

One of the main obstacles to achieving personalized medicine by using models trained using machine learning processes is obtaining sufficient patient data to train the models. Data from only one hospital may not be sufficient to develop a model that can be used for a wide variety of patients (e.g., can be spread over the globe). However, it can take a long time to get data from different hospitals and patient populations, which increases the time from planning to deployment of the model. In the field of deep learning, the performance of the model improves as the number of training data samples increases. Thus, to ensure the best possible model to assist the physician, the performance of the model can be proactively improved with more data. However, combining data from multiple clinical sites (e.g., hospitals, doctors' operating rooms, etc.) can be difficult due to the ethical, legal, political, and administrative hurdles associated with data sharing. One way to alleviate this problem is by training the model using a distributed machine learning process, such as, for example, in Bonawitz et al in 2019 entitled "large-scale-oriented joint learning: the joint Learning process described in the paper of systems Design (firmware fed Learning at Scale). Distributed learning enables the model to be trained using data from different clinical sites without the data leaving the site.

Disclosure of Invention

As noted above, the distributed machine learning process can be used to train models (alternatively referred to as "machine learning models") on training data located at different sites without the need to move the training data from the respective sites. Those skilled in the art will be familiar with distributed learning and distributed learning processes such as joint machine learning, however, this is illustrated briefly in fig. 1, which shows a central server 102 in communication with a plurality of clinical sites 104-112. The central server uses a distributed learning process to coordinate the training of the model using training data located at each clinical site 104 through 112. The central server maintains a "global" or central copy of the model, and can send 114 information about the global model to each clinical site, such as parameters that enable the creation of a local copy of the model, for example. Each clinical site may then create a local copy of the model and train its local copy according to the training data at the respective clinical site. Each clinical site 104 through 112 may then send 116 updates to the central server for one or more parameters of the model. The central server combines the updates from the respective clinical sites, for example by averaging to update the global model. This allows training of the global model at the central server 102, e.g. updating and improving, based on training data at the plurality of clinical sites 104 to 112 without the data having to leave the respective clinical site. It is an object of embodiments herein to improve such a process for training a model to perform tasks on medical data using a distributed machine learning process.

Thus, according to a first aspect, a method of training a model is provided for performing a task on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at a plurality of clinical sites. The method comprises the following steps: a) Sending information to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train a respective local copy of the model according to training data at the respective clinical site; b) Receiving, from each of a plurality of clinical sites, i) local updates to parameters in the model, the local updates obtained by training a local copy of the model according to training data at the respective clinical site, and ii) metadata relating to a quality of training performed at the respective clinical site; and c) updating the parameters in the global model based on the received local updates to the parameters and the received metadata.

Thus, metadata relating to the quality of training performed at each site may be used when combining local updates into updates for the global model. In this way, different local updates may be given different importance (e.g., by using weighting) depending on the quality of the training performed at the respective clinical site. This may lead to improved training, resulting in an improved model and improved clinical results for the clinical procedure using the model. Since the model is trained on data from different sites, irregularities in the data may exist, and this may lead to bias and model drift. By considering appropriate metadata while combining weights, model drift can be avoided, resulting in a better quality model.

According to a second aspect, a method at a clinical site for training a model is provided for performing tasks on medical data using a distributed machine learning process, whereby a global model at a central server is updated based on training performed on local copies of the model at the clinical site. The method comprises the following steps: receiving information from a central server such that a local copy of the model can be created and trained from training data at the clinical site; training a local copy of the model according to the information; and sending to the central server i) an update to the model, the update being based on training of the local copy of the model according to the training data at the clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site.

According to a third aspect, the model trained according to the first or second aspect is used to perform a task on medical data.

According to a fourth aspect, an apparatus for training a model is provided for performing tasks on medical data using a distributed machine learning process, whereby a global model is updated based on training performed at a plurality of clinical sites. The device includes: a memory including instruction data representing a set of instructions; and a processor configured to communicate with the memory and configured to execute the set of instructions. The set of instructions, when executed by the processor, cause the processor to: a) Sending information to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train a respective local copy of the model according to training data at the respective clinical site; b) Receiving, from each of a plurality of clinical sites, i) local updates to parameters in the model, the local updates obtained by training a local copy of the model according to training data at the respective clinical site, and ii) metadata relating to a quality of training performed at the respective clinical site; and c) updating the parameters in the global model based on the received local updates to the parameters and the received metadata.

According to a fifth aspect, there is provided a computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the methods of the first and second aspects.

These and other aspects will be apparent from and elucidated with reference to these and described embodiments.

Drawings

Example embodiments will now be described, by way of example only, with reference to the following drawings, in which:

FIG. 1 illustrates a distributed learning process for training a model;

FIG. 2 illustrates an apparatus according to some embodiments herein;

FIG. 3 illustrates a method according to some embodiments herein;

FIG. 4 illustrates a method of determining model drift according to some embodiments herein;

FIG. 5 illustrates an apparatus according to some embodiments herein;

fig. 6 illustrates a method according to some embodiments herein;

FIG. 7 illustrates a system according to some embodiments herein; and

fig. 8 illustrates segmentation of an image of a liver according to a model trained according to embodiments herein.

Detailed Description

As described above, embodiments herein are directed to improving methods for training clinical models to perform tasks on medical data using a distributed machine learning process.

Turning to fig. 2, in some embodiments, there is an apparatus 200 according to some embodiments herein, the apparatus 200 for training a model to perform tasks on medical data using a distributed machine learning process. In general, the apparatus may form part of a computer apparatus or system, such as a laptop computer, desktop computer, or other computing device, for example. In some embodiments, the apparatus 200 may form part of a distributed computing arrangement or cloud.

The apparatus includes a memory 204 and a processor 202 (e.g., processing circuitry or logic), the memory 204 including instruction data representing a set of instructions, the processor 202 configured to communicate with the memory and execute the set of instructions. In general, the set of instructions, when executed by the processor, may cause the processor to perform any embodiment of the method 300 as described below.

Embodiments of the apparatus 200 may be used to train a model to perform tasks on medical data using a distributed machine learning process, thereby updating a global model based on training performed on local copies of the model at multiple clinical sites. More specifically, the set of instructions, when executed by the processor 202, cause the processor to: a) Sending information to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train a respective local copy of the model according to training data at the respective clinical site; b) Receiving from each clinical site of the plurality of clinical sites i) local updates to parameters in the model obtained by training a local copy of the model according to training data at the respective clinical site, and ii) metadata relating to a quality of training performed at the respective clinical site; and c) updating the parameters in the global model based on the received local updates to the parameters and the received metadata.

Processor 202 may include one or more processors, processing units, multi-core processors, or modules configured or programmed to control apparatus 200 in the manner described herein. In particular implementations, processor 202 may include a plurality of software and/or hardware modules, each configured to perform or for performing a single or multiple steps of the methods described herein. Processor 202 may include one or more processors, processing units, multi-core processors, and/or modules configured or programmed to control apparatus 200 in the manner described herein. In some implementations, for example, processor 202 may include multiple (e.g., interoperative) processors, processing units, multi-core processors, and/or modules configured for distributed processing. Those skilled in the art will appreciate that such processors, processing units, multi-core processors, and/or modules may be located at different locations and may perform different steps and/or different portions of a single step of the methods described herein.

The memory 204 is configured to store program code that can be executed by the processor 202 to perform the methods described herein. Alternatively or additionally, one or more memories 204 may be located external to apparatus 200 (e.g., separate from apparatus 200 or remote from apparatus 200). For example, one or more memories 204 may be part of another device. The memory 204 may be used to store the global model, the received local updates, the received metadata, and/or any other information or data received, calculated, or determined by the processor 202 of the apparatus 200 or from any interface, memory, or device external to the apparatus 200. Processor 202 may be configured to control memory 204 to store global models, received local updates, received metadata, and/or any other information or data described herein.

In some embodiments, memory 204 may include multiple sub-memories, each capable of storing a piece of instruction data. For example, at least one of the sub-memories may store instruction data representing at least one instruction of the set of instructions, while at least one other of the sub-memories may store instruction data representing at least one other instruction of the set of instructions.

It should be appreciated that fig. 2 shows only the components necessary to illustrate this aspect of the disclosure, and in a practical implementation, the apparatus 200 may include additional components than those shown. For example, the apparatus 200 may also include a display. The display may for example comprise a computer screen and/or a screen on a mobile phone or tablet. The apparatus may also include a user input device (such as a keyboard, mouse, or other input device that enables a user to interact with the apparatus), for example, to provide initial input parameters for use in the methods described herein. The device 200 may include a battery or other power source for powering the device 200 or components for connecting the device 200 to a mains power supply.

Turning to fig. 3, there is a computer-implemented method 300 for training a model to perform a task (e.g., processing) on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at multiple clinical sites. Embodiments of method 300 may be performed, for example, by an apparatus such as apparatus 200 described above.

Briefly, in step a), the method 300 comprises: information is sent 302 to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train a respective local copy of the model according to training data at the respective clinical site. In step b), the method 300 comprises receiving 304 from each clinical site of the plurality of clinical sites: i) Local updates to parameters in the model, the local updates obtained by training a local copy of the model according to training data at the respective clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site. In step c), the method comprises updating 306 parameters in the global model based on the received local updates to the parameters and the received metadata.

As noted above, since the model trains data from different sites, irregularities may exist in the data between sites, and this may result in bias and model drift, thereby differences in decision boundaries used to perform tasks (e.g., classification/segmentation, etc.) between different training sessions. In general, the bias describes how well a model matches the training set. Models with high bias will not closely match the data set, while models with low bias will very closely match the data set. The bias comes from a model that is too simple and unable to capture the trends that exist in the data set. Model drift can be classified into two broad categories. The first type is called "conceptual drift". Conceptual drift means that the statistical properties of the target variables that the model attempts to predict change over time in an unpredictable manner. This causes problems because over time the prediction becomes less accurate. "data drift": if the underlying variables are changing, the model is deemed to fail. This occurs when the statistical properties of the predictor change.

By considering appropriate metadata while combining weights, model drift can be avoided, which results in a better quality model. Thus, metadata relating to the quality of training performed at each site may be used when combining local updates into updates for the global model. In this way, different local updates may be given different importance (e.g., by using weighting) depending on the quality of the training performed at the respective clinical site.

In more detail, the model may include any type of model that may be trained using a machine learning process. Examples of models include, but are not limited to, neural networks, deep neural networks such as F-nets, U-nets, and convolutional neural networks, random forest models, and Support Vector Machine (SVM) models.

Those skilled in the art are familiar with machine learning and machine learning models, but in short, machine learning can be used to find a prediction function for a given data set; a data set is typically a mapping between a given input to an output. A prediction function (or mapping function) is generated in a training phase, which involves providing example inputs and ground true (e.g., correct) outputs to the model. The testing phase includes predicting the output for a given input. Applications of machine learning include, for example, curve fitting, facial recognition, and spam filtering.

In some embodiments herein, the model comprises a neural network model, such as a deep neural network model. Those skilled in the art will be familiar with neural networks, but in short, a neural network is a machine learning model that can be trained to predict the expected output given input data. The neural network is trained by providing training data that includes example input data and corresponding "correct" or ground truth results that are expected. The neural network includes a plurality of neuron layers, each neuron representing a mathematical operation applied to input data. The output of each layer in the neural network is fed into the next layer to produce an output. For each piece of training data, the weights associated with the neurons are adjusted (e.g., using a process such as backpropagation and/or gradient descent) until the optimal weights are found that produce a prediction that reflects training examples corresponding to ground truth values.

As noted above, the methods and systems herein relate to training a model, such as any of the models described above, using a distributed learning process. The distributed learning process is described above with reference to fig. 1, and details therein will be understood to apply to embodiments of the apparatus 200 and method 300. Examples of distributed learning processes include, but are not limited to, joint learning and distributed data parallel methods.

In some embodiments, the apparatus 200 may include a server that coordinates training performed by servers at multiple clinical sites, in other words, a "central server. Herein, method 300 may be performed or initiated by a user, a company, or any other designer or orchestrator of a training process, for example, using apparatus 200. Using terminology commonly associated with distributed learning schemes, a central server (e.g., such as the apparatus 200) may comprise the "master" of the scheme, and multiple clinical sites may comprise "workers" or nodes.

The central server (e.g., apparatus 200) may store and/or maintain (e.g., update) the global model. The global model (or a global copy of the model) includes a master copy or a central copy of the model. As described in more detail below, the results of the training performed at each of the plurality of clinical sites (e.g., local updates) are transmitted to a central server and incorporated into the global model. Thus, the global model represents the current "combined" results of all the exercises performed at the multiple clinical sites.

In this context, a clinical site may include a hospital, an operating room, a clinic, and/or a data center or other computing site adapted to store medical data originating from such a clinical site.

As indicated above, the model is used to perform tasks on medical data. In this context, medical data may include any type of data that may be used, generated and/or obtained in a medical environment, including but not limited to: clinical diagnostic data (such as patient vital signs or physiological parameters), medical images, medical files (such as patient records, for example), and/or output of a medical machine (from operating or diagnostic data of a medical device, for example).

The model may take as input one or more types of medical data as described above and perform tasks on the medical data. The task may include, for example, a classification task or a segmentation task. For example, the model may predict a classification of the medical data and/or provide an output classification. In embodiments herein, the model may output, for example, a patient diagnosis based on the input medical data. In embodiments where the medical data comprises a medical image, the model may output, for example, a segmentation of the medical image, a location of a feature of interest in the medical image, or a diagnosis based on the medical image. However, those skilled in the art will appreciate that these are merely examples, and that the model may take as input different types of medical data, and provide different types of output (e.g., perform different tasks) to the examples provided above.

Returning to method 300, as noted above, method 300 includes: a) Information is sent (302) to a plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model, and a respective local copy of the model is trained according to training data at the respective clinical site.

For example, the information may include model information indicating a type of the model and/or parameter values in the model. For example, in embodiments where the model includes a neural network, the information may include parameters including, but not limited to, the number of layers in the neural network model, the input and output channels of the model, and the values of weights and biases in the neural network model. Typically, the information sent in step a) is sufficient for each of the plurality of clinical sites to create a local copy of the model.

The information may also include instructions for how each clinical site trains the model. For example, the information may indicate, for example, the number of training sessions to be performed, the number of training data that should be used to train the model, the type of data to be used to train the model, and so on.

In step b), the method 300 comprises receiving (304) from each clinical site of the plurality of clinical sites: i) Local updates to parameters in the model, the local updates obtained by training a local copy of the model from training data at the respective clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site.

Local updating of parameters in the model may include training a local copy of the model from training data at the respective clinical siteThe result of (1). Such as changes in model parameters resulting from training. In embodiments where the model comprises a neural network, the parameters may comprise weights or biases in the neural network, or changes in weights or biases that should be applied in the neural network. Thus, in some embodiments, step b) comprises receiving updated values w of one or more weights or biases in the neural network model _i (or change in value Δ w) _i )。

The metadata relates to the quality of the training performed at the respective clinical site. In some embodiments, the metadata provides an indication of the performance of the respective local copy of the model after training. For example, an indication of the accuracy of the local model at the respective clinical site.

In some embodiments, for one or more subsets of training data having common characteristics expected to affect model errors, the metadata provides an indication of the performance of the respective local copy of the model after training. For example, by making it easier (or conversely more difficult) for the model to perform tasks (e.g., classification/segmentation) on the medical data, it can be expected that this characteristic will affect the model error. For example, the metadata may include an indication of the performance of the respective local model when classifying medical data having different quality levels or different integrity levels (e.g., full image versus partial image).

In another embodiment, the metadata may include medical statistics that may affect training errors. In other words, the metadata may include static states related to features of the training data at the respective medical site that may affect the accuracy of the respective local model. For example, the number of high quality training data samples is compared to the number of low quality training data samples.

In some embodiments, the metadata provides an indication of the quality of the training data at the respective clinical site. For example, the metadata may provide an indication of the distribution of training data at the clinical site between different output classes of the model. In this sense, the output classification may include a label or category output by the model. For example, the metadata may describe whether the training data is evenly distributed between different output classes, or whether the training data is skewed towards a particular class (e.g., more training data is associated with some labels than other labels).

For example, consider a classification problem with 5 classes (or labels), each clinical site has a different data rate in each class, and the trainable data varies as the distributed learning is performed. The returned metadata may include the number of samples of each class present in each node during the weight update. This may provide an indication of how the training data (e.g., between different classes) used to train the respective local models is balanced. Local updates produced by a more balanced training data set may be given more weight than local updates produced by a less balanced training data set.

In step c) of the method 300, the method comprises updating (306) parameters in the global model based on the received local updates to the parameters and the received metadata.

Typically, the metadata is used to perform parameter consolidation at the central server. Thus, the merged parameters may include a function of the parameters received from the clinical site and corresponding metadata. In other words, in some embodiments:

merge parameters = functions (metadata, parameters received from clinical sites). Mathematically, the function can be expressed as follows: suppose that N clinical sites N1, N2, N3 \8230; \8230, etc. with parameters W1, W2, W3 \8230;, etc. are considered and each clinical site has a measure of quality (measure) α 1, α 2, α 3 \8230; \8230, etc. where the α value varies between 0 and 1 and is calculated from metadata sent from the clinical sites to the central server. Hence, the merging parameters can thus be calculated as:

merge parameter = (α 1 × w1+ α 2 × w2+ α 3 × w3+. Once.)/(α 1+ α 2+ α 3+. Once.. So.)

Stated differently, in some embodiments, the step of combining local updates to the parameters to determine an update to the global model comprises determining the parameters for the global model according to:

global parameter = (α 1 × w1+ α 2 × w2+ α 3)*W3+....+α _N *W _N )/(α1+α2+α3+....α _N )；

Wherein W _N Including local updates to parameters in the model determined by the nth clinical site, and a _N Includes the range of 0 ≦ alpha _N Real number less than or equal to 1. Determining alpha from metadata associated with updates to parameters in a model determined by an nth clinical site _N The value of (c). For the avoidance of doubt, other parameters may also be used in calculating the alpha value. For example, for batch training, one of the methods of calculating α i includes:

α i = number of relevant samples in the ith node/global batch size where the available samples in the nth node can be obtained from the scanned metadata information (slice thickness, resolution, etc.). In some embodiments, step c) may include combining the local updates to the parameters by weighting each local update according to the respective metadata to determine updates to the global model such that local updates associated with metadata indicative of high quality training results have a higher weighting (e.g., a higher alpha value as described above) than updates associated with metadata indicative of low quality training results. For example, local updates associated with a more accurate local model may be given a higher weight than local updates associated with a less accurate local model, in general.

In one embodiment, the medical data comprises computed tomography CT scan data. In such embodiments, the metadata may provide an indication of the performance of the respective local copy of the model when classifying CT images of different radiation doses, e.g., the metadata may provide an indication of the performance of the model when classifying high-dose CT scans and/or (or in comparison to) low-dose CT scans. In such an example, it is expected that the model can more accurately classify CT images with high radiation doses than CT images with low radiation doses. In this embodiment, in step c) of the method 300, such metadata may be used to prioritize updates received from a first clinical site (having a local model with higher performance on a high-dose CT scan) compared to updates received from a second clinical site (having a local model with lower performance on a high-dose CT scan), for example, even if the first model has relatively poor performance on a low-dose CT scan.

In another embodiment, the metadata may describe the number of training data samples for low or high dose for contrast enhancement. As noted above, if the model is erroneous on a low dose CT image, the error is given less weight than a model that is erroneous on a high dose CT image (since it is expected that the algorithm performs very well on a high dose CT image and several errors on a low dose CT image will be acceptable).

In another example, the metadata may include an indication of model performance when classifying training data of different integrity levels. For example, in embodiments where the model is trained to perform anatomical feature segmentation in medical imaging data; and wherein the metadata comprises an indication of model performance when segmenting the full image of the anatomical feature and/or the partial image of the anatomical feature. In this embodiment, in step c) of the method 300, for example, even if the performance of the first model is relatively poor when segmenting a partial image of an anatomical feature, such metadata may be used to prioritize updates received from the first clinical site (having a local model with a higher performance in segmenting a full image of the anatomical feature) as compared to updates received from the second clinical site (having a local model with a lower performance in segmenting a full image of the anatomical feature).

In embodiments where the medical data comprises CT scan data and the model is used for segmentation of the liver in the CT scan data, the metadata may comprise, for example, the following information:

1. error for low dose CT and error for high dose CT

2. Error based on the division area

When viewing the CT volume, the liver will be totally invisible on each slice of the CT volume, which should be acceptable if the model is wrong when the liver is partially visible, and when the whole liver is visible (i.e. during distributed learning of liver segmentation), with certainly less error when compared to when the error is made on the image slice, the error is made on one node in case the liver is partially visible, and the error is made on the second node in both cases the liver is partially visible and the liver is fully visible, the algorithm should weight the updates from the previous node.

Thus, in the manner described above, the global model may be updated using metadata that provides a deeper understanding of the quality of local updates determined by multiple clinical sites in a distributed learning scheme. As described above, in cases where the training data originates from different clinical sites, there is a possibility that bias and/or model drift will affect the model. The methods herein exhibit means to reduce this effect and counter model bias and data heterogeneity.

Turning now to other embodiments, in some embodiments herein, the method 300 may be further refined by detecting whether the global model drifts during the training process based on an analysis of the visual output. For example, if the region of interest activated/considered by the model remains changing when determining the classification or label, the associated drift can be ascertained. The variance value may be calculated based on reference training data fed through the global model at different points in time during the training process (e.g., at point in time t) ₀ And at t ₁ A change in variance is obtained). The change calculation may be in terms of coordinate values, or areas under the bounding box of the region of interest that are activated/considered by the model.

For example, model drift may be determined according to the following equation:

model drift: l (| (t) ₀ Coordinates of (c) - (d) ₁ Coordinates of points))>Dynamic threshold

In other words, the preceding steps a), b) and c) (e.g. at time t) ₀ ) The method 300 may include: for a test medical image, a first region of the test image used by the global model to perform a task on the test medical image is determined. Then, the method may further comprise, after steps a), b) and c): determining global mode updated by the test medical imageThe method further includes comparing the first region of the test image to a second region of the test image to determine a measure of model drift.

The comparing step may include, for example, comparing coordinates associated with the first and second regions (e.g., coordinates at the center or edge of a region or bounding box), or comparing regions within the first and second regions and determining whether the regions have changed, for example, by a statistically significant amount or by more than a threshold amount.

In this context, the dynamic threshold may be determined based on the current content and the different models and types of models in question. Thus, it is not static for all application/model types.

This is illustrated in fig. 4, which fig. 4 shows an image of a liver 402 comprising a lesion 404. The model is used to classify (e.g., localize) the lesion. The model is trained according to the method 300 described above. Before steps a), b) and c), at time t ₀ The model classifies the lesion based on the region of the image 406 a. At time t ₁ The model classifies the same lesion based on the region of the image 406 b. A difference in the position and size of region 406a and region 406b may indicate that the model has drifted. Thus, by comparing and monitoring changes in the region between different training sessions/updates, the drift of the model can be determined.

In other embodiments, steps a), b) and c) may be repeated, for example, to provide a sequence of training periods. For example, steps a), b) and c) may be repeated periodically, or whenever new training data becomes available at the clinical site.

Turning to other embodiments, in some embodiments, the method may be enhanced by using active learning. One skilled in the art will be familiar with active learning, but in short, active learning focuses on training data that has been previously misclassified by the model or classified with a low probability of accuracy. Thus, effectively, the training is concentrated on weak areas in the model.

In some embodiments, the method may thus comprise repeating steps a), b) and c) for a subset of training data at each respective clinical site, the training data being classified by the model with a certainty below a threshold level of certainty. For example, the confidence level output by the model may be used to measure certainty. In other embodiments, a measure of entropy may be used to calculate the certainty of the model that classifies the data. The measure of entropy may reflect the amount of information in the data set. Thus, the higher the entropy, the higher the amount of information in the data set. Thus, for example, if a data set has high entropy, its contents have diversity.

In general, a fuzzy region may be defined that includes training data whose classification is uncertain. The training data in such fuzzy regions may be used for subsequent periods of model training. Note that the fuzzy region may be dynamic and change from epoch to epoch as the (global) model improves.

Furthermore, in these embodiments, in the case where optimized (e.g., proactive) distributed learning is performed, where the model is trained on misclassified training samples each time, the metadata and thus the quality metric value (α) as described above may change for each training period.

In this way, the most relevant training data is considered in each training period, each adding more value to the model. In this way, "optimized" distributed learning may be performed, which only considers examples of misclassifications for updating weights in subsequent epochs. The proposed concept captures changes in the data set while ensuring that the data does not leave the hospital site. Furthermore, the new concept designed ensures that the training model gives high performance with less data.

In general, one problem with distributed learning processes is that: as model predictions improve, the parameter updates become smaller (the weight updates become invalid by applying simple averaging or weighted averaging). Sometimes, this makes the distributed learning model less performing than the centralized model. The use of active learning as described above helps to overcome this problem by considering only misclassified (the nature of active learning) or images that have not been properly segmented or classified for retraining. This has various advantages: the training data of each time is reduced, so that the training time is faster; since the algorithm is trained only on misclassified data, the loss function is more focused and the gradient used to update the model may be better.

In general, it is an idea of utopia to assume that data from all clinical sites are very similar. Therefore, considering the quality of misclassified data (according to active learning principles) as metadata and using this information in merging weights will help build a better global model.

The combined ideas of distributed learning and active learning follow the philosophy of "global thinking" and "local action". Global models can be taught using training data from different hospitals located around the globe through distributed learning, with improved model performance at various nodes through active learning. Distributed learning captures data changes across a population of people (who may) be located in a global location, while active learning uses less data to improve performance at local nodes.

Turning now to a view of a clinical site performing local training as described above, fig. 5 illustrates an apparatus 500 according to some embodiments herein, the apparatus 500 for training a model in a clinical site to perform tasks on medical data using a distributed machine learning process. In general, the apparatus may form part of a computer apparatus or system, such as a laptop computer, desktop computer, or other computing device, for example. In some embodiments, the apparatus 500 may form part of a distributed computing arrangement or cloud.

The apparatus includes a memory 504 and a processor 502 (e.g., processing circuitry or logic), the memory 504 including instruction data representing a set of instructions, the processor 502 configured to communicate with the memory and execute the set of instructions. In general, the set of instructions, when executed by the processor, may cause the processor to perform any of the embodiments of the method 600 as described below.

Embodiments of the apparatus 500 may be used to train a model in a clinical site to perform tasks on medical data using a distributed machine learning process, whereby a global model at a central server is updated based on the training performed on local copies of the model at the clinical site. More specifically, the set of instructions, when executed by the processor, cause the processor to: receiving information from a central server enabling a local copy of the model to be created and trained from training data at the clinical site; training a local copy of the model according to the information; and sending to the central server i) an update to the model, the update being based on training of the local copy of the model according to the training data at the clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site.

Processor 502 may include one or more processors, processing units, multi-core processors, or modules configured or programmed to control apparatus 500 in the manner described herein. In particular implementations, processor 502 may include a plurality of software and/or hardware modules that are each configured to perform or be used to perform a single or multiple steps of the methods described herein. Processor 502 may include one or more processors, processing units, multi-core processors, and/or modules configured or programmed to control apparatus 500 in the manner described herein. In some implementations, for example, the processor 502 may include multiple (e.g., interoperative) processors, processing units, multi-core processors, and/or modules configured for distributed processing. Those skilled in the art will appreciate that such processors, processing units, multi-core processors, and/or modules may be located at different locations and may perform different steps and/or different portions of a single step of the methods described herein.

The memory 504 is configured to store program code that can be executed by the processor 502 to perform the methods described herein. Alternatively or additionally, the one or more memories 504 may be located external to the apparatus 500 (e.g., separate from the apparatus 500 or remote from the apparatus 500). For example, the one or more memories 504 may be part of another device. The memory 504 may be used to store the global model, the received local updates, the received metadata, and/or any other information or data received, calculated, or determined by the processor 502 of the apparatus 500 or from any interface, memory, or device external to the apparatus 500. The processor 502 may be configured to control the memory 504 to store a local copy of the model, training data, trained output, and/or any other information or data generated by the method 600 described below or used at the method 600.

In some embodiments, memory 504 may include multiple sub-memories, each capable of storing one piece of instruction data. For example, at least one sub-memory may store instruction data representing at least one instruction of the set of instructions, while at least one other sub-memory may store instruction data representing at least one other instruction of the set of instructions.

It should be appreciated that fig. 5 shows only the components necessary to illustrate this aspect of the disclosure, and in a practical implementation, the apparatus 500 may include additional components than those shown. For example, the apparatus 500 may also include a display. The display may for example comprise a computer screen and/or a screen on a mobile phone or tablet. The apparatus may also include a user input device (such as a keyboard, mouse or other input device that enables a user to interact with the apparatus), for example, to provide initial input parameters for use in the methods described herein. The apparatus 500 may comprise a battery or other power source for powering the apparatus 500 or means for connecting the apparatus 200 to a mains power supply.

Turning to FIG. 6, there is a computer-implemented method 300 for training a model to perform tasks on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at a plurality of clinical sites. Embodiments of method 600 may be performed, for example, by an apparatus such as apparatus 500 described above.

Briefly, in a first step 602, the method 600 includes: information is received from a central server, enabling a local copy of the model to be created and trained from training data at the clinical site. In a second step 604, the method comprises: a local copy of the model is trained based on this information. In a third step 606, the method comprises: sending to the central server i) an update to the model, the update being based on training of a local copy of the model according to training data at the clinical site, and ii) metadata relating to a quality of training performed at the respective clinical site.

The method and apparatus corresponding to the central server is described above with reference to fig. 2 and 3, and the details therein will be understood to apply equally to the method in the clinical site.

In this context, the clinical site 500 may include a server (e.g., a "clinical server") or data center associated with a hospital, operating room, clinic, or any other medical facility. The clinical site may comprise, for example, a data center, such as a Hospital Data Center (HDC) or any other computing site suitable for storing medical data.

The information received in step 602 is described above with respect to fig. 2 and 3, and the details therein will be understood to apply equally to the apparatus 500 and method 600. Using this information, the clinical site creates a local copy of the model, and trains the local copy of the model using training data at the clinical site (e.g., from information received from the central server).

Those skilled in the art will be familiar with methods of training machine learning models, for example, using methods including, but not limited to, gradient descent and back propagation.

The clinical sites obtain metadata relating to the quality of training performed on the local models at the respective clinical sites, and in step 606, send i) an update to the model based on training (e.g., training results) of a local copy of the model from the training data at the clinical sites, and ii) the metadata to the central server. The metadata is described in detail above with respect to the apparatus 200 and method 300, and the details therein will be understood to apply equally to the apparatus 500 and method 600.

Turning now to another embodiment, fig. 7 illustrates a method of training a model using a distributed learning process, according to some embodiments herein. In this embodiment, researchers or other users are present on a computer or server 700, a central server 702, and a plurality of clinical sites (or nodes) 704. For clarity, only one clinical site 704 is shown in fig. 7. In this embodiment, the model includes a neural network. The method is as follows.

The researcher develops the model and places it on the server along with pre-initialized weights 708. The following procedure is then performed:

710. the researcher sends the model and initialized weights to server 702. This starts the server. The server waits for node 704 to connect.

712. Once the server is connected, the deep learning model is passed to the node, and the connection between the server and the node is encrypted. Node 704 receives 714 the model.

716. The node creates a local copy of the model and performs training on the local copy of the model. The training is done using an active learning approach whereby the initialized model is used to perform prediction (or classification) on the training data at that node. If the prediction has a confidence less than a certain threshold confidence value (the threshold confidence value is assigned by the researcher, e.g., a dice score less than 0.95), this is used for further training of the model. As mentioned in the model file, the model is trained for several periods. In general, training includes receiving weight values 718, fitting different epochs 720, and obtaining final weights and metadata 722.

724. The weights are returned to the central server 702 along with metadata relating to the quality of the training performed.

726. With the help of the metadata returned from the nodes, the weights returned are combined using an average or weighted average or other statistical method as deemed appropriate by the researcher and then used to update the global model (e.g., the version of the model stored on the central server 602). Information describing the updated global model is then sent 726 back to node 704 for retraining with the new merge weights.

This process is performed in an iterative manner until the model converges. The transfer between the central server 704 and the node 706 may be recorded in a database (this may also be recorded on a blockchain so that records cannot be deleted). This step may be used to maintain privacy.

Once the model has converged, the final weights may be sent 730 to the researcher.

Note that if the central server has local training data stored thereon, the central server may also perform training 732 on its own local copy of the model (e.g., in some embodiments, the central server may include a server at the clinical site that trains its own local copy of the model while also coordinating the distributed training process among multiple clinical sites).

Turning now to another embodiment, in some embodiments, a model trained according to any method or apparatus herein (e.g., method 300, method 600 or method 700 or apparatus 200 or apparatus 500) is used to perform a task on medical data. May be used in addition to or separate from the methods herein. Examples of uses include, but are not limited to: for example, an image (such as a CT scan of the liver) is segmented using a model trained according to any of the methods herein; medical records are classified (e.g., diagnosed or subjected to some other classification) using a model trained according to any of the methods herein.

Turning now to fig. 8, an output segmentation of a liver 802 produced by a model trained using a conventional distributed learning process is illustrated and compared to a segmentation of a liver 804 output by a model trained using the

methods

300 and 600 described above.

In another embodiment, a computer program product is provided comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform one or more of the methods herein.

Thus, it should be understood that the present disclosure also applies to computer programs, particularly computer programs on or in a carrier, adapted for putting the embodiments into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled form, or in any other form suitable for use in the implementation of the method according to the embodiments described herein.

It should also be understood that such programs may have many different architectural designs. For example, program code implementing the functionality of the method or system may be subdivided into one or more subroutines. Many different ways of distributing functionality between these subroutines will be apparent to those skilled in the art. The subroutines may be stored together in one executable file to form a self-contained program. Such an executable file may include computer-executable instructions, such as processor instructions and/or interpreter instructions (e.g., java interpreter instructions). Alternatively, one or more or all of the subroutines may be stored in at least one external library file and linked with a main program, either statically or dynamically, for example at run-time. The main program contains at least one call to at least one subroutine. The subroutines may also include function calls to each other.

The carrier of the computer program may be any entity or device capable of carrying the program. For example, the carrier may comprise a data store such as a ROM (e.g. a cd ROM or a semiconductor ROM), or a magnetic recording medium (e.g. a hard disk). Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the carrier being adapted for performing, or for use in the performance of, the relevant method.

Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the principles and techniques described herein, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored or distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with other hardware as part of the other hardware, but may also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems. Any reference signs in the claims shall not be construed as limiting the scope.

Appendix 1

Experimental data

Experiment 1: sample size: number of samples at each node (class imbalance)

Model type: neural networks trained on a modified national institute of standards and technology database (MINST). MINST contains the handwritten images of the numbers 0 to 9. The model is trained to classify each image based on its digital content.

In the experiment, medical data was available at 2 nodes. There are 10 different classes of training data sets. In the first node, there are 9 similar popular classes and a small number of samples from class 10. At the second node, the first 9 types of data are very sparse and the 10 th type of data is popular.

In the above example, two methods are used to merge the models.

1) Simple average merging: the models are merged without metadata information. The merged model gave an accuracy of 20%.

2) Weighted average combining: the metadata information is used to merge models (class populations). The accuracy of the model on the training and testing data reached 90% and 88.9%, respectively.

The results are summarized below.

Experiment 2: acquisition/image scanner setup

Model: neural network

Assume that there is data available at 2 nodes. The data sets at node 1 and node 2 are acquired at two different locations using different CT machines.

In the above example, data is collected from two different locations. The first data set had an average HU intensity of 60HU, with the second data set having an average HU intensity of 100. To perform joint learning, based on the metadata, it would be necessary to use advanced pre-processing techniques. For example, if the average strength of a site is slightly different from the expected value, the data may still be used, but with a lower weight for that site (e.g., based on scaling according to the degree or difference of the expected value). I.e. indicating that weight updates from that site should be given a lower priority because the distribution is not as desired. If the data is entirely on a different scale, the weights can be made zero so that the model is not completely destroyed. In general, the model is more likely to fail if the data from the two sites is different or of different nature. Thus, the weights assigned to sites can be varied based on statistical heterogeneity to find the best model from a given site. This improves the accuracy of the resulting global model.

Experiment of the invention3: image quality (CT scan)

Model: neural network

Suppose that there is data available at 2 nodes. The data sets at node 1 and node 2 are acquired at two different locations using different CT machines. The quality of the data collected from the two centers is quite different. Based on the intensity histogram compression, weighted combining may be performed. Where higher weights are assigned based on similarity to the total sample data set.

This allows the model to be penalized without errors occurring for low quality data by assigning lower weights. This improves the accuracy of the resulting global model.

Claims

1. A computer-implemented method of training a model for performing a task on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at a plurality of clinical sites, wherein the model is for use in predicting a classification for the medical data, or wherein the medical data comprises a medical image and the model is for use in segmenting the medical image, the method comprising:

a) Sending (302) information to the plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train the respective local copy of the model according to training data at the respective clinical site;

b) Receiving (304), from each clinical site of the plurality of clinical sites: i) Local updates to parameters in the model, the local updates obtained by training the local copy of the model in accordance with the training data at the respective clinical site, and ii) metadata relating to the quality of the training performed at the respective clinical site; and

c) Updating (306) the parameters in the global model based on the received local updates to the parameters and the received metadata by: combining the local updates to the parameters by weighting each local update according to the respective metadata to determine updates to the global model such that local updates associated with metadata indicating high quality training results have a higher weighting than updates associated with metadata indicating low quality training results.

2. The method of claim 1, wherein combining the local updates to the parameters to determine the update to the global model comprises:

determining parameters for the global model according to:

global parameter = (alpha 1 star W1+ alpha 2 star W2+ alpha 3 star W3+ \ 8230, + alpha _N *W _N )/(α1+α2+α3+…α _N )；

Wherein W _N Comprises a pair ofThe local update of the parameters in the model determined by n clinical sites, and a _N Including the range of 0 ≦ α _N Real number less than or equal to 1; and

wherein said alpha is _N Is determined from the metadata associated with the update to the parameters in the model determined by the nth clinical site.

3. A method according to any one of the preceding claims, wherein the metadata provides an indication of the performance of the respective local copy of the model after the training for one or more subsets of training data at the respective clinical sites having common characteristics expected to affect model error.

4. The method of claim 3, wherein the medical data comprises a Computed Tomography (CT) scan; and

wherein the metadata comprises an indication of the performance of the local copy of the model when classifying CT scans of different radiation doses.

5. The method of claim 3, wherein the medical data comprises a medical image and the model is for segmentation for use in segmenting the medical image to obtain anatomical features in the medical imaging data; and wherein the metadata comprises an indication of the performance of the model when segmenting a full image of the anatomical feature and/or a partial image of the anatomical feature.

6. The method of any preceding claim, wherein the metadata provides an indication of the quality of the training data at the respective clinical site.

7. The method of claim 6, wherein the metadata provides an indication of a distribution of the training data at the clinical site between different output classifications of the model.

8. The method of any preceding claim, wherein the medical data comprises a medical image, the method further comprising:

prior to steps a), b) and c):

for a test medical image, determining a first region of the test image used by the global model to perform the task on the test medical image; and after steps a), b) and c):

determining, for the test medical image, a second region of the test image used by the updated global model to perform the task on the test medical image; and

comparing the first region of the test image to the second region of the test image to determine a measure of model drift.

9. The method of any preceding claim, further comprising:

repeating steps a), b) and c) for a subset of the training data at each respective clinical site, the subset of training data being classified by the model with a level of certainty below a threshold level of certainty.

10. A computer-implemented method at a clinical site for training a model for performing tasks on medical data using a distributed machine learning process, whereby a global model at a central server is updated based on training performed on local copies of the model at the clinical site, wherein the model is for use in predicting a classification for the medical data, or wherein the medical data comprises a medical image and the model is for use in segmenting the medical image, the method comprising:

receiving information from a central server such that a local copy of the model can be created and trained from training data at the clinical site;

training a local copy of the model according to the information; and

sending, to the central server, i) an update to the model, the update being based on training of the local copy of the model in accordance with the training data at the clinical site, and ii) metadata relating to a quality of the training performed at the respective clinical site.

11. The method of any preceding claim, wherein the model comprises a neural network model and the parameters comprise weights or biases in the neural network model.

12. An apparatus for training a model for performing a task on medical data using a distributed machine learning process, whereby a global model is updated based on training performed on local copies of the model at a plurality of clinical sites, wherein the model is for use in predicting a classification for the medical data, or wherein the medical data comprises a medical image and the model is for use in segmenting the medical image, the apparatus comprising:

a memory including instruction data representing a set of instructions; and

a processor configured to communicate with the memory and configured to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to:

a) Sending information to the plurality of clinical sites to enable each clinical site of the plurality of clinical sites to create a local copy of the model and train the respective local copy of the model according to training data at the respective clinical site;

b) Receiving from each clinical site of the plurality of clinical sites i) local updates to parameters in the model obtained by training the local copy of the model in accordance with the training data at the respective clinical site, and ii) metadata relating to a quality of the training performed at the respective clinical site; and

c) Updating the parameters in the global model based on the received local updates to the parameters and the received metadata by: combining the local updates to the parameters by weighting each local update according to the respective metadata to determine updates to the global model such that local updates associated with metadata indicating high quality training results have a higher weighting than updates associated with metadata indicating low quality training results.

13. A computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method of any of claims 1 to 11.