WO2020185973A1 - System and method with federated learning model for medical research applications - Google Patents

System and method with federated learning model for medical research applications Download PDF

Info

Publication number
WO2020185973A1
WO2020185973A1 PCT/US2020/022200 US2020022200W WO2020185973A1 WO 2020185973 A1 WO2020185973 A1 WO 2020185973A1 US 2020022200 W US2020022200 W US 2020022200W WO 2020185973 A1 WO2020185973 A1 WO 2020185973A1
Authority
WO
WIPO (PCT)
Prior art keywords
end user
tensor
federated
model
data
Prior art date
Application number
PCT/US2020/022200
Other languages
French (fr)
Inventor
Walter Adolph DE BROUWER
Srivatsa Akshay Sharma
Neerajshyam Rangan KASHYAP
Kartik THAKORE
Philip Joseph DOW
Original Assignee
doc.ai incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by doc.ai incorporated filed Critical doc.ai incorporated
Publication of WO2020185973A1 publication Critical patent/WO2020185973A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/945User interactive design; Environments; Toolboxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic
    • G06N7/023Learning or tuning the parameters of a fuzzy system
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies

Definitions

  • the disclosed system and method are in the field of machine learning.
  • federated machine learning utilizing computation capability of edge devices and a federated learning (“FL”) aggregator, which is typically cloud-based, relative to the edge devices.
  • edge devices typically are mobile devices, but also can include nodes that aggregate data from multiple users.
  • 2019 leads us to the next deep dive of intelligent and/or neuromorphic computing, the federated learning technology.
  • Deep learning applies multi-layered networks to data. While training can be automated, there remains the problem of assembling training data in the right formats and sending data to a central node of computation with sufficient storage and compute power. In many fields, sending personally identifiable, private data to any central authority causes worries about data privacy, including data security, data ownership, privacy protection and proper authorization and use of data.
  • the technology disclosed includes systems and methods for federated learning.
  • a crowd of end users runs application programs on mobile devices that collect data, train, compute, and evaluate data stored on the mobile devices.
  • the original data does not leave the device where it is stored, that is used to compute an updated model.
  • Devices later federate data globally by sending“derived insights” in the form of updated model parameters, sometimes called tensors, to an FL aggregator where all these derived insights are combined.
  • Devices then receive from the FL aggregator an updated matrix or model which can improve local prediction of these devices. This is repeated in cycles.
  • federated learning With federated learning, a device on the edge can send de-identified updates to a model instead of sending over raw data such as images or audio, that would then be used to update the model.
  • federated learning greatly privacy concerns, since the raw data never leaves these devices.
  • Federated learning reduces data ownership concerns, as end users are enabled to opt in or out to share raw data and parameter updates created in their devices. Federated learning further greatly reduces security concern, because there is no single point at which a security breach can compromise a large body of data - hackers cannot hack millions of mobile devices that store the raw data.
  • the machine learning process can be described as five steps.
  • a cost function e.g., how well the network solves the problem, which the system should strive to minimize.
  • the network is mn and see how it does, as measured by the cost function.
  • the values of the network parameters are adjusted, and the network is run again.
  • the difference between successive results is the direction or slope in which the result of applying network moved between the trials. This process is called a gradient.
  • the technology disclosed includes a system for federated learning utilizing computation capability of edge devices in communication with an FL aggregator.
  • the system comprises multiple edge devices of end users, one or more federated learner update repositories, and one or more FL aggregators.
  • Each edge device comprises a federated learner model, configured to send tensors to at least one FL aggregator or federated learner update repository.
  • An FL aggregator includes a federated learner, which may be part of the FL aggregator or a separate module.
  • the FL aggregator and/or federated learner is configured to send tensors to the federated learner update repository.
  • Federated learner update repository comprises a back-end configuration, configured to send model updates to edge devices.
  • the technology disclosed includes is a method of federated learning utilizing computation capability of edge devices.
  • the method comprises sending out tensors by multiple edge devices with federated learning models, receiving tensors by an FL aggregator including a federated learning update repository from the edge devices, distributing updated models from the federated learning update repository to the edge devices, and the edge devices using the updated models.
  • the technology disclosed includes a federated learning system comprising multiple federated learners, whereas each federated learner is configured to be an end user side library, built for an edge device environment.
  • Such federated learners on edge devices update model parameters based on raw data and ground truth collected in the edge device.
  • the edge devices perform model post-processing and share updated parameters with a central federated learner update repository.
  • the edge devices can download of updated models. They can evaluate the updated models against locally held data, preferably data withheld from training, and report evaluations to the repository or FL aggregator.
  • the technology disclosed includes a federated learner update repository, sometimes described as a component of an FL aggregator, comprising a federated learning back-end that collects model updates and evaluations from Flea end users.
  • the FL aggregator can be a high availability system. It organizes models that can be updated based on data from end user edge device updates and performs operations required to make these updates, such as admitting or rejecting proposed updates from end users based. Such determination can be based on criteria and metadata sent by end user.
  • the FL aggregator combines admissible end user updates into an overall update and redistributes the updated model to edge devices.
  • FIG. 1 is a flow chart illustrating an example core template of machine learning workflow.
  • FIG. 2 is a diagram illustrating an example federated learning model with multiple edge devices and a central FL aggregator.
  • FIG. 3A is a diagram illustrating an example use case of a federated learner system, comprising one- to-many tensors for distributed clinical trials.
  • FIG. 3B is a diagram illustrating an example use case of a federated learner system, comprising Fleas for distributed clinical trials.
  • FIG. 4 is a diagram illustrating an example FL aggregator.
  • FIG. 5 is a diagram illustrating an example use case of tensor globalization of a federated learner system.
  • FIG. 6A and FIG. 6B are diagrams illustrating an example use case of a federated learner system in a linear training trial and in an adaptive and continuously learning distributed trial, comprising federated learners and FL aggregator for application of data trial.
  • FIG. 7 is a diagram illustrating an example use case of a federated learner system, comprising simulated control arms for trials.
  • FIG. 8 is a diagram illustrating centralized data collection and training, leading to deployment to edge devices.
  • FIG. 9 is a diagram illustrating edge device update training followed by centralized aggregation of the updated models.
  • FIG. 10 is a diagram illustrating more detail of data at edge devices during update training.
  • FIG. 11 is a graphic user interface illustrating use of a selfie to estimate age, height and weight, from which body mass index (BMI) can be calculated.
  • BMI body mass index
  • FIG. 12 is a simplified message diagram depicting exchanges between four edge devices and an FL aggregator, over three cycles of model updating.
  • FIGS. 13-14 are scatter plots from edge device training on small samples and a centrally model trained on a large sample.
  • FIG. 15 is a conceptual diagram illustrating updating a global model from local models, applying update averaging.
  • FIG. 16 is an example convolutional neural network.
  • FIG. 17 is a block diagram illustrating training of the convolutional neural network of FIG. 16.
  • FIG. 18 is a simplified block diagram of a computer system that can be used to implement the technology disclosed.
  • the technology disclosed includes demonstrated image processing applications for frontal face images and meal images, as well as an anticipated clinical platform.
  • Applicant applied federated learning to its mobile device app that estimates age, sex, height, and weight, then calculates BMI, all from a selfie, a frontal face photograph of the mobile device user.
  • Clinical can be taken in a broad sense to include collection of health related data, such as mood or general health, which might be assessed against a voice or photographic sample.
  • Clinical can also be take in a pharmaceutical sense for providing a tool for contract research organizations to collect data occasionally or periodically during a traditional clinical trial. Collection of data that is partially or completely anonymized can be complemented with a so-called synthetic control arm, in lieu of giving part of the trial participants a placebo. Anonymized data can encourage frequent reporting. Receiving test doses, instead of being at risk of receiving a placebo, is further encouraging.
  • Mobile machine learning in this disclosure, refers to inference on device, training on device, and federated learning, which can be applied to health care.
  • Theoretical and practical challenges need to be faced and overcome to demonstrate a practical federated learning application, especially in a sensitive area such as health care.
  • FIG. 8 A typical machine learning workflow is illustrated by FIG. 8. Having identified a problem space and a learning task, one finds a large body of data 811, 853 to train a model at a central repository 857, in a centralized manner. After being satisfied with the model, one deploys it to edge devices or to a cloud-based compute resource 859 for prediction.
  • Typical model training involves centrally collecting the data and centrally training the model even when it is deployed in a distributed manner. This involves bringing the data 811 to a central repository 853 to gain control over how it's used in training 857.
  • FIG. 1 is a high level flow chart of machine learning workflow.
  • a core template of machine learning workflow comprises four steps.
  • Step 1 is data collection, to procure raw data.
  • Step 2 is data re-formatting, to prepare the data in the right format.
  • Step 3 is modeling, to choose and apply a learning algorithm.
  • Step 4 is predictive analytics, to make a prediction.
  • Variables that are likely to influence future events are predicted.
  • Parameters used to make the prediction are represented in multi-dimensional matrix, called tensors.
  • a multi-dimensional matrix, or tensor has certain features commend this data representation to machine learning.
  • Linear algebra operations are efficiently applied by GPUs and other parallel processors on computers. Linearization or differentiation make it feasible to frame optimization problems as linear algebra problems. Big data is difficult to process at scale without tensors, so many software tools have come onto market that simplify tensor computing, e.g., TensorLab, Matlab package, Google TensorFlow, etc. Hardware is following software.
  • tensor processing accelerator chips e.g., NVDIA GPUs, Google TPUs, Apple All, Amazon Inferentia, Graviton and Echo-chip, Facebook Glow, and a whole range of technology companies that make Application-Specific Integrated Circuits (ASIC), field programmable gate arrays (FPGAs) and coarse-grained reconfigurable arrays (CGRAs) adapted to calculate tensors with tensor calculation software.
  • ASIC Application-Specific Integrated Circuits
  • FPGAs field programmable gate arrays
  • CGRAs coarse-grained reconfigurable arrays
  • FIG. 2 is a diagram illustrating an example federated learning model with multiple edge devices and a central FL aggregator.
  • a federated learner can be implemented as an end user side library, built for an edge device environment, to perform local model update calculations using data collected in the edge device environment.
  • the Flea can perform post-processing after model updating, including applying perturbations (e.g., encryption and introduction of noise for privacy purposes), sharing the model update with a central update repository (i.e., an FL aggregator), optionally downloading updated models, evaluating updated models, and sharing evaluation metrics across platforms, e.g., Flea-iOS (for iPhones), Flea-Android (for Android phones), Flea-kubemetes (for node clients), etc.
  • perturbations e.g., encryption and introduction of noise for privacy purposes
  • a central update repository i.e., an FL aggregator
  • optionally downloading updated models evaluating updated models, and sharing evaluation metrics across platforms, e.g., Flea-iOS (for iPhones), Flea-Android (for Android phones), Flea-kubemetes (for node clients),
  • a federated workflow 915 we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices. The federated workflow aggregates the local updates into a new global model 959 which will become our next base model 951 that will be used for inference and additional rounds 915 of training a federated loop. Again, updating via the federated loop 915 does not require centrally collecting data. Instead, we're sending the model to the data for training, not bringing data to the model for training. This is a decentralized workflow instead of a centralized workflow.
  • an individual may understand the research value of sharing information, but doesn't tmst the organization that they're being asked to share with. The individual may wonder what third parties that could gain access to their data.
  • On the B2B side there are intellectual property issues that thwart companies that want to collaborate, but are unable to share their raw data for IP reasons. The technology disclosed can enable collaboration without necessarily sharing data. Also on the B2B side, some companies have internal data policies that prevent even intra-company, cross-division sharing of data. These companies would benefit from collaboration without data sharing.
  • the technology disclosed applies federated learning to an environment where it's difficult to share underlying data due to data sensitivity concerns.
  • One of the priority applications addresses so-called vertical learning.
  • This application focuses on so-called horizontal federated learning, in which devices have at a different sample space for the same feature space, as opposed to vertical learning, which can be applied to the same sample space with different feature spaces.
  • Horizontal learning applies well to a mobile environment, where a model can be completely shared.
  • a data set in the form of a table 1015 This data can be visualized as is a matrix with samples across rows and features down columns.
  • the rows of data may correspond to samples used with a neural network for training. They also may correspond to a SQL-returned table and may have a unique identifiers, IDs, across rows and again have columns of features.
  • the dataset 1015 is divided horizontally among devices 953. In this horizontally partitioned dataset, each subset of the data has access to the same feature space, but has its own sample space, as one can imagine of data trained or collection on mobile phones.
  • Each edge device can start with the same base model.
  • An FL aggregator or federated learning repository or some other central authority or compute resource sends the base model to the edge device for update training, to produce updated models 957.
  • the edge devices 953 train using respective partitions of the data 1015, producing the updated models 957, which are aggregated 959 into an updated model which can be distributed as a new base model 951.
  • the base model resides locally on each device.
  • Each device trains locally on data that is available on device.
  • the federated loop aggregates the local updates to produce a new global model.
  • FIG. 11 depicts a graphic user interface for medical selfies. At one time, most of the information in 1153 is collapsed, and the frontal face image is visible. When estimates are given, the user is invited to correct the system’s estimates of age, sex, height and weight. (BMI is calculated from the other values.) At another time, the user can expand some or all of the information panels, as in 1157, and reveal further information.
  • This model can be trained in a federated manner, beginning with a base model 951 trained conventionally on millions of images to produce a model that performs relatively well.
  • This base model is sent to an edge device where it's first used to perform inference on new images collected by the user, such as selfies. The user will be given the option to correct the inferences made by the model, so that accurate age, sex, height and weight are known. With this ground truth, the base model is trained to produce an updated model.
  • Each of the participating edge devices similarly produces local updates to the current model. Those local updates are centrally aggregated into a new based model and the process repeats. Aggregation can be performed using a federated average algorithm, applying the averaging formula 1577 in FIG. 15.
  • the base convolution model can be a MobileNet V2 model with supplemental training that builds on transfer learning of facial images. Transfer learning can leverage training on an ImageNet classification problem. For age, sex, height and weight, custom layers can be stacked on top of an ImageNet or MobileNet V2 model.
  • Initial training of the base model can be offline. Then, the trained base model can be distributed to edge devices, which produce updates that are processed by the federated loop, as illustrated in FIG. 9.
  • the horizontal axis is time.
  • Devices are depicted on the vertical axis, including a coordinating server 1221 that manages training tasks and performs model aggregation.
  • the figure illustrates four edge devices 953 that perform training using local data to produce local updates of a base model.
  • messages travel down and up between the coordinating server 1221 and individual devices 953, each represented by a horizontal line.
  • the stream of communications reflects asynchronous messaging, with simplifications involving just a handful of devices and grouping of communications back and forth that would likely be interleaved or multiplexed.
  • the edge device 953 will train on local data, update its a local model and send the updated version of the model back to the server.
  • Communications between devices 953 and server 1221 are asynchronous, over network connections, and sometimes unreliable.
  • an edge device or client make a request for a training task, but does not receive a response for the server. This can be represented by a upward arrow, for instance near the beginning of cycle 1223, without a responsive downward arrow.
  • the client might request and receive an assignment and current model version, but never upload an updated model.
  • a client may participate multiple times during a given training cycle.
  • the server 1221 checks to make sure that updates received apply to a current version of the base model, that the edge device is not updating a deprecated base model version.
  • a cycle such as 1213, 1215 or 1217, eventually reaches a predetermined threshold.
  • This threshold could be expressed as a number of clients that have participated in the round, as a number of training samples processed in the updated models, or as an elapsed amount of time.
  • Each of the cycles corresponds to one round of the federated loop 915 that produces a new global model (959, which becomes 951), and to distribution to the edge devices of the updated, new model.
  • the edge devices can use the new model for predictions and training as additional data is collected.
  • the edge devices do not repeatedly train using old data that previously was used to train an updated model that was forwarded to the server 1221 for aggregation.
  • the process repeats, as depicted for three cycles in FIG. 12
  • the engineering challenges are significant.
  • One challenge arises from networking issues and latency of devices are joining and leaving the network.
  • Another challenge is that the mobile model is unquantized and includes on the order 20 megabytes of model parameters. It is useful to make sure that the model is not updated too often over cellular data connections. Updating also hits the mobile device’s power constraints, as training on a mobile phone is resource intensive and, therefore, power hungry. In some implementations, training is limited to times when the phone is plugged in, has a Wi-Fi connection and is not in otherwise use by the user.
  • asynchronous task management requires record keeping and keeping track of all of the training tasks and local updates in process numerous edge device. It also involves periodically performing aggregation and redeploying updated models. In addition to these engineering challenges, there are theoretical concerns, arising from classical statistics, that can only be overcome by empirical investigation.
  • FIGS. 13-14 illustrate a scatter plot of data from a so-called In‘n Out model that was trained to distinguish between photographs taken indoors and out of doors.
  • FIG. 13 plots a loss function, for which a lower value is better, except in case of overtraining.
  • FIG. 14 plots a likelihood of correct binary classification, for which a higher value is better.
  • the scatterplot in FIG. 13 graphs local losses versus global trading loss for a binary classification test model that deployed internally by Applicant.
  • dot 1353 is the global training loss of the original base model.
  • Other dots clumped to the left 1351 and scattered to the right 1355, 1359, are the local losses of that model trained on individual devices, which sent their models to a server (e.g., 1221).
  • the graph shows two sorts of bad results resulting from training on end devices with small sample sets. First, in some cases, e.g., 1359, the local losses exploding off to the right. This suggests that something has gone badly in training that that caused the gradient descent to jump out of a local minima found during the initial global training.
  • FIG. 14 depicts the corollary accuracy of the original based model.
  • the accuracy 1453 of the initial base model was roughly 90 percent for a binary classification problem.
  • the local accuracy 1455 is clustered near 50 percent.
  • the updates to the models that are sent back to the server for aggregation, when tested against samples held back for validation, have an accuracy that hovers around 50 percent, between 40 and 60 percent, which suggests that the local updates to the models are no better than random guesses at binary classification.
  • FIG. 15 is a conceptual depiction, appealing to intuition, why the bad models averaged might work well.
  • a base model in a two dimensional space 1513 with a good decision boundary that accurately classifies gray dots, below the line, against black dots, above the line.
  • this global model to two devices and train it on those devices, producing the decision boundaries illustrated upper and lower graphs 1515 and 1575.
  • two yellow dots 1536 and three new gray dots 1534 are added in the bottom half of 1515. The new dots have pulled the decision boundary down and to the right, separating the new dots.
  • the technology disclosed can be enhanced by putting a filter in front of model training, to try to constrain the sample population collected and used for training on edge devices, bringing it closer to the intended target population.
  • a filter at the FL aggregator to threshold out some updates that appear to have been trained on bad data.
  • the first filter can limit training to images of selfies, instead of exposing it to all kinds of images.
  • a face detector in front of the model does not treat sunrises or house plants as faces.
  • the edge devices are training on any image that has a face in it, which is mostly selfies but could be include some other images. That brings us closer to that target population.
  • the technology disclosed can be enhanced by filtering out some of the updates that appear to be very bad.
  • the training that produced wildly divergent updates potentially resulted from being exposed to bad training data or training data that has been mislabeled, such as a typo in a person’s weight or height.
  • some of these losses e.g., 1359, explode off to the right.
  • a second filter can eliminate those updates from being averaged into the model, where it appears that the local updated has jumped too far outside of our original local minima.
  • this corresponds to updated models that have very badly malformed decision boundaries, which could result from bad training data, such as mislabeled training data.
  • One measure would be a simple Euclidean distance across all the weights and a relative to the distribution of distances among local updates in a batch.
  • the distribution can be used to filter out updated modes that are very bad or divergent. This should allow us to restrict our aggregation by federated averaging to updated models that have been trained on a population of data that is similar to our target population.
  • Empirical results have been good. Internally research by applicant is showing. Actual test deployments also showing good federated learning. Despite the loss of classical IID, strong statistical guarantee, we end up with empirical results that are good. Of course, this depends on class size and sample size as well as hyper- parameters of the model. It also would be impacted by implementation of the filters described. The inventors have concluded that federated learning works and is a viable approach to machine learning for a range of health space tasks.
  • homo morphing encryption can be considered. This approach applies a series of computations to a cipher text and then decipher the results and ends up with the same results as if that series of computations had been applied to the original text.
  • homo morphing encryption may only work with linear transformations and linear approximations of non-linear transformations.
  • a convolutional neural network is a special type of neural network.
  • the fundamental difference between a densely connected layer and a convolution layer is this: Dense layers learn global patterns in their input feature space, whereas convolution layers learn local patters: in the case of images, patterns found in small 2D windows of the inputs.
  • This key characteristic gives convolutional neural networks two interesting properties: (1) the patterns they learn are translation invariant and (2) they can learn spatial hierarchies of patterns.
  • a convolution layer can recognize it anywhere: for example, in the upper-left comer.
  • a densely connected network would have to learn the pattern anew if it appeared at a new location. This makes convolutional neural networks data efficient because they need fewer training samples to learn representations they have generalization power.
  • a convolutional neural network learns highly non-linear mappings by interconnecting layers of artificial neurons arranged in many different layers with activation functions that make the layers dependent. It includes one or more convolutional layers, interspersed with one or more sub-sampling layers and non-linear layers, which are typically followed by one or more fully connected layers. Each element of the convolutional neural network receives inputs from a set of features in the previous layer.
  • the convolutional neural network learns concurrently because the neurons in the same feature map have identical weights. These local shared weights reduce the complexity of the network such that when multi-dimensional input data enters the network, the convolutional neural network avoids the complexity of data reconstruction in feature extraction and regression or classification process.
  • Convolutions operate over 3D tensors, called feature maps, with two spatial axes (height and width) as well as a depth axis (also called the channels axis).
  • depth axis also called the channels axis.
  • the dimension of the depth axis is 3, because the image has three color channels; red, green, and blue.
  • the depth is 1 (levels of gray).
  • the convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an output feature map.
  • This output feature map is still a 3D tensor: it has a width and a height.
  • Its depth can be arbitrary, because the output depth is a parameter of the layer, and the different channels in that depth axis no longer stand for specific colors as in RGB input; rather, they stand for filters. Filters encode specific aspects of the input data: at a height level, a single filter could encode the concept“presence of a face in the input,” for instance.
  • the first convolution layer takes a feature map of size (28, 28, 1) and outputs a feature map of size (26, 26, 32): it computes 32 filters over its input.
  • Each of these 32 output channels contains a 26 x 26 grid of values, which is a response map of the filter over the input, indicating the response of that filter pattern at different locations in the input. That is what the term feature map means: every dimension in the depth axis is a feature (or filter), and the 2D tensor output [:, :, n] is the 2D spatial map of the response of this filter over the input.
  • Convolutions are defined by two key parameters: (1) size of the patches extracted from the inputs - these are typically l x l, 3 x 3 or 5 x 5 and (2) depth of the output feature map - the number of filters computed by the convolution. Often these start with a depth of 32, continue to a depth of 64, and terminate with a depth of 128 or 256.
  • a convolution works by sliding these windows of size 3 x 3 or 5 x 5 over the 3D input feature map, stopping at every location, and extracting the 3D patch of surrounding features (shape (window height, window width, input depth)).
  • Each such 3D patch is ten transformed (via a tensor product with the same learned weight matrix, called the convolution kernel) into a ID vector of shape (output depth,). All of these vectors are then spatially reassembled into a 3D output map of shape (height, width, output depth). Every spatial location in the output feature map corresponds to the same location in the input feature map (for example, the lower-right comer of the output contains information about the lower-right comer of the input). For instance, with 3 x 3 windows, the vector output [i, j, :] comes from the 3D patch input [i-1 : i+1, j-1 : J+l, :]. The full process is detailed in FIG. 11.
  • the convolutional neural network comprises convolution layers which perform the convolution operation between the input values and convolution filters (matrix of weights) that are learned over many gradient update iterations during the training.
  • (in. n) be the fdter size and W be the matrix of weights
  • a convolution layer performs a convolution of the W with the input X by calculating the dot product W ⁇ x + b, where x is an instance of X and b is the bias.
  • the step size by which the convolution fdters slide across the input is called the stride, and the filter area (in c n) is called the receptive field.
  • a same convolution filter is applied across different positions of the input, which reduces the number of weights learned. It also allows location invariant learning, i.e., if an important pattern exists in the input, the convolution filters learn it no matter where it is in the sequence. Trainins a Convolutional Neural Network
  • FIG. 12 depicts a block diagram of training a convolutional neural network in accordance with one implementation of the technology disclosed.
  • the convolutional neural network is adjusted or trained so that the input data leads to a specific output estimate.
  • the convolutional neural network is adjusted using back propagation based on a comparison of the output estimate and the ground truth until the output estimate progressively matches or approaches the ground truth.
  • the convolutional neural network is trained by adjusting the weights between the neurons based on the difference between the ground tmth and the actual output. This is mathematically described as:
  • the training rule is defined as:
  • the arrow indicates an update of the value; is the target value of neuron m ;
  • the intermediary step in the training includes generating a feature vector from the input data using the convolution layers.
  • the gradient with respect to the weights in each layer, starting at the output, is calculated. This is referred to as the backward pass, or going backwards.
  • the weights in the network are updated using a combination of the negative gradient and previous weights.
  • the convolutional neural network uses a stochastic gradient update algorithm (such as ADAM) that performs backward propagation of errors by means of gradient descent.
  • ADAM stochastic gradient update algorithm
  • sigmoid function based back propagation algorithm is described below:
  • n is the weighted sum computed by a neuron.
  • the algorithm includes computing the activation of all neurons in the network, yielding an output for the forward pass.
  • the activation of neuron m in the hidden layers is described as:
  • nm k 1 mk ok
  • the weights of the output layer are updated as:
  • the weights of the hidden layers are updated using the learning rate ⁇ as:
  • the convolutional neural network uses a gradient descent optimization to compute the error across all the layers.
  • the loss function is defined as l for the cost of predicting y when the target is y, i.e. I (y, y).
  • the predicted output y is transformed from the input feature vector x using function f.
  • Function/is parameterized by the weights of convolutional neural network, i.e. y fw (x).
  • the gradient is calculated using only selected data pairs fed to a Nesterov’s accelerated gradient and an adaptive gradient to inject computation efficiency.
  • the convolutional neural network uses a stochastic gradient descent (SGD) to calculate the cost function.
  • SGD stochastic gradient descent
  • a SGD approximates the gradient with respect to the weights in the loss function by computing it from only one, randomized, data pair, , described as:
  • j s the learning rate
  • is the momentum
  • is the current weight state before updating.
  • the convergence speed of SGD is approximately when the learning rate are reduced both fast and slow enough.
  • the convolutional neural network uses different loss functions such as Euclidean loss and softmax loss.
  • an Adam stochastic optimizer is used by the convolutional neural network.
  • Flea end users communicate and collaborate with one another to build and update models, effecting a lateral tensor ensemble of user models, in a one-to-one manner.
  • the end users could also laterally organize their own trials and choose a central FL aggregator to which to send the gradients and get the averaged gradients back in a distributed fashion.
  • tensors are configured to function tensorial handshakes, with one-to-one tensors for distributed clinical trials. End users can also laterally organize their own trials and choose a central FL aggregator to send the gradients and get the averaged gradients back in a distributed fashion.
  • Flea end users communicate and collaborate with one another to build and update models of computation in tensor economy in a many-to-one manner.
  • Tensors for distributed clinical trials Each end user can be called upon several sponsors to conduct several trials at the same time and can use the same underlying data to create new tensors.
  • Flea end users communicate and collaborate with one another to build and update models in autonomous tensor ensembles, in a many-to-many manner.
  • devices without human intervention will start to collect information between each other. These will just behave like many insect species, including ants and bees, who work together in colonies, and their cooperative behavior determines the survival of the entire group.
  • the group operates like a single organism, with each individual in a colony acting like a cell in the body and becomes a "superorganism”.
  • Federated Deep learning only needs these small players like insects, ants, critters and bees to create big and smart things with immense, complex and adaptive social power and increasingly missions.
  • the group operates like a single organism, with each individual in a colony acting like a cell in the body and becomes a "superorganism.”
  • Federated deep learning algorithm requires these small players like insects, ants, critters and bees to create big and smart things with immense, complex and adaptive social power and increasingly missions.
  • Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner.
  • federated learning a global protocol is sent from one central authority to many participants who collect information on their edge device, label the information and compute it locally, after which they sent the tensors to the central FL aggregator of the sponsor. They aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.
  • FIG. 3A is a diagram illustrating an example use case of a traditional clinical trial where the one-to- many tensors for distributed clinical trials could be applied.
  • tensor ensembles are vertical in a one-to-many structure, called Vertical Tensor Ensembles.
  • Most clinical trials are centralized which consist of one sponsor who centrally produces the protocol and uses several sites where many end users can go for physical exams and laboratory tests. This procedure is time consuming and costly and mostly outsourced to Contract Research Organizations (CROs).
  • CROs Contract Research Organizations
  • With Federated Learning a global protocol is sent from one central authority to many end users who collect information on their edge devices, e.g. smartphones, label the information and compute it locally, after which the outcome tensors are sent to the central FL aggregator of the sponsor.
  • the central authority aggregates all the tensors and then reports the updated and averaged tensors back to each of the end users. This one-to-many tensors are configured to conduct distributed clinical trials.
  • FIG. 3B is a diagram illustrating an example of using a federated learner system to conduct one-to- many tensor exchanges for distributed clinical trials, using so-called Fleas.
  • sponsor of a digital clinical trial typically a data trial, announces the data trial directly to end users via application program installed on end users’ devices.
  • Each end user device includes a federated learner.
  • the federated learners are configured to share tensors with a centralized FL aggregator.
  • the centralized FL aggregator is configured to share with the sponsor only a global model, not data or model updates from individual end users.
  • sponsor of a data trial announces the trial directly to end users. End users are free to choose from many specific sites to participate the data trial. Each of these specific sites are configured to be connected with a CRO which holds FL aggregator. Similarly, federated learners of devices are configured to share tensors on data with the CRO FL aggregator. The CRO centralized FL aggregator is configured to share with the sponsor only a global model, not data or model updates from individual end users.
  • FIG. 4 is a diagram illustrating an example FL aggregator.
  • Flea is configured to be embedded in various edge devices belonging to end users.
  • Such edge devices can be but not limited to any electronic device which is capable of connecting to internet or similar web.
  • An FL aggregator is designed as a federated learning back-end responsible to collect model updates and evaluations sent from Flea end users which requires high availability, organize models that can be updated from end user side updates along with the operations required to perform these updates, admit or reject proposed updates from each end user based on criteria such as history of end user’s submissions (e.g. an end user’s credibility score) as well as end user sent metadata.
  • the FL aggregator aggregates admissible end user updates into a single update to each model and redistributes updated models to the end user side.
  • the FL aggregator reports aggregations of model evaluations based on similar admissibility criteria as those used in updates, It conducts tensorial handshakes, which are protocols that govern the exchange of information between federated learners running on end user devices and the FL aggregator, or amongst collectives of federated learners, on the initiative of end users themselves.
  • FIG. 5 is a diagram illustrating an example use case of tensor globalization of a federated learner system.
  • a biotech company that has a federated learner model trained for Parkinson’s disease.
  • most clinical trials are centralized. They consist of one sponsor who centrally produces the protocol and uses several sites where the many participants can go for exams and tests. This procedure is time consuming and costly and mostly outsourced to Clinical Research Organizations (CROs).
  • CROs Clinical Research Organizations
  • New alternatives that are now becoming available as the technologies disclosed, which manipulate tensors as proxies for data evolve.
  • the distributed structure of a clinical trial instead of flat, can be curved into an n-dimensional manifold or surface. This also changes the nature of models. Models themselves are simply tensor ensembles. As edge computational units become more powerful, each computational unit on the edge can house its own model.
  • the FL aggregator is configured to be provided at least a federated learner model and a multidimensional matrix.
  • the tensors coming out of that model are to be averaged with the tensors of biotech model.
  • the biotech company gets the global model back.
  • Another example use case applies the technology disclosed to an application program used by millions of members who regularly use the application for a function, leaving digital traces that reveal the members’ interests in a data trail. For instance, someone may look for restaurants.
  • the tech company requires user feedback in order to improve the quality of its prediction model to serve users better.
  • the tech company gives this input to FL aggregator and gets the tensors back, asynchronously or synchronously. Doing so, the raw data of end users is not used, and privacy of end users is not invaded.
  • the tech company only gets a global model of the interests of the entire population and a more precise model in different behavioral segments that enables them to target specific predicted actions.
  • the company can also share either the global tensors or the precision tensors, should they want to. No data is transported, inferences can be drawn by applying the tensors, without access to underlying user data.
  • FIGS. 6A-6B are diagrams illustrating example use case of a federated learner system in a linear training trial and in an adaptive and continuously learning distributed trial, comprising federated learners and FL aggregator applied to collection and analysis of data trial.
  • An end user can use a site of their choice, provided the site is also chosen with the trial.
  • the data on end user’s phone is used for training the model relevant to the end point of the trial. Since the analytics and model are not an after-trial completion artifact but living and real-time with the federated learner, administrators of the trial can quickly adapt to issues of bias, confounding influences, etc. This speeds up trials. End users can be virtual or onsite. Additionally, trials can collect real world data from user devices that provides more dimensions for training.
  • FIG. 7 is a diagram illustrating an example use case of a federated learner system, including one or more simulated control arms for the application of data trial.
  • So-called synthetic control arms are configured to operate via collected data at large scale over an existing population. See, e.g., Goldsack, Syntehtic control arms can save time and money in clinical trials Feb.
  • synthetic control arms model those comparators using real-world data that has previously been collected from sources such as health data generated during routine care, including electronic health records, administrative claims data, patient-generated data from fitness trackers or home medical equipment, disease registries, and historical clinical trial data, etc. This can be done via a federated learning model with edge devices sending up gradients to at least one FL aggregator.
  • Synthetic control arms bring clear benefits to pharmaceutical industry and application. It can reduce or even eliminate to enroll control end users, improve efficiency, efficacy and consistency. By reducing or eliminating the need to enroll control end users, a synthetic control arm can increase efficiency, reduce delays, lower trial costs, and speed up life-saving therapies to market.
  • This kind of hybrid trial design presents a less risky way for sponsors to introduce real-world data elements into regulatory trials and can also reduce the risk of late stage failures by informing go or no-go development decisions. Placebo-fear is one of the top-reasons patients choose not to participate in clinical trials. This concern is amplified when an individual’s prognosis is poor and when current care is of limited effectiveness.
  • Using a synthetic control arm instead of a standard control arm ensures that all participants receive the active treatment, eliminating concerns about treatment/placebo assignment.
  • Use of a synthetic control arm addresses an important participant concerns and removes an important barrier to recruitment.
  • the use of simulated control arms can also eliminate the risk of unblinding when patients lean on their disease support social networks posting details of their treatment, progress, and side effects that could harm the integrity of the trial.
  • the federated learner system can be utilized for tensorial twins.
  • the tensorial twin represents the nearest-neighbor patient, derived from algorithmic matching of the maximal proportion of data points using a subtype of AI known as nearest-neighbor analysis.
  • the nearest neighbor is identified using AI analytics for approximating a facsimile, another human being as close as possible to an exact copy according to the patient’s characteristics to help inform best treatment, outcomes, and even prevention.
  • Perturbed Subspace Method employs a predicted probability of group membership, e.g., treatment or control group, based on observed predictors, usually obtained from logistic regression to create a counterfactual group. Propensity scores may also be used for matching or as covariates— alone or with other matching variables or covariates.
  • group membership e.g., treatment or control group
  • Propensity scores may also be used for matching or as covariates— alone or with other matching variables or covariates.
  • every cohort can be configured to be adaptive in a very complex way because the members with federated learner could send up delta. In this case, it continuously makes the relationship between them and the cohort tenuous to the point that they redefine normality and start to act as patients in silico, preparing for a stochastic forward model of precision medicine.
  • the federated learner system may use fuzzy tensor swarm. Devices which used to be responsible only for the gathering of data are to be configured to ran downstream computations. Such configuration can be applied to various scenarios. For example, heart rate monitors, automatic blood pressure pumps, weather micro-stations, etc. Computational capacity as well as speed are increased drastically. With the advent of higher-bandwidth connectivity between such devices (due, for example, to 5G), the old paradigm of requiring these devices to send data to a central location where an archaic batch runner produces an updated data processor and ships it back to each device individually is becoming outmoded. Incurring a system-wide overhead when heart rate monitor can update its own data processing algorithms makes no sense any more.
  • each device is to be deployed with its own adaptive data processing module, placed within a network mesh of devices, and equipped with an ontology (e.g., protocol-driven) describing to it the kind of information it can derive from each of its neighbors in the mesh.
  • an ontology e.g., protocol-driven
  • Each device in the mesh is configured to make available to its neighbors any of its primitives, as well as data-derived updates to itself.
  • an ensemble of interconnected devices of which each with an intelligent data processing module and an ontological protocol, form a fuzzy tensor swarm.
  • this fuzzy tensor swarm the emergent behavior is configured at a minimum equivalent in functionality, although may not be optimal in terms of latency and overhead, to what is possible with a centralized model building workflow.
  • each device can be connected, either physically or not, and stream data to millions of other smart data capture devices that can create live models of their vertical worlds.
  • the enriched information from millions of graphics processing units can be feedbacked to other objects or their carbon, silicon or neuron users.
  • Passive collection can be monetized and become the service industry of virtual reality (VR) which can create parallel existential dimensions as a service.
  • VR virtual reality
  • a federated learner model can be applied to federated learning and adversarial rapid testing of clinical data and standards.
  • Data training done on the device close to the data mitigates privacy concerns.
  • the trained models basically try to predict when symptoms happen, and the user can be enabled to verify.
  • This Generative Adversarial Models (GAN) can then be used to generate Real World Evidence (RWE) backed patient simulations to validate clinical trials, data, anomaly detection.
  • RWE Real World Evidence
  • Pharmaceutical company can be enabled to license these models out as new revenue. End users’ simulated data is predicted or inferred on probabilistic risk calculators, based on their genetics, exposome, pharmacome and other omics data. Once these models are built, pharmaceutical company can also use the models in other data trials to do ground work analysis.
  • Clinical trial can go out with consumer health care mobile devices, e.g., apple watch, where participants can confirm or deny when the GAN thinks they may have a symptom happen soon.
  • the model gets trained on end user devices and only the model is sent back to the servers. The models are then tested in other patients and verified over and over.
  • This model of symptoms can be used to simulate existing clinical trial around similar drag. If it can reproduce the study results, then these models can be used in dashboard around these types of drags.
  • the federated learning model can be applied to automatic qualification of participants for clinical trials and remove the expensive human verification process.
  • the federated learning model can be applied to decentralized patient registries. Such registry is on the edge and fragmented, but comes together on an“ask” command by authorized personnel, e.g., the end user.
  • the federated learning model can be applied to configure peer to peer health data comparator to compare health condition of one end user against another without sharing any personal data.
  • the federated learning model can be applied to distribute second opinion.
  • One end user can be enabled to share his or her personal model with a new doctor or citizen scientist without giving away any data. Tensors are compared and not the real data.
  • the federated learning model can be applied to health anomaly detection via model anomaly detection. Tensors can be configured to indicate that there is an out of bounds anomaly with the population. Once some issues identified, it can escalate to a doctor. [00146]
  • the federated learning model can be applied to health fingerprint.
  • the model built on end user data can be a unique signature of the end user. It evolves as the health condition of the end user evolves.
  • the model can be used as an identity in time.
  • FIG. 18 is a simplified block diagram of a computer system 1800 that can be used to implement the technology disclosed.
  • Computer system typically includes at least one processor 1872 that communicates with a number of peripheral devices via bus subsystem 1855.
  • peripheral devices can include a storage subsystem 1810 including, for example, memory subsystem 1822 and a file storage subsystem 1836, user interface input devices 1838, user interface output devices 1876, and a network interface subsystem 1874.
  • the input and output devices allow user interaction with computer system.
  • Network interface subsystem provides an interface to outside networks, including an interface to corresponding interface devices in other computer systems.
  • User interface input devices 1838 can include a keyboard; pointing devices such as a mouse, trackball, touchpad, or graphics tablet; a scanner; a touch screen incorporated into the display; audio input devices such as voice recognition systems and microphones; and other types of input devices.
  • pointing devices such as a mouse, trackball, touchpad, or graphics tablet
  • audio input devices such as voice recognition systems and microphones
  • use of the term“input device” is intended to include all possible types of devices and ways to input information into computer system.
  • User interface output devices 1876 can include a display subsystem, a printer, a fax machine, or nonvisual displays such as audio output devices.
  • the display subsystem can include a cathode ray tube (CRT), a flat- panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image.
  • the display subsystem can also provide a non-visual display such as audio output devices.
  • output device is intended to include all possible types of devices and ways to output information from computer system to the user or to another machine or computer system.
  • Storage subsystem 1810 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. These software modules are generally executed by processor alone or in combination with other processors.
  • Memory used in the storage subsystem can include a number of memories including a main random access memory (RAM) 1832 for storage of instructions and data during program execution and a read only memory (ROM) 1834 in which fixed instructions are stored.
  • the file storage subsystem 1836 can provide persistent storage for program and data files, and can include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges.
  • the modules implementing the functionality of certain implementations can be stored by file storage subsystem in the storage subsystem, or in other machines accessible by the processor.
  • Bus subsystem 1855 provides a mechanism for letting the various components and subsystems of computer system communicate with each other as intended. Although bus subsystem is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple busses.
  • Computer system itself can be of varying types including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a widely -distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system depicted in FIG. 18 is intended only as a specific example for purposes of illustrating the technology disclosed. Many other configurations of computer system are possible having more or less components than the computer system depicted in FIG. 18.
  • the computer system 1800 includes GPUs or FPGAs 1878. It can also include machine learning processors hosted by machine learning cloud platforms such as Google Cloud Platform, Xilinx, and Cirrascale. Examples of deep learning processors include Google’s Tensor Processing Unit (TPU), rackmount solutions like GX4 Rackmount Series, GX8 Rackmount Series, NVIDIA DGX-1, Microsoft’ Stratix V FPGA, Graphcore’s
  • GPUs or FPGAs 1878 can also include machine learning processors hosted by machine learning cloud platforms such as Google Cloud Platform, Xilinx, and Cirrascale. Examples of deep learning processors include Google’s Tensor Processing Unit (TPU), rackmount solutions like GX4 Rackmount Series, GX8 Rackmount Series, NVIDIA DGX-1, Microsoft’ Stratix V FPGA, Graphcore’s
  • IPU Intelligent Processor Unit
  • NVIDIA NVIDIA
  • NVIDIA S DRIVE PX
  • NVIDIA s JETSON TX1/TX2 MODULE
  • Intel s Nirvana, Movidius VPU, Fujitsu DPI, ARM’s DynamicIQ, IBM TrueNorth, and others.
  • One disclosed implementation includes a system for federated learning.
  • the system includes multiple edge devices of end users, coupled to a communication network.
  • the edge devices include a memory that stores program instructions for a federated learner, recorded user data, and a tensor of model parameters of a deep neural network, a“DNN”.
  • the federated learner executes on a processor of the edge device.
  • the federated learner is configured to record end user data, predict characteristics of the end user from the recorded end user data by applying the DNN, and receive updates from the end user that correct the predicted end user characteristics.
  • the federated learner is further configured to perform update training of the DNN using the recorded user data and the corrected user characteristics, thereby producing a modified tensor of updated model parameters and send at least a modified part of the modified tensor to an FL aggregator.
  • the system further includes a base model tensor of model parameters for the DNN running on the edge devices, trained to predict characteristics of the end users from the recorded end user data, provided to the edge devices.
  • the FL aggregator is coupled to a communication network and includes a federated learner.
  • the federated learner is configured to receive modified tensors from at least some of the edge devices, aggregate the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor, and distribute the new version of the base model tensor to the edge devices.
  • the federator learner can be implemented in the FL aggregator as in-line code, can be implemented in a separate module or some combination of the two coding strategies.
  • This system implementation and other systems disclosed optionally include one or more of the following features.
  • System can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
  • the recorded end user data can include a picture captured by the edge device, an audio recording of the end user captured by the edge device, or both.
  • the predicted end user characteristics include age, height and weight. Sex also can be predicted and BMI calculated from a combination of predicted features.
  • the predicted end user characteristics can include mood.
  • a face detector can be applied to determines whether a face appears in the picture, limit update training of a facial interpretation model, avoiding, for instance, training on cat or sunset pictures.
  • the federated learner can be configured to filter out spurious updates by calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor, constructing a distribution of distance measures in an updating cycle and rejecting from aggregation with the current version of outlier modified tensors. That is, production of the new base model version, will not be based on rejected tensors having a distance measure that are outliers from the distribution.
  • An outlier can be determined using a statistical measure such as three standard deviations or the like.
  • implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform actions of the system described above.
  • instructions executable by a processor to perform actions of the system described above.
  • the technology disclosed presents methods of operating the edge devices, the server or FL aggregator device, or both.
  • One method implementation disclosed involves federated learning utilizing computation capability of edge devices.
  • the edge devices used to practice this method include a memory, storing store program instructions for a federated learner, recorded user data and a tensor of model parameters of a deep neural network, a“DNN”.
  • the federated learner executes on a processor of the edge device, and is configured to: record end user data; predict characteristics of the end user from the recorded end user data by applying the DNN; receive updates from the end user that correct the predicted end user characteristics, and perform update training of the DNN using the recorded user data and the corrected user characteristics.
  • This method implementation includes sending a current base model tensor of the model parameters to the edge devices and receiving modified tensors from at least some of the edge devices, based on at least user data recorded by the edge devices and corrected user characteristics received by the edge devices from end users. It can involve checking to determine that the modified tensors received apply to the current version of the base model tensor, not to a prior, outdated version. Because updating is an asynchronous process and user behavior is not under the system’s control, it is expected that some users will not participate in a cycle, some edge devices will not receive the current version of the base model tensor, and some edge devices will offer updates to an expired or outdated version of the base model tensor.
  • This method further includes aggregating the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor and distributing the new version of the base model tensor to the edge devices.
  • the receiving, aggregating and distributing actions are repeated for at least ten cycles. These actions may be repeated 50 or 100 or 1000 times or more. The cycles of the FL aggregator and its components will repeat more times than most users participate in collecting data and retraining base models.
  • the recorded end user data includes a frontal face picture captured by the edge device
  • the and the predicted end user can include characteristics include age, height and weight.
  • the method can further include constructing an initial current version of the base model from a generic face recognition model, with additional layers added and training applied with ground tmth for the age, height and weight of persons in at least some frontal face pictures. This initial current version is prepared before the edge devices make available any recorded images or corrected user characteristics.
  • the method can include predicting the end user’s mood.
  • the method can further include filtering before aggregating, such as by calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor and constructing a distribution of distance measures in an updating cycle. As described in more detail above, this distribution can be used to reject at least one modified tensor from aggregation, as an outlier from the distribution.
  • filtering before aggregating such as by calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor and constructing a distribution of distance measures in an updating cycle. As described in more detail above, this distribution can be used to reject at least one modified tensor from aggregation, as an outlier from the distribution.
  • Another method implementation of the technology disclosed is presented from the perspective of an edge device contributing to federated learning.
  • the edge device cooperates with an FL aggregator that is configured to receive modified tensors from a plurality of edge devices, aggregate the modified tensors with a current version of a base model tensor by federated learning to produce a new version of the base model tensor, and distribute the new version of the base model tensor to the edge devices.
  • This method includes the edge device receiving a version of the base model, including a tensor of model parameters of a deep neural network, a“DNN” and recording end user data.
  • the method includes predicting characteristics of the end user from the recorded end user data by applying the DNN and causing display of the predicted characteristics to the end user. Responsive to the display, the method includes receiving updates from the end user that correct the predicted end user characteristics.
  • the edge device performs update training of the DNN on the edge device, using the recorded user data and the corrected user characteristics, to produce a modified tensor of updated model parameters.
  • the method further includes sending at least a modified part of the modified tensor to an FL aggregator and receiving new version of the base model tensor from the FL aggregator, after the FL aggregator aggregated modified tensors from a plurality of edge devices with the base model by federated learning.
  • the recording, predicting, receiving updates, performing, and sending actions are repeated by the edge device in at least five cycles.
  • the actions can be repeated in at least 10 or 50 cycles or even 100 cycles.
  • An edge device such as a mobile phone carried by an end user, is unlikely to participate in all of the cycles managed by the FL aggregator, unless data is being relayed automatically to and processed by the edge device, or an app collects data from the user on a daily basis.
  • Examples of personal devices that are capable of automatically relaying data to a personal device include a blood glucose monitor, a pace maker, a heart rate monitor, an exercise monitor, a fall monitor, a pulse oximeter, a scale (with or without body fat estimation), and a breathing assistance device. Use of such devices can result in more frequent participation by the edge device in training cycles, even in 1,000 cycles or more.
  • Examples of applications that collect data from the user on a daily basis include diet or consumption logging applications, exercise applications and meditation applications.
  • the predicted end user characteristics can include age, height and weight.
  • the predicted end user characteristics can include mood.
  • the method can further include filtering of images before using the images for update training.
  • a face detector can be applied to determine whether a face appears in the picture, before performing update training using the picture. This can prevent training against pictures of kittens and sunsets, when the system is designed to interpret human faces.
  • the technology disclosed can be practiced as a system, method, or article of manufacture.
  • One or more features of an implementation can be combined with the base implementation. Implementations that are not mutually exclusive are taught to be combinable.
  • One or more features of an implementation can be combined with other implementations. This disclosure periodically reminds the user of these options. Omission from some implementations of recitations that repeat these options should not be taken as limiting the combinations taught in the preceding sections - these recitations are hereby incorporated forward by reference into each of the following implementations.
  • One disclosed implementation may include a tangible non-volatile computer readable storage media loaded with computer program instructions that, when executed on a server, cause a computer to implement any of the methods described earlier.
  • Another disclosed implementation may include a server system including one or more processors and memory coupled to the processors, the memory loaded with instructions that, when executed on the processors, cause the server system to perform any of the methods described earlier.
  • This system implementation and other systems disclosed optionally can also include features described in connection with methods disclosed.
  • alternative combinations of system features are not individually enumerated.
  • Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Pathology (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Method and system with federated learning model for health care applications are disclosed. The system for federated learning comprises multiple edge devices of end users, one or more federated learner update repository, and one or more cloud. Each edge device comprises a federated learner model, configured to send tensors to federated learner update repository. Cloud comprises a federated learner model, configured to send tensors to federated learner update repository. Federated learner update repository comprises a back-end configuration, configured to send model updates to edge devices and cloud.

Description

SYSTEM AND METHOD WITH FEDERATED LEARNING MODEL
FOR MEDICAL RESEARCH APPLICATIONS
PRIORITY APPLICATIONS
[0001] This application claims priority to or the benefit of US Patent Application No. 16/816,153, titled, “SYSTEM AND METHOD WITH FEDERATED LEARNING MODEL FOR MEDICAL RESEARCH
APPLICATIONS,” filed March 11, 2020 (Attorney Docket No. DCAI 1008-2), which claims the benefit of US Provisional Patent Application No. 62/816,880 titled,“SYSTEM AND METHOD WITH FEDERATED
LEARNING MODEL FOR MEDICAL RESEARCH APPLICATIONS,” fried March 11, 2019 (Attorney Docket No. DCAI 1008-1); and US Provisional Patent Application No. 62/942,644 titled,“SYSTEMS AND METHODS OF TRAINING PROCESSING ENGINES,” filed December 2, 2019 (Attorney Docket No. DCAI 1002-1). The above applications are hereby incorporated by reference for all purposes.
INCORPORATIONS
[0002] The following materials are incorporated by reference as if fully set forth herein:
[0003] U.S. Provisional Patent Application No. 62/883,639, titled“FEDERATED CLOUD LEARNING SYSTEM AND METHOD,” filed on August 6, 2019 (Atty. Docket No. 396892-991101);
[0004] U.S. Provisional Patent Application No. 62/481,691, titled“A METHOD OF BODY MASS INDEX PREDICTION BASED ON SELFIE IMAGES,” filed on April 5, 2017;
[0005] U.S. Provisional Patent Application No. 62/671,823, titled“SYSTEM AND METHOD FOR
MEDICAL INFORMATION EXCHANGE ENABLED BY CRYPTO ASSET,” filed on May 15, 2018;
[0006] Chinese Patent Application No. 201910235758.60, titled“SYSTEM AND METHOD WITH
FEDERATED LEARNING MODEL FOR MEDICAL RESEARCH APPLICATIONS,” filed on March 27, 2019;
[0007] Japanese Patent Application No. 2019-097904, titled“SYSTEM AND METHOD WITH
FEDERATED LEARNING MODEL FOR MEDICAL RESEARCH APPLICATIONS,” filed on May 24, 2019; and
[0008] U.S. Nonprovisional Patent Application No. 15/946,629, titled“IMAGE-BASED SYSTEM AND METHOD FOR PREDICTING PHYSIOLOGICAL PARAMETERS,” filed on April 5, 2018.
TECHNICAL FIELD
[0009] The disclosed system and method are in the field of machine learning. To be more specific, in the field of federated machine learning utilizing computation capability of edge devices and a federated learning (“FL”) aggregator, which is typically cloud-based, relative to the edge devices. In this context, edge devices typically are mobile devices, but also can include nodes that aggregate data from multiple users.
BACKGROUND
[0010] Traditional software (1.0) uses declarative inputs and follows deterministic trees of logic, but machine learning (2.0) deals with noisy inputs and uses probabilities. Since the beginning of epistemology, there have been two theories, top-down (Plato theory) and bottom-up (Aristotle theory). Top-down deep learning starts from a theory, not from the data. Bayesian logic combines generative models and probability theory to calculate just how likely it is that the particular answer is true given the data. Bottom up deep learning starts from the data, not the theory. It consists of labeling large amounts of data (both“right” and“wrong” data) to determine association and build a foundation for pattern recognition. It can even learn unsupervised, detecting patterns in data with no labels at all and identify clusters (factor analysis). [0011] The year of 2013 to 2016, the era of the renewed interest in machine learning technology, was followed by the era of deep learning technology, spanning 2016 to the priority filing of this application in 2019.
2019 leads us to the next deep dive of intelligent and/or neuromorphic computing, the federated learning technology.
[0012] With machine learning, humans enter input examples and desired output, sometimes called ground truth, and a system leams. Thereafter, output comes from a trained classifier or network. The classifier or network does not have to be programmed directly, but the semantics by which it is generated are programmed. This way, humans train a classifier or network to encode complex behavior with parameters that can be thought of as rules of low complexity. Although the algorithm does not need to be programmed, these neuron networks still need to be trained by humans. They need the input data to be presented in a structured way. Hence, there is a lot of human- aided labor involved in collecting, cleaning, and labeling data. Human talent also is applied to evaluating a model and steering its training in the right direction.
[0013] Deep learning applies multi-layered networks to data. While training can be automated, there remains the problem of assembling training data in the right formats and sending data to a central node of computation with sufficient storage and compute power. In many fields, sending personally identifiable, private data to any central authority causes worries about data privacy, including data security, data ownership, privacy protection and proper authorization and use of data.
[0014] In the following discussion, the technology disclosed includes systems and methods for federated learning.
SUMMARY
[0015] In one application of the technology disclosed, a crowd of end users runs application programs on mobile devices that collect data, train, compute, and evaluate data stored on the mobile devices. The original data does not leave the device where it is stored, that is used to compute an updated model. Devices later federate data globally by sending“derived insights” in the form of updated model parameters, sometimes called tensors, to an FL aggregator where all these derived insights are combined. Devices then receive from the FL aggregator an updated matrix or model which can improve local prediction of these devices. This is repeated in cycles.
[0016] With federated learning, a device on the edge can send de-identified updates to a model instead of sending over raw data such as images or audio, that would then be used to update the model. As a result, federated learning greatly privacy concerns, since the raw data never leaves these devices. Federated learning reduces data ownership concerns, as end users are enabled to opt in or out to share raw data and parameter updates created in their devices. Federated learning further greatly reduces security concern, because there is no single point at which a security breach can compromise a large body of data - hackers cannot hack millions of mobile devices that store the raw data.
[0017] The machine learning process can be described as five steps. First, a cost function, e.g., how well the network solves the problem, which the system should strive to minimize, is defined. Second, the network is mn and see how it does, as measured by the cost function. Third, the values of the network parameters are adjusted, and the network is run again. Fourth, the difference between successive results is the direction or slope in which the result of applying network moved between the trials. This process is called a gradient. Fifthly, if the slope is downhill the parameters are adjusted to move the result changed in downhill direction, and if the slope is uphill, the parameters are changed to move the result in the opposite direction. Steps three to five are repeated. They may be repeated a fixed number of time or until there is limited or no improvement.
[0018] The technology disclosed includes a system for federated learning utilizing computation capability of edge devices in communication with an FL aggregator. The system comprises multiple edge devices of end users, one or more federated learner update repositories, and one or more FL aggregators. Each edge device comprises a federated learner model, configured to send tensors to at least one FL aggregator or federated learner update repository.
[0019] An FL aggregator includes a federated learner, which may be part of the FL aggregator or a separate module. The FL aggregator and/or federated learner is configured to send tensors to the federated learner update repository. Federated learner update repository comprises a back-end configuration, configured to send model updates to edge devices. Of course, description of constituent parts of the FL aggregator is for purposes of explanation and not to constrain the configuration or scope of the technology disclosed.
[0020] The technology disclosed includes is a method of federated learning utilizing computation capability of edge devices. The method comprises sending out tensors by multiple edge devices with federated learning models, receiving tensors by an FL aggregator including a federated learning update repository from the edge devices, distributing updated models from the federated learning update repository to the edge devices, and the edge devices using the updated models.
[0021] The technology disclosed includes a federated learning system comprising multiple federated learners, whereas each federated learner is configured to be an end user side library, built for an edge device environment. Such federated learners on edge devices update model parameters based on raw data and ground truth collected in the edge device. The edge devices perform model post-processing and share updated parameters with a central federated learner update repository. The edge devices can download of updated models. They can evaluate the updated models against locally held data, preferably data withheld from training, and report evaluations to the repository or FL aggregator.
[0022] The technology disclosed includes a federated learner update repository, sometimes described as a component of an FL aggregator, comprising a federated learning back-end that collects model updates and evaluations from Flea end users. The FL aggregator can be a high availability system. It organizes models that can be updated based on data from end user edge device updates and performs operations required to make these updates, such as admitting or rejecting proposed updates from end users based. Such determination can be based on criteria and metadata sent by end user. The FL aggregator combines admissible end user updates into an overall update and redistributes the updated model to edge devices.
[0023] This summary is provided to efficiently present the general concept of the technology disclosed and should not be interpreted as limiting the scope of the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] For purpose of facilitating understanding of the embodiments, the accompanying drawings and description illustrate embodiments thereof, its various structures, construction, method of operation, and many advantages that may be understood and appreciated. According to common practice, the various features of the drawings are not drawn to scale. To the contrary, the dimensions of the various features are expanded or reduced for the purpose of explanation and clarity.
[0025] FIG. 1 is a flow chart illustrating an example core template of machine learning workflow.
[0026] FIG. 2 is a diagram illustrating an example federated learning model with multiple edge devices and a central FL aggregator.
[0027] FIG. 3A is a diagram illustrating an example use case of a federated learner system, comprising one- to-many tensors for distributed clinical trials.
[0028] FIG. 3B is a diagram illustrating an example use case of a federated learner system, comprising Fleas for distributed clinical trials. [0029] FIG. 4 is a diagram illustrating an example FL aggregator.
[0030] FIG. 5 is a diagram illustrating an example use case of tensor globalization of a federated learner system.
[0031] FIG. 6A and FIG. 6B are diagrams illustrating an example use case of a federated learner system in a linear training trial and in an adaptive and continuously learning distributed trial, comprising federated learners and FL aggregator for application of data trial.
[0032] FIG. 7 is a diagram illustrating an example use case of a federated learner system, comprising simulated control arms for trials.
[0033] FIG. 8 is a diagram illustrating centralized data collection and training, leading to deployment to edge devices.
[0034] FIG. 9 is a diagram illustrating edge device update training followed by centralized aggregation of the updated models.
[0035] FIG. 10 is a diagram illustrating more detail of data at edge devices during update training.
[0036] FIG. 11 is a graphic user interface illustrating use of a selfie to estimate age, height and weight, from which body mass index (BMI) can be calculated.
[0037] FIG. 12 is a simplified message diagram depicting exchanges between four edge devices and an FL aggregator, over three cycles of model updating.
[0038] FIGS. 13-14 are scatter plots from edge device training on small samples and a centrally model trained on a large sample.
[0039] FIG. 15 is a conceptual diagram illustrating updating a global model from local models, applying update averaging.
[0040] FIG. 16 is an example convolutional neural network.
[0041] FIG. 17 is a block diagram illustrating training of the convolutional neural network of FIG. 16.
[0042] FIG. 18 is a simplified block diagram of a computer system that can be used to implement the technology disclosed.
DETAILED DESCRIPTION
[0043] Many alternative embodiments of the present aspects may be appropriate and are contemplated, including as described in these detailed embodiments, though also including alternatives that may not be expressly shown or described herein but as obvious variants or obviously contemplated according to one of ordinary skill based on reviewing the totality of this disclosure in combination with other available information. For example, it is contemplated that features shown and described with respect to one or more embodiments may also be included in combination with another embodiment even though not expressly shown and described in that specific combination.
[0044] For purpose of efficiency, reference numbers may be repeated between figures where they are intended to represent similar features between otherwise varied embodiments, though those features may also incorporate certain differences between embodiments if and to the extent specified as such or otherwise apparent to one of ordinary skill, such as differences clearly shown between them in the respective figures.
Introduction
[0045] The technology disclosed includes demonstrated image processing applications for frontal face images and meal images, as well as an anticipated clinical platform. Between the provisional filing of this application and the non-provisional conversion, Applicant applied federated learning to its mobile device app that estimates age, sex, height, and weight, then calculates BMI, all from a selfie, a frontal face photograph of the mobile device user.
See, e.g., Pat. App. No. 15/946,629, filed April 5, 2018, entitled“Image-based system and method for predicting physiological parameters”, which is hereby incorporated by reference. Estimated age, sex, height, weight are calculated from the selfie and reported to the user. The user corrects the estimated values. The user’s edge device updates model parameters to take into account the ground truth provided by the user. For instance, age might change from 55 to 64 years, weight from 176 to 182, and height from 5’8” to 6’2”. This ground truth is backward propagated through a network on the edge device, producing parameter adjustments. Occasionally, the updated parameters are returned to an FL aggregator. The FL aggregator periodically updates and redistributes an updated model.
[0046] An anticipated clinical platform also is disclosed. Clinical can be taken in a broad sense to include collection of health related data, such as mood or general health, which might be assessed against a voice or photographic sample. Clinical can also be take in a pharmaceutical sense for providing a tool for contract research organizations to collect data occasionally or periodically during a traditional clinical trial. Collection of data that is partially or completely anonymized can be complemented with a so-called synthetic control arm, in lieu of giving part of the trial participants a placebo. Anonymized data can encourage frequent reporting. Receiving test doses, instead of being at risk of receiving a placebo, is further encouraging.
[0047] Mobile machine learning, in this disclosure, refers to inference on device, training on device, and federated learning, which can be applied to health care. Theoretical and practical challenges need to be faced and overcome to demonstrate a practical federated learning application, especially in a sensitive area such as health care.
[0048] A typical machine learning workflow is illustrated by FIG. 8. Having identified a problem space and a learning task, one finds a large body of data 811, 853 to train a model at a central repository 857, in a centralized manner. After being satisfied with the model, one deploys it to edge devices or to a cloud-based compute resource 859 for prediction. Typical model training involves centrally collecting the data and centrally training the model even when it is deployed in a distributed manner. This involves bringing the data 811 to a central repository 853 to gain control over how it's used in training 857.
Federated Learning
[0049] FIG. 1 is a high level flow chart of machine learning workflow.
[0050] In some embodiments, a core template of machine learning workflow comprises four steps. Step 1 is data collection, to procure raw data. Step 2 is data re-formatting, to prepare the data in the right format. Step 3 is modeling, to choose and apply a learning algorithm. Step 4 is predictive analytics, to make a prediction.
Variables that are likely to influence future events are predicted. Parameters used to make the prediction are represented in multi-dimensional matrix, called tensors.
[0051] A multi-dimensional matrix, or tensor, has certain features commend this data representation to machine learning. Linear algebra operations are efficiently applied by GPUs and other parallel processors on computers. Linearization or differentiation make it feasible to frame optimization problems as linear algebra problems. Big data is difficult to process at scale without tensors, so many software tools have come onto market that simplify tensor computing, e.g., TensorLab, Matlab package, Google TensorFlow, etc. Hardware is following software. Groups of engineers are working on tensor processing accelerator chips, e.g., NVDIA GPUs, Google TPUs, Apple All, Amazon Inferentia, Graviton and Echo-chip, Facebook Glow, and a whole range of technology companies that make Application-Specific Integrated Circuits (ASIC), field programmable gate arrays (FPGAs) and coarse-grained reconfigurable arrays (CGRAs) adapted to calculate tensors with tensor calculation software.
[0052] FIG. 2 is a diagram illustrating an example federated learning model with multiple edge devices and a central FL aggregator.
[0053] A federated learner (Flea) can be implemented as an end user side library, built for an edge device environment, to perform local model update calculations using data collected in the edge device environment. The Flea can perform post-processing after model updating, including applying perturbations (e.g., encryption and introduction of noise for privacy purposes), sharing the model update with a central update repository (i.e., an FL aggregator), optionally downloading updated models, evaluating updated models, and sharing evaluation metrics across platforms, e.g., Flea-iOS (for iPhones), Flea-Android (for Android phones), Flea-kubemetes (for node clients), etc.
[0054] In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices. The federated workflow aggregates the local updates into a new global model 959 which will become our next base model 951 that will be used for inference and additional rounds 915 of training a federated loop. Again, updating via the federated loop 915 does not require centrally collecting data. Instead, we're sending the model to the data for training, not bringing data to the model for training. This is a decentralized workflow instead of a centralized workflow.
Health Care Space
[0055] This can be particularly helpful when dealing with sensitive data, such as medical information in the health care space. In this space, there are a number of issues around data sensitivity. It is crucial to address privacy, both to attract participation of individuals who are reluctant to share sensitive medical information and to comply with regulations.
[0056] In some circumstances, an individual may understand the research value of sharing information, but doesn't tmst the organization that they're being asked to share with. The individual may wonder what third parties that could gain access to their data. On the B2B side, there are intellectual property issues that thwart companies that want to collaborate, but are unable to share their raw data for IP reasons. The technology disclosed can enable collaboration without necessarily sharing data. Also on the B2B side, some companies have internal data policies that prevent even intra-company, cross-division sharing of data. These companies would benefit from collaboration without data sharing.
[0057] In the health care space, regulatory concerns can be paramount. The United States has the federal Health Insurance Portability and Accountability Act, HIPPA. The Eurozone has GDRP. Both impose strict rules around how medical data is handled and shared.
[0058] The technology disclosed applies federated learning to an environment where it's difficult to share underlying data due to data sensitivity concerns. One of the priority applications addresses so-called vertical learning. This application focuses on so-called horizontal federated learning, in which devices have at a different sample space for the same feature space, as opposed to vertical learning, which can be applied to the same sample space with different feature spaces. Horizontal learning applies well to a mobile environment, where a model can be completely shared.
[0059] Consider, with reference to FIG. 10, a data set in the form of a table 1015. This data can be visualized as is a matrix with samples across rows and features down columns. The rows of data may correspond to samples used with a neural network for training. They also may correspond to a SQL-returned table and may have a unique identifiers, IDs, across rows and again have columns of features. In FIG. 10, the dataset 1015 is divided horizontally among devices 953. In this horizontally partitioned dataset, each subset of the data has access to the same feature space, but has its own sample space, as one can imagine of data trained or collection on mobile phones.
[0060] Consider an image processing application and a tensor applied to images that are, for example,
224x224 pixels, prior to being sent to a neural network for inference and training by backward propagation. Images on different devices have the same feature space, but they're different images, belonging to different sample spaces. Each edge device can start with the same base model. An FL aggregator or federated learning repository or some other central authority or compute resource sends the base model to the edge device for update training, to produce updated models 957. The edge devices 953 train using respective partitions of the data 1015, producing the updated models 957, which are aggregated 959 into an updated model which can be distributed as a new base model 951. In this process, the base model resides locally on each device. Each device trains locally on data that is available on device. The federated loop aggregates the local updates to produce a new global model.
[0061] One working example of horizontal learning executed in a mobile environment is the medical selfie. The medical selfie model infers the user’s age, sex, height and weight from a frontal image a user’s face. This data can be used to calculate the user’s body mass index, BMI, which is a data point in health care statistics. FIG. 11 depicts a graphic user interface for medical selfies. At one time, most of the information in 1153 is collapsed, and the frontal face image is visible. When estimates are given, the user is invited to correct the system’s estimates of age, sex, height and weight. (BMI is calculated from the other values.) At another time, the user can expand some or all of the information panels, as in 1157, and reveal further information.
[0062] This model can be trained in a federated manner, beginning with a base model 951 trained conventionally on millions of images to produce a model that performs relatively well. This base model is sent to an edge device where it's first used to perform inference on new images collected by the user, such as selfies. The user will be given the option to correct the inferences made by the model, so that accurate age, sex, height and weight are known. With this ground truth, the base model is trained to produce an updated model. Each of the participating edge devices similarly produces local updates to the current model. Those local updates are centrally aggregated into a new based model and the process repeats. Aggregation can be performed using a federated average algorithm, applying the averaging formula 1577 in FIG. 15. This is a weighted average of the updates to the model, weighted according to the number of samples used by an edge device to produce its update. Alternatively, only updates based on a threshold number of samples would be aggregated and the aggregation could be un-weighted. In practice, the base convolution model can be a MobileNet V2 model with supplemental training that builds on transfer learning of facial images. Transfer learning can leverage training on an ImageNet classification problem. For age, sex, height and weight, custom layers can be stacked on top of an ImageNet or MobileNet V2 model.
[0063] Initial training of the base model can be offline. Then, the trained base model can be distributed to edge devices, which produce updates that are processed by the federated loop, as illustrated in FIG. 9.
Asynchronous distribution of base models and receipt of proposed updates present significant engineering challenges, which can be explained by flattening the federated loop into a message flow diagram, FIG. 12.
[0064] In FIG. 12, the horizontal axis is time. Devices are depicted on the vertical axis, including a coordinating server 1221 that manages training tasks and performs model aggregation. Below the coordinating server, the figure illustrates four edge devices 953 that perform training using local data to produce local updates of a base model. In the figure, messages travel down and up between the coordinating server 1221 and individual devices 953, each represented by a horizontal line. The stream of communications reflects asynchronous messaging, with simplifications involving just a handful of devices and grouping of communications back and forth that would likely be interleaved or multiplexed. Each of the devices 953, at unassigned times, makes a request to the server 1221, indicating their availability for training tasks, either expressly or implicity. Supposing there are, the server 1221 responds in the affirmative and send an updated, latest version of the model to the device, if it has not already done so. The edge device 953 will train on local data, update its a local model and send the updated version of the model back to the server.
[0065] Communications between devices 953 and server 1221 are asynchronous, over network connections, and sometimes unreliable. In some cases, an edge device or client make a request for a training task, but does not receive a response for the server. This can be represented by a upward arrow, for instance near the beginning of cycle 1223, without a responsive downward arrow. In other cases, the client might request and receive an assignment and current model version, but never upload an updated model. In other cases, a client may participate multiple times during a given training cycle. The server 1221 checks to make sure that updates received apply to a current version of the base model, that the edge device is not updating a deprecated base model version. A cycle, such as 1213, 1215 or 1217, eventually reaches a predetermined threshold. This threshold could be expressed as a number of clients that have participated in the round, as a number of training samples processed in the updated models, or as an elapsed amount of time. Each of the cycles corresponds to one round of the federated loop 915 that produces a new global model (959, which becomes 951), and to distribution to the edge devices of the updated, new model. The edge devices can use the new model for predictions and training as additional data is collected.
Preferably, the edge devices do not repeatedly train using old data that previously was used to train an updated model that was forwarded to the server 1221 for aggregation. The process repeats, as depicted for three cycles in FIG. 12
[0066] The engineering challenges are significant. One challenge arises from networking issues and latency of devices are joining and leaving the network. Another challenge is that the mobile model is unquantized and includes on the order 20 megabytes of model parameters. It is useful to make sure that the model is not updated too often over cellular data connections. Updating also hits the mobile device’s power constraints, as training on a mobile phone is resource intensive and, therefore, power hungry. In some implementations, training is limited to times when the phone is plugged in, has a Wi-Fi connection and is not in otherwise use by the user.
[0067] On the server side, asynchronous task management requires record keeping and keeping track of all of the training tasks and local updates in process numerous edge device. It also involves periodically performing aggregation and redeploying updated models. In addition to these engineering challenges, there are theoretical concerns, arising from classical statistics, that can only be overcome by empirical investigation.
[0068] In experiments performed thus far, federated training actually has worked. FIGS. 13-14 illustrate a scatter plot of data from a so-called In‘n Out model that was trained to distinguish between photographs taken indoors and out of doors. FIG. 13 plots a loss function, for which a lower value is better, except in case of overtraining. FIG. 14 plots a likelihood of correct binary classification, for which a higher value is better.
[0069] The scatterplot in FIG. 13 graphs local losses versus global trading loss for a binary classification test model that deployed internally by Applicant. Towards the left, dot 1353 is the global training loss of the original base model. Other dots clumped to the left 1351 and scattered to the right 1355, 1359, are the local losses of that model trained on individual devices, which sent their models to a server (e.g., 1221). The graph shows two sorts of bad results resulting from training on end devices with small sample sets. First, in some cases, e.g., 1359, the local losses exploding off to the right. This suggests that something has gone badly in training that that caused the gradient descent to jump out of a local minima found during the initial global training. Second, the local loss sometimes dives towards zero, which indicates overfitting of the local training data. This is a recognized issue with small sample sizes, relative to the initial sample size that we used to produce the global model. FIG. 14 depicts the corollary accuracy of the original based model. The accuracy 1453 of the initial base model was roughly 90 percent for a binary classification problem. The local accuracy 1455 is clustered near 50 percent. The updates to the models that are sent back to the server for aggregation, when tested against samples held back for validation, have an accuracy that hovers around 50 percent, between 40 and 60 percent, which suggests that the local updates to the models are no better than random guesses at binary classification.
[0070] With excitement, these inventors determined that the federated average of the updated models actually produced a model that was slightly better than the base model before aggregation. The aggregated model loss is represented by a blue dot 1459 just to the right of the red dot 1453. Again, the average of worse models produced an improved, better model. That is extraordinarily counterintuitive, given the position near random chance of cluster
1455.
[0071] FIG. 15 is a conceptual depiction, appealing to intuition, why the bad models averaged might work well. Imagine beginning with a base model in a two dimensional space 1513 with a good decision boundary that accurately classifies gray dots, below the line, against black dots, above the line. Then, we send this global model to two devices and train it on those devices, producing the decision boundaries illustrated upper and lower graphs 1515 and 1575. In the upper graph, two yellow dots 1536 and three new gray dots 1534 are added in the bottom half of 1515. The new dots have pulled the decision boundary down and to the right, separating the new dots. In the lower graph 1575, we've added four purple dots 1564 in the top left comer, representing new black samples. The added samples have pulled the decision boundary up and to the left, in the direction of the new dots. In both cases, the resulting a decision boundary that is actually worse, resulting in misclassification of some of the original samples. Counter intuitively, when we average the 1515 and 1575 decision boundaries to produce 1517, which corresponds to averaging the weights 1577 that describe that decision boundary, we end up with a boundary that is close to the original one and that accurately classifies both the original and added samples.
[0072] Conceptually, this is what federated averaging is doing. It's able to take updated models, in which decision boundaries that have been pulled in opposite directions, and average them into something closer to the center that improves on a base model. Of course, this is happening in very high dimensional space, with on the order of four million parameters for a MobileNet model. Projecting what happens onto a two-dimensional space helps us develop an intuition for how this might work.
[0073] Classical statistics pose additional theoretical challenges to federated learning, as user device data collection and training defeats any guarantee or assumption that training data is independent and identically distributed, IID. This is a distinguishing feature of federated learning. The loss of the strong statistical guarantee allows the system with high dimensionality to make inferences about a wider population of data, including in our training set samples collected by edge devices. Consider the medical selfie example again. We're training the initial model on a library of selfies and sending it to an edge device for training on more selfies, including performing inference on the new selfies. When we send the model to edge device for training, we are potentially exposing it any image that a user can take on a mobile phone. We are no longer training the model on just selfies, but also on kittens and houseplants and sunsets and so on. Exposing the model to a different population than the population than our target population for actual training and inference means that we've lost the strong statistical guarantee that our training will be results will produce results that generalize. To address this, beyond training users, we can filter image capture and updating, before and after data leaves the edge device.
[0074] First, the technology disclosed can be enhanced by putting a filter in front of model training, to try to constrain the sample population collected and used for training on edge devices, bringing it closer to the intended target population. Second, we can put a filter at the FL aggregator to threshold out some updates that appear to have been trained on bad data.
[0075] The first filter can limit training to images of selfies, instead of exposing it to all kinds of images. A face detector in front of the model does not treat sunrises or house plants as faces. Then, the edge devices are training on any image that has a face in it, which is mostly selfies but could be include some other images. That brings us closer to that target population.
[0076] On the backside, the technology disclosed can be enhanced by filtering out some of the updates that appear to be very bad. The training that produced wildly divergent updates potentially resulted from being exposed to bad training data or training data that has been mislabeled, such as a typo in a person’s weight or height. Consider again our local losses versus our global training loss graph in FIG. 13. Recall that some of these losses, e.g., 1359, explode off to the right. A second filter can eliminate those updates from being averaged into the model, where it appears that the local updated has jumped too far outside of our original local minima.
[0077] Intuitively, this corresponds to updated models that have very badly malformed decision boundaries, which could result from bad training data, such as mislabeled training data. In any case, we want to measure some kind of distance between local updates and the original model. One measure would be a simple Euclidean distance across all the weights and a relative to the distribution of distances among local updates in a batch. The distribution can be used to filter out updated modes that are very bad or divergent. This should allow us to restrict our aggregation by federated averaging to updated models that have been trained on a population of data that is similar to our target population.
[0078] Empirical results have been good. Internally research by applicant is showing. Actual test deployments also showing good federated learning. Despite the loss of classical IID, strong statistical guarantee, we end up with empirical results that are good. Of course, this depends on class size and sample size as well as hyper- parameters of the model. It also would be impacted by implementation of the filters described. The inventors have concluded that federated learning works and is a viable approach to machine learning for a range of health space tasks.
[0079] Another theoretical issue is kinds of privacy guarantees that can be made when federated learning is implemented. Does this approach leak any information about the training data? Can the input be reconstructed from the updates? Two approaches to ensuring privacy during horizontal federated training bear consideration. First, is the practice of adding noise to a statistic to mask its true value. Research has shown that this technique can be applied in both federated and non-federated contexts to mask the participation of a sample or even an entire client in a training room.
[0080] Second, homo morphing encryption can be considered. This approach applies a series of computations to a cipher text and then decipher the results and ends up with the same results as if that series of computations had been applied to the original text. However, homo morphing encryption may only work with linear transformations and linear approximations of non-linear transformations.
Overall Approach
[0081] With this example in mind, we return to describing the overall approach. As show in in FIG. 2, Flea end users can communicate and collaborate with one another (potentially in tandem with one or more FL aggregator backends) to build and update models of computation in multiple ways. These configurations are described in the context of medical research use cases. A general discussion of regarding convolutional neural networks, CNNs, and training by gradient descent is facilitated by FIGS. 16-17.
CNNs
[0082] A convolutional neural network is a special type of neural network. The fundamental difference between a densely connected layer and a convolution layer is this: Dense layers learn global patterns in their input feature space, whereas convolution layers learn local patters: in the case of images, patterns found in small 2D windows of the inputs. This key characteristic gives convolutional neural networks two interesting properties: (1) the patterns they learn are translation invariant and (2) they can learn spatial hierarchies of patterns.
[0083] Regarding the first, after learning a certain pattern in the lower-right comer of a picture, a convolution layer can recognize it anywhere: for example, in the upper-left comer. A densely connected network would have to learn the pattern anew if it appeared at a new location. This makes convolutional neural networks data efficient because they need fewer training samples to learn representations they have generalization power.
[0084] Regarding the second, a first convolution layer can learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on. This allows convolutional neural networks to efficiently learn increasingly complex and abstract visual concepts. [0085] A convolutional neural network learns highly non-linear mappings by interconnecting layers of artificial neurons arranged in many different layers with activation functions that make the layers dependent. It includes one or more convolutional layers, interspersed with one or more sub-sampling layers and non-linear layers, which are typically followed by one or more fully connected layers. Each element of the convolutional neural network receives inputs from a set of features in the previous layer. The convolutional neural network learns concurrently because the neurons in the same feature map have identical weights. These local shared weights reduce the complexity of the network such that when multi-dimensional input data enters the network, the convolutional neural network avoids the complexity of data reconstruction in feature extraction and regression or classification process.
[0086] Convolutions operate over 3D tensors, called feature maps, with two spatial axes (height and width) as well as a depth axis (also called the channels axis). For an RGB image, the dimension of the depth axis is 3, because the image has three color channels; red, green, and blue. For a black-and-white picture, the depth is 1 (levels of gray). The convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an output feature map. This output feature map is still a 3D tensor: it has a width and a height. Its depth can be arbitrary, because the output depth is a parameter of the layer, and the different channels in that depth axis no longer stand for specific colors as in RGB input; rather, they stand for filters. Filters encode specific aspects of the input data: at a height level, a single filter could encode the concept“presence of a face in the input,” for instance.
[0087] For example, the first convolution layer takes a feature map of size (28, 28, 1) and outputs a feature map of size (26, 26, 32): it computes 32 filters over its input. Each of these 32 output channels contains a 26 x 26 grid of values, which is a response map of the filter over the input, indicating the response of that filter pattern at different locations in the input. That is what the term feature map means: every dimension in the depth axis is a feature (or filter), and the 2D tensor output [:, :, n] is the 2D spatial map of the response of this filter over the input.
[0088] Convolutions are defined by two key parameters: (1) size of the patches extracted from the inputs - these are typically l x l, 3 x 3 or 5 x 5 and (2) depth of the output feature map - the number of filters computed by the convolution. Often these start with a depth of 32, continue to a depth of 64, and terminate with a depth of 128 or 256.
[0089] A convolution works by sliding these windows of size 3 x 3 or 5 x 5 over the 3D input feature map, stopping at every location, and extracting the 3D patch of surrounding features (shape (window height, window width, input depth)). Each such 3D patch is ten transformed (via a tensor product with the same learned weight matrix, called the convolution kernel) into a ID vector of shape (output depth,). All of these vectors are then spatially reassembled into a 3D output map of shape (height, width, output depth). Every spatial location in the output feature map corresponds to the same location in the input feature map (for example, the lower-right comer of the output contains information about the lower-right comer of the input). For instance, with 3 x 3 windows, the vector output [i, j, :] comes from the 3D patch input [i-1 : i+1, j-1 : J+l, :]. The full process is detailed in FIG. 11.
[0090] The convolutional neural network comprises convolution layers which perform the convolution operation between the input values and convolution filters (matrix of weights) that are learned over many gradient update iterations during the training. Let (in. n) be the fdter size and W be the matrix of weights, then a convolution layer performs a convolution of the W with the input X by calculating the dot product W · x + b, where x is an instance of X and b is the bias. The step size by which the convolution fdters slide across the input is called the stride, and the filter area (in c n) is called the receptive field. A same convolution filter is applied across different positions of the input, which reduces the number of weights learned. It also allows location invariant learning, i.e., if an important pattern exists in the input, the convolution filters learn it no matter where it is in the sequence. Trainins a Convolutional Neural Network
[0091] FIG. 12 depicts a block diagram of training a convolutional neural network in accordance with one implementation of the technology disclosed. The convolutional neural network is adjusted or trained so that the input data leads to a specific output estimate. The convolutional neural network is adjusted using back propagation based on a comparison of the output estimate and the ground truth until the output estimate progressively matches or approaches the ground truth.
[0092] The convolutional neural network is trained by adjusting the weights between the neurons based on the difference between the ground tmth and the actual output. This is mathematically described as:
Figure imgf000014_0001
[0093] In one implementation, the training rule is defined as:
Figure imgf000014_0002
[0094] In the equation above: the arrow indicates an update of the value;
Figure imgf000014_0003
is the target value of neuron m ;
(Pm is the computed current output of neuron m
Figure imgf000014_0004
is input ^ ; and a is the learning rate.
[0095] The intermediary step in the training includes generating a feature vector from the input data using the convolution layers. The gradient with respect to the weights in each layer, starting at the output, is calculated. This is referred to as the backward pass, or going backwards. The weights in the network are updated using a combination of the negative gradient and previous weights.
[0096] In one implementation, the convolutional neural network uses a stochastic gradient update algorithm (such as ADAM) that performs backward propagation of errors by means of gradient descent. One example of a sigmoid function based back propagation algorithm is described below:
1
f = m
l + e h
[0097] In the sigmoid function above, n is the weighted sum computed by a neuron. The sigmoid function has the following derivative: =<PP - <P)
[0098] The algorithm includes computing the activation of all neurons in the network, yielding an output for the forward pass. The activation of neuron m in the hidden layers is described as:
Figure imgf000014_0005
[0099] This is done for all the hidden layers to get the activation described as:
Figure imgf000014_0006
[00100] Then, the error and the correct weights are calculated per layer. The error at the output is computed as: [00101] The error in the hidden layers is calculated as:
K
Sj = f (I-f )Yv j d Ί
nm k= 1 mk ok
[00102] The weights of the output layer are updated as:
Vmk -— Vmk - - OtSok(pm
[00103] The weights of the hidden layers are updated using the learning rate ^ as:
Vnm ^— Wnm + CcShmCln
[00104] In one implementation, the convolutional neural network uses a gradient descent optimization to compute the error across all the layers. In such an optimization, for an input feature vector x and the predicted output y, the loss function is defined as l for the cost of predicting y when the target is y, i.e. I (y, y). The predicted output y is transformed from the input feature vector x using function f. Function/is parameterized by the weights of convolutional neural network, i.e. y =fw (x). The loss function is described as / (y, y) = / (f, (x), y), or
Q (z, w) = l (fw (x), y) where z is an input and output data pair (x, y). The gradient descent optimization is performed by updating the weights according to:
Figure imgf000015_0001
Wt + \ = Wt+Vt+\
[00105] In the equations above, OC js qlc learning rate. Also, the loss is computed as the average over a set of
^ data pairs. The computation is terminated when the learning rate OC is small enough upon linear convergence.
In other implementations, the gradient is calculated using only selected data pairs fed to a Nesterov’s accelerated gradient and an adaptive gradient to inject computation efficiency.
[00106] In one implementation, the convolutional neural network uses a stochastic gradient descent (SGD) to calculate the cost function. A SGD approximates the gradient with respect to the weights in the loss function by computing it from only one, randomized, data pair,
Figure imgf000015_0002
, described as:
Vi + i = mn - CC w Q(zt, wt)
Wt + i = Wt+Vt + i
[00107] In the equations above: js the learning rate; ^is the momentum; and ^ is the current weight state before updating. The convergence speed of SGD is approximately
Figure imgf000015_0003
when the learning rate are reduced both fast and slow enough. In other implementations, the convolutional neural network uses different loss functions such as Euclidean loss and softmax loss. In a further implementation, an Adam stochastic optimizer is used by the convolutional neural network.
Model Exchange in Federated Learning
[00108] In some embodiments, Flea end users communicate and collaborate with one another to build and update models, effecting a lateral tensor ensemble of user models, in a one-to-one manner. The end users could also laterally organize their own trials and choose a central FL aggregator to which to send the gradients and get the averaged gradients back in a distributed fashion. [00109] In yet some other embodiments of the disclosure, tensors are configured to function tensorial handshakes, with one-to-one tensors for distributed clinical trials. End users can also laterally organize their own trials and choose a central FL aggregator to send the gradients and get the averaged gradients back in a distributed fashion.
[00110] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in tensor economy in a many-to-one manner. Tensors for distributed clinical trials. Each end user can be called upon several sponsors to conduct several trials at the same time and can use the same underlying data to create new tensors.
[00111] In yet some other embodiments of the disclosure, there are many-to-one tensors for distributed clinical trials. Each end users can be called upon several sponsors to conduct several data trials at the same period of time.
[00112] In some embodiments, Flea end users communicate and collaborate with one another to build and update models in autonomous tensor ensembles, in a many-to-many manner. Just as algorithms start to write themselves, devices without human intervention will start to collect information between each other. These will just behave like many insect species, including ants and bees, who work together in colonies, and their cooperative behavior determines the survival of the entire group. The group operates like a single organism, with each individual in a colony acting like a cell in the body and becomes a "superorganism”. Federated Deep learning only needs these small players like insects, ants, critters and bees to create big and smart things with immense, complex and adaptive social power and ambitious missions.
[00113] In yet some other embodiments of the disclosure, there are many-to-many tensors for distributed clinical trials. Just as algorithms start to write themselves, devices are configured to collect information between each other without human intervention. Cheap Micro-Computer Units can soon be deployed anywhere, without mains, docking, or battery replacement. MCUs can be configured to behave like many insect species, including ants and bees, who work together in colonies. The cooperative behavior of the group of MCUs determines the survival of the entire group. The group operates like a single organism, with each individual in a colony acting like a cell in the body and becomes a "superorganism.” Federated deep learning algorithm requires these small players like insects, ants, critters and bees to create big and smart things with immense, complex and adaptive social power and ambitious missions.
[00114] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner. With federated learning a global protocol is sent from one central authority to many participants who collect information on their edge device, label the information and compute it locally, after which they sent the tensors to the central FL aggregator of the sponsor. They aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.
Clinical Trials
[00115] FIG. 3A is a diagram illustrating an example use case of a traditional clinical trial where the one-to- many tensors for distributed clinical trials could be applied.
[00116] In some embodiments, tensor ensembles are vertical in a one-to-many structure, called Vertical Tensor Ensembles. Most clinical trials are centralized which consist of one sponsor who centrally produces the protocol and uses several sites where many end users can go for physical exams and laboratory tests. This procedure is time consuming and costly and mostly outsourced to Contract Research Organizations (CROs). With Federated Learning a global protocol is sent from one central authority to many end users who collect information on their edge devices, e.g. smartphones, label the information and compute it locally, after which the outcome tensors are sent to the central FL aggregator of the sponsor. The central authority aggregates all the tensors and then reports the updated and averaged tensors back to each of the end users. This one-to-many tensors are configured to conduct distributed clinical trials.
[00117] FIG. 3B is a diagram illustrating an example of using a federated learner system to conduct one-to- many tensor exchanges for distributed clinical trials, using so-called Fleas.
[00118] In some embodiments, sponsor of a digital clinical trial, typically a data trial, announces the data trial directly to end users via application program installed on end users’ devices. Each end user device includes a federated learner. The federated learners are configured to share tensors with a centralized FL aggregator. The centralized FL aggregator is configured to share with the sponsor only a global model, not data or model updates from individual end users.
[00119] In some embodiments, sponsor of a data trial announces the trial directly to end users. End users are free to choose from many specific sites to participate the data trial. Each of these specific sites are configured to be connected with a CRO which holds FL aggregator. Similarly, federated learners of devices are configured to share tensors on data with the CRO FL aggregator. The CRO centralized FL aggregator is configured to share with the sponsor only a global model, not data or model updates from individual end users.
[00120] Both of these embodiments, comparing to traditional clinical trial procedure involving Institutional Review Board (IRB), improve the efficiency of clinical trials drastically. End users enjoy far better flexibility of participating clinical trials. The one-to-many trials reduce the need for a CRO from a data management perspective for Pharmaceutical company. End users are not sharing data, just trained models’ weights. End users have the option to go to preferred site of choice, instead of being limited to a chosen and assigned site to them. This also means more virtual trials are possible without introducing data quality issues. The FL aggregator intermediary, either a centralized FL aggregator or a CRO having licensed FL aggregator, can do the global averaging of the weights. A sponsor, such as a pharmaceutical company, doesn’t do the global averaging of the weights, thus removing doubts of any bias by the sponsor. The audits are on the weights and algorithms, thus removing most human bias in checking data quality.
[00121] FIG. 4 is a diagram illustrating an example FL aggregator. In this example, Flea is configured to be embedded in various edge devices belonging to end users. Such edge devices can be but not limited to any electronic device which is capable of connecting to internet or similar web. For example, mobile phones, smart watches, sensor modules in car or home, or a cloud server, etc.
[00122] An FL aggregator is designed as a federated learning back-end responsible to collect model updates and evaluations sent from Flea end users which requires high availability, organize models that can be updated from end user side updates along with the operations required to perform these updates, admit or reject proposed updates from each end user based on criteria such as history of end user’s submissions (e.g. an end user’s credibility score) as well as end user sent metadata. The FL aggregator aggregates admissible end user updates into a single update to each model and redistributes updated models to the end user side. The FL aggregator reports aggregations of model evaluations based on similar admissibility criteria as those used in updates, It conducts tensorial handshakes, which are protocols that govern the exchange of information between federated learners running on end user devices and the FL aggregator, or amongst collectives of federated learners, on the initiative of end users themselves.
[00123] FIG. 5 is a diagram illustrating an example use case of tensor globalization of a federated learner system. Consider the example of a biotech company that has a federated learner model trained for Parkinson’s disease. Traditionally, most clinical trials are centralized. They consist of one sponsor who centrally produces the protocol and uses several sites where the many participants can go for exams and tests. This procedure is time consuming and costly and mostly outsourced to Clinical Research Organizations (CROs). [00124] New alternatives that are now becoming available as the technologies disclosed, which manipulate tensors as proxies for data, evolve. The distributed structure of a clinical trial, instead of flat, can be curved into an n-dimensional manifold or surface. This also changes the nature of models. Models themselves are simply tensor ensembles. As edge computational units become more powerful, each computational unit on the edge can house its own model.
[00125] Between edge units, both data-derived tensors and model ensembles can be freely exchanged.
[00126] The FL aggregator is configured to be provided at least a federated learner model and a multidimensional matrix. The tensors coming out of that model are to be averaged with the tensors of biotech model. The biotech company gets the global model back.
[00127] Another example use case applies the technology disclosed to an application program used by millions of members who regularly use the application for a function, leaving digital traces that reveal the members’ interests in a data trail. For instance, someone may look for restaurants. In this example, the tech company requires user feedback in order to improve the quality of its prediction model to serve users better. The tech company gives this input to FL aggregator and gets the tensors back, asynchronously or synchronously. Doing so, the raw data of end users is not used, and privacy of end users is not invaded. The tech company only gets a global model of the interests of the entire population and a more precise model in different behavioral segments that enables them to target specific predicted actions. The company can also share either the global tensors or the precision tensors, should they want to. No data is transported, inferences can be drawn by applying the tensors, without access to underlying user data.
[00128] FIGS. 6A-6B are diagrams illustrating example use case of a federated learner system in a linear training trial and in an adaptive and continuously learning distributed trial, comprising federated learners and FL aggregator applied to collection and analysis of data trial.
[00129] With a federated learner and FL aggregator, clinical trials do not require site visits. On a site visit, CROs receive the data from the sites, which is an arduous data collection process that takes significant time. The CROs analyze the data once the trial is complete, which takes significant amount of time and money to do so. Correcting model errors is expensive, especially if a part of the trial has to be reevaluated. With federated learner, trials are in real-time, especially because end points of the trials are already being built as prediction models or analytics. Administrators can control the data training and frequency behind the scenes and it is the algorithms that are adaptive, instead of humans in a CRO. Trials are more streamlined and parallelized. Speed of trial is significantly improved, even though it may possibly mean failing fast. Feedback loops are much faster, and the sponsors or CROs get a much better idea whether the trial is even working correctly from early on.
[00130] An end user can use a site of their choice, provided the site is also chosen with the trial. The data on end user’s phone is used for training the model relevant to the end point of the trial. Since the analytics and model are not an after-trial completion artifact but living and real-time with the federated learner, administrators of the trial can quickly adapt to issues of bias, confounding influences, etc. This speeds up trials. End users can be virtual or onsite. Additionally, trials can collect real world data from user devices that provides more dimensions for training.
[00131] FIG. 7 is a diagram illustrating an example use case of a federated learner system, including one or more simulated control arms for the application of data trial. So-called synthetic control arms are configured to operate via collected data at large scale over an existing population. See, e.g., Goldsack, Syntehtic control arms can save time and money in clinical trials Feb. 5, 2019) <accessed at dub dub dub at statnews.com 2019/2/5/synthetic - control-arms-clinical-trials/>; Medidata, De-risk Go/No Go Product Development Decisions by Reusing Patient Trial Data: MEDS Synthetic Control Arms & Synthetic Control Data (2019) < accessed at dub dub dub dot medidata.com/en/white-paper/de-risk-go-no-go-product-development-decisions-by-reusing-patient-trial-data-meds- synthetic-control-arms-synthetic-control-data-2/>. The same populations can be used to train generative models for similar populations. These generative models can cause a many -fold increase in the utility of the population based on its simulated characteristics.
[00132] Instead of collecting data from patients recruited for a trial who have been assigned to the control or standard-of-care arm, synthetic control arms model those comparators using real-world data that has previously been collected from sources such as health data generated during routine care, including electronic health records, administrative claims data, patient-generated data from fitness trackers or home medical equipment, disease registries, and historical clinical trial data, etc. This can be done via a federated learning model with edge devices sending up gradients to at least one FL aggregator.
[00133] Synthetic control arms bring clear benefits to pharmaceutical industry and application. It can reduce or even eliminate to enroll control end users, improve efficiency, efficacy and consistency. By reducing or eliminating the need to enroll control end users, a synthetic control arm can increase efficiency, reduce delays, lower trial costs, and speed up life-saving therapies to market. This kind of hybrid trial design presents a less risky way for sponsors to introduce real-world data elements into regulatory trials and can also reduce the risk of late stage failures by informing go or no-go development decisions. Placebo-fear is one of the top-reasons patients choose not to participate in clinical trials. This concern is amplified when an individual’s prognosis is poor and when current care is of limited effectiveness. Using a synthetic control arm instead of a standard control arm ensures that all participants receive the active treatment, eliminating concerns about treatment/placebo assignment. Use of a synthetic control arm addresses an important participant concerns and removes an important barrier to recruitment. The use of simulated control arms can also eliminate the risk of unblinding when patients lean on their disease support social networks posting details of their treatment, progress, and side effects that could harm the integrity of the trial.
[00134] The federated learner system can be utilized for tensorial twins. The tensorial twin represents the nearest-neighbor patient, derived from algorithmic matching of the maximal proportion of data points using a subtype of AI known as nearest-neighbor analysis. The nearest neighbor is identified using AI analytics for approximating a facsimile, another human being as close as possible to an exact copy according to the patient’s characteristics to help inform best treatment, outcomes, and even prevention.
[00135] We can use information that comprehensively characterizes each individual for demographics, biologic omics, physiology, anatomy, and environment, along with treatment and outcomes for medical conditions.
[00136] Perturbed Subspace Method (PSM) employs a predicted probability of group membership, e.g., treatment or control group, based on observed predictors, usually obtained from logistic regression to create a counterfactual group. Propensity scores may also be used for matching or as covariates— alone or with other matching variables or covariates. With federated learning every cohort can be configured to be adaptive in a very complex way because the members with federated learner could send up delta. In this case, it continuously makes the relationship between them and the cohort tenuous to the point that they redefine normality and start to act as patients in silico, preparing for a stochastic forward model of precision medicine.
[00137] The federated learner system may use fuzzy tensor swarm. Devices which used to be responsible only for the gathering of data are to be configured to ran downstream computations. Such configuration can be applied to various scenarios. For example, heart rate monitors, automatic blood pressure pumps, weather micro-stations, etc. Computational capacity as well as speed are increased drastically. With the advent of higher-bandwidth connectivity between such devices (due, for example, to 5G), the old paradigm of requiring these devices to send data to a central location where an archaic batch runner produces an updated data processor and ships it back to each device individually is becoming outmoded. Incurring a system-wide overhead when heart rate monitor can update its own data processing algorithms makes no sense any more. Such heart rate monitor system only requires access blood pressure pump and weather micro-station. As in the case of the heart rate monitor, the capability of updating the system’s own data processing algorithm by the system itself is especially true for mission-critical functionality, where seconds could make a difference between life and death. To make use of this additional computational capacity and bandwidth, each device is to be deployed with its own adaptive data processing module, placed within a network mesh of devices, and equipped with an ontology (e.g., protocol-driven) describing to it the kind of information it can derive from each of its neighbors in the mesh. Each device in the mesh is configured to make available to its neighbors any of its primitives, as well as data-derived updates to itself. Taken together, an ensemble of interconnected devices, of which each with an intelligent data processing module and an ontological protocol, form a fuzzy tensor swarm. In this fuzzy tensor swarm, the emergent behavior is configured at a minimum equivalent in functionality, although may not be optimal in terms of latency and overhead, to what is possible with a centralized model building workflow. Empowered by 5G and Intemet-of-Things technologies, each device can be connected, either physically or not, and stream data to millions of other smart data capture devices that can create live models of their vertical worlds. The enriched information from millions of graphics processing units can be feedbacked to other objects or their carbon, silicon or neuron users. Passive collection can be monetized and become the service industry of virtual reality (VR) which can create parallel existential dimensions as a service.
[00138] In some embodiments of the disclosure, a federated learner model can be applied to federated learning and adversarial rapid testing of clinical data and standards. Data training done on the device close to the data mitigates privacy concerns. The trained models basically try to predict when symptoms happen, and the user can be enabled to verify. This Generative Adversarial Models (GAN) can then be used to generate Real World Evidence (RWE) backed patient simulations to validate clinical trials, data, anomaly detection. Pharmaceutical company can be enabled to license these models out as new revenue. End users’ simulated data is predicted or inferred on probabilistic risk calculators, based on their genetics, exposome, pharmacome and other omics data. Once these models are built, pharmaceutical company can also use the models in other data trials to do ground work analysis.
[00139] Clinical trial can go out with consumer health care mobile devices, e.g., apple watch, where participants can confirm or deny when the GAN thinks they may have a symptom happen soon. The model gets trained on end user devices and only the model is sent back to the servers. The models are then tested in other patients and verified over and over.
[00140] This model of symptoms can be used to simulate existing clinical trial around similar drag. If it can reproduce the study results, then these models can be used in dashboard around these types of drags.
[00141] The federated learning model can be applied to automatic qualification of participants for clinical trials and remove the expensive human verification process.
[00142] The federated learning model can be applied to decentralized patient registries. Such registry is on the edge and fragmented, but comes together on an“ask” command by authorized personnel, e.g., the end user.
[00143] The federated learning model can be applied to configure peer to peer health data comparator to compare health condition of one end user against another without sharing any personal data.
[00144] The federated learning model can be applied to distribute second opinion. One end user can be enabled to share his or her personal model with a new doctor or citizen scientist without giving away any data. Tensors are compared and not the real data.
[00145] The federated learning model can be applied to health anomaly detection via model anomaly detection. Tensors can be configured to indicate that there is an out of bounds anomaly with the population. Once some issues identified, it can escalate to a doctor. [00146] The federated learning model can be applied to health fingerprint. The model built on end user data can be a unique signature of the end user. It evolves as the health condition of the end user evolves. The model can be used as an identity in time.
Computer System
[00147] FIG. 18 is a simplified block diagram of a computer system 1800 that can be used to implement the technology disclosed. Computer system typically includes at least one processor 1872 that communicates with a number of peripheral devices via bus subsystem 1855. These peripheral devices can include a storage subsystem 1810 including, for example, memory subsystem 1822 and a file storage subsystem 1836, user interface input devices 1838, user interface output devices 1876, and a network interface subsystem 1874. The input and output devices allow user interaction with computer system. Network interface subsystem provides an interface to outside networks, including an interface to corresponding interface devices in other computer systems.
[00148] User interface input devices 1838 can include a keyboard; pointing devices such as a mouse, trackball, touchpad, or graphics tablet; a scanner; a touch screen incorporated into the display; audio input devices such as voice recognition systems and microphones; and other types of input devices. In general, use of the term“input device” is intended to include all possible types of devices and ways to input information into computer system.
[00149] User interface output devices 1876 can include a display subsystem, a printer, a fax machine, or nonvisual displays such as audio output devices. The display subsystem can include a cathode ray tube (CRT), a flat- panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide a non-visual display such as audio output devices. In general, use of the term“output device” is intended to include all possible types of devices and ways to output information from computer system to the user or to another machine or computer system.
[00150] Storage subsystem 1810 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. These software modules are generally executed by processor alone or in combination with other processors.
[00151] Memory used in the storage subsystem can include a number of memories including a main random access memory (RAM) 1832 for storage of instructions and data during program execution and a read only memory (ROM) 1834 in which fixed instructions are stored. The file storage subsystem 1836 can provide persistent storage for program and data files, and can include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations can be stored by file storage subsystem in the storage subsystem, or in other machines accessible by the processor.
[00152] Bus subsystem 1855 provides a mechanism for letting the various components and subsystems of computer system communicate with each other as intended. Although bus subsystem is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple busses.
[00153] Computer system itself can be of varying types including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a widely -distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system depicted in FIG. 18 is intended only as a specific example for purposes of illustrating the technology disclosed. Many other configurations of computer system are possible having more or less components than the computer system depicted in FIG. 18.
[00154] The computer system 1800 includes GPUs or FPGAs 1878. It can also include machine learning processors hosted by machine learning cloud platforms such as Google Cloud Platform, Xilinx, and Cirrascale. Examples of deep learning processors include Google’s Tensor Processing Unit (TPU), rackmount solutions like GX4 Rackmount Series, GX8 Rackmount Series, NVIDIA DGX-1, Microsoft’ Stratix V FPGA, Graphcore’s
Intelligent Processor Unit (IPU), Qualcomm’s Zeroth platform with Snapdragon processors, NVIDIA’s Volta,
NVIDIA’S DRIVE PX, NVIDIA’s JETSON TX1/TX2 MODULE, Intel’s Nirvana, Movidius VPU, Fujitsu DPI, ARM’s DynamicIQ, IBM TrueNorth, and others.
Some Particular Implementations
[00155] We disclose use of federated learning in a variety of heathcare applications that typically involve sensitive, private data.
[00156] One disclosed implementation includes a system for federated learning. The system includes multiple edge devices of end users, coupled to a communication network. The edge devices include a memory that stores program instructions for a federated learner, recorded user data, and a tensor of model parameters of a deep neural network, a“DNN”. The federated learner executes on a processor of the edge device. The federated learner is configured to record end user data, predict characteristics of the end user from the recorded end user data by applying the DNN, and receive updates from the end user that correct the predicted end user characteristics. The federated learner is further configured to perform update training of the DNN using the recorded user data and the corrected user characteristics, thereby producing a modified tensor of updated model parameters and send at least a modified part of the modified tensor to an FL aggregator.
[00157] The system further includes a base model tensor of model parameters for the DNN running on the edge devices, trained to predict characteristics of the end users from the recorded end user data, provided to the edge devices.
[00158] The FL aggregator is coupled to a communication network and includes a federated learner. The federated learner is configured to receive modified tensors from at least some of the edge devices, aggregate the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor, and distribute the new version of the base model tensor to the edge devices. The federator learner can be implemented in the FL aggregator as in-line code, can be implemented in a separate module or some combination of the two coding strategies.
[00159] This system implementation and other systems disclosed optionally include one or more of the following features. System can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
[00160] The recorded end user data can include a picture captured by the edge device, an audio recording of the end user captured by the edge device, or both. When the recorded end user data includes a frontal face picture captured by the edge device, the predicted end user characteristics include age, height and weight. Sex also can be predicted and BMI calculated from a combination of predicted features. When the recorded end user data includes an audio recording of the end user captured by the edge device, with or without a face image, the predicted end user characteristics can include mood.
[00161] On the edge device, a face detector can be applied to determines whether a face appears in the picture, limit update training of a facial interpretation model, avoiding, for instance, training on cat or sunset pictures.
[00162] On the FL aggregator side, the federated learner can be configured to filter out spurious updates by calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor, constructing a distribution of distance measures in an updating cycle and rejecting from aggregation with the current version of outlier modified tensors. That is, production of the new base model version, will not be based on rejected tensors having a distance measure that are outliers from the distribution. An outlier can be determined using a statistical measure such as three standard deviations or the like.
[00163] Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform actions of the system described above. Each of the features discussed in the particular implementation section for other implementations apply equally to this implementation. As indicated above, all the other features are not repeated here and should be considered repeated by reference.
[00164] In other implementations, the technology disclosed presents methods of operating the edge devices, the server or FL aggregator device, or both.
[00165] One method implementation disclosed involves federated learning utilizing computation capability of edge devices. The edge devices used to practice this method include a memory, storing store program instructions for a federated learner, recorded user data and a tensor of model parameters of a deep neural network, a“DNN”. The federated learner executes on a processor of the edge device, and is configured to: record end user data; predict characteristics of the end user from the recorded end user data by applying the DNN; receive updates from the end user that correct the predicted end user characteristics, and perform update training of the DNN using the recorded user data and the corrected user characteristics.
[00166] This method implementation includes sending a current base model tensor of the model parameters to the edge devices and receiving modified tensors from at least some of the edge devices, based on at least user data recorded by the edge devices and corrected user characteristics received by the edge devices from end users. It can involve checking to determine that the modified tensors received apply to the current version of the base model tensor, not to a prior, outdated version. Because updating is an asynchronous process and user behavior is not under the system’s control, it is expected that some users will not participate in a cycle, some edge devices will not receive the current version of the base model tensor, and some edge devices will offer updates to an expired or outdated version of the base model tensor.
[00167] This method further includes aggregating the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor and distributing the new version of the base model tensor to the edge devices. The receiving, aggregating and distributing actions are repeated for at least ten cycles. These actions may be repeated 50 or 100 or 1000 times or more. The cycles of the FL aggregator and its components will repeat more times than most users participate in collecting data and retraining base models.
[00168] Features described above for the system and described through out the application for systems and methods can be combined with this method, cast as it is from the server’s perspective. In the interest of conciseness, not every combination of features is enumerated.
[00169] When the recorded end user data includes a frontal face picture captured by the edge device, the and the predicted end user can include characteristics include age, height and weight. The method can further include constructing an initial current version of the base model from a generic face recognition model, with additional layers added and training applied with ground tmth for the age, height and weight of persons in at least some frontal face pictures. This initial current version is prepared before the edge devices make available any recorded images or corrected user characteristics.
[00170] When the recorded end user data includes an audio recording of the end user captured by the edge device, the method can include predicting the end user’s mood.
[00171] The method can further include filtering before aggregating, such as by calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor and constructing a distribution of distance measures in an updating cycle. As described in more detail above, this distribution can be used to reject at least one modified tensor from aggregation, as an outlier from the distribution. [00172] Another method implementation of the technology disclosed is presented from the perspective of an edge device contributing to federated learning. The edge device cooperates with an FL aggregator that is configured to receive modified tensors from a plurality of edge devices, aggregate the modified tensors with a current version of a base model tensor by federated learning to produce a new version of the base model tensor, and distribute the new version of the base model tensor to the edge devices.
[00173] This method includes the edge device receiving a version of the base model, including a tensor of model parameters of a deep neural network, a“DNN” and recording end user data. The method includes predicting characteristics of the end user from the recorded end user data by applying the DNN and causing display of the predicted characteristics to the end user. Responsive to the display, the method includes receiving updates from the end user that correct the predicted end user characteristics. The edge device performs update training of the DNN on the edge device, using the recorded user data and the corrected user characteristics, to produce a modified tensor of updated model parameters. The method further includes sending at least a modified part of the modified tensor to an FL aggregator and receiving new version of the base model tensor from the FL aggregator, after the FL aggregator aggregated modified tensors from a plurality of edge devices with the base model by federated learning. The recording, predicting, receiving updates, performing, and sending actions are repeated by the edge device in at least five cycles. The actions can be repeated in at least 10 or 50 cycles or even 100 cycles. An edge device, such as a mobile phone carried by an end user, is unlikely to participate in all of the cycles managed by the FL aggregator, unless data is being relayed automatically to and processed by the edge device, or an app collects data from the user on a daily basis. Examples of personal devices that are capable of automatically relaying data to a personal device include a blood glucose monitor, a pace maker, a heart rate monitor, an exercise monitor, a fall monitor, a pulse oximeter, a scale (with or without body fat estimation), and a breathing assistance device. Use of such devices can result in more frequent participation by the edge device in training cycles, even in 1,000 cycles or more. Examples of applications that collect data from the user on a daily basis include diet or consumption logging applications, exercise applications and meditation applications.
[00174] Features described above for the system and described through out the application for systems and methods can be combined with this method, cast as it is from the edge device’s perspective. In the interest of conciseness, not every combination of features is enumerated.
[00175] When the recorded end user data includes a frontal face picture captured by the edge device, the predicted end user characteristics can include age, height and weight. When the recorded end user data includes an audio recording of the end user, with our without a face image, the predicted end user characteristics can include mood.
[00176] The method can further include filtering of images before using the images for update training. A face detector can be applied to determine whether a face appears in the picture, before performing update training using the picture. This can prevent training against pictures of kittens and sunsets, when the system is designed to interpret human faces.
[00177] The technology disclosed can be practiced as a system, method, or article of manufacture. One or more features of an implementation can be combined with the base implementation. Implementations that are not mutually exclusive are taught to be combinable. One or more features of an implementation can be combined with other implementations. This disclosure periodically reminds the user of these options. Omission from some implementations of recitations that repeat these options should not be taken as limiting the combinations taught in the preceding sections - these recitations are hereby incorporated forward by reference into each of the following implementations. [00178] One disclosed implementation may include a tangible non-volatile computer readable storage media loaded with computer program instructions that, when executed on a server, cause a computer to implement any of the methods described earlier.
[00179] Another disclosed implementation may include a server system including one or more processors and memory coupled to the processors, the memory loaded with instructions that, when executed on the processors, cause the server system to perform any of the methods described earlier.
[00180] This system implementation and other systems disclosed optionally can also include features described in connection with methods disclosed. In the interest of conciseness, alternative combinations of system features are not individually enumerated. Features applicable to systems, methods, and articles of manufacture are not repeated for each statutory class set of base features. The reader will understand how features identified in this section can readily be combined with base features in other statutory classes.
[00181] While the technology disclosed is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the innovation and the scope of the following claims.

Claims

CLAIMS What is claimed:
1. A system for federated learning, comprising:
multiple edge devices of end users, coupled to a communication network, each comprising
a memory, that stores program instructions for a federated learner, recorded user data and a tensor of model parameters of a deep neural network, a“DNN”; and
the federated learner, that executes on a processor of the edge device, configured to:
record end user data,
predict characteristics of the end user from the recorded end user data by applying the DNN, receive updates from the end user that correct the predicted end user characteristics,
perform update training of the DNN using the recorded user data and the corrected user characteristics, thereby producing a modified tensor of updated model parameters, and
send at least a modified part of the modified tensor to an FL aggregator;
a base model tensor of model parameters for the DNN running on the edge devices, trained to predict characteristics of the end users from the recorded end user data, provided to the edge devices;
the FL aggregator, coupled to a communication network, comprising a federated learner, configured to
receive modified tensors from at least some of the edge devices,
aggregate the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor, and
distribute the new version of the base model tensor to the edge devices.
2. The system of claim 1, wherein the recorded end user data is a picture captured by the edge device.
3. The system of any of the preceding claims, wherein the recorded end user data includes an audio recording of the end user captured by the edge device.
4. The system of any of the preceding claims, wherein the recorded end user data includes a frontal face picture captured by the edge device and the predicted end user characteristics include age, height and weight.
5. The system of any of the preceding claims, wherein the recorded end user data includes an audio recording of the end user captured by the edge device and the predicted end user characteristics include mood.
6. The system of any of the preceding claims, wherein the recorded end user data includes a frontal face picture and an audio recording of the end user captured by the edge device and the predicted end user characteristics include mood.
7. The system of any of the preceding claims, wherein the recorded end user data includes a picture captured by the edge device, each edge device further comprising a face detector that determines whether a face appears in the picture and proceeds to perform update training only upon detection of a face in the picture.
8. The system of any of the preceding claims, the federated learner further configured to: calculate a distance measure that compares each modified tensor received from the edge devices to the base model tensor;
constmct a distribution of distance measures in an updating cycle; and
reject at least one modified tensor from aggregation with the current version of the base model tensor to produce the new version, based on the rejected tensor having a distance measure that is an outlier from the distribution.
9. A method for federated learning utilizing computation capability of edge devices that include:
a memory, storing store program instmctions for a federated learner, recorded user data and a tensor of model parameters of a deep neural network, a“DNN”; and
the federated learner, executing on a processor of the edge device, configured to:
record end user data,
predict characteristics of the end user from the recorded end user data by applying the DNN, receive updates from the end user that correct the predicted end user characteristics, and
perform update training of the DNN using the recorded user data and the corrected user characteristics; the method comprising:
sending a current base model tensor of the model parameters to the edge devices;
receiving modified tensors from at least some of the edge devices, based on at least user data recorded by the edge devices and corrected user characteristics received by the edge devices from end users;
aggregating the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor;
distributing the new version of the base model tensor to the edge devices; and
repeating the receiving, aggregating and distributing actions in at least ten cycles.
10. The method of claim 9, wherein the recorded end user data includes a frontal face picture captured by the edge device and the predicted end user characteristics include age, height and weight, further including constructing an initial current version of the base model from a generic face recognition model with additional layers added and training applied with ground truth for the age, height and weight of persons in at least some frontal face pictures, before recorded images and corrected user characteristics are available from the edge devices.
11. The method of any of claims 9-10, wherein the recorded end user data includes an audio recording of the end user captured by the edge device and the predicted end user characteristics include mood.
12. The method of any of claims 9-11, further including:
calculating a distance measure that compares each modified tensor received from the edge devices to the base model tensor;
constructing a distribution of distance measures in an updating cycle; and
rejecting at least one modified tensor from aggregation with the current version of the base model tensor to produce the new version, based on the rejected tensor having a distance measure that is an outlier from the distribution.
13. A method of contributing to federated learning, FL, applied by an FL aggregator utilizing computation capability of an edge device, wherein the FL aggregator is configured to:
receive modified tensors from a plurality of edge devices,
aggregate the modified tensors with a current version of a base model tensor by federated learning to produce a new version of the base model tensor, and
distribute the new version of the base model tensor to the edge devices;
the method comprising the edge device:
receiving a version of the base model, including a tensor of model parameters of a deep neural network, a “DNN”;
recording end user data;
predicting characteristics of the end user from the recorded end user data by applying the DNN and causing display of the predicted characteristics to the end user;
receiving updates from the end user that correct the predicted end user characteristics;
performing update training of the DNN on the edge device using the recorded user data and the corrected user characteristics, thereby producing a modified tensor of updated model parameters;
sending at least a modified part of the modified tensor to an FL aggregator;
receiving new version of the base model tensor from the FL aggregator, after the FL aggregator aggregated modified tensors from a plurality of edge devices with the base model by federated learning; and repeating the recording, predicting, receiving updates, performing, and sending actions in at least five cycles.
14. The method of claim 13, wherein the recorded end user data includes a frontal face picture captured by the edge device and the predicted end user characteristics include age, height and weight.
15. The method of any of claims 13-14, wherein the recorded end user data includes an audio recording of the end user captured by the edge device and the predicted end user characteristics include mood.
16. The method of any of claims 13-15, wherein the recorded end user data includes a frontal face picture and an audio recording of the end user captured by the edge device and the predicted end user characteristics include mood.
17. The method of any of claims 13-16, wherein the recorded end user data includes a picture captured by the edge device, each edge device further comprising a face detector that determines whether a face appears in the picture and proceeds to perform update training only upon detection of a face in the picture.
18. Software embodying the system or method of any of claims 1 through 17, including instructions that, when loaded onto one or more devices, create the device or system claimed or carry out the method claimed.
19. Non-transitory memory including computer instructions corresponding to the device, system or method of any of claims 1 through 17, including instructions that, when loaded onto one or more devices, create the device or system claimed or carry out the method claimed.
PCT/US2020/022200 2019-03-11 2020-03-11 System and method with federated learning model for medical research applications WO2020185973A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201962816880P 2019-03-11 2019-03-11
US62/816,880 2019-03-11
US201962942644P 2019-12-02 2019-12-02
US62/942,644 2019-12-02
US16/816,153 2020-03-11
US16/816,153 US11853891B2 (en) 2019-03-11 2020-03-11 System and method with federated learning model for medical research applications

Publications (1)

Publication Number Publication Date
WO2020185973A1 true WO2020185973A1 (en) 2020-09-17

Family

ID=72423741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/022200 WO2020185973A1 (en) 2019-03-11 2020-03-11 System and method with federated learning model for medical research applications

Country Status (2)

Country Link
US (1) US11853891B2 (en)
WO (1) WO2020185973A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033652A (en) * 2021-03-23 2021-06-25 电子科技大学 Image recognition system and method based on block chain and federal learning
CN113206887A (en) * 2021-05-08 2021-08-03 武汉理工大学 Method for accelerating federal learning aiming at data and equipment isomerism under edge calculation
CN113392919A (en) * 2021-06-24 2021-09-14 长沙理工大学 Federal attention DBN cooperative detection system based on client selection
US20220060235A1 (en) * 2020-08-18 2022-02-24 Qualcomm Incorporated Federated learning for client-specific neural network parameter generation for wireless communication
WO2022068575A1 (en) * 2020-09-30 2022-04-07 腾讯科技(深圳)有限公司 Calculation method for vertical federated learning, apparatus, device, and medium
EP4080388A1 (en) 2021-04-19 2022-10-26 Privately SA Multimodal, dynamic, privacy preserving age and attribute estimation and learning methods and systems

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11838034B2 (en) * 2017-10-30 2023-12-05 AtomBeam Technologies Inc. System and method for blockchain data compaction
US11763950B1 (en) 2018-08-16 2023-09-19 Clarify Health Solutions, Inc. Computer network architecture with machine learning and artificial intelligence and patient risk scoring
US11625789B1 (en) 2019-04-02 2023-04-11 Clarify Health Solutions, Inc. Computer network architecture with automated claims completion, machine learning and artificial intelligence
US11308618B2 (en) 2019-04-14 2022-04-19 Holovisions LLC Healthy-Selfie(TM): a portable phone-moving device for telemedicine imaging using a mobile phone
US12014500B2 (en) 2019-04-14 2024-06-18 Holovisions LLC Healthy-Selfie(TM): methods for remote medical imaging using a conventional smart phone or augmented reality eyewear
US11621085B1 (en) 2019-04-18 2023-04-04 Clarify Health Solutions, Inc. Computer network architecture with machine learning and artificial intelligence and active updates of outcomes
US11238469B1 (en) 2019-05-06 2022-02-01 Clarify Health Solutions, Inc. Computer network architecture with machine learning and artificial intelligence and risk adjusted performance ranking of healthcare providers
US11270785B1 (en) 2019-11-27 2022-03-08 Clarify Health Solutions, Inc. Computer network architecture with machine learning and artificial intelligence and care groupings
KR102482374B1 (en) * 2019-12-10 2022-12-29 한국전자통신연구원 Device for ensembling data received from prediction devices and operating method thereof
US12088719B2 (en) * 2020-01-23 2024-09-10 Subash Sundaresan Method and system for incremental training of machine learning models on edge devices
CN112241537B (en) * 2020-09-23 2023-02-10 易联众信息技术股份有限公司 Longitudinal federated learning modeling method, system, medium and equipment
CN112201342B (en) * 2020-09-27 2024-04-26 博雅正链(北京)科技有限公司 Medical auxiliary diagnosis method, device, equipment and storage medium based on federal learning
US11893030B2 (en) * 2020-09-29 2024-02-06 Cerner Innovation, Inc. System and method for improved state identification and prediction in computerized queries
WO2022073765A1 (en) * 2020-10-08 2022-04-14 Koninklijke Philips N.V. Decentralized training method suitable for disparate training sets
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112162959B (en) * 2020-10-15 2023-10-10 深圳技术大学 Medical data sharing method and device
CN112148437B (en) * 2020-10-21 2022-04-01 深圳致星科技有限公司 Calculation task acceleration processing method, device and equipment for federal learning
US12039012B2 (en) * 2020-10-23 2024-07-16 Sharecare AI, Inc. Systems and methods for heterogeneous federated transfer learning
CN112231768B (en) * 2020-10-27 2021-06-18 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN112231756B (en) * 2020-10-29 2022-05-27 湖南科技学院 FL-EM-GMM medical user privacy protection method and system
CN114529005A (en) * 2020-11-03 2022-05-24 华为技术有限公司 Machine learning model management method, device and system
CN112364908B (en) * 2020-11-05 2022-11-11 浙江大学 Longitudinal federal learning method oriented to decision tree
WO2022100861A1 (en) * 2020-11-16 2022-05-19 Huawei Technologies Co., Ltd. Device and method for classifying input data
CN113807157B (en) * 2020-11-27 2024-07-19 京东科技控股股份有限公司 Method, device and system for training neural network model based on federal learning
EP4009220A1 (en) * 2020-12-03 2022-06-08 Fujitsu Limited Method and apparatus for decentralized supervised learning in nlp applications
CN113724117A (en) * 2020-12-28 2021-11-30 京东城市(北京)数字科技有限公司 Model training method and device for house abnormal use recognition
CN112700010B (en) * 2020-12-30 2024-08-23 深圳前海微众银行股份有限公司 Feature completion method, device, equipment and storage medium based on federal learning
CN112669216B (en) * 2021-01-05 2022-04-22 华南理工大学 Super-resolution reconstruction network of parallel cavity new structure based on federal learning
CN112686385B (en) * 2021-01-07 2023-03-07 中国人民解放军国防科技大学 Multi-site three-dimensional image oriented federal deep learning method and system
CN112768056A (en) * 2021-01-14 2021-05-07 新智数字科技有限公司 Disease prediction model establishing method and device based on joint learning framework
US20220237508A1 (en) * 2021-01-28 2022-07-28 Kiarash SHALOUDEGI Servers, methods and systems for second order federated learning
US11017322B1 (en) 2021-01-28 2021-05-25 Alipay Labs (singapore) Pte. Ltd. Method and system for federated learning
CN112765898B (en) * 2021-01-29 2024-05-10 上海明略人工智能(集团)有限公司 Multi-task joint training model method, system, electronic equipment and storage medium
US20220261697A1 (en) * 2021-02-15 2022-08-18 Devron Corporation Federated learning platform and methods for using same
CN112560105B (en) * 2021-02-19 2021-09-07 支付宝(杭州)信息技术有限公司 Joint modeling method and device for protecting multi-party data privacy
US11711348B2 (en) * 2021-02-22 2023-07-25 Begin Ai Inc. Method for maintaining trust and credibility in a federated learning environment
CN112949760B (en) * 2021-03-30 2024-05-10 平安科技(深圳)有限公司 Model precision control method, device and storage medium based on federal learning
CN113222211B (en) * 2021-03-31 2023-12-12 中国科学技术大学先进技术研究院 Method and system for predicting pollutant emission factors of multi-region diesel vehicle
US20220366220A1 (en) * 2021-04-29 2022-11-17 Nvidia Corporation Dynamic weight updates for neural networks
CN113240018B (en) * 2021-05-19 2023-02-03 哈尔滨医科大学 Hand-drawn graph classification method and system based on error back propagation algorithm
US20220384040A1 (en) * 2021-05-27 2022-12-01 Disney Enterprises Inc. Machine Learning Model Based Condition and Property Detection
WO2022265948A1 (en) * 2021-06-14 2022-12-22 Meta Platforms, Inc. Systems and methods for machine learning serving
CN113516249B (en) * 2021-06-18 2023-04-07 重庆大学 Federal learning method, system, server and medium based on semi-asynchronization
CN113468521B (en) * 2021-07-01 2022-04-05 哈尔滨工程大学 Data protection method for federal learning intrusion detection based on GAN
CN113469371B (en) * 2021-07-01 2023-05-02 建信金融科技有限责任公司 Federal learning method and apparatus
CN113343280B (en) * 2021-07-07 2024-08-23 时代云英(深圳)科技有限公司 Private cloud algorithm model generation method based on joint learning
US20230016827A1 (en) * 2021-07-08 2023-01-19 Rakuten Mobile, Inc. Adaptive offloading of federated learning
CN113571203B (en) * 2021-07-19 2024-01-26 复旦大学附属华山医院 Multi-center federal learning-based brain tumor prognosis survival prediction method and system
US12081541B2 (en) 2021-08-05 2024-09-03 Paypal, Inc. Device-side federated machine learning computer system architecture
CN113673476B (en) * 2021-09-02 2023-11-07 京东科技控股股份有限公司 Face recognition model training method and device, storage medium and electronic equipment
US11934555B2 (en) 2021-09-28 2024-03-19 Siemens Healthineers Ag Privacy-preserving data curation for federated learning
CN113947210B (en) * 2021-10-08 2024-05-10 东北大学 Cloud edge end federation learning method in mobile edge calculation
CN114266293A (en) * 2021-12-07 2022-04-01 浙江网商银行股份有限公司 Federated learning method and federated learning system
CN114429223B (en) * 2022-01-26 2023-11-07 上海富数科技有限公司 Heterogeneous model building method and device
CN114202397B (en) * 2022-02-17 2022-05-10 浙江君同智能科技有限责任公司 Longitudinal federal learning backdoor defense method based on neuron activation value clustering
CN114638357B (en) * 2022-02-28 2024-05-31 厦门大学 Edge computing system based on automatic federal learning and learning method thereof
CN114785608B (en) * 2022-05-09 2023-08-15 中国石油大学(华东) Industrial control network intrusion detection method based on decentralised federal learning
CN117093859A (en) * 2022-05-10 2023-11-21 中国移动通信有限公司研究院 Model training or reasoning method and device and communication equipment
JP2023173559A (en) 2022-05-26 2023-12-07 株式会社日立製作所 Analysis device, analysis method, and analysis program
CN115148379B (en) * 2022-06-06 2024-05-31 电子科技大学 System and method for realizing intelligent health monitoring of solitary old people by utilizing edge calculation
CN117150369B (en) * 2023-10-30 2024-01-26 恒安标准人寿保险有限公司 Training method of overweight prediction model and electronic equipment
CN117556381B (en) * 2024-01-04 2024-04-02 华中师范大学 Knowledge level depth mining method and system for cross-disciplinary subjective test questions
US12079230B1 (en) 2024-01-31 2024-09-03 Clarify Health Solutions, Inc. Computer network architecture and method for predictive analysis using lookup tables as prediction models

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130046761A1 (en) * 2010-01-08 2013-02-21 Telefonaktiebolaget L M Ericsson (Publ) Method and Apparatus for Social Tagging of Media Files
US20150324686A1 (en) * 2014-05-12 2015-11-12 Qualcomm Incorporated Distributed model learning
US20180289334A1 (en) * 2017-04-05 2018-10-11 doc.ai incorporated Image-based system and method for predicting physiological parameters
JP2019097904A (en) 2017-12-04 2019-06-24 医療法人智徳会 Sleep respiration securing tube

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920644A (en) 1996-06-06 1999-07-06 Fujitsu Limited Apparatus and method of recognizing pattern through feature selection by projecting feature vector on partial eigenspace
EP1693801A3 (en) 2005-02-16 2006-11-29 David Schaufele Biometric-based systems and methods for identity verification
EP1910977B1 (en) 2005-07-29 2016-11-30 Telecom Italia S.p.A. Automatic biometric identification based on face recognition and support vector machines
US8112293B2 (en) 2006-03-24 2012-02-07 Ipventure, Inc Medical monitoring system
JP5150542B2 (en) 2009-03-26 2013-02-20 株式会社東芝 Pattern recognition apparatus, pattern recognition method, and program
US8543428B1 (en) 2009-12-10 2013-09-24 Humana Inc. Computerized system and method for estimating levels of obesity in an insured population
US10194800B2 (en) 2010-01-08 2019-02-05 Koninklijke Philips N.V. Remote patient management system adapted for generating an assessment content element
US8547232B2 (en) 2010-05-28 2013-10-01 Nokia Corporation Method and apparatus for transferring data via radio frequency (RF) memory tags
US8655029B2 (en) 2012-04-10 2014-02-18 Seiko Epson Corporation Hash-based face recognition system
US10593426B2 (en) 2012-09-13 2020-03-17 Parkland Center For Clinical Innovation Holistic hospital patient care and management system and method for automated facial biological recognition
US9232247B2 (en) 2012-09-26 2016-01-05 Sony Corporation System and method for correlating audio and/or images presented to a user with facial characteristics and expressions of the user
CN104871164B (en) 2012-10-24 2019-02-05 南托米克斯有限责任公司 Processing and the genome browser system that the variation of genomic sequence data nucleotide is presented
WO2014115362A1 (en) 2013-01-28 2014-07-31 日本電気株式会社 Discriminator learning device and discriminator learning method
EP2854059A3 (en) 2013-09-27 2015-07-29 Orbicule BVBA Method for storage and communication of personal genomic or medical information
JP6156126B2 (en) 2013-12-19 2017-07-05 富士通株式会社 SEARCH METHOD, SEARCH PROGRAM, AND SEARCH DEVICE
JP6269186B2 (en) 2014-03-07 2018-01-31 富士通株式会社 Classification method, classification device, and classification program
US10430985B2 (en) 2014-03-14 2019-10-01 Magic Leap, Inc. Augmented reality systems and methods utilizing reflections
JP6362085B2 (en) 2014-05-21 2018-07-25 キヤノン株式会社 Image recognition system, image recognition method and program
US20160253549A1 (en) 2015-02-27 2016-09-01 Leo Ramic Estimating personal information from facial features
US9839376B1 (en) 2015-04-20 2017-12-12 Massachusetts Mutual Life Insurance Systems and methods for automated body mass index calculation to determine value
US10452989B2 (en) 2015-05-05 2019-10-22 Kyndi, Inc. Quanton representation for emulating quantum-like computation on classical processors
US9792492B2 (en) 2015-07-07 2017-10-17 Xerox Corporation Extracting gradient features from neural networks
US11736756B2 (en) 2016-02-10 2023-08-22 Nitin Vats Producing realistic body movement using body images
US11501139B2 (en) 2017-05-03 2022-11-15 Intel Corporation Scaling half-precision floating point tensors for training deep neural networks
US20180330061A1 (en) 2017-05-10 2018-11-15 Pinscriptive, Inc. Treatment Recommendation System And Method
WO2019018732A1 (en) * 2017-07-21 2019-01-24 Pearson Education, Inc. Systems and methods for automated feature-based alert triggering
US12033079B2 (en) 2018-02-08 2024-07-09 Cognizant Technology Solutions U.S. Corporation System and method for pseudo-task augmentation in deep multitask learning
US11789699B2 (en) 2018-03-07 2023-10-17 Private Identity Llc Systems and methods for private authentication with helper networks
US10938852B1 (en) 2020-08-14 2021-03-02 Private Identity Llc Systems and methods for private authentication with helper networks
US11216541B2 (en) 2018-09-07 2022-01-04 Qualcomm Incorporated User adaptation for biometric authentication
JP7205148B2 (en) * 2018-10-04 2023-01-17 カシオ計算機株式会社 ROBOT, CONTROL METHOD AND PROGRAM
US10764656B2 (en) 2019-01-04 2020-09-01 International Business Machines Corporation Agglomerated video highlights with custom speckling
US10423773B1 (en) * 2019-04-12 2019-09-24 Coupang, Corp. Computerized systems and methods for determining authenticity using micro expressions
US11258813B2 (en) 2019-06-27 2022-02-22 Intel Corporation Systems and methods to fingerprint and classify application behaviors using telemetry
US20210057056A1 (en) 2019-08-19 2021-02-25 Apricity Health LLC System and Method for Developing Artificial Intelligent Digital Therapeutics with Drug Therapy for Precision and Personalized Care Pathway
US20210110417A1 (en) * 2019-10-11 2021-04-15 Live Nation Entertainment, Inc. Dynamic bidding determination using machine-learning models
CN114930347A (en) * 2020-02-03 2022-08-19 英特尔公司 System and method for distributed learning of wireless edge dynamics
US11321447B2 (en) 2020-04-21 2022-05-03 Sharecare AI, Inc. Systems and methods for generating and using anthropomorphic signatures to authenticate users
EP3940604A1 (en) * 2020-07-09 2022-01-19 Nokia Technologies Oy Federated teacher-student machine learning
US11968541B2 (en) * 2020-09-08 2024-04-23 Qualcomm Incorporated Spectrum sharing with deep reinforcement learning (RL)
US11551109B2 (en) 2020-12-16 2023-01-10 Ro5 Inc. System and method for patient health data prediction using knowledge graph analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130046761A1 (en) * 2010-01-08 2013-02-21 Telefonaktiebolaget L M Ericsson (Publ) Method and Apparatus for Social Tagging of Media Files
US20150324686A1 (en) * 2014-05-12 2015-11-12 Qualcomm Incorporated Distributed model learning
US20180289334A1 (en) * 2017-04-05 2018-10-11 doc.ai incorporated Image-based system and method for predicting physiological parameters
JP2019097904A (en) 2017-12-04 2019-06-24 医療法人智徳会 Sleep respiration securing tube

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"De-risk Go/No Go Product Development Decisions by Reusing Patient Trial Data: MEDS Synthetic Control Arms & Synthetic Control Data", 2019, MEDIDATA
GOLDSACK, SYNTEHTIC CONTROL ARMS CAN SAVE TIME AND MONEY IN CLINICAL TRIALS, 5 February 2019 (2019-02-05)
H. BRENDAN MCMAHAN ET AL: "Communication-Efficient Learning of Deep Networks from Decentralized Data", 28 February 2017 (2017-02-28), pages 1 - 11, XP055538798, Retrieved from the Internet <URL:https://arxiv.org/pdf/1602.05629.pdf> [retrieved on 20190107] *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220060235A1 (en) * 2020-08-18 2022-02-24 Qualcomm Incorporated Federated learning for client-specific neural network parameter generation for wireless communication
US11909482B2 (en) * 2020-08-18 2024-02-20 Qualcomm Incorporated Federated learning for client-specific neural network parameter generation for wireless communication
WO2022068575A1 (en) * 2020-09-30 2022-04-07 腾讯科技(深圳)有限公司 Calculation method for vertical federated learning, apparatus, device, and medium
CN113033652A (en) * 2021-03-23 2021-06-25 电子科技大学 Image recognition system and method based on block chain and federal learning
CN113033652B (en) * 2021-03-23 2023-03-24 电子科技大学 Image recognition system and method based on block chain and federal learning
EP4080388A1 (en) 2021-04-19 2022-10-26 Privately SA Multimodal, dynamic, privacy preserving age and attribute estimation and learning methods and systems
CN113206887A (en) * 2021-05-08 2021-08-03 武汉理工大学 Method for accelerating federal learning aiming at data and equipment isomerism under edge calculation
CN113392919A (en) * 2021-06-24 2021-09-14 长沙理工大学 Federal attention DBN cooperative detection system based on client selection
CN113392919B (en) * 2021-06-24 2023-04-28 长沙理工大学 Deep belief network DBN detection method of attention mechanism

Also Published As

Publication number Publication date
US11853891B2 (en) 2023-12-26
US20200293887A1 (en) 2020-09-17

Similar Documents

Publication Publication Date Title
US11853891B2 (en) System and method with federated learning model for medical research applications
US20210225463A1 (en) System and Method with Federated Learning Model for Medical Research Applications
US12028452B2 (en) Establishing a trained machine learning classifier in a blockchain network
US11694122B2 (en) Distributed machine learning systems, apparatus, and methods
US20220344049A1 (en) Decentralized artificial intelligence (ai)/machine learning training system
US11829510B2 (en) Secure messaging in a machine learning blockchain network
US11544535B2 (en) Graph convolutional networks with motif-based attention
US20210125732A1 (en) System and method with federated learning model for geotemporal data associated medical prediction applications
Liu et al. A collaborative privacy-preserving deep learning system in distributed mobile environment
Qayyum et al. Making federated learning robust to adversarial attacks by learning data and model association
Rashid et al. Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and MapReduce perspectives
JP2020149656A (en) System having combined learning model for medical research applications, and method
CN116340793A (en) Data processing method, device, equipment and readable storage medium
Supriya et al. A Hybrid Federated Learning Model for Insurance Fraud Detection
CN115879564A (en) Adaptive aggregation for joint learning
Atitallah Intelligent Microservices-based Approach to Support Data Analytics for IoT Applications
Ganapathy An Introduction to Federated Learning and Its Analysis
Bhavani et al. An iterative genetic algorithm based source code plagiarism detection approach using NCRR similarity measure
Bucur et al. Federated Learning and Explainable AI in Healthcare
Girosi et al. Using Artificial Intelligence to Generate Synthetic Health Data
Govindwar et al. An Approach of Federated Learning in Artificial Intelligence for Healthcare Analysis
RAO et al. Protected Shot-Based Federated Learning for Facial Expression Recognition
Jafarigol Uncovering the Potential of Federated Learning: Addressing Algorithmic and Data-driven Challenges under Privacy Restrictions
Mpembele Differential Privacy-Enabled Federated Learning for 5G-Edge-Cloud Framework in Smart Healthcare
Lakhanotra et al. Trusted Federated Learning Solutions for Internet of Medical Things

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20718426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20718426

Country of ref document: EP

Kind code of ref document: A1