KR20230005900A

KR20230005900A - Systems and methods for providing a private multimodal artificial intelligence platform

Info

Publication number: KR20230005900A
Application number: KR1020227040910A
Authority: KR
Inventors: 그렉 스톰; 가립 가리비; 리디만 다스
Original assignee: 트리플블라인드, 인코퍼레이티드
Priority date: 2020-05-06
Filing date: 2021-05-06
Publication date: 2023-01-10
Also published as: WO2021226302A1; EP4147178A1; JP2023524289A; CA3177679A1; EP4147178A4

Abstract

개인 다중 모드 인공 지능 플랫폼을 제공하기 위한 시스템 및 방법이 개시된다. 방법은 신경망을 제1 클라이언트 측 네트워크, 제2 클라이언트 측 네트워크 및 서버 측 네트워크로 분할하고 제1 클라이언트 측 네트워크를 제1 클라이언트에 발송하는 단계를 포함한다. 제1 클라이언트 측 네트워크는 제1 클라이언트로부터의 제1 데이터를 처리하고, 제1 데이터는 제1 유형을 갖는다. 본 방법은 제2 클라이언트 측 네트워크를 제2 클라이언트에 발송하는 단계를 포함한다. 제2 클라이언트 측 네트워크는 제2 클라이언트로부터의 제2 데이터를 처리하고, 제2 데이터는 제2 유형을 갖는다. 제1 유형과 제2 유형은 공통 연관을 갖는다. 순방향 및 역전파는 클라이언트 측 네트워크들과 다른 클라이언트 측 네트워크들의 이질적인 데이터 유형들과 서버 측 네트워크 사이에서 발생하여 신경망을 훈련시킨다.Systems and methods for providing a personal multimodal artificial intelligence platform are disclosed. The method includes dividing the neural network into a first client-side network, a second client-side network and a server-side network and sending the first client-side network to the first client. The first client-side network processes first data from the first client, and the first data has a first type. The method includes forwarding the second client-side network to the second client. The second client-side network processes second data from the second client, and the second data has a second type. The first type and the second type have a common association. Forward and back propagation takes place between client-side networks and heterogeneous data types of other client-side networks and the server-side network to train the neural network.

Description

Systems and methods for providing a private multimodal artificial intelligence platform

우선권 주장priority claim

본 출원은 2020년 5월 6일에 출원된 미국 가출원 번호 63/020,930(문서 번호 213-0104P)에 대한 우선권을 주장하며, 그 내용은 본 명세서에 참조로 포함된다. 본 출원은 2019년 12월 13일에 출원된 미국 가출원 번호 제62/948,105호에 대한 우선권을 주장하는 2020년 3월 24일자로 출원된 미국 출원 번호 16/828,085(문서 번호 213-0100)의 일부 계속출원이며, 그 내용은 본 명세서에 참조로 포함된다. 본 출원은 2019년 12월 13일에 출원된 미국 가출원 번호 제62/948,105호에 대한 우선권을 주장하는 2020년 3월 24일에 출원된 미국 출원 번호 16/828,216(문서 번호 213-0101)의 일부 계속출원이며, 그 내용은 본 명세서에 참조로 포함된다. 본 출원은 2019년 12월 13일에 출원된 미국 가출원 번호 제62/948,105호에 대한 우선권을 주장하는 현재 2021년 2월 16일자로 등록된 미국 특허 번호 제10,924,460호인, 2020년 3월 24일에 출원된 미국 출원 번호 제16/828,354(213-0102)의 계속출원인, 2021년 2월 16일에 출원된 미국 출원 번호 제17/176,530(213-0102-CON)호의 일부 계속출원이며, 이는 이제 그 내용이 참조로 본 명세서에 포함된다. 본 출원은 2019년 12월 13일에 출원된 미국 가출원 번호 제62/948,105호에 대한 우선권을 주장하는 2020년 3월 24일에 출원된 미국 출원 번호 제16/828,420(문서 번호 213-0103)호의 일부 계속출원이며, 그 내용은 본 명세서에 참조로 포함된다.This application claims priority to U.S. Provisional Application No. 63/020,930, filed on May 6, 2020 (Document No. 213-0104P), the contents of which are incorporated herein by reference. This application is a part of U.S. Application No. 16/828,085, filed on March 24, 2020 (Document No. 213-0100), which claims priority to U.S. Provisional Application No. 62/948,105, filed on December 13, 2019. It is a continuation application, the contents of which are incorporated herein by reference. This application is part of US Application No. 16/828,216, filed on March 24, 2020 (Document No. 213-0101), which claims priority to US Provisional Application No. 62/948,105, filed on December 13, 2019. It is a continuation application, the contents of which are incorporated herein by reference. This application is filed on March 24, 2020, currently filed on February 16, 2021, U.S. Patent No. 10,924,460, which claims priority to U.S. Provisional Application No. 62/948,105, filed on December 13, 2019. It is a continuation-in-part of U.S. Application No. 17/176,530 (213-0102-CON) filed February 16, 2021, which is now a continuation-in-part of filed U.S. Application No. 16/828,354 (213-0102) The content is incorporated herein by reference. [0001] This application claims priority to U.S. Application Serial No. 16/828,420, filed on March 24, 2020 (Document No. 213-0103), which claims priority to U.S. Provisional Application No. 62/948,105, filed on December 13, 2019. It is a continuation-in-part application, the contents of which are incorporated herein by reference.

기술 분야technical field

본 개시는 일반적으로 신경망들 훈련에 관한 것으로 다양한 소스들로부터의 훈련 데이터가 발견 가능하지 않도록 보호하는 방식으로 신경망들 또는 다른 훈련된 모델들을 훈련 및 전개하기 위한 새로운 기술들을 소개한다.This disclosure relates generally to training neural networks and introduces new techniques for training and deploying neural networks or other trained models in a manner that protects training data from various sources from being discoverable.

신경망들 훈련에 대한 기존 접근법들이 존재하며 연합 훈련 접근법(federated training approach) 또는 중앙 집중식 훈련 접근법(centralized training approach)을 사용한다. 신경망들 훈련에 대한 기존 접근법 각각은 다른 클라이언트들로부터 받은 데이터를 기반으로 한다. 이 맥락에서 데이터를 공유하는 프로세스로 인해 데이터가 누출되거나 발견 가능하게 될 수 있다.Existing approaches to training neural networks exist and use either a federated training approach or a centralized training approach. Each of the existing approaches to training neural networks is based on data received from different clients. The process of sharing data in this context can cause data to leak or become discoverable.

예를 들어, 의료 맥락에서 딥 러닝 또는 머신 러닝은 진단들에 의미가 있는 훈련된 모델들에서 충분한 정확도를 생성하기 위해 대규모 데이터세트들을 요구할 수 있다. 이러한 데이터는 여러 환자들 또는 환자 범주들에 대한 X-선들 또는 MRI들에 대한 데이터가 포함할 수 있다. 이러한 모델들을 훈련하는 한 가지 접근법은 원시 데이터를 풀링한 다음 분석가가 풀링된 데이터의 중앙 저장소에 액세스하여 대규모 데이터세트에 대해 머신 러닝을 수행할 수 있다는 것이다. 그러나 이 접근법에는 윤리적 이슈들과 프라이버시 이슈들뿐만 아니라 단일 지점 실패 및 보관 요구 사항들과 같은 기타 이슈들이 있다.For example, deep learning or machine learning in the medical context may require large datasets to produce sufficient accuracy in trained models that are meaningful for diagnoses. Such data may include data for X-rays or MRIs for different patients or categories of patients. One approach to training these models is to pool the raw data, then allow analysts to access a central repository of the pooled data to perform machine learning on large datasets. However, there are ethical and privacy issues with this approach as well as other issues such as single points of failure and archiving requirements.

본 개시의 전술한 및 기타 이점 및 피쳐가 획득될 수 있는 방식을 설명하기 위해, 위에서 간략하게 설명된 원리의 보다 특정한 설명은 첨부된 도면에 예시된 특정 실시예를 참조하여 제공될 것이다. 이들 도면은 본 개시의 예시적인 실시예만을 도시하고 따라서 그 범위를 제한하는 것으로 간주되어서는 안 된다는 것을 이해하고, 본 명세서의 원리는 다음과 같은 첨부 도면을 사용하여 추가로 구체적이고 세부적으로 설명되고 설명된다:
도 1은 연합 러닝 모델 훈련 접근법을 예시한다;
도 2는 분할 러닝 중앙 집중식 모델 훈련 접근법을 예시한다;
도 3은 분할 러닝 피어-투-피어(peer-to-peer) 접근법을 예시한다;
도 4는 연합 분할 러닝 접근법(federated split learning approach)을 예시한다;
도 5는 블라인드 러닝과 관련된 실시예를 예시한다;
도 7은 블라인드 상관관계가 다중 클라이언트들에 걸쳐 어떻게 작동하는지를 예시한다;
도 8은 방법 실시예를 예시한다;
도 9 내지 9a는 방법 실시예를 예시한다;
도 10 내지 도 10a는 방법 실시예를 예시한다; 및
도 11은 시스템 실시예를 예시한다.To illustrate how the foregoing and other advantages and features of the present disclosure may be obtained, a more specific explanation of the principles outlined above will be given with reference to specific embodiments illustrated in the accompanying drawings. With the understanding that these drawings illustrate only exemplary embodiments of the present disclosure and are therefore not to be regarded as limiting of its scope, the principles herein are described in further detail and detail using the accompanying drawings as follows: Explained:
1 illustrates a federated learning model training approach;
Figure 2 illustrates a split learning centralized model training approach;
3 illustrates a split learning peer-to-peer approach;
4 illustrates a federated split learning approach;
5 illustrates an embodiment related to blind learning;
Figure 7 illustrates how blind correlation works across multiple clients;
8 illustrates a method embodiment;
9-9A illustrate a method embodiment;
10-10A illustrate a method embodiment; and
11 illustrates a system embodiment.

도입introduction

본 개시의 특정 양태 및 실시양태가 아래에 제공된다. 이들 양태 및 실시예 중 일부는 독립적으로 적용될 수 있고 이들 중 일부는 당업자에게 명백한 바와 같이 조합하여 적용될 수 있다. 다음 설명에서, 설명의 목적으로, 애플리케이션의 실시예에 대한 완전한 이해를 제공하기 위해 특정 세부사항이 제시된다. 그러나, 이러한 특정 세부사항 없이 다양한 실시예가 실시될 수 있음이 명백할 것이다. 도면 및 설명은 제한하려는 의도가 아니다.Certain aspects and embodiments of the present disclosure are provided below. Some of these aspects and embodiments can be applied independently and some of them can be applied in combination as will be apparent to those skilled in the art. In the following description, for purposes of explanation, specific details are set forth in order to provide a thorough understanding of embodiments of the application. However, it will be apparent that various embodiments may be practiced without these specific details. The drawings and description are not intended to be limiting.

이어지는 설명은 예시적인 실시예만을 제공하며, 본 개시의 범위, 적용 가능성 또는 구성을 제한하도록 의도되지 않는다. 오히려, 예시적인 실시예의 다음 설명은 예시적인 실시예를 구현하기 위한 가능한 설명을 당업자에게 제공할 것이다. 첨부된 특허청구범위에 기술된 적용의 사상 및 범위를 벗어나지 않으면서 요소의 기능 및 배열에 다양한 변경이 이루어질 수 있음을 이해해야 한다.The following description provides exemplary embodiments only and is not intended to limit the scope, applicability or configuration of the present disclosure. Rather, the following description of example embodiments will provide those skilled in the art with possible descriptions for implementing the example embodiments. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of application as set forth in the appended claims.

간단한 설명brief explanation

당업계에서 필요한 것은 모델이 훈련된 데이터를 비공개로 유지하는 신경망 모델을 훈련하기 위해 알려진 접근법들을 결합하는 방법 및 시스템이다. 이전 접근법을 사용하면 훈련 데이터가 누출되거나 훈련 프로세스의 일부로 발견될 수 있다. 본 명세서에 개시된 개선된 접근법은 또한 보관 문제와 같은 다른 문제를 처리하고 실패 문제의 단일-포인트를 제거할 수 있다. 본 개시는 먼저 알려진 접근법을 논의한 다음 새로운 접근법을 소개한다. 일 양태에서, 특정 플랫폼은 신경망 모델의 연합 개발 또는 훈련을 가능하게 하기 위해 사용된다. 이러한 방식으로 모델을 훈련하기 위한 개시된 플랫폼의 사용은 본 명세서의 다른 실시예로서 개시된다. 또 다른 실시예에서, 데이터는 서버와 하나 이상의 클라이언트 디바이스 사이에서 전달될 때 암호화된다. 다양한 유형의 연합 러닝(도 1에 도시), 분할 러닝(도 2에 도시) 및 분할 러닝 피어-투-피어(도 3에 도시)가 본 명세서에 개시되어 있다. 본 개시는 기존 접근법에 대한 몇 가지 새로운 개선 사항을 제공한다.What is needed in the art is a method and system combining known approaches to training neural network models that keep the data on which the model was trained private. With the previous approach, training data could be leaked or discovered as part of the training process. The improved approach disclosed herein can also address other issues, such as storage issues, and eliminate single-point of failure issues. This disclosure first discusses known approaches and then introduces new approaches. In one aspect, a particular platform is used to enable federation development or training of neural network models. Use of the disclosed platform to train a model in this manner is disclosed as another embodiment herein. In another embodiment, data is encrypted as it passes between the server and one or more client devices. Various types of federated learning (shown in FIG. 1 ), split learning (shown in FIG. 2 ) and split running peer-to-peer (shown in FIG. 3 ) are disclosed herein. This disclosure provides several new improvements over existing approaches.

일반적인 연합 러닝은 클라이언트 데이터를 사용한 훈련을 위해 서버에서 클라이언트 디바이스로 전체 모델을 전달하는 작업 포함한다. 이 프로세스는 교육 목적으로 각각의 데이터가 있는 여러 클라이언트를 사용하는 것을 포함할 수 있다. 이 접근법은 일반적으로 전체 모델이 데이터와 함께 제1 클라이언트에 발송된 다음 제1 클라이언트에서 훈련된 후 전체 모델이 "평균화"를 위해 서버로 다시 수신되는 선형 및 반복 방식으로 수행된다. 그런 다음 전체 업데이트된 모델이 추가 처리를 위해 데이터와 함께 제2 클라이언트로 발송된다. 그런 다음 업데이트된 모델은 추가 "평균화" 등을 위해 서버로 다시 발송된다. 분할 러닝 접근법에서는 모델이 분할되고 일부가 각 클라이언트에 발송되지만 여전히 비효율적인 선형 및 대화형 훈련 프로세스가 존재한다. 분할 러닝 피어-투-피어 접근법은 피어 클라이언트가 선형 프로세스에서 데이터를 공유하므로 선형적으로 수행된다. 데이터의 프라이버시를 유지하기 위한 개선과 훈련 프로세스의 효율성이 필요하다.Common federated learning involves passing the entire model from the server to the client device for training using client data. This process may involve using multiple clients, each with its own data, for training purposes. This approach is generally performed in a linear and iterative fashion where the entire model is sent along with the data to the first client and then trained on the first client before the entire model is received back to the server for "averaging". The entire updated model is then sent along with the data to the second client for further processing. The updated model is then sent back to the server for further “averaging,” etc. In the split learning approach, the model is split and parts are sent to each client, but there is still an inefficient linear and interactive training process. The segmented learning peer-to-peer approach performs linearly as peer clients share data in a linear process. Improvements are needed to maintain the privacy of data and the efficiency of the training process.

본 개시는 연합 러닝 및 분할 러닝에 대한 2가지 주요 개선점을 설명한다. 첫 번째는 클라이언트 측 처리가 다른 클라이언트와 병렬로 및 독립적으로 발생하는 연합 분할 학습 접근법(도 4 내지 5 참조)이다. 두 번째로 개시된 접근법(도 6 내지 10에 도시됨)은 상이한 클라이언트로부터 상이한 유형의 데이터를 처리하기 위한 다중 모드 인공 지능(MMAI) 훈련 접근법에 관한 것이다.This disclosure describes two major improvements over federated learning and partitioned learning. The first is a federated partitioned learning approach (see FIGS. 4-5) in which client-side processing occurs in parallel and independently of other clients. The second disclosed approach (shown in Figures 6-10) relates to a multimodal artificial intelligence (MMAI) training approach for processing different types of data from different clients.

위에서 언급한 바와 같이, 연합 분할 러닝 접근법은 위의 전형적인 연합 러닝 접근법의 변형으로서 개시된다. 이와 관련하여 방법은 서버에서 신경망을 제1 부분 및 제2 부분으로 분할하는 단계, 및 제2 부분을 개별적으로 제1 클라이언트 및 제2 클라이언트에 발송하는 단계를 포함한다. 클라이언트는 데이터(MRI, 환자 데이터, 고객을 위한 은행 데이터 등)를 가질 수 있으며 각각은 신경망의 일부(절단 레이어까지 네트워크의 특정 수의 레이어)를 수신할 수 있다. 이 방법은 임계값이 충족될 때까지 다음 동작을 수행하는 것을 포함한다: (1) 데이터 SA1 및 SA2를 생성하기 위해 제1 클라이언트 및 제2 클라이언트에서 제2 부분에 대한 순방향 단계를 동시에 수행하는 단계(도 1 내지 4 참조); (2) 제1 클라이언트 및 제2 클라이언트로부터 SA1 및 SA2를 서버로 송신하는 단계; (3) 서버에서 제1 클라이언트 및 제2 클라이언트에 대한 손실 값을 계산하는 단계; (4) 서버에서 제1 클라이언트와 제2 클라이언트에 걸친 평균 손실을 계산하는 단계; (5) 서버에서 평균 손실을 사용하여 역전파를 수행하고 그라디언트를 계산하는 단계; 및 (6) 서버로부터 그라디언트를 제1 클라이언트와 제2 클라이언트로 발송하는 단계. 이 접근법은 클라이언트 측(또는 "데이터 서버" 측)의 처리가 서로 독립적이고 병렬로 동작하도록 하여 연합 러닝 접근법 및 분할 러닝 접근법보다 향상된 기능을 제공한다. 이 접근법은 또한 분할 러닝 피어-투-피어 접근법과도 상이하다. 독립적인 데이터 서버는 최종 훈련 모델을 얻기 위해 네트워크 요구 사항에 따라 데이터를 집계, 평균화 또는 처리하는 서버 측에 그들의 활성화를 발송한다.As mentioned above, the federated partitioned learning approach is disclosed as a variant of the typical federated learning approach above. In this regard, the method includes dividing a neural network into a first portion and a second portion at a server, and forwarding the second portion to a first client and a second client, respectively. Clients can have data (MRI, patient data, bank data for a customer, etc.) and each can receive a portion of a neural network (a certain number of layers of the network up to the truncated layer). The method includes performing the following operations until the threshold is met: (1) concurrently performing forward steps for the second portion at the first client and at the second client to generate data SA1 and SA2; (See Figures 1 to 4); (2) sending SA1 and SA2 from the first client and the second client to the server; (3) calculating loss values for the first client and the second client at the server; (4) calculating an average loss across the first client and the second client at the server; (5) performing backpropagation using the average loss in the server and calculating the gradient; and (6) sending the gradient from the server to the first and second clients. This approach provides an improvement over federated and partitioned learning approaches by allowing client-side (or "data server") processing to operate independently of each other and in parallel. This approach is also different from the split learning peer-to-peer approach. Independent data servers send their activations to the server side, which aggregates, averages or processes the data according to network requirements to obtain the final trained model.

본 개시의 다른 양태는 다중 상이한 모드의 데이터 또는 데이터 유형이 훈련에 사용될 수 있는 인공 지능 모델을 개발하는 데 있어서의 개선에 관한 것이다. 예를 들어 클라이언트마다 데이터 유형이 다를 수 있다. 한 클라이언트는 X-선 또는 MRI 이미지를 가지고 있고 다른 클라이언트는 환자의 건강 상태를 설명하는 텍스트를 가지고 있을 수 있다. 이와 관련하여, 방법은 신경망을 제1 클라이언트측 네트워크, 제2 클라이언트측 네트워크 및 서버측 네트워크로 분할하고, 제1 클라이언트측 네트워크를 제1 클라이언트에 발송하는 단계를 포함할 수 있다. 제1 클라이언트 측 네트워크는 제1 클라이언트로부터의 제1 데이터를 처리하도록 구성되며, 제1 데이터는 제1 유형을 갖는다. 제1 클라이언트측 네트워크는 적어도 하나의 제1 클라이언트측 레이어를 포함할 수 있다. 이 방법은 제2 클라이언트 측 네트워크를 제2 클라이언트에 발송하는 단계를 포함한다. 제2 클라이언트측 네트워크는 제2 클라이언트로부터의 제2 데이터를 처리하도록 구성되며, 제2 데이터는 제2 유형을 갖는다. 제2 클라이언트 측 네트워크는 적어도 하나의 제2 클라이언트 측 레이어를 포함할 수 있으며, 여기서 제1 유형 및 제2 유형은 공통 연관을 갖는다.Another aspect of the present disclosure relates to improvements in developing artificial intelligence models in which multiple different modes of data or data types can be used for training. For example, each client may have a different data type. One client may have an X-ray or MRI image and another client may have text describing the patient's health condition. In this regard, the method may include dividing the neural network into a first client-side network, a second client-side network and a server-side network, and forwarding the first client-side network to the first client. The first client-side network is configured to process first data from the first client, the first data having a first type. The first client-side network may include at least one first client-side layer. The method includes forwarding the second client-side network to the second client. The second client-side network is configured to process second data from the second client, the second data having a second type. The second client-side network may include at least one second client-side layer, wherein the first type and the second type have a common association.

방법은 서버 측 네트워크에서, 제1 클라이언트로부터의 제1 데이터에 대한 제1 클라이언트 측 네트워크의 훈련으로부터 제1 활성화를 수신하는 단계, 서버 측 네트워크에서, 제2 클라이언트로부터의 제2 데이터에 대한 제2 클라이언트 측 네트워크의 훈련으로부터 제2 활성화를 수신하는 단계, 그라디언트를 생성하기 위해 제1 활성화 및 제2 활성화에 기초하여 서버 측 네트워크의 적어도 하나의 서버 측 레이어를 훈련하는 단계, 및 서버 측 네트워크에서 제1 클라이언트 측 네트워크 및 제2 클라이언트 측 네트워크로 그라디언트를 발송하는 단계를 더 포함할 수 있다. 이러한 방식으로 단일 환자 또는 단일 유형 또는 환자 범주와 같은 공통 관계를 갖는 여러 유형의 데이터를 사용하여 모델이 학습될 수 있다.The method comprises receiving, at a server-side network, a first activation from training of a first client-side network on first data from a first client; at the server-side network, a second activation on second data from a second client. receiving a second activation from training of the client-side network, training at least one server-side layer of the server-side network based on the first activation and the second activation to generate a gradient, and The step of sending the gradient to the first client-side network and the second client-side network may be further included. In this way, models can be trained using multiple types of data that have a common relationship, such as a single patient or single type or category of patients.

이 요약은 청구된 요지의 핵심 또는 필수 피쳐를 식별하도록 의도되지 않았으며, 청구된 요지의 범위를 결정하기 위해 단독으로 사용되도록 의도되지 않는다. 주제는 이 특허의 전체 명세서, 일부 또는 모든 도면 및 각 청구범위의 적절한 부분을 참조하여 이해되어야 한다.This summary is not intended to identify key or essential features of the claimed subject matter, and is not intended to be used alone to determine the scope of the claimed subject matter. The subject matter should be understood by reference to the entire specification of this patent, some or all of the drawings, and appropriate portions of each claim.

전술한 내용은 다른 피쳐 및 실시예와 함께 다음 명세서, 청구범위 및 첨부 도면을 참조하면 더욱 명확해질 것이다.The foregoing, along with other features and embodiments, will become more apparent upon reference to the following specification, claims and accompanying drawings.

상세한 설명details

신경망들 훈련에 대한 개선된 접근법을 가능하게 할 새로운 시스템, 플랫폼, 컴퓨팅 환경, 클라우드 환경, 시장, 또는 시스템의 임의의 다른 특성화가 본 명세서에 개시된다. 하나의 양태에서, 접근법은 알려진 접근법들의 피쳐들을 결합하지만 다양한 클라이언트 디바이스들로부터 모델을 훈련하는 데 사용되는 데이터에 대한 프라이버시를 유지하는 훈련 프로세스를 제공하는 연합-분할 러닝 접근법이라고 한다. 이 개시는 먼저 연합 러닝 접근법을 더 자세히 논의하고, 분할 러닝 접근법과 분할 러닝 피어-투-피어 접근법에 이어 새로운 연합 분할 러닝 접근법을 소개한다. 또한 다양한 유형들의 데이터에 대한 다중 모드 인공 지능(MMAI) 러닝 접근법도 소개된다. 새로운 연합 분할 러닝 접근법과 MMAI 접근법은 위에서 언급한 모델들을 포함한 여러 모델들을 기반으로 한다. 애플리케이션은 이러한 제1 접근법들을 더 자세히 검토한 다음 두 가지 새로운 러닝 기술들을 소개한다.Disclosed herein is a new system, platform, computing environment, cloud environment, marketplace, or any other characterization of a system that will enable an improved approach to training neural networks. In one aspect, the approach is referred to as a federated-split learning approach that combines features of known approaches but provides a training process that maintains privacy for the data used to train the model from the various client devices. This disclosure first discusses the federated learning approach in more detail, and introduces a new federated partitioned learning approach following the partitioned learning approach and the partitioned learning peer-to-peer approach. A multimodal artificial intelligence (MMAI) learning approach on different types of data is also introduced. The new federated segmentation learning approach and the MMAI approach are based on several models, including those mentioned above. The application examines these first approaches in more detail and then introduces two new learning techniques.

연합 러닝united running

도 1은 연합 러닝 접근법(100)을 예시한다. 이것은 현재 주요 회사들에 의해 사용되는 접근법이다. 이 접근법의 단점은 병렬이 아니라 한 번에 하나의 데이터 공급자에게 "선형"으로 진행된다는 것이다. 도시된 신경망의 예는 연합 러닝 접근법을 사용하여 훈련되는 완전 연결된 피드 포워드 신경망(feed forward neural network)이다. 이 경우의 훈련 프로세스는 모델(104)을 생성하고 선형으로 개별의 클라이언트들(106, 108, 110)과 모델(106A, 108A 및 110A)을 공유하는 서버(102)를 포함한다. 클라이언트들은 차례대로 모델을 수신할 때 개별의 모델(106A, 108A, 110A)을 따로 훈련시키고 도시된 바와 같이 훈련된 모델 데이터를 개별 서버(102)로 다시 발송한다. 서버(102)는 모델들을 평균화하고 업데이트된 가중치들을 갖는 새로운 모델(104)(일명 훈련된 모델)을 생성한다. 서버(102)는 새로운 모델 또는 가중치들을 선형 방식으로 개별의 클라이언트들(106, 108, 110)에 발송한다. 프로세스는 여러 번 반복되거나 특정 정확도가 달성될 때까지 반복된다.1 illustrates a federated learning approach 100 . This is the approach currently used by major companies. The downside to this approach is that it goes "linear", one data provider at a time, rather than parallel. An example of a neural network shown is a fully connected feed forward neural network that is trained using a federated learning approach. The training process in this case involves the server 102 generating the model 104 and sharing the model 106A, 108A and 110A with the respective clients 106, 108, 110 linearly. The clients in turn train individual models 106A, 108A, 110A separately when receiving the models and send the trained model data back to individual server 102 as shown. Server 102 averages the models and creates a new model 104 (aka the trained model) with updated weights. Server 102 sends the new model or weights to individual clients 106, 108, 110 in a linear fashion. The process is repeated several times or until a certain accuracy is achieved.

각각의 반복에서, 서버(102)는 훈련된 모델 B를 생성하기 위해 모든 참여 모델들을 평균화한다. 따라서, 서버는 임의의 시점에서 완전히 훈련된 모델(104)을 갖는다. "글로벌 모델"이라는 용어는 훈련 프로세스의 결과인 모델을 지칭한다. 글로벌 모델은 추론 작업에 사용되는 훈련된 개체이다. 추론 작업은 환자가 암이나 부러진 뼈 또는 기타 의학적 상태에 있는지 여부를 분류하기 위해 의료 이미지를 평가하는 것일 수 있다.At each iteration, server 102 averages all participating models to create a trained model B. Thus, the server has a fully trained model 104 at any point in time. The term "global model" refers to a model that is the result of a training process. A global model is a trained entity used for inference tasks. An inference task might be evaluating medical images to classify whether a patient has cancer or a broken bone or other medical condition.

사용되는 이 접근법의 예, 전자 시계와 같은 디바이스들 또는 모바일 디바이스, 예를 들어 야간에 충전하고, Wi-Fi 네트워크에 연결된 디바이스는 신경망 모델들을 훈련하는 데 사용되는 그의 프로세서를 가질 수 있다. 따라서, 클라이언트 1(106)은 애플 와치(Apple watch)가 될 수 있고, 클라이언트 2(108)는 다른 사람의 아이폰(iPhone)이 될 수 있으며, 기타 등등이 가능하다. 애플(Apple)에서 제공하는 시리(Siri) 음성 처리 서비스를 모델로 들 수 있다. 모든 디바이스는 동일한 모델을 훈련하고 있으며 유일한 차이점은 개별 클라이언트가 로컬 데이터에 대해 훈련한다는 것이다. 모델 또는 데이터는 서버(102)로 다시 송신되고 서버는 모델을 함께 평균화한다. 단점은 클라이언트 1(106)과 같은 개별 클라이언트들이 속아서 모델을 훈련하는 데 사용되는 데이터에 대한 무언가를 공유할 수 있다는 것이다. 이것은 프라이버시 데이터의 누출이 될 것이며 위에서 설명한 이슈를 제기할 것이다. 연합 러닝 접근법의 문제는 전체 모델이 클라이언트로부터 클라이언트로 전달되기 때문에 모델 프라이버시가 없다는 것이다. 각 클라이언트가 전체 모델을 처리하기 때문에 높은 계산 비용이 발생하고 전체 모델이 여러 번 송신되기 때문에 많은 통신 오버헤드가 발생한다. 재구성 공격은 훈련 데이터를 또한 취약하게 만들 수 있다.An example of this approach being used, devices such as electronic watches or mobile devices, eg charging at night and connected to a Wi-Fi network, can have their processor used to train neural network models. Thus, client 1 106 could be an Apple watch, client 2 108 could be someone else's iPhone, and so forth. The Siri voice processing service provided by Apple can be used as a model. All devices train the same model, the only difference is that each client trains on local data. The model or data is sent back to server 102 and the server averages the models together. The downside is that individual clients, such as Client 1 (106), can be tricked into sharing something about the data used to train the model. This would be a leak of privacy data and would raise the issues described above. The problem with the federated learning approach is that there is no model privacy because the entire model is passed from client to client. Each client processes the entire model, resulting in high computational costs and high communication overhead since the entire model is sent multiple times. Reconstruction attacks can also make training data vulnerable.

분할 러닝split running

도 2는 분할 러닝 중앙 집중식 접근법을 예시한다. 모델(신경망)(204)은 두 부분들로 분할된다: 한 부분(206A, 208A, 210A)은 개별 클라이언트 측(206, 208, 210)에 상주하고 모델에 대한 입력 레이어 및 선택적으로 절단 레이어(cut layer)까지의 다른 레이어들을 포함하며, 그리고 다른 부분(B)은 서버 측(202)에 상주하고 종종 출력 레이어를 포함한다. 분할 레이어(S)는 A와 B가 분할된 레이어(절단 레이어)를 지칭한다. 도 2에서 SA는 A로부터 B로 발송된 분할 레이어 또는 데이터를 나타내고 SB는 B로부터 A로 발송된 분할 레이어를 나타낸다.2 illustrates a split learning centralized approach. The model (neural network) 204 is split into two parts: one part (206A, 208A, 210A) resides on a separate client side (206, 208, 210) and is an input layer to the model and optionally a cut layer (cut layer). layer), and another part (B) resides on the server side 202 and often includes an output layer. The split layer S refers to a layer (cutting layer) in which A and B are split. In FIG. 2, SA represents a split layer or data sent from A to B, and SB denotes a split layer sent from B to A.

일례에서, B(204)와 클라이언트 1(206) 사이의 신경망은 전체 신경망을 완성하기 위한 데이터 SB1(206C) 및 SA1(206B)의 통신을 갖는 B 부분(204)에 A1 부분(206A)을 더한 것이다. 이 모델에서 훈련 프로세스는 다음과 같다. 서버(202)는 A와 B를 생성하고 개별 모델 A(206A, 208A, 210A)를 개별 클라이언트(206, 208, 210)에 발송한다. 모든 클라이언트에 대해, 동작들은 일부 조건들이 발생할 때까지 클라이언트들 그룹에서 선형 또는 반복 방식으로 다음을 반복하는 것을 포함한다. 개별 클라이언트(206, 208, 210)는 차례로 서버(202)로부터 가장 최근의 모델 A를 다운로드한다(이 단계는 도 2와 도 3에 도시된 접근법이 다르다는 점에 유의). 클라이언트들 (206, 208, 210)은 개별 차례에서 모델 A에 대해 순방향 단계를 수행하고 A의 출력(즉, S에서만 활성화 또는 SA1(206B), SA2(208B), SAN(210B))을 필수 레이블들에 더해 서버(202)로 발송한다. 서버(202)는 개별 클라이언트(206, 208, 210)로부터 수신된 SA들을 사용하여 B에 대한 순방향 단계를 수행한다. 서버(202)는 손실 함수(loss function)를 계산하고 서버(202)는 역전파(backpropagation)를 수행하고 S 레이어에서 그라디언트(gradient)들을 계산한다. 서버(202)는 S만의 그라디언트(즉, SB1(206C), SB2(208C), SBN(210C))를 개별 클라이언트(206, 208, 210)에 발송한다. 이 프로세스는 클라이언트(206)에 대해 먼저 동작들이 발생하고 클라이언트(208)가 그 다음 클라이언트(210)에 대해 발생하도록 서로 다른 클라이언트들에 걸쳐 선형으로 수행된다. 클라이언트(206, 208, 210)는 서버(202)로부터 수신된 SB 그라디언트들을 사용하여 역전파를 수행하고 클라이언트(206, 208, 210)는 업데이트된 A(SA1(206B), SA2(208B), SAN(210B))를 서버(202)와 공유한다.In one example, the neural network between B 204 and client 1 206 is B portion 204 plus A1 portion 206A with communication of data SB1 206C and SA1 206B to complete the entire neural network. will be. The training process in this model is as follows. Server 202 creates A and B and sends individual models A (206A, 208A, 210A) to individual clients (206, 208, 210). For every client, actions include repeating the following in a linear or iterative fashion in a group of clients until some condition occurs. Individual clients 206, 208 and 210 in turn download the most recent model A from server 202 (note that this step differs from the approach shown in FIGS. 2 and 3). Clients 206, 208, 210, in their individual turns, perform a forward step on model A and pass the output of A (i.e. active only in S or SA1 206B, SA2 208B, SAN 210B) to the required label. are sent to the server 202. Server 202 uses the SAs received from individual clients 206, 208, and 210 to perform a forward step on B. The server 202 calculates a loss function and the server 202 performs backpropagation and computes gradients in the S layer. Server 202 sends S-only gradients (ie, SB1 206C, SB2 208C, SBN 210C) to individual clients 206, 208, 210. This process is performed linearly across the different clients so that actions occur first for client 206, then client 208 for client 210. Clients 206, 208, 210 perform backpropagation using the SB gradients received from server 202, and clients 206, 208, 210 use the updated A (SA1 (206B), SA2 (208B), SAN (210B)) with the server 202.

도 2의 수평축은 클라이언트로부터 클라이언트로 라운드 로빈(round-robin) 방식으로 처리가 발생하는 시간이다.The horizontal axis of FIG. 2 is the time at which processing occurs in a round-robin manner from client to client.

일례에서, 클라이언트 1 상의 네트워크 A1(206A)은 컨볼루션(convolution) 레이어 및 활성화 레이어를 포함할 수 있다. 데이터를 처리한 클라이언트 1(206)은 해당 레이어의 결과(SA1(206B))를 위에서 설명한 대로 역전파 등을 계산하는 서버(202)에 있는 네트워크의 다음 레이어로 발송한다. B 네트워크는 (라운드 로빈 방식으로) 반복적으로 다른 클라이언트들(206, 208, 210)로부터의 다른 데이터를 처리한다. 그것은 궁극적으로 네트워크의 평균 반영(averaged reflection)에 도달한다. 그것은 절대 동시에 모든 클라이언트들(206, 208, 210)의 모든 데이터에 대해 네트워크를 훈련하지 않는다. 그것은 데이터를 더 빠르게 처리할 수 있고 B가 구축될 때 데이터 전체에서 평균화된다는 이점이 있다. 최종 알고리즘은 모든 데이터를 확인하지 못했다. 모델 B는 모든 데이터에 대해 훈련된 적이 없기 때문에 그것의 데이터를 공개하도록 속여질 수 없다.In one example, network A1 206A on client 1 may include a convolution layer and an activation layer. After processing the data, Client 1 (206) sends the result of that layer (SA1 (206B)) to the next layer of the network at Server 202, which computes backpropagation and the like as described above. The B network iteratively (in a round robin fashion) processes different data from different clients 206, 208, 210. It ultimately arrives at the averaged reflection of the network. It never trains the network on all data from all clients 206, 208, 210 at the same time. It has the advantage that it can process the data faster and when B is built it is averaged over the data. The final algorithm did not check all data. Model B cannot be tricked into revealing its data because it has never been trained on all data.

피어-투-피어 환경에서의 분할 러닝Split learning in a peer-to-peer environment

도 3은 분할 러닝 피어-투-피어 접근법을 예시한다. 모델(신경망)은 두 부분들로 나뉜다: 한 부분(A)은 클라이언트 측에 상주하고 입력 레이어를 포함하고, 다른 부분(B)은 서버 측에 상주하고 종종 출력 레이어를 포함한다. 도 3에서, 클라이언트 측 부분(A)은 클라이언트(306)에서 A1(306A), 클라이언트(308)에서 A2(308A), 클라이언트(310)에서 AN(310A)으로 개별로 도시되어 있다. 분할 레이어(S)는 A와 B가 분할된 레이어를 지칭한다. 도 3에서 SA는 A로부터 B로 발송되는 분할 레이어를 나타내고 SB는 B로부터 A로 발송되는 분할 레이어를 나타낸다.3 illustrates a split learning peer-to-peer approach. A model (neural network) is divided into two parts: one part (A) resides on the client side and contains an input layer, and the other part (B) resides on the server side and often contains an output layer. In FIG. 3 , client-side portion A is shown separately as A1 306A in client 306 , A2 308A in client 308 , and AN 310A in client 310 . The split layer S refers to a layer in which A and B are split. In FIG. 3, SA represents a split layer sent from A to B, and SB denotes a split layer sent from B to A.

일례에서, B와 클라이언트 1(306) 사이의 신경망은 전체 신경망을 완성하기 위해 데이터 SB1(306C) 및 SA1(306B)의 통신과 함께 B 부분에 A1 부분(306A)을 더한 것이다. 이 모델에서 훈련 프로세스는 다음과 같다. 서버(302)는 A와 B를 생성하고 A를 클라이언트(306, 308, 310)에. 모든 클라이언트에 대해, 프로세스는 일부 조건들이 발생할 때까지 다음을 반복하는 것을 포함한다. 첫째, 프로세스는 이전 클라이언트로부터 가장 최근 A를 다운로드하는 것을 포함한다.In one example, the neural network between B and client 1 (306) is part B plus part A1 (306A) with communication of data SB1 (306C) and SA1 (306B) to complete the entire neural network. The training process in this model is as follows. Server 302 creates A and B and sends A to clients 306, 308 and 310. For every client, the process involves repeating the following until some conditions occur. First, the process involves downloading the most recent A from a previous client.

이 단계는 다른 도면들에 도시된 접근 법과 다르다는 것을 유의한다. 그 다음, 프로세스는 A에 대한 순방향 단계를 수행하고 A의 출력(즉, S에서만 활성화들)을 필요한 레이블들에 추가하여 서버(302)로 발송하는 것을 포함한다. 서버(302)는 개별 클라이언트(306, 308, 310)로부터 수신된 SA를 사용하여 B에 대한 순방향 단계를 수행한다. 서버(302)는 손실 함수를 계산하고 역전파를 수행하고 S에서 그라디언트들을 계산한다. 서버(302)는 S만의 그라디언트들(즉, SB)을 개별 클라이언트(306, 308, 310)에 발송한다. 클라이언트는 서버(302)로부터 수신된 SB 그라디언트들을 사용하여 역전파를 한다. 클라이언트는 업데이트된 A를 서버(302)와 공유한다.Note that this step differs from the approach shown in other figures. The process then involves performing a forward step on A and sending A's output (i.e., activations only in S) to server 302 with the necessary labels added. Server 302 uses SAs received from individual clients 306, 308, and 310 to perform forward steps for B. Server 302 computes the loss function and performs backpropagation and computes the gradients in S. Server 302 sends S only gradients (ie, SB) to individual clients 306, 308, 310. The client uses the SB gradients received from the server 302 to backpropagate. The client shares the updated A with the server 302.

피어-투-피어 접근법은 일반적으로 마지막 훈련된 클라이언트로부터, 또는 더 광범위하게는, 이전에 훈련된 클라이언트로부터 직접 다운로드함으로써 개별 클라이언트가 그것들의 A 모델을 업데이트하는 것을 포함한다. 이와 관련하여, 클라이언트들을 훈련시키는 프로세스는 클라이언트가 순차적으로 훈련되는 라운드 로빈 방식으로 발생할 수 있다. 예를 들어, 클라이언트 1(306)이 먼저 훈련되면, 클라이언트 2(308)가 서버(302) 또는 다른 신뢰할 수 있는 서버로부터 클라이언트 측 모델 A2를 업데이트하기보다는 피어-투-피어 모델에서, 클라이언트 2(308)는 클라이언트 1(306)로부터 클라이언트 측 모델 A1을 다운로드함으로써 클라이언트 모델 A2를 업데이트한다. 이전에 훈련된 모델은 마지막으로 훈련된 클라이언트 모델일 수 있거나 일부 기준에 기초하여 이전에 훈련된 일부 다른 클라이언트로부터의 모델일 수 있다. 예를 들어, 클라이언트 1(306) 및 클라이언트 2(308)는 개별 모델들을 훈련시킬 수 있다. 클라이언트 3(310)은 클라이언트 측 모델 업데이트가 필요하고 클라이언트 1(306)과 클라이언트 2(308) 사이에서 다운로드할 클라이언트 측 모델을 결정하기 위한 알고리즘 또는 프로세스를 구현할 수 있다. 아래 개시 내용은 여기서 적용될 수 있는 다중 모델 인공 지능 훈련 프로세스를 구현한다는 것에 유의한다. 클라이언트 1(306)이 이미지들을 처리하고 그 모델 A1이 이미지 처리에 포커스를 맞추고, 클라이언트 2(308)가 텍스트를 처리하고 그 모델 A2가 텍스트 처리에 포커스를 맞추고, 클라이언트 3(310)이 이미지들을 처리한다면, 알고리즘 또는 프로세스는, 피어-투-피어 환경에서, 클라이언트 측 모델 A1을 업데이트로서 클라이언트 3(310)에 다운로드하게 할 수 있다. The peer-to-peer approach usually involves individual clients updating their A models by downloading directly from the last trained client, or more broadly, from a previously trained client. In this regard, the process of training clients may occur in a round robin fashion where clients are trained sequentially. For example, in a peer-to-peer model, if client 1 306 is trained first, then client 2 308 updates the client-side model A2 from server 302 or another trusted server. 308) updates client model A2 by downloading client-side model A1 from client 1 (306). The previously trained model may be the last trained client model or it may be a model from some other client previously trained based on some criterion. For example, client 1 306 and client 2 308 can train separate models. Client 3 (310) may implement an algorithm or process to determine which client-side model update is required and which client-side model to download between client 1 (306) and client 2 (308). Note that the disclosure below implements a multi-model artificial intelligence training process that may be applied herein. Client 1 (306) processes images and its model A1 focuses on image processing, client 2 (308) processes text and its model A2 focuses on text processing, and client 3 (310) processes images. If so, the algorithm or process may, in a peer-to-peer environment, download the client-side model A1 to client 3 (310) as an update.

일 시나리오에서, 신경망의 적절한 훈련을 달성하기 위해 분할 러닝으로부터의 정보가 충분하지 않다. 이 모델에서는 A 및 B가 단순히 적층 (A 및 B)되어 일반 텍스트로 서버(302)에서 집계되는 것이 좋은 훈련 접근법일 수 있다고 가정한다.In one scenario, there is not enough information from split learning to achieve proper training of the neural network. This model assumes that a good training approach might be for A and B to be simply stacked (A and B) and aggregated at server 302 as plain text.

연합 분할 러닝federation split running

도 4는 본 명세서에 개시된 훈련 신경망들에 대한 개선을 예시한다. 이 개선은 연합 분할 러닝 접근법으로 특징지어질 수 있으며 위에서 설명한 접근법들의 일부 결함을 다룬다. 도 4는 병렬 처리 접근법을 소개한다. 병렬 및 독립 처리로 인해 위에서 설명한 다른 모델들보다 빠른 속도로 모델 훈련이 발생한다.4 illustrates an improvement over the training neural networks disclosed herein. This improvement can be characterized as a federated segmentation learning approach and addresses some of the deficiencies of the approaches described above. Figure 4 introduces a parallel processing approach. Model training occurs at a faster rate than the other models described above due to parallel and independent processing.

연합 분할 러닝 접근법은 위에서 설명한 라운드 로빈 처리를 수행하지 않는다. 서버(402)는 네트워크 정의 코드(network definition code)들에 삽입된 사용자 파라미터인 "분할 레이어(split layer)"에서 네트워크를 분할한다. 네트워크의 "상단 부분(top portion)"은 서버(402)에서 유지되고 "하단 부분(bottom portion)"은 개별 데이터 제공자들 또는 클라이언트들(406, 408, 410)로 발송된다(클라이언트들 및 데이터 제공자들이라는 용어는 여기서 상호 교환 가능하게 사용됨). 훈련은 데이터에 가장 가까운 레이어인 가장 낮은 네트워크 레이어에서 시작된다. 각 레이어는 데이터(제1 레이어로부터) 또는 이전 레이어의 출력(다른 모든 레이어들)을 판독한다.The federated partitioned learning approach does not perform the round robin processing described above. Server 402 splits the network at “split layers,” which are user parameters inserted into network definition codes. The "top portion" of the network is maintained at server 402 and the "bottom portion" is forwarded to individual data providers or clients 406, 408, 410 (clients and data providers The terms are used interchangeably herein). Training starts at the lowest network layer, which is the layer closest to the data. Each layer reads the data (from the first layer) or the output of the previous layer (all other layers).

레이어들은 임의의 유효한 네트워크 아키텍처 커맨드(컨볼루션들, 드롭아웃(dropout)들, 배치 정규화, 플랫화된 레이어들 등) 및 활성화 함수(relu, tanh 등)에 기초하여 출력(활성화 함수에서 가져오기 때문에 "활성화(activation)들"이라고 한다)을 계산할 수 있다. 데이터 측(406, 408, 410)의 마지막 레이어가 적절한 활성화들(즉, 출력)을 계산하면 해당 출력은 "분할의 다른 측"에 있는 제1 레이어-서버 측(402)의 제1 레이어로 발송된다.Layers are based on any valid network architecture command (convolutions, dropouts, batch normalization, flattened layers, etc.) and an activation function (relu, tanh, etc.) referred to as "activations") can be calculated. When the last layer on the data side (406, 408, 410) computes the appropriate activations (i.e., outputs), that output is routed to the first layer on the "other side of the split" - the first layer on the server side (402). do.

다음 접근법은 이전과 같이 모델을 분할하는 것을 포함한다. A 모델은 두 부분으로 분할된다: (A)는 클라이언트 측에서 입력 레이어를 포함하고 (B)는 서버 측에서 종종 출력 레이어를 포함한다. (S)는 분할 레이어이다. 클라이언트들 또는 데이터 제공자들(406, 408, 410)은 독립적으로 실행되고 응답이 있으면 응답을 다시 발송한다. 서버(402)의 코드는 데이터를 처리하고 그 출력을 SB(406C, 408C, 410C)로 모든 클라이언트에 동일하게 다시 발송한다.The next approach involves splitting the model as before. The A model is split into two parts: (A) contains an input layer on the client side and (B) often contains an output layer on the server side. (S) is a split layer. Clients or data providers 406, 408, 410 run independently and send back a response if one is available. Code in server 402 processes the data and sends the output back to SBs 406C, 408C, and 410C to all clients identically.

예시적인 훈련 프로세스는 다음과 같다. 서버(402)는 A 및 B를 생성하고 부분 A(406A, 408A, 410A)를 클라이언트들(406, 408, 410)에 발송한다. 다음 단계들은 조건(예를 들어, 정확도)이 충족될 때까지 반복된다. 모든 클라이언트들(406, 408, 410)은 A에 대한 순방향 단계를 동시에 수행한다. 이 시점까지, 클라이언트들(406, 408, 410)에 대한 모든 계산들은 독립적인 서버들에서 수행되고 있으며 한 데이터 서버로부터 다른 서버로의 종속성이 없다. 이 접근법은 본 명세서에 공개된 혁신들 중 하나를 강조한다. 클라이언트들/데이터 제공자들(406, 408, 410)에 의한 이러한 모든 계산들은 동시에 병렬로 모두 동작할 수 있다. 이것은 위에서 논의한 선형 또는 "라운드 로빈" 방식과 대조된다.An exemplary training process is as follows. Server 402 creates A and B and sends portion A 406A, 408A, 410A to clients 406, 408, 410. The following steps are repeated until a condition (eg accuracy) is met. All clients 406, 408, 410 simultaneously perform the forward step for A. Up to this point, all calculations for clients 406, 408, 410 are being performed on independent servers and there is no dependency from one data server to another. This approach highlights one of the innovations disclosed herein. All of these calculations by clients/data providers 406, 408, 410 can all operate simultaneously and in parallel. This contrasts with the linear or "round robin" approach discussed above.

클라이언트들(406, 408, 410)은 각각 신경망의 A 부분(406A, 408A, 410A)을 실행하고 A(즉, SA(406B, 408B, 410B))의 개별 출력을 생성하고 출력을 서버(402)로 발송한다. 서버(402)는 3개의 다른 '버전들'의 활성화들(SA1, SA2, SA3 각각에서 하나씩)을 수신한다. 이 시점에서, 서버(402)는 그러한 활성화들을 "적절하게" 처리하는데, 이는 서버(402)가 경우에 따라 상이한 동작들을 수행함을 의미할 수 있다. 예를 들어, 서버(402)는 각각의 클라이언트(406, 408, 410)에 대한 손실 값을 계산하고 서버(402)는 모든 클라이언트들에 대한 평균 손실을 계산한다. 서버(402)는 평균 손실을 사용하여 역전파를 수행하고 S에서 그라디언트들을 계산한다. 서버(402)는 S(즉, SB(406C, 408C, 410C))에서 그라디언트들을 모든 클라이언트들(406, 408, 410)에 발송한다.Clients 406, 408, 410 each execute portions A (406A, 408A, 410A) of the neural network and generate individual outputs of A (i.e., SAs 406B, 408B, 410B) and send the output to server 402. send to Server 402 receives three different 'versions' of activations (one from each of SA1, SA2 and SA3). At this point, server 402 "properly" handles those activations, which may mean that server 402 performs different actions on a case-by-case basis. For example, server 402 calculates a loss value for each client 406, 408, 410 and server 402 calculates an average loss across all clients. Server 402 performs backpropagation using the average loss and computes the gradients in S. Server 402 sends gradients to all clients 406, 408, 410 at S (ie, SB 406C, 408C, 410C).

다시 말해서, 서버 측(402)에서의 훈련은 위에서 설명된 것과 매우 유사하게 진행된다. 서버 측(402)의 제1 레이어가 "완료"되면(데이터 제공자들(406, 408, 410)로부터 수신된 것을 평균화하거나 집계함으로써) 네트워크의 "상단"에 도달할 때까지 순방향 전파가 발생한다. 본 개시에 설명된 추가 혁신은 데이터 제공자들(406, 408, 410)로부터 오는 활성화 들 및 이들이 평균화, 집계 또는 기타 처리되는 방법을 관리하는 것이다. 시스템이 모델의 상단에 도달하면, 서버(402)는 역전파에 필요한 그라디언트들을 계산하고, 도 4에 도시된 바와 같이 분할 네트워크들을 통해 하향으로 다시 발송한다.In other words, training on the server side 402 proceeds very similarly to that described above. When the first layer on the server side 402 is “complete” (by averaging or aggregating received from data providers 406, 408, 410) forward propagation occurs until it reaches the “top” of the network. A further innovation described in this disclosure is managing activations coming from data providers 406, 408, 410 and how they are averaged, aggregated or otherwise processed. When the system reaches the top of the model, server 402 computes the gradients needed for backpropagation and sends them back down through the split networks as shown in FIG.

전술한 바와 같이, 서버(402)에 의한 활성화들의 처리 및 관리는 상이한 인자(factor)들에 따라 변할 수 있다. 예를 들어, 3개의 데이터 제공자들(406, 408, 410) 모두가 동일한 데이터(X-선들)를 제공하는 경우를 가정한다. 이 경우, 데이터는 수평으로 결합되며 개념적으로 데이터가 한 파일 위에 다른 파일 위에 "적층"됨을 의미할 수 있다. 이 경우, 발생하는 활성화들은 평균이 될 가능성이 크다. 그러면 "각 활성화의 평균"이 네트워크의 "상반부"로 순방향 발송된다.As noted above, the handling and management of activations by server 402 may vary depending on different factors. For example, it is assumed that all three data providers 406, 408, and 410 provide the same data (X-rays). In this case, the data is combined horizontally, which can conceptually mean that data is "stacked" on top of one file on top of another. In this case, the activations that occur are likely averaged. The "average of each activation" is then forwarded to the "top half" of the network.

다른 경우에, 데이터는 "수직으로" 적층될 수 있으므로, 클라이언트 1(406)은 데이터의 처음 40개 컬럼(column)들((예를 들어, 혈액 검사)을 갖고, 클라이언트 2(408)는 데이터의 다음 60개 컬럼들(예를 들어, 나이, 체중 등과 같은 데이터를 포함하는 전자 건강 기록)을 갖고 및 클라이언트 3(410)은 데이터의 마지막 100개 컬럼들(예를 들어, 보험 정보-이전 청구 등)을 갖는다. 이 예에서, 세 개의 클라이언트들은 200개 컬럼들의 결합된 "기록"을(페이지 전체에 걸쳐 수직으로 집계됨) 설정하는 것으로 간주될 수 있다. 이 경우, 활성화들은 "수직으로 결합"되어 서버 네트워크로 순방향 발송된다. 데이터 결합에 대한 이 접근법 및 기타 접근법들이 구현될 수 있다. 아래에서 더 자세히 설명하는 다중 모델 인공 지능 모델은 수직으로 활성화들을 결합하는 것과 관련하여 방금 설명한 개념을 기반으로 한다는 것을 유의한다. 이 개념에 대한 자세한 내용은 아래에서 제공될 것이다.In other cases, the data may be “vertically” stacked, so client 1 (406) has the first 40 columns of data (e.g. blood tests), and client 2 (408) has the data with the next 60 columns of data (eg, electronic health record containing data such as age, weight, etc.) etc.) In this example, the three clients can be considered to set a combined "record" of 200 columns (aggregated vertically across the page). In this case, the activations are "vertically combined". This and other approaches to combining data can be implemented. The multi-model artificial intelligence model described in more detail below is based on the concept just described regarding combining activations vertically. Note that , more details on this concept will be provided below.

위에서 언급한 바와 같이, 클라이언트들(406, 408, 410)은 이 실시예에서 병렬로 실행된다. 이렇게 하면 모든 처리가 병렬로 수행되므로 모델을 훈련하는 데 걸리는 시간이 줄어든다. 또한, 이 데이터는 특정 플랫폼을 통해 전달된다. 위에 통합된 애플리케이션들은 본 명세서에 공개된 데이터를 전달하는데 사용할 수 있는 특정 플랫폼의 예시들을 제공한다. 이것은 아래에서 더 논의될 것이다.As mentioned above, clients 406, 408 and 410 run in parallel in this embodiment. This reduces the time it takes to train the model as all processing is done in parallel. Also, this data is passed through a specific platform. The applications incorporated above provide examples of specific platforms that can be used to deliver data disclosed herein. This will be discussed further below.

연합 분할 러닝의 글로벌 모델은 다음과 같이 집계될 수 있다. 훈련이 완료되면, 시스템은 추론 작업에 사용될 글로벌 모델을 집계하기 위해 다음 접근법을 사용한다. 제1 접근법에서 서버는 모델들 중 하나인 Ai를 선택하여 해당 모델 B와 통합하여 글로벌 모델을 형성한다. Ai의 선택은 다음 방법들 중 하나를 사용하여 달성될 수 있다. 예를 들어, 서버가 임의의 클라이언트(406, 408, 410)의 모델(Ai)을 무작위로 선택하는 경우 무작위 선택이 사용될 수 있다. 이 무작위 선택은 현재 온라인에서 사용 가능한 클라이언트들, 각 클라이언트가 처리하는 데이터 유형들(텍스트 데이터, 이미지 데이터, 시간 데이터) 또는 두 독립체들 간의 송신 속도 또는 네트워크 지연과 같은 다른 인자들의 영향을 받을 수 있다. 그런 다음 서버는 Ai와 B 부분들을 모두 적층하여 글로벌 모델을 생성한다.The global model of federated partitioned learning can be aggregated as: When training is complete, the system uses the following approach to aggregate a global model to be used for inference tasks. In the first approach, the server selects one of the models, Ai, and integrates it with the corresponding model B to form a global model. Selection of Ai may be achieved using one of the following methods. For example, random selection may be used when the server randomly selects the model Ai of any client 406 , 408 , 410 . This random selection may be influenced by other factors such as the clients currently available online, the types of data each client processes (text data, image data, time data), or the transmission speed or network latency between the two entities. there is. The server then builds a global model by stacking both Ai and B parts.

다른 예에서, 가중 클라이언트 선택이 사용될 수 있다. 이 선택 기준에 대해, 서버(402)는 각각의 클라이언트에게 그들의 데이터, 계산 능력들, 및 그들이 훈련 프로세스 동안 소유하고 기여하는 다른 가치 있는 자산에 기초하여 그들의 중요성을 반영하는 가중치(즉, 수치 값)를 할당한다. 예를 들어, 특정 모델 세트(예를 들어, 특정 언어에 대한 데이터, 이미지 유형과 관련된 데이터, 환자 세트와 관련된 데이터 또는 특정 국가 또는 지역의 데이터)는 모델 개발에서 많은 가중치를 받을 수 있다. 따라서, 국가가 선택되면, 해당 국가의 클라이언트 디바이스들이 다른 국가들의 클라이언트들보다 더 많은 가중치를 둘 수 있다. 예를 들어, 일본 기반 클라이언트 디바이스들은 모델 데이터의 80%에 사용될 수 있다. 호주는 10%가 될 수 있고 캐나다는 나머지 10%가 될 수 있다. 또 다른 예로, 독감이나 COVID의 발병과 관련된 특정 클리닉의 데이터에 더 많은 가중치를 둘 수 있다. 또 다른 예에서, 데이터 유형도 더 많은 가중치를 둘 수 있다. 모델의 70%는 이미지 데이터가 사용되는 반면, 20%는 텍스트 데이터, 10%는 시간 데이터가 사용될 수 있다.In another example, weighted client selection may be used. For this selection criterion, server 402 assigns a weight (i.e., numerical value) to each client that reflects their importance based on their data, computational capabilities, and other valuable assets that they own and contribute during the training process. assign For example, a particular set of models (eg, data for a particular language, data related to an image type, data related to a set of patients, or data from a particular country or region) may receive a lot of weight in model development. Thus, if a country is selected, client devices in that country may be given more weight than clients in other countries. For example, Japan-based client devices may be used for 80% of the model data. Australia could be 10% and Canada the other 10%. As another example, more weight may be given to data from specific clinics related to flu or COVID outbreaks. In another example, data types can also be given more weight. 70% of the model may be image data, 20% text data, and 10% time data.

또 다른 모델은 정확도 기반 선택일 수 있다. 이 경우, 서버(402)는 각각의 클라이언트 모델 Ai로부터 생성된 정확도를 테스트한 다음 "최고" 정확도를 생성하는 모델을 선택할 수 있다. "최고"는 이해 관계자들에 의해 머신 러닝 접근법 등을 통해 식별될 수 있다. 이들은 모두 제1 접근법의 모델들이다.Another model could be accuracy based selection. In this case, server 402 may test the accuracy generated from each client model Ai and then select the model that produces the "best" accuracy. “Best” may be identified by stakeholders, such as through a machine learning approach. These are all models of the first approach.

제2 접근법은 모든 클라이언트들의 모델들 Ai {1, N}을 평균화하여 글로벌 모델을 집계하는 경우일 수 있다. 각 클라이언트는 먼저 동형 암호화(homomorphic encryption)를 사용하여 자신의 모델을 암호화한 다음 암호화된 Ai' 데이터를 서버(402)로 발송한다. 서버(402)는 모든 암호화된 모델들을 추가하고, 추가 결과들을 해독한 다음, 평균을 계산한다. 그런 다음 평균화된 A를 B와 함께 적층하여 글로벌 모델을 생성한다. 한 가지 접근법은 디폴트 접근법일 수 있으며 선택적 접근법들도 제공될 수 있다. 해독화 프로세스들 및 평균화 프로세스는 또한, 예를 들어, 클라이언트 측에서 발생하는 하나의 프로세스와 글로벌 모델을 달성하기 위해 서버(402)에 의해 수행되는 다른 프로세스로 서로 상이한 서버들 간에 확산될 수 있다.A second approach may be a case of aggregating a global model by averaging models A i {1, N} of all clients. Each client first encrypts its model using homomorphic encryption and then sends the encrypted Ai′ data to the server 402 . Server 402 adds all the encrypted models, decrypts the additional results, and calculates the average. The averaged A is then stacked with B to create a global model. One approach may be the default approach and optional approaches may be provided. The decryption processes and the averaging process may also be spread between different servers, for example, with one process occurring on the client side and another process performed by the server 402 to achieve a global model.

접근법들은 모델의 개발을 통해 다양할 수 있다. 예를 들어, 모델은 디폴트 접근법을 사용하여 훈련이 시작된 다음 가중치 접근법이 사용되어 모델 훈련을 완료하도록 훈련이 조정될 수 있다.Approaches may vary through the development of the model. For example, training can be adjusted such that a model is started training using a default approach and then a weighted approach is used to complete model training.

방법의 예가 도 5에 도시되어 있고, 서버에서 신경망을 제1 부분과 제2 부분으로 분할하는 단계(502), 제2 부분을 개별적으로 제1 클라이언트 및 제2 클라이언트에 발송하는 단계(504) 및 임계치가 충족될 때까지 다음 동작들을 수행하는 단계를 포함할 수 있다:An example of the method is shown in Fig. 5, in which the server divides the neural network into a first part and a second part (502), sends the second part separately to the first client and the second client (504), and It may include performing the following actions until the threshold is met:

(1) 데이터 SA1 및 SA2를 생성하기 위해 제1 클라이언트 및 제2 클라이언트에서 제2 부분에 대한 순방향 단계를 동시에 수행하는 단계; (1) Simultaneously performing the forward step for the second part in the first client and the second client to generate data SA1 and SA2;

(2) 제1 클라이언트 및 제2 클라이언트로부터 SA1 및 SA2를 서버로 송신하는 단계;(2) sending SA1 and SA2 from the first client and the second client to the server;

(3) 서버에서 제1 클라이언트 및 제2 클라이언트에 대한 손실 값을 계산하는 단계;(3) calculating loss values for the first client and the second client at the server;

(4) 서버에서 제1 클라이언트와 제2 클라이언트에 걸친 평균 손실을 계산하는 단계; (4) calculating an average loss across the first client and the second client at the server;

(5) 서버에서 평균 손실을 사용하여 역전파를 수행하고 그라디언트들을 계산하는 단계; 및(5) performing backpropagation using average loss and calculating gradients in the server; and

(6) 서버로부터 제1 클라이언트 및 제2 클라이언트로 그라디언트를 발송하는 단계(506).(6) Sending the gradient from the server to the first client and the second client (506).

위의 동작들을 수행하는 컴퓨팅 디바이스 또는 디바이스들은 또한 실행될 때 프로세서가 이러한 동작들을 수행하게 하는 명령어들을 저장하는 컴퓨터 판독가능 저장 디바이스로 다루어질 수 있다. 동작들은 임의의 순서로 수행될 수 있으며 방법은 하나 이상의 동작들을 포함할 수 있다.A computing device or devices that perform the above operations may also be addressed as a computer readable storage device storing instructions that, when executed, cause a processor to perform these operations. The actions may be performed in any order and a method may include one or more actions.

본 개시의 다른 양태에서, 위에 통합된 특허 애플리케이션들에 설명된 플랫폼들은 임의의 연합 모델들에서 데이터를 전후로 통신하기 위한 기초를 제공할 수 있다. 예를 들어, 클라이언트들의 각각은 및/또는 서버는 플랫폼 또는 본 명세서에 통합된 애플리케이션에서 참조되는 플랫폼 버전들 중 하나에 로그온해야 될 수도 있다. 따라서, 이러한 애플리케이션들에 개시된 바와 같이 구성된 플랫폼 또는 교환을 통해 이 기능을 전달하는 것도 본 개시의 양태로서 다루어진다.In another aspect of this disclosure, the platforms described in Patent Applications incorporated above may provide a basis for communicating data back and forth in any federated models. For example, each of the clients and/or server may need to log on to the platform or one of the platform versions referenced in the applications incorporated herein. Accordingly, delivery of this functionality via a platform or exchange configured as disclosed in these applications is also treated as an aspect of the present disclosure.

다른 양태에서, 고객은 전파될 필요가 있는 가중치들을 나타내는 SA, SB 라인들(벡터들 및 숫자들)을 선택할 수 있다. 클라이언트가 데이터에 대해 서버가 알지 못하는 상태에서 데이터를 잠그기를 원하면, 해당 데이터를 동형 암호화할 수 있다. 암호화 프로세스(임의의 암호화 프로세스를 포함할 수 있음)는 위에서 설명한 모든 접근법에서 사용될 수 있다.In another aspect, the customer can select the SA, SB lines (vectors and numbers) representing the weights that need to be propagated. If a client wants to lock data about it without the server knowing about it, it can homomorphically encrypt that data. An encryption process (which may include any encryption process) may be used in any of the approaches described above.

상기 통합된 특허 애플리케이션들은 클라이언트 디바이스들 및/또는 서버들이 본 명세서에 개시된 연합 분할 러닝 접근법을 수행하기 위해 로그인할 수 있거나 로그인이 요구될 수 있는 예시적인 플랫폼들을 제공한다.The above integrated patent applications provide example platforms on which client devices and/or servers can log in or require log in to perform the federated partitioned learning approach disclosed herein.

일 양태에서, 본 명세서에 개시된 단계들은 "시스템"에 의해 실행될 수 있음에 유의한다. 시스템은 서버와 하나 이상의 클라이언트들을 함께 포함하거나, 서버에서 수행하는 기능일 수 있다. 시스템은 또한 본 명세서에 개시된 클라이언트 기반 기능들을 수행하는 특정 지리적 영역의 클라이언트들 또는 클라이언트들 그룹들과 같은 클라이언트들 또는 클라이언트들의 그룹일 수 있다. 일 양태에서, "서버"는 또한 서버 측 상의 컴퓨팅 디바이스(물리적 또는 가상) 및 클라이언트 측 상의 컴퓨팅 디바이스(물리적 또는 가상)일 수 있다. 일례에서, 서버는 클라이언트 측에 있을 수 있고 개별 클라이언트 측 모델 Ai의 역전파 출력을 수신할 수 있고 훈련 라운드에서 클라이언트 측 글로벌 모델을 동기화할 수 있다.Note that in one aspect, the steps disclosed herein may be performed by a “system”. A system may include a server and one or more clients together, or it may be a function performed by a server. A system may also be clients or groups of clients, such as clients or groups of clients in a particular geographic area, that perform the client-based functions disclosed herein. In one aspect, a “server” can also be a computing device (physical or virtual) on the server side and a computing device (physical or virtual) on the client side. In one example, the server may be on the client side and may receive the backpropagation output of individual client side models Ai and synchronize the client side global model in training rounds.

따라서, 서버 측 시스템 및 클라이언트 측 시스템 각각은 본 명세서에 개시된 동작들 중 임의의 하나 이상을 수행할 수 있다. 본 명세서에 공개된 임의의 디바이스의 관점에서 발생하는 단계들을 설명하는 청구항들이 포함될 수 있다. 예를 들어, 데이터의 송신, 계산 및 수신 단계들은 어떤 실시예가 다루어지는지에 따라 서버 디바이스, 클라이언트 디바이스 또는 클라이언트 디바이스들의 그룹의 관점으로부터 청구될 수 있다. 개별 컴포넌트 또는 디바이스의 관점에서 이러한 모든 통신은 해당 디바이스에 포커스를 둔 특정 실시예의 범위 내에 포함될 수 있다.Accordingly, each of the server-side system and the client-side system may perform any one or more of the operations disclosed herein. Claims may be included reciting steps occurring in the context of any device disclosed herein. For example, the steps of transmitting, computing and receiving data may be claimed from the perspective of a server device, a client device or a group of client devices depending on which embodiment is being addressed. All such communications from the perspective of an individual component or device may fall within the scope of the particular embodiment focused on that device.

다른 양태에서, 시스템은 참조로 포함된 특허 애플리케이션들에 개시된 플랫폼을 포함할 수 있으며 또한 위에 개시된 개념과 협력하여 단계들을 수행할 수 있다. 따라서, 본 명세서에 설명된 연합 분할 러닝 프로세스를 제공하는 데 사용되는 플랫폼은 또한 본 개시의 실시예이며 본 명세서에 설명된 대로 데이터의 프라이버시를 유지하는 방식으로 모델들을 훈련하기 위한 해당 플랫폼의 사용과 관련하여 단계들이 나열될 수 있다.In another aspect, the system may include the platform disclosed in the patent applications incorporated by reference and may also perform steps in concert with the concepts disclosed above. Accordingly, the platform used to provide the federated segmentation learning process described herein is also an embodiment of the present disclosure and the use of that platform to train models in a manner that maintains the privacy of data as described herein and Associated steps may be listed.

일반적으로 신경망의 훈련은 유사한 데이터 유형들에 대해 수행된다. 예를 들어, 환자 이미지나 신장을 수신하여 암을 식별하도록 훈련된 신경망은 암이 있는 것과 그렇지 않은 신장들의 이미지들에 대해 훈련된다. 다음은 본 명세서에 개시된 연합 분할 러닝 접근법들을 사용하여, 신경망을 훈련하기 위해 다른 유형들의 훈련 데이터를 함께 사용하는 훈련에 대한 새로운 접근법에 대해 설명된다.In general, training of neural networks is performed on similar data types. For example, a neural network trained to identify cancer by receiving patient images or kidneys is trained on images of cancerous and non-cancerous kidneys. The following describes a new approach to training that uses different types of training data together to train a neural network, using the federated partitioned learning approaches disclosed herein.

다중 모델 인공 지능 접근법A multi-model artificial intelligence approach

위에서 언급한 바와 같이, MMAI 혁신은 연합 분할 러닝의 예에서 설명된 "수직 집계(vertical aggregation)" 아이디어를 기반으로 한다. 예는 동일한 유형의 데이터-이미지들(적층용) 또는 수직으로 결합되는 표 형식의 데이터를 제공하는 세 개의 클라이언트들(406, 408, 410) 모두와 관련된다. 발명가들이 수직 집계 개념을 고려하고 있을 때, 이것이 다른 유형들의 데이터로 수행될 수 있음을 깨달았다. 예를 들어, 클라이언트 1은 이미지들을 제공할 수 있고, 클라이언트 2는 혈액 검사를 제공할 수 있으며, 클라이언트 3은 의사들에게 텍스트 메모들을 제공할 수 있다. 중요한 차이점은 이러한 모든 데이터 유형들에는 서로 다른 네트워크 아키텍처들이 필요하다는 것이다. 이 경우, 시스템의 개발자들은 하나의 네트워크를 정의한 다음 서버가 그것을 "분할"하도록 할 수 없다. 따라서, 솔루션의 일부는 사용자들이 각 데이터 공급자에 대해 "분할 전" 네트워크를 정의한 다음, 서버에서 네트워크 및 집계 기술을 정의하도록 하는 것이다. 이 접근법은 도 6 내지 10에 예시되어 있다.As mentioned above, the MMAI innovation is based on the idea of “vertical aggregation” described in the example of federated segmentation learning. An example involves all three clients 406, 408, 410 providing the same type of data-images (for stacking) or vertically combined tabular data. When the inventors were considering the vertical aggregation concept, they realized that this could be done with other types of data. For example, client 1 may provide images, client 2 may provide blood tests, and client 3 may provide text notes to doctors. An important difference is that all these data types require different network architectures. In this case, the developers of the system cannot define one network and then have the server "split" it. So, part of the solution is to have users define a "before splitting" network for each data provider, then define the network and aggregation technology at the server. This approach is illustrated in Figures 6-10.

도 6은 다중 모드 인공 지능(MMAI) 플랫폼 또는 머신 러닝(ML) 플랫폼(600)을 예시한다. MMAI 접근법은 다른 접근법들의 계산 요구 사항들 및 통신 오버헤드를 감소시킨다. 또한, 훈련 속도가 훨씬 빠르고 프로세스는, 모델도 개인적으로 유지된다는 사실을 포함하여, 데이터에서 훨씬 더 높은 프라이버시를 유지한다.6 illustrates a multimodal artificial intelligence (MMAI) platform or machine learning (ML) platform 600 . The MMAI approach reduces the computational requirements and communication overhead of other approaches. Also, training is much faster and the process maintains a much higher privacy in the data, including the fact that the model is also kept private.

MMAI 플랫폼(600)은 하나의 대형 AI 모델의 여러 데이터 유형들에 AI/ML 기술들을 적용한다. 일반적으로, 정확한 결과들을 산출하려면 데이터 유형들에 따라 다양한 AI 네트워크 아키텍처들이 필요하다. 예를 들어, 이미지에는 일반적으로 특수 필터들(컨볼루션들)이 필요한 반면, 텍스트 또는 음성에는 다른 "시계열과 같은" 처리가 필요하며, 표 형식 데이터는 종종 ML 또는 피드 순방향 아키텍처들에서 가장 잘 작동한다. 이슈는 이미지들이 모든 픽셀들을 함께 살펴보고 다양한 방식들로 "컨볼루션"할 때 가장 잘 이해되는 반면, 음성은 특정 사운드 전후(즉, 시계열 데이터와 유사한 방식으로)의 맥락에서 가장 잘 이해된다는 것 등이다. 이러한 처리 방식의 차이들로 인해, 오늘날 "최첨단" 시스템들은 일반적으로 하나의 데이터 유형(즉, 이미지들, 텍스트, 음성, 표 형식 등)을 처리한다.The MMAI platform 600 applies AI/ML techniques to multiple data types in one large AI model. In general, different AI network architectures are required for different types of data to produce accurate results. For example, images usually require special filters (convolutions), while text or speech require other "time-series-like" processing, and tabular data often works best in ML or feed-forward architectures. do. The issue is that images are best understood when looking at all the pixels together and "convolved" in various ways, whereas speech is best understood in the context of before and after a particular sound (i.e. in a manner similar to time series data), etc. to be. Because of these processing differences, today's "state of the art" systems typically process one type of data (ie images, text, audio, tabular, etc.).

대부분의 AI 연구자들은 "차세대" 정확도의 브레이크스루(breakthrough)들이 자신들의 모델들에 더 많은 고유 데이터를 추가함으로써 달성될 수 있음을 인식한다. 이는 본질적으로 모델에 더 많은 데이터를 제공하여 사례들에서 흥미로운 차이점들을 발견할 수 있는 더 많은 맥락을 제공하는 것과 같다. 이러한 개념의 예는 심전도(ECG) 데이터를 검사하여 심방세동(A-fib)을 진단하는 모델이다. 이 모델은 심전도 데이터만으로도 일정 수준의 정확도에 도달할 수 있지만 연구자들이 심전도 데이터에 나이, 성별, 키 및 체중을 추가하면 모델이 훨씬 더 정확해진다. 정확도의 증가는 "동등한" 심전도들과 같이 모델이 어떻게 보일지 모델이 더 잘 이해하는 데 도움이 될 수 있는 네 가지 추가 데이터 유형들 때문이다. 네 가지 항목들 또는 데이터의 특성들을 추가하면 데이터를 더 세분화할 수 있다.Most AI researchers recognize that "next-generation" breakthroughs in accuracy can be achieved by adding more unique data to their models. This is essentially like giving the model more data, giving it more context to discover interesting differences in cases. An example of this concept is a model that diagnoses atrial fibrillation (A-fib) by examining electrocardiogram (ECG) data. The model can reach a certain level of accuracy with ECG data alone, but the model becomes even more accurate when researchers add age, gender, height, and weight to the ECG data. The increase in accuracy is due to four additional data types that can help the model better understand what it might look like, such as "equivalent" ECGs. You can further refine your data by adding four items, or characteristics of your data.

도 6에 도시된 MMAI 플랫폼(600)은 개인 데이터의 훈련 및 보호를 개선하기 위해 신세대 암호화 도구세트를 소개한다. MMAI 플랫폼(600)은 AI/ML 모델들을 훈련하고 데이터를 확장하는데 일반적으로 사용되는 것보다 더 많은 데이터를 모델에 제공한다. 이 접근법은 다양한 데이터 유형들(예를 들어, 이미지들 및 표 형식 데이터)을 결합하여 상당한 양의 데이터를 추가한다.The MMAI platform 600 shown in FIG. 6 introduces a new generation cryptographic toolset to improve training and protection of personal data. The MMAI platform 600 provides more data to the model than is normally used to train AI/ML models and augment the data. This approach adds significant amounts of data by combining various data types (eg, images and tabular data).

도 6은 웰스 파고(Wells Fargo) 은행으로 도시된 데이터의 제1 외부 소스(602)를 예시한다. 웰스 파고 데이터(602a)는 암호화되고(602b), 암호화된 데이터(602c) 패키지는 개인 AI 기반시설(603)로 송신된다. 데이터의 제2 외부 소스(604)는 시티은행(Citibank)으로 도시된다. 시티은행 데이터(604a)는 암호화(604b)되고 암호화된 데이터(604c) 패키지는 개인 AI 기반시설(603)로 송신된다. 데이터의 제3 외부 소스(606)는 뱅크 오브 아메리카(Bank of America)로 도시된다. 뱅크 오브 아메리카 데이터(606a)는 암호화되고(606b) 암호화된 데이터(606c) 패키지는 개인 AI 기반시설(603)로 송신된다. AI 기반시설(603)은 이질적인 소스들(602, 604, 606)으로부터의 모든 데이터(610)를 개인적으로 탐색, 선택 및 전처리하는 제1 모듈(608)을 포함한다. 이 예에서, 모든 소스들은 은행들로 식별되지만 서로 다른 구조들을 가지며, 개별 데이터도 이질적일 수 있다. 물론, 데이터의 모든 외부 소스들(602, 604, 606)이 동일한 유형, 즉 은행들일 필요는 없다. 은행들의 사용은 단지 예일뿐이다. 외부 소스들(602, 604, 606)은, 예를 들어 병원, 클리닉, 대학 등이 될 수 있다. 기본 개념은 데이터 유형들이 다양한 외부 소스들(602, 604, 606)과 다를 수 있다는 것이다.6 illustrates a first external source 602 of data depicted as a Wells Fargo bank. The Wells Fargo data 602a is encrypted 602b and the package of encrypted data 602c is sent to the personal AI infrastructure 603. A second external source 604 of data is shown as Citibank. Citibank data 604a is encrypted 604b and the encrypted data 604c package is sent to the personal AI infrastructure 603. A third external source 606 of data is shown as Bank of America. The Bank of America data 606a is encrypted 606b and the encrypted data 606c package is sent to the private AI infrastructure 603. The AI infrastructure 603 includes a first module 608 that personally searches, selects and pre-processes all data 610 from disparate sources 602 , 604 , 606 . In this example, all sources are identified as banks, but have different structures, and individual data may also be heterogeneous. Of course, not all external sources of data 602, 604, 606 need be of the same type, ie banks. The use of banks is just an example. External sources 602, 604, 606 may be, for example, hospitals, clinics, universities, and the like. The basic idea is that data types can be different from various external sources (602, 604, 606).

개인 AI 기반시설(603)은 훈련을 위해 수신하는 모든 데이터(602c, 604c, 606c)로부터 관련 피쳐들을 개인적으로 탐색, 선택 및 전처리하는 컴포넌트를 포함할 수 있다. 피쳐(612)는 개인 AI 기반시설(603)에서 컴포넌트의 처리로부터 발생할 수 있는 데이터(610)의 서브세트를 나타낸다. 동작들(614, 616)에서, AI 기반시설(603)은 선택된 데이터(612)에 대한 새로운 딥 및 통계 모델들을 개인적으로 훈련하고 동작(618)에서 이미지들, 비디오, 텍스트 및/또는 기타 데이터 유형들을 포함할 수 있는 임의의 개인적이고 민감한 데이터에 대해 예측할 것이다. AI 기반시설(603)은 그 다음 동작(620)에서 제시되는 새로운 모델들에 대한 액세스를 판매하거나 허가할 수 있다.Personal AI infrastructure 603 may include components that personally search for, select, and pre-process relevant features from all data 602c, 604c, and 606c it receives for training. Feature 612 represents a subset of data 610 that may result from the processing of components in personal AI infrastructure 603 . In operations 614 and 616, the AI infrastructure 603 privately trains new deep and statistical models on the selected data 612 and in operation 618 images, video, text and/or other data types. for any personal and sensitive data that may contain The AI infrastructure 603 may then sell or grant access to the new models presented in operation 620 .

도 7은 분할 러닝 기술(700)에 대한 또 다른 변형을 예시한다. 이 접근법은 데이터의 이질적인 유형들에 기초한 훈련을 위한 블라인드 상관 프로세스를 사용함으로써 모델들의 훈련을 개선하기 위해 낮은 컴퓨팅 요구사항들 및 낮은 통신 오버헤드를 제공한다. 위의 A-fib 모델 예를 기반으로, 모델에 대한 더 많은 데이터의 또 다른 소스는 모델이 고려하는 각 사례에 대한 흉부 X-선을 포함하는 것이다. 불행히도, X-선 영상의 일반적인 처리는 표 형식의 심전도 데이터의 일반적인 처리와 일치하지 않다. 약간의 마이너한 엔지니어링을 추가하면, 위에서 설명한 분할 연합 러닝 도구를 사용하여 이 비호환성 문제를 해결할 수 있다. 즉, 다른 데이터 유형들이 기존 파이프라인에서 처리할 수 있도록 도구에 새 명령어들이 제공될 수 있다.7 illustrates another variation on the split learning technique 700 . This approach provides low computing requirements and low communication overhead to improve training of models by using a blind correlation process for training based on heterogeneous types of data. Based on the A-fib model example above, another source of more data for the model is the inclusion of chest X-rays for each case the model considers. Unfortunately, the general processing of X-ray images does not correspond to the general processing of tabular ECG data. With a bit of minor engineering added, this incompatibility can be fixed using the split-federation learning tool described above. That is, new instructions can be provided to the tool so that different data types can be processed in the existing pipeline.

이 경우에 네트워크 아키텍처의 "자동" 분할보다는 아이디어에 대한 이러한 변형을 통해 네트워크 설계자(즉, 알고리즘을 개발하는 데이터 과학자)가 각 데이터 유형에 대해 원하는 특정 네트워크 컴포넌트를 지정할 수 있다. 각 데이터 유형에는 해당 데이터 유형과 관련된 네트워크 아키텍처 레이어들이 필요하다(즉 이미지들에 대한 컨볼루션 레이어들, 음성에 대한 순환 레이어들/장기 단기 메모리 레이어, 표 형식 데이터에 대한 피드 순방향 레이어들 등). 문제의 데이터 유형에 고유한 이러한 이질적인 레이어들은 "데이터 서버" 측에서 실행되도록 지정된다(거의 그 자체로 독립 네트워크들과 유사). 각 "독립 네트워크"(데이터 유형별)의 마지막 레이어는 활성화들을 "분할을 통해" "서버 측"으로 발송한다. 알고리즘 서버 측은 들어오는 활성화들(데이터 서버 측으로부터)을 적절하게 처리하는 하나의 일관된 "네트워크"를 가진다. 어떤 면들에서 이 접근법은 (데이터 서버 측에서) "네트워크들의 앙상블"이 알고리즘 서버 측에서 하나의 최종 네트워크로 집계되는 것과 유사하다(궁극적으로 네트워크들의 "앙상블"로부터 최종 "답변"을 생성함).In this case, rather than "automatic" segmentation of the network architecture, this transformation of the idea allows the network designer (i.e., the data scientist developing the algorithm) to specify the specific network components desired for each type of data. Each data type requires layers of the network architecture associated with that data type (ie convolutional layers for images, recursive layers/long-term short-term memory layers for speech, feed-forward layers for tabular data, etc.). These disparate layers, specific to the type of data in question, are designated to run on the "data server" side (almost like independent networks per se). The last layer of each "independent network" (by data type) sends the activations "through the split" to the "server side". The algorithm server side has one consistent "network" that properly handles incoming activations (from the data server side). In some ways this approach is similar to how an "ensemble of networks" (on the data server side) is aggregated into one final network on the algorithm server side (ultimately generating a final "answer" from the "ensemble" of networks).

분할 러닝은 협력 딥 러닝 기술로, 딥 러닝 네트워크 또는 신경망(NN)은 위에서 논의된 바와 같이 클라이언트 측 네트워크 A와 서버 측 네트워크 B의 두 부분으로 분할될 수 있다. NN에는 가중치, 편향, 하이퍼파라미터들을 포함한다. 도 7에서, 데이터가 상주하는 클라이언트들(702, 704, 706)은 네트워크의 클라이언트측 부분에만 커미팅(committing)하고 서버(710)는 네트워크의 서버측 부분에만 커밋한다. 클라이언트 측과 서버 측 부분은 집합적으로 전체 네트워크 NN을 형성한다.Segmented learning is a collaborative deep learning technique, in which a deep learning network or neural network (NN) can be split into two parts, a client-side network A and a server-side network B, as discussed above. A NN includes weights, biases, and hyperparameters. In Figure 7, clients 702, 704, 706 where the data resides are only committing to the client-side portion of the network and server 710 is only committing to the server-side portion of the network. The client-side and server-side parts collectively form the overall network NN.

네트워크의 훈련은 일련의 분산된 훈련 프로세스들에 의해 수행된다. 순방향 전파와 역전파는 다음과 같이 일어날 수 있다. 원시 데이터를 사용하여 클라이언트(클라이언트(702)로 가정)는 절단 레이어 또는 분할 레이어라고 부를 수 있는 네트워크의 특정 레이어까지 클라이언트 측 네트워크(702A)를 훈련하고, 절단 레이어의 활성화들을 서버(710)로 발송한다. 서버(710)는 클라이언트(702)로부터 수신한 활성화들로 NN의 나머지 레이어들을 훈련시킨다. 이것은 단일 순방향 전파 단계를 완료한다. 유사한 프로세스가 제2 클라이언트(704) 및 그의 클라이언트 측 네트워크(704A) 및 서버(710)로 송신되는 그의 데이터 및 생성된 활성화들에 대해 병렬로 발생한다. 추가 유사한 프로세스가 제3 클라이언트(706) 및 그의 클라이언트 측 네트워크(706A) 및 서버(710)로 송신되는 그의 데이터 및 생성된 활성화들에 병렬로 발생한다.Training of the network is performed by a series of distributed training processes. Forward propagation and back propagation can happen as follows. Using the raw data, a client (assuming client 702 ) trains the client-side network 702A to a specific layer of the network, which may be referred to as a truncated layer or segmentation layer, and sends activations of the truncated layer to server 710 . do. Server 710 trains the remaining layers of the NN with activations received from client 702 . This completes a single forward propagation step. A similar process occurs in parallel for the second client 704 and its data sent to the client-side network 704A and server 710 and the activations generated. A further similar process occurs in parallel with the third client 706 and its data being sent to its client-side network 706A and server 710 and the activations generated.

다음으로, 서버(710)는 절단 레이어까지 역전파를 수행하고 활성화들의 그라디언트들을 개별 클라이언트들(702, 704, 706)로 발송한다. 그라디언트들을 사용하여 각각의 개별 클라이언트(702, 704, 706)는 나머지 네트워크(702A, 704A, 706A)에서 역전파를 수행한다. 이것은 클라이언트(702, 704, 706)와 서버(710) 사이의 역전파의 단일 패스를 완료한다.Next, server 710 performs backpropagation up to the truncation layer and sends gradients of activations to individual clients 702, 704, 706. Using the gradients, each individual client 702, 704, 706 performs backpropagation on the rest of the network 702A, 704A, 706A. This completes a single pass of backpropagation between clients 702, 704, 706 and server 710.

순방향 전파 및 역전파의 이 프로세스는 네트워크가 모든 이용 가능한 클라이언트들(702, 704, 706)로 훈련되고 그 수렴에 도달할 때까지 계속된다. 분할 러닝에서 아키텍처 구성들은 메인 서버(710)에 직접 액세스하는 신뢰할 수 있는 당사자에 의해 수행되는 것으로 가정한다. 이 승인된 당사자는 러닝의 시작에서 ML 모델(애플리케이션 기반) 및 네트워크 분할(절단 레이어 찾기)을 선택한다.This process of forward propagation and back propagation continues until the network is trained with all available clients 702, 704, 706 and convergence is reached. It is assumed that the architectural configurations in split learning are performed by a trusted party with direct access to the main server 710 . This authorized party selects the ML model (application-based) and network segmentation (finding the cutting layer) at the start of the run.

전술한 바와 같이, 본 개시에 도입된 개념은 각각 상이한 유형의 데이터를 제공하지만 또한 상이한 유형들의 데이터가 공통 연관을 갖는 클라이언트들(702, 704, 706)에 관한 것이다. 따라서, 머신 러닝 모델의 선택은 클라이언트 측에서 처리되는 데이터의 유형들을 기반으로 할 수 있으며, 절단 레이어를 찾는 프로세스는 데이터의 유형들 또는 서로 상이한 데이터의 유형들의 이질성에 따라 달라질 수 있다. 예를 들어, 클라이언트들(702, 704, 706)에 걸쳐 광범위하게 이질적인 데이터 유형들에 대해, 절단 레이어는 클라이언트 측 네트워크들(702A, 704A, 706A)에 더 많거나 더 적은 레이어들을 갖도록 선택될 수 있다. 다른 양태에서, 절단 레이어 또는 분할 레이어 이전의 레이어들의 수는 클라이언트들에 따라 다를 수 있다. 클라이언트(702)는 이미지들을 처리할 수 있고 절단 레이어 전에 8개의 레이어들을 필요로 하는 반면, 클라이언트(704)는 텍스트를 처리할 수 있고 절단 레이어 전에 4개의 레이어만 필요로 할 수 있다. 이와 관련하여, 절단 레이어의 벡터들, 활성화들 또는 활성화 레이어가 서로 다른 유형들의 데이터를 갖는 서로 다른 클라이언트들(702, 704, 706)에 걸쳐 일관되는 한, 클라이언트측 네트워크들(702A, 704A, 706A)의 레이어들 수가 동일할 필요는 없다.As mentioned above, the concept introduced in this disclosure relates to clients 702, 704, 706 each providing different types of data but also having a common association with the different types of data. Thus, the selection of the machine learning model may be based on the types of data processed on the client side, and the process of finding the truncation layer may vary depending on the types of data or the heterogeneity of different types of data. For example, for data types that are widely disparate across clients 702, 704, 706, the truncation layer can be chosen to have more or fewer layers in client-side networks 702A, 704A, 706A. there is. In another aspect, the number of layers before the cutting layer or segmentation layer may vary depending on the clients. Client 702 can process images and requires 8 layers before the cut layer, while client 704 can process text and only needs 4 layers before the cut layer. In this regard, as long as the vectors, activations or activation layer of the truncation layer are consistent across different clients 702, 704, 706 with different types of data, client-side networks 702A, 704A, 706A ) need not be the same number of layers.

다중 클라이언트들(702, 704, 706)과의 러닝 프로세스의 동기화는 중앙 집중식 모드 또는 피어-투-피어 모드에서 수행될 수 있다. 중앙 집중식 모드에서, 클라이언트(702, 704, 706)는 서버(710)로 훈련을 시작하기 전에, 마지막으로 훈련된 클라이언트가 업로드한 업데이트된 클라이언트 측 모델을 유지하는 신뢰할 수 있는 제3자 서버(710)로부터 모델 파라미터들을 다운로드하여 클라이언트 측 모델(702A, 704A, 706A)을 업데이트한다. 반면에, 피어-투-피어 모드에서, 클라이언트(702, 704, 706)는 마지막으로 훈련된 클라이언트에서 직접 다운로드하여 클라이언트 측 모델을 업데이트한다. 위에서 언급했듯이, 이전에 훈련된 모델들은 해당 모델을 업데이트해야 하는 현재 클라이언트와 데이터 유형 유사성을 가질 수 있다. 예를 들어, 유사성은 이미지들, 텍스트 데이터, 음성 데이터, 비디오 데이터, 시간 데이터 등인 데이터에 기초할 수 있다. 따라서, 피어에서 다운로드하는 데 사용할 이전에 훈련된 클라이언트 모델을 지능적으로 선택할 수 있다. 서버(710)에 의한 처리는 또한 일부 경우들에 서버 측의 일부 처리와 클라이언트 측의 연합 서버에서의 다른 처리 사이에서 분할될 수 있다.Synchronization of the running process with multiple clients 702, 704, 706 may be performed in a centralized mode or peer-to-peer mode. In centralized mode, clients 702, 704, and 706, before starting training with server 710, have a trusted third-party server 710 that maintains an updated client-side model uploaded by the last trained client. ) to update the client-side models 702A, 704A, 706A. On the other hand, in peer-to-peer mode, clients 702, 704, and 706 update the client-side model by downloading directly from the last trained client. As mentioned above, previously trained models may have data type similarities with the current client that needs to update that model. For example, similarity may be based on data that is images, text data, audio data, video data, temporal data, and the like. Thus, it can intelligently select a previously trained client model to use for downloading from peers. Processing by server 710 may also in some cases be split between some processing on the server side and other processing on the federated server on the client side.

위에서 소개된 바와 같이, 클라이언트 1(702), 클라이언트 2(704) 및 클라이언트 3(706)은 상이한 데이터 유형들을 가질 수 있다. 서버(710)는 네트워크의 두 부분들을 생성하고 한 부분(702A, 704A, 706A)을 모든 클라이언트들(702, 704, 706)에 발송한다. 시스템은 정확성 조건 또는 예를 들어 모든 클라이언트들이 데이터를 그들이 가지고 있는 네트워크의 일부로 발송하고 출력을 서버(710)로 발송하는 다른 조건이 충족될 때까지 특정 단계를 반복한다. 서버(710)는 각 클라이언트에 대한 손실 값과 모든 클라이언트들에 걸친 평균 손실을 계산한다. 서버(710)는 역전파 동안 계산하는 그라디언트들의 가중 평균을 사용하여 자신의 모델을 업데이트할 수 있고 그라디언트들을 다시 모든 클라이언트들(702, 704, 706)로 발송한다. 클라이언트들(702, 704, 706)은 서버(710)로부터 그라디언트들을 수신하고 각 클라이언트(702, 704, 706)는 클라이언트 측 네트워크(702A, 704A, 706A)에서 역전파를 수행하고 각 클라이언트 측 네트워크(702A, 704A, 706A)에 대한 개별 그라디언트들을 계산한다. 클라이언트측 네트워크들(702A, 704A, 706A)로부터의 개별 그라디언트들은 클라이언트측 업데이트들의 평균화를 전도하고 글로벌 결과들을 모든 클라이언트들(702, 704, 706)에 다시 발송하는 서버(710)로 다시 송신될 수 있다.As introduced above, Client 1 702, Client 2 704 and Client 3 706 may have different data types. Server 710 creates two parts of the network and sends one part (702A, 704A, 706A) to all clients (702, 704, 706). The system repeats certain steps until a correctness condition or other condition is met, for example all clients send data to their part of the network and send output to server 710. Server 710 calculates a loss value for each client and an average loss across all clients. Server 710 may update its model using the weighted average of the gradients it computes during backpropagation and sends the gradients back to all clients 702, 704, 706. Clients 702, 704, 706 receive the gradients from server 710 and each client 702, 704, 706 performs backpropagation in the client-side network 702A, 704A, 706A and returns to each client-side network ( 702A, 704A, 706A) compute individual gradients. Individual gradients from client-side networks 702A, 704A, 706A may be sent back to server 710, which inverts the averaging of client-side updates and sends global results back to all clients 702, 704, 706. there is.

서버(710) 기능은 또한 각각이 상이한 동작들(각각 상이한 영역들에 위치하는, 하나의 서버에 의한 모델 업데이트 및 다른 서버에 의한 로컬 클라이언트 업데이트들의 평균화와 같은)을 수행하는 여러 서버들로 분할될 수 있다는 점에 유의한다. 도 7의 경우, 클라이언트들(702, 704, 706)은 모두 AI 모델을 개발하기 위해 일반적으로 처리되거나 처리될 수 없는 이질적인 유형들의 데이터를 처리한다.Server 710 functionality may also be divided into several servers, each performing different operations (such as updating a model by one server and averaging local client updates by another server, each located in different regions). Note that you can In the case of FIG. 7 , clients 702 , 704 , and 706 all process disparate types of data that may or may not normally be processed to develop an AI model.

예의 목적들로, 위의 A-fib 모델이 프로세스를 예시하는데 사용될 수 있다. 클라이언트 1(702)은 심전도 데이터를 가질 수 있고, 클라이언트 2(704)는 X-선 데이터를 가질 수 있고, 클라이언트 3(706)은 유전 데이터를 가질 수 있다. 예를 들어, 클라이언트 1(702)은 병원일 수 있고, 클라이언트 2(704)는 의료 진단들 영상 회사일 수 있으며 클라이언트 3(706)은 도 6에 묘사된 방식으로 은행 또는 금융 기관일 수 있다. 클라이언트들 중 하나는 검진들을 위해 매주 병원을 방문하는 것과 관련하여 환자에 대한 점진적 정보와 같은 시간 기반 데이터를 가질 수도 있다.For example purposes, the A-fib model above may be used to illustrate the process. Client 1 (702) can have electrocardiogram data, client 2 (704) can have X-ray data, and client 3 (706) can have genetic data. For example, client 1 702 may be a hospital, client 2 704 may be a medical diagnostics imaging company and client 3 706 may be a bank or financial institution in the manner depicted in FIG. 6 . One of the clients may have time-based data, such as incremental information about a patient regarding weekly hospital visits for checkups.

도 7에 도시된 접근법은 사용자가 분할 또는 절단 레이어 이전에 또는 블라인드 역상관(decorrelation) 블록(708)에 도시된 바와 같이 "정확한" 처리와 함께 상이한 데이터 유형들을 가져오도록 하는 새로운 사용자 명령어들을 시스템이 구현할 수 있는 방법을 예시한다. 모델의 각 부분들은 독립적일 수 있으며, 독립적으로 동작한다. 일 양태에서, 블라인드 상관 블록(708)에 의해 수행된 처리는 서버(710)로 전송되는 활성화 레이어 또는 활성화들을 초래할 것이다. 이 접근법은 클라이언트들(702, 704, 706) 사이의 데이터 유형의 차이들을 추가하여 위에서 설명된 접근법과 유사하다.The approach shown in FIG. 7 allows the system to send new user instructions that allow the user to fetch different data types before a segmentation or truncation layer or with "correct" processing as shown in the blind decorrelation block 708. Illustrates how this can be implemented. Each part of the model can be independent and operate independently. In one aspect, the processing performed by blind correlation block 708 will result in an activation layer or activations being sent to server 710 . This approach is similar to the approach described above with the addition of differences in data types between clients 702, 704 and 706.

서버(710)는 다수의 방법들 중 하나로 이러한 활성화 레이어들을 결합할 것이다. 서버(710)는 그것들을 평균화할 수 있지만(또한 위에서 설명됨), 그것들을 하나의 긴 활성화 레이어로 이을 수도 있다(concatenation) . 다른 양태에서, 서버(710)는 활성화 레이어들의 원하는 조합을 달성하기 위해 임의의 수학적 함수를 적용할 수 있다. 그런 다음 서버(710)는 임의의 적절한 네트워크 아키텍처를 사용하여 결합된 활성화 레이어들을 추가로 처리할 수 있다. 일 양태에서, 클라이언트 측의 서버는 그라디언트들을 수신하고 다양한 클라이언트들(702, 704, 706)의 글로벌 모델을 생성하기 위해 그라디언트들을 평균화하고 이음 또는 추가 처리를 위해 글로벌 모델을 서버(710)에 발송할 수 있다.Server 710 will combine these activation layers in one of a number of ways. Server 710 may average them (also described above), but may also concatenate them into one long activation layer. In another aspect, server 710 may apply any mathematical function to achieve a desired combination of activation layers. Server 710 may then further process the combined activation layers using any suitable network architecture. In one aspect, the client-side server may receive the gradients, average the gradients to create a global model of the various clients 702, 704, 706 and send the global model to server 710 for subsequent or further processing. there is.

도 6 및 7에 도시된 아이디어는 분할 연합 러닝 도구 세트의 확장 및 애플리케이션을 나타내고 이질적인 데이터 유형을 슈퍼세트 AI 모델로 함께 가져오기 위한 기성 도구들의 플랫폼을 제공한다. 처리는 모두 개인적으로 수행될 수 있으며 위에서 참조한 통합 특허 애플리케이션들에 설명된 대로 제안이 시장에 포함될 수도 있다.The idea illustrated in Figures 6 and 7 represents the extension and application of the segmentation federated learning tool set and provides a platform of off-the-shelf tools for bringing disparate data types together into a superset AI model. The processing may all be conducted privately and the offer may be placed on the market as described in the incorporated patent applications referenced above.

시스템은 상이한 데이터 유형들을 결합할 수 있을 뿐만 아니라, 시스템은 또한 상이한 AI/ML 기술들을 결합할 수 있다. 예를 들어, 클라이언트 1(702)은 CNN(컨볼루션 신경망)일 수 있고, 클라이언트 2(704)는 ML 루틴(즉, XGBoost)일 수 있으며, 클라이언트 3(706)도 다른 기술을 적용할 수 있다. 이와 관련하여 서로 상이한 AI/ML 기술들은 상이하지만 절단 레이어의 결과 데이터가 일관되고 적절하게 구성되어 있으면, 순방향 전파와 역전파가 발생할 수 있고 모델들은 훈련될 수 있다.Not only can the system combine different data types, the system can also combine different AI/ML techniques. For example, client 1 702 can be a CNN (convolutional neural network), client 2 704 can be an ML routine (i.e. XGBoost), and client 3 706 can also apply other techniques. . Different AI/ML technologies are different in this regard, but if the result data of the truncation layer is consistent and properly structured, forward and back propagation can occur and models can be trained.

당업자가 MMAI 접근법이 어떻게 작동하는지 이해하는 것을 돕기 위해 다음은 3개의 데이터 제공자들(702, 704, 706)로부터 오는 데이터 유형별 실제 커맨드들의 예이다. 이 코드는 파이썬 넘버링 규칙을 사용하므로 builder0(데이터 공급자 1(702)의 표 형식 데이터)으로 시작한다. 이 예에서 builder1은 CT 스캔 또는 이미지 데이터용이다. 커맨드들은 X-선, MRI 및/또는 기타 사진에 대해 유사하다. Builder2(데이터 제공자(704)로부터)는 텍스트 데이터이다. "장기/단기 메모리"의 줄임말인 "lstm" 커맨드에 유의한다. "서버" 빌더 커맨드들은 분할 반대 측의 "상단"에서 나머지 세 개를 집계하는 네트워크를 정의한다.To help those skilled in the art understand how the MMAI approach works, the following are examples of actual commands by data type coming from three data providers 702, 704, 706. This code uses Python numbering conventions, so it starts with builder0 (tabular data from data provider 1 (702)). In this example, builder1 is for CT scan or image data. Commands are similar for X-ray, MRI and/or other imaging. Builder2 (from data provider 704) is text data. Note the "lstm" command, short for "long/short memory". The "Server" builder commands define a network that aggregates the other three at the "top" on the opposite side of the split.

builder0 = tb.NetworkBuilder()builder0 = tb.NetworkBuilder()

builder0.add_dense_layer(100, 120)builder0.add_dense_layer(100, 120)

builder0.add_relu()builder0.add_relu()

builder0.add_dense_layer(120, 160)builder0.add_dense_layer(120, 160)

builder0.add_relu()builder0.add_relu()

builder0.add_dropout(0.25)builder0.add_dropout(0.25)

builder0.add_dense_layer(160, 200)builder0.add_dense_layer(160, 200)

builder0.add_relu()builder0.add_relu()

builder0.add_split()builder0.add_split()

builder1 = tb.NetworkBuilder()builder1 = tb.NetworkBuilder()

builder1.add_conv2d_layer(1, 32, 3, 1)builder1.add_conv2d_layer(1, 32, 3, 1)

builder1.add_batchnorm2d(32)builder1.add_batchnorm2d(32)

builder1.add_relu()builder1.add_relu()

builder1.add_max_pool2d_layer(2, 2)builder1.add_max_pool2d_layer(2, 2)

builder1.add_conv2d_layer(32, 64, 3, 1)builder1.add_conv2d_layer(32, 64, 3, 1)

builder1.add_batchnorm2d(64)builder1.add_batchnorm2d(64)

builder1.add_relu()builder1.add_relu()

builder1.add_max_pool2d_layer(2, 2)builder1.add_max_pool2d_layer(2, 2)

builder1.add_flatten_layer()builder1.add_flatten_layer()

builder1.add_split()builder1.add_split()

builder2 = tb.NetworkBuilder()builder2 = tb.NetworkBuilder()

builder2.add_lstm_layer(39, 100, batch_first=True)builder2.add_lstm_layer(39, 100, batch_first=True)

builder2.add_dense_layer(100, 39)builder2.add_dense_layer(100, 39)

builder2.add_split()builder2.add_split()

server_builder = tb.NetworkBuilder()server_builder = tb.NetworkBuilder()

server_builder.add_dense_layer(60000, 8000),server_builder.add_dense_layer(60000, 8000),

server_builder.add_relu()server_builder.add_relu()

server_builder.add_dense_layer(8000, 1000),server_builder.add_dense_layer(8000, 1000),

server_builder.add_relu()server_builder.add_relu()

server_builder.add_dense_layer(1000, 128),server_builder.add_dense_layer(1000, 128),

server_builder.add_relu()server_builder.add_relu()

server_builder.add_dense_layer(128, 1)server_builder.add_dense_layer(128, 1)

도 8은 클라이언트들의 관점에서 MMAI 개념을 제공하기 위한 예시적인 방법(800)을 예시한다. 방법은 제1 데이터 소스로부터 데이터의 제1 세트를 수신하는 단계, 제1 데이터 유형을 갖는 제1 데이터 세트(802), 제1 데이터 세트에 대해 제1 클라이언트측 네트워크를 훈련하고 제1 활성화들을 생성하는 단계(804), 제2 데이터 소스로부터 제2 데이터 세트를 수신하는 단계, 제2 데이터 유형을 갖는 제2 데이터 세트(806) 및 제2 데이터 세트에 대해 제2 클라이언트측 네트워크를 훈련하고 제2 활성화들을 생성하는 단계(808)를 포함한다.8 illustrates an example method 800 for presenting MMAI concepts from the perspective of clients. The method includes receiving a first set of data from a first data source, a first data set (802) having a first data type, training a first client-side network on the first data set and generating first activations. ( 804 ) receiving a second data set from a second data source, a second data set ( 806 ) having a second data type and training a second client-side network on the second data set and Generating 808 activations.

방법은 제1 활성화들 및 제2 활성화들을 서버 측 네트워크로 송신하는 단계-여기서, 서버 측 네트워크는 그라디언트들(810)을 생성하기 위해 제1 활성화들 및 제2 활성화들에 기초하여 훈련됨(810)-, 및 제1 클라이언트 측 네트워크 및 제2 클라이언트 측 네트워크(812)에서의 그라디언트를 수신하는 단계를 더 포함할 수 있다. 제1 데이터 유형 및 제2 데이터 유형은 하나는 이미지 기반이고 다른 하나는 음성에서와 같이 텍스트 또는 시간 기반인 것과 같이 서로 상이한 데이터 유형들일 수 있다.The method comprises sending first activations and second activations to a server-side network, wherein the server-side network is trained (810) based on the first activations and second activations to generate gradients (810). )-, and receiving the gradient in the first client-side network and the second client-side network 812 . The first data type and the second data type may be different data types such as one being image based and the other being textual or time based as in speech.

도 9는 서버(710)와 하나 이상의 클라이언트들(702, 704, 706)의 관점에서 예시적인 방법(900)을 예시한다. 방법은 신경망을 제1 클라이언트 측 네트워크, 제2 클라이언트 측 네트워크 및 서버 측 네트워크(902)로 분할하는 단계, 제1 클라이언트측 네트워크를 제1 클라이언트로 발송하는 단계-여기서 제1 클라이언트 측 네트워크는 제1 클라이언트로부터의 제1 데이터를 처리하도록 구성되고, 제1 데이터는 제1 유형을 갖고 제1 클라이언트 측 네트워크는 적어도 하나의 제1 클라이언트 측 레이어(904)를 포함함-, 및 제2 클라이언트 측 네트워크를 제2 클라이언트로 발송하는 단계를 포함하고, 제2 클라이언트 측 네트워크는 제2 클라이언트로부터의 제2 데이터를 처리하도록 구성되고, 제2 데이터는 제2 유형을 갖고, 제2 클라이언트 측 네트워크는 적어도 하나의 제2 클라이언트 측 레이어를 포함할 수 있고, 여기서 제1 유형 및 제2 유형은 공통 연관(906)을 갖는다.9 illustrates an exemplary method 900 from the perspective of a server 710 and one or more clients 702 , 704 , 706 . The method comprises dividing a neural network into a first client-side network, a second client-side network and a server-side network 902, forwarding the first client-side network to a first client, wherein the first client-side network comprises a first configured to process first data from a client, wherein the first data is of a first type and the first client-side network comprises at least one first client-side layer (904); and a second client-side network. sending to a second client, wherein the second client-side network is configured to process second data from the second client, the second data has a second type, and the second client-side network includes at least one It may include a second client-side layer, where the first type and the second type have a common association (906).

방법은 제1 클라이언트로부터의 제1 데이터에 대해 제1 클라이언트 측 네트워크를 훈련시키고 제1 활성화들을 생성하는 단계(908), 제1 클라이언트 측 네트워크로부터 서버 측 네트워크로 제1 활성화들을 송신하는 단계(910), 제2 클라이언트로부터의 제2 데이터에 대해 제2 클라이언트 측 네트워크를 훈련시키고, 제2 활성화들을 생성하는 단계(912), 제2 클라이언트 측 네트워크로부터 서버 측 네트워크로 제2 활성화들을 송신하고(914), 그라디언트들을 생성하기 위해 제1 활성화들 및 제2 활성화들에 기초하여 서버 측 네트워크의 적어도 하나의 서버 측 레이어를 훈련시키는 단계(916) 및 서버 측 네트워크로부터 제1 클라이언트 측 네트워크 및 제2 클라이언트 측 네트워크로 그라디언트들을 송신하는 단계(918)를 더 포함할 수 있다.The method includes training a first client-side network on first data from a first client and generating first activations (908), sending (910) first activations from the first client-side network to a server-side network. ), train the second client-side network on second data from the second client, generate second activations (912), send second activations from the second client-side network to the server-side network (914 ), training (916) at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients and from the server-side network the first client-side network and the second client-side layer. A step 918 of transmitting the gradients to the side network may be further included.

데이터의 이질적인 유형들 사이의 공통 연관은 디바이스, 사람, 소비자, 환자, 비즈니스, 개념, 의학적 상태, 사람들의 그룹, 프로세스, 제품 중 및/또는 서비스 중 적어도 하나를 포함할 수 있다. 임의의 개념, 디바이스 또는 사람은 상이한 클라이언트들로부터 제공되고 절단 또는 분할 레이어까지 상이한 독립적인 클라이언트 측 네트워크들에 의해 처리되는 데이터의 다양한 이질적인 유형들에 대한 공통 연관 또는 주제가 될 수 있다. 서버 측 네트워크는 글로벌 머신 러닝 모델을 포함할 수 있다. 신경망은 가중치들, 바이어스 및 하이퍼파라미터들을 포함할 수 있다. 하이퍼파라미터들은 일반적으로 토폴로지 파라미터 또는 신경망의 크기와 같이 러닝 프로세스를 제어하는 데 값이 사용되는 파라미터와 관련된다. 예를 들어, 러닝 레이트, 미니 배치 크기, 클라이언트 측 레이어들의 수 또는 상이한 데이터 유형들에 영향을 미치거나 관련될 수 있는 프로세스 제어와 관련된 임의의 파라미터가 하이퍼파라미터를 나타낼 수 있다.A common association between disparate types of data may include at least one of a device, person, consumer, patient, business, concept, medical condition, group of people, process, product, and/or service. Any concept, device or person can be a common association or subject for the various disparate types of data provided from different clients and processed by different independent client-side networks down to the truncation or segmentation layer. The server-side network may include a global machine learning model. A neural network can include weights, biases and hyperparameters. Hyperparameters are generally related to parameters whose values are used to control the learning process, such as topological parameters or the size of a neural network. For example, running rate, mini-batch size, number of client-side layers, or any parameter related to process control that may affect or relate to different data types may represent a hyperparameter.

적어도 하나의 제1 클라이언트 측 레이어 및 적어도 하나의 제2 클라이언트 측 레이어 각각은 동일한 수의 레이어들 또는 상이한 수의 레이어들을 포함할 수 있다. 독립적으로 동작하기 때문에, 클라이언트 측 네트워크들은 데이터를 처리하여 추가 훈련을 위해 서버 측 네트워크에 전달할 적절한 형식의 활성화들 또는 벡터들을 생성하는 한 상이한 수의 레이어들을 가질 수 있다. 절단 레이어는 서버 측 네트워크와 제1 클라이언트 측 네트워크 및 제2 클라이언트 측 네트워크 사이에 존재할 수 있다.Each of the at least one first client-side layer and the at least one second client-side layer may include the same number of layers or different numbers of layers. Because they operate independently, client-side networks can have different numbers of layers as long as they process the data to generate appropriately formatted activations or vectors to pass to the server-side network for further training. A truncation layer may exist between the server-side network and the first client-side network and the second client-side network.

도 10은 서버(710)의 관점에서 예시적인 방법(1000)을 예시한다. 방법은 신경망을 제1 클라이언트 측 네트워크, 제2 클라이언트 측 네트워크 및 서버 측 네트워크(1002)로 분할하는 단계, 제1 클라이언트 측 네트워크를 제1 클라이언트에 발송하는 단계-여기서 제1 클라이언트 측 네트워크는 제1 클라이언트로부터의 제1 데이터를 처리하도록 구성되고, 제1 데이터는 제1 유형을 갖고 여기서 제1 클라이언트 측 네트워크는 적어도 하나의 제1 클라이언트 측 레이어(1004)를 포함함-, 제2 클라이언트 측 네트워크를 제2 클라이언트에 발송하는 단계를 포함하고, 여기서 제2 클라이언트 측 네트워크는 제2 클라이언트로부터의 제2 데이터를 처리하도록 구성되고, 제2 데이터는 제2 유형을 갖고 여기서 제2 클라이언트 측 네트워크는 적어도 하나의 제2 클라이언트 측 레이어를 포함할 수 있으며, 여기서 제1 유형 및 제2 유형은 공통 연관(1006)을 갖는다.10 illustrates an example method 1000 from the perspective of a server 710 . The method comprises dividing a neural network into a first client-side network, a second client-side network and a server-side network 1002, forwarding the first client-side network to a first client, wherein the first client-side network comprises a first configured to process first data from a client, wherein the first data is of a first type, wherein the first client-side network comprises at least one first client-side layer (1004); sending to a second client, wherein the second client-side network is configured to process second data from the second client, the second data has a second type, and wherein the second client-side network has at least one may include a second client-side layer of , where the first type and the second type have a common association (1006).

방법은, 서버 측 네트워크에서, 제1 클라이언트로부터의 제1 데이터에 대한 제1 클라이언트 측 네트워크의 훈련으로부터 제1 활성화들을 수신하는 단계(1008), 서버측 네트워크에서, 제2 클라이언트로부터의 제2 데이터에 대한 제2 클라이언트 측 네트워크의 훈련으로부터 제2 활성화들을 수신하는 단계(1010), 그라디언트들을 생성하기 위해 제1 활성화들 및 제2 활성화들에 기초하여 서버 측 네트워크의 적어도 하나의 서버 측 레이어를 훈련시키는 단계(1012), 및 서버 측 네트워크로부터 제1 클라이언트 측 네트워크 및 제2 클라이언트 측 네트워크로 그라디언트들을 송신하는 단계(1014)를 더 포함할 수 있다.The method comprises receiving ( 1008 ) first activations from training of the first client-side network on first data from a first client, at the server-side network, second data from a second client, at the server-side network. Receiving (1010) second activations from training of a second client-side network for , training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients. It may further include sending 1012 the gradients from the server-side network to the first client-side network and the second client-side network 1014 .

각각의 경우에, 훈련의 관점에서 서버(710)의 프로세스의 일부는 서버(710)에 의해 수행될 수 있고 다양한 클라이언트들에 대한 값들의 평균화와 같은 다른 부분들은 클라이언트 사이트, 별도의 위치 또는 여러 클라이언트들에 걸쳐 있을 수 있는 상이한 서버(미도시됨)에 의해 수행될 수 있음을 유의한다.In each case, part of the process of server 710 in terms of training may be performed by server 710 and other parts, such as averaging of values for various clients, may be performed at the client site, at a separate location or at multiple clients. Note that this may be performed by different servers (not shown) that may span .

이 접근법은 시스템이 블라인드 상관 관계(708)에서 신경망을 분할할 때 시스템은 결과로 훈련된 모델을 취하는 것을 더 어렵게 만들고, 이를 깨고 훈련 추론 공격을 적용할 수 있게 하는 새로운 방식으로 연합 분할 러닝 도구 세트의 사용을 가능하게 한다. 시스템은 신경망을 반으로(또는 두 부분들로) 나눌 수 있고, 위에서 설명한 방식으로, 신경망 부분들(702A, 704A, 706A)에서 교환되는 모든 것은 활성화 레이어 숫자들이라고도 묘사된 스트링 또는 숫자들의 어레이이다. 이것들은 단지 숫자들 또는 문자들의 어레이이기 때문에, 제1 신경망 부분(702A)에서 일어나는 일은 제2 신경망 부분(704A)에서 일어나는 것과 상이할 수 있다. 예를 들어, 제1 신경망 부분(702A)은 2 레이어 깊이일 수 있고 제2 신경망 부분(704A)은 90 레이어 깊이일 수 있다. 각 출력이 신경망(710)의 상단 부분으로 송신하기에 적절하게 구성된 일련의 숫자들로 해석되는 한, 순방향 전파 및 역전파가 작동할 수 있고 훈련이 달성될 수 있다. 이러한 이해는 신경망의 상이한 부분들(702A, 704A, 706A)에 걸쳐 처리되는 상이한 유형의 데이터가 모델들()을 훈련시키기 위해 적절하게 수신되고 처리될 수 있다는 본 명세서에 개시된 새로운 개념을 위한 길을 열어준다. 시스템이 상이한 클라이언트들 각각에 대해 상이한 하반부(702A, 704A, 706A)를 생성할 수 있는 경우, 클라이언트들(702, 704, 706)은 동일한 유형의 데이터(예를 들어 텍스트와 이미지들 사이)를 생성하거나 처리할 필요가 없으나, 적절하게 포맷된 신경망 부분들(702A, 704A, 706A)은 이질적인 데이터를 처리하고, 서버(710)로 발송될 수 있는 구조화된 출력을 생성할 수 있다.This approach makes it harder for the system to take a trained model as a result, when the system splits the neural network on blind correlations 708, breaks it and applies a training inference attack in a new way to the federated split learning toolset. enables the use of The system can split the neural network in half (or into two parts), and in the manner described above, all that is exchanged in the neural network parts 702A, 704A, 706A is a string or array of numbers, also described as activation layer numbers. . Since these are just numbers or arrays of letters, what happens in the first neural network portion 702A may be different than what happens in the second neural network portion 704A. For example, the first neural network portion 702A may be 2 layers deep and the second neural network portion 704A may be 90 layers deep. As long as each output is interpreted as a suitably configured sequence of numbers for transmission to the top portion of the neural network 710, forward propagation and back propagation can work and training can be achieved. This understanding paves the way for the new concept disclosed herein that different types of data processed across different parts 702A, 704A, 706A of a neural network can be properly received and processed to train models open it If the system can generate a different bottom half 702A, 704A, 706A for each of the different clients, then the clients 702, 704, 706 will generate the same type of data (e.g. between text and images). Neural network parts 702A, 704A, 706A, properly formatted, can process the disparate data and generate structured output that can be sent to server 710.

일례에서, 클라이언트 1(702)은 사람의 심전도를 제공할 수 있고, 클라이언트 2(704)는 심장의 흉부 X-선을 제공할 수 있고, 클라이언트 3(706)은 환자의 혈액에서 가장 흥미로운 4개의 단백질들의 유전적 프로파일을 제공할 수 있다. 신경망 부분들(702A, 704A, 706A)이 출력을 위한 올바른 벡터 구조까지 상이한 개별 데이터 유형들을 처리할 수 있고, 이질적인 유형들의 데이터를 서버(710)에 제공할 수 있다면, 서버(710)는 상이하고 이질적인 유형들의 데이터를 활용할 수 있는 진단을 내리는데 사용될 모델을 훈련하기 위해 모든 정보를 결합하는 적절한 신경망으로 구성될 수 있다.In one example, client 1 702 may provide a person's electrocardiogram, client 2 704 may provide a chest x-ray of the heart, and client 3 706 may provide the four most interesting samples of the patient's blood. Genetic profiles of proteins can be provided. If neural network portions 702A, 704A, 706A can process different individual data types up to the correct vector structure for output, and can provide disparate types of data to server 710, then server 710 is different and It can be configured with an appropriate neural network that combines all the information to train a model that will be used to make a diagnosis that can take advantage of disparate types of data.

일 양태에서, 신경망 부분들(702A, 704A, 706A)이 각각 상이한 유형의 데이터를 처리하는 동안, 데이터와 연관된 일부 상관 인자가 있다. 위의 예에서 모든 데이터는 일반적으로 동일한 사람과 관련될 수 있지만, 일부 데이터는 심전도와 관련되고 다른 데이터는 유전적 프로파일과 관련되어 있지만, 모두 동일한 사람에 대한 것이다. 따라서, 본 개시의 한 양태는 데이터가 공통 연관을 갖는다는 것이다. 다른 양태에서, 데이터는 동일한 사람과 관련되지 않을 수 있지만 공통 연관은 연령, 성별, 인종, 프로젝트, 개념, 날씨, 주식 시장 또는 기타 요인과 관련될 수 있다. 예를 들어, 모든 데이터는 30 내지 35세 사이의 여성과 관련될 수 있다. 따라서 공통 연관은 그것을 어떻게 적용하느냐에 약간의 유연성을 가진다.In one aspect, while the neural network portions 702A, 704A, and 706A each process different types of data, there is some correlation factor associated with the data. In the example above, all of the data may generally relate to the same person, but some data relate to the electrocardiogram and other data relate to the genetic profile, but all to the same person. Thus, one aspect of the present disclosure is that data have a common association. In another aspect, the data may not relate to the same person, but common associations may relate to age, gender, race, project, concept, weather, stock market, or other factor. For example, all data may relate to women between the ages of 30 and 35 years. Common associations thus have some flexibility in how they are applied.

다른 예에서, 데이터는 제트 엔진 스트림의 카메라로부터의 이미지들일 수 있고, 데이터의 다른 스트림은 센서 데이터일 수 있고, 다른 데이터는 비행기로부터의 비행 특성일 수 있고, 공통 연관은 비행기일 수 있다. 또 다른 양태에서 공통 연관은 한 소비자의 한 유형의 데이터는 구매 습관들, 다른 유형의 데이터는 웹서핑 패턴들, 다른 유형의 데이터는 사용자가 발송하는 이메일들, 또 다른 유형의 데이터는 시리로부터의 오디오 또는 다른 음성 처리 도구들, 및 또 다른 유형의 데이터는 소비자가 자주 방문하는 물리적 상점 또는 사용자의 현재 위치와 같은 다른 유형의 데이터일 수 있다. 서버의 출력은 이질적인 유형들의 입력에 대한 분석을 기반으로 사용자에게 제공할 광고일 수 있다. 따라서 공통 연관은 이질적인 유형들의 데이터가 개념과 연관될 수 있는 임의의 개념과 연관될 수 있다.In another example, the data can be images from a camera of a jet engine stream, another stream of data can be sensor data, another data can be flight characteristics from an airplane, and a common association can be airplanes. In another aspect, a common association is a consumer's purchasing habits for one type of data, web surfing patterns for another type of data, emails sent by the user for another type of data, and emails for another type of data from Siri. Audio or other voice processing tools, and another type of data may be other types of data, such as the user's current location or a physical store the consumer frequents. The server's output may be an advertisement to present to the user based on an analysis of disparate types of input. Thus, a common association can be associated with any concept in which disparate types of data can be associated with a concept.

도 11은 본 명세서에 개시된 임의의 시스템들과 관련하여 사용될 수 있는 예시적인 컴퓨터 디바이스를 예시한다. 이 예에서, 도 11은 버스와 같은 연결(1105)을 사용하여 서로 전기적으로 통신하는 컴포넌트들을 포함하는 컴퓨팅 시스템(1100)을 예시한다. 시스템(1100)은 처리 유닛(CPU 또는 프로세서)(1110) 및 시스템 메모리(1115)를 포함하는 다양한 시스템 컴포넌트들, 예를 들어 판독 전용 메모리(ROM)(1120) 및 랜덤 액세스 메모리(RAM)(1125)를 프로세서(1110)에 결합하는 시스템 연결(1105)을 포함한다. 시스템(1100)은 프로세서(1110)와 직접 연결되거나, 근접하거나 그 일부로서 통합된 고속 메모리 캐시를 포함할 수 있다. 시스템(1100)은 프로세서(1110)에 의한 빠른 액세스를 위해 메모리(1115) 및/또는 저장 디바이스(1130)로부터 캐시(1112)로 데이터를 복사할 수 있다. 이러한 방식으로, 캐시는 데이터를 기다리는 동안 프로세서(1110) 지연을 피하는 성능 향상을 제공할 수 있다. 이들 및 다른 모듈들은 다양한 액션들을 수행하도록 프로세서(1110)를 제어하거나 제어하도록 구성될 수 있다. 다른 시스템 메모리(1115)도 사용 가능할 수 있다. 메모리(1115)는 상이한 성능 특성들을 갖는 다수의 상이한 유형들의 메모리를 포함할 수 있다. 프로세서(1110)는 스토리지 디바이스(1130)에 저장된 서비스(모듈) 1(1132), 서비스(모듈) 2(1134) 및 서비스(모듈) 3(1136)과 같은 임의의 범용 프로세서 및 하드웨어 또는 소프트웨어 서비스 또는 모듈을 포함할 수 있으며, 소프트웨어 명령어들이 실제 프로세서 디자인에 통합되는 특수 목적 프로세서와 프로세서(1110)를 제어하도록 구성된다. 프로세서(1110)는 다중 코어들 또는 프로세서들, 버스, 메모리 제어기, 캐시 등을 포함하는 완전히 자체 완비된 컴퓨팅 시스템일 수 있다. 다중 코어 프로세서는 대칭 또는 비대칭일 수 있다.11 illustrates an exemplary computer device that may be used in connection with any of the systems disclosed herein. In this example, FIG. 11 illustrates a computing system 1100 that includes components that are in electrical communication with each other using a bus-like connection 1105 . System 1100 includes processing unit (CPU or processor) 1110 and various system components including system memory 1115 , such as read only memory (ROM) 1120 and random access memory (RAM) 1125 . ) to the processor 1110. System 1100 may include a high-speed memory cache that is directly coupled to, adjacent to, or incorporated as part of processor 1110 . System 1100 may copy data from memory 1115 and/or storage device 1130 to cache 1112 for fast access by processor 1110 . In this way, the cache may provide performance improvements that avoid processor 1110 delays while waiting for data. These and other modules may control or be configured to control the processor 1110 to perform various actions. Other system memory 1115 may also be available. Memory 1115 may include a number of different types of memory with different performance characteristics. The processor 1110 may be any general-purpose processor and hardware or software service or module, and software instructions are configured to control the processor 1110 and a special purpose processor to be integrated into the actual processor design. Processor 1110 may be a completely self-contained computing system that includes multiple cores or processors, a bus, memory controller, cache, and the like. Multi-core processors can be symmetric or asymmetric.

디바이스(1100)와의 사용자 상호작용을 가능하게 하기 위해, 입력 디바이스(1145)는 음성용 마이크, 제스처 또는 그래픽 입력용 터치 감지 스크린, 키보드, 마우스, 모션 입력, 음성과 같은 임의의 수의 입력 메커니즘들을 나타낼 수 있다. 출력 디바이스(1135)는 또한 당업자들에게 공지된 다수의 출력 메커니즘들 중 하나 이상일 수 있다. 일부 예들에서, 다중 모드 시스템들은 사용자가 디바이스(1100)와 통신하기 위해 여러 유형들의 입력을 제공할 수 있게 할 수 있다. 통신들 인터페이스(1140)는 일반적으로 사용자 입력 및 시스템 출력을 통제하고 관리할 수 있다. 특정 하드웨어 배열에서 동작하는 데 제한이 없으므로 여기의 기본 피쳐들은 개발될 때 개선된 하드웨어 또는 펌웨어 배열들로 쉽게 대체될 수 있다.To enable user interaction with device 1100, input device 1145 may use any number of input mechanisms, such as a microphone for voice, a touch sensitive screen for gesture or graphic input, a keyboard, mouse, motion input, and voice. can indicate Output device 1135 may also be one or more of a number of output mechanisms known to those skilled in the art. In some examples, multimodal systems may allow a user to provide multiple types of input to communicate with device 1100 . Communications interface 1140 may generally control and manage user input and system output. Since there are no restrictions on operating in a particular hardware arrangement, the basic features herein can be easily replaced with improved hardware or firmware arrangements as they are developed.

저장 디바이스(1130)는 비휘발성 메모리이고 자기 카세트들, 플래시 메모리 카드들, 고체 상태 메모리 디바이스들, 디지털 다목적 디스크들, 카트리지들, 랜덤 액세스 메모리들(RAM들)(1125), 판독 전용 메모리(ROM)(1120), 및 이들의 하이브리드들 같이 컴퓨터에 의해 액세스 가능한 데이터를 저장할 수 있는 하드 디스크 또는 다른 유형들의 컴퓨터 판독가능 매체일 수 있다.Storage device 1130 is non-volatile memory and includes magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, random access memories (RAMs) 1125, read only memory (ROM) ) 1120, and hybrids thereof, may be a hard disk or other types of computer readable media capable of storing data accessible by a computer.

저장 디바이스(1130)는 프로세서(1110)를 제어하기 위한 서비스들 또는 모듈들(1132, 1134, 1136)을 포함할 수 있다. 다른 하드웨어 또는 소프트웨어 모듈들이 고려된다. 저장 디바이스(1130)는 시스템 연결(1105)에 연결될 수 있다. 일 양태에서, 특정 기능을 수행하는 하드웨어 모듈은 기능을 수행하기 위해 프로세서(1110), 연결(1105), 출력 디바이스(1135) 등과 같은 필요한 하드웨어 컴포넌트들과 관련하여 컴퓨터 판독가능 매체에 저장된 소프트웨어 컴포넌트를 포함할 수 있다.The storage device 1130 may include services or modules 1132 , 1134 , and 1136 for controlling the processor 1110 . Other hardware or software modules are contemplated. Storage device 1130 can be coupled to system connection 1105 . In one aspect, a hardware module that performs a particular function comprises a software component stored on a computer readable medium in association with necessary hardware components such as processor 1110, connection 1105, output device 1135, etc. to perform the function. can include

일부 경우에, 이러한 컴퓨팅 디바이스 또는 장치는 프로세서, 마이크로프로세서, 마이크로컴퓨터, 또는 위에 개시된 방법의 단계를 수행하도록 구성된 디바이스의 다른 컴포넌트를 포함할 수 있다. 일부 예들에서, 그러한 컴퓨팅 디바이스 또는 장치는 RF 신호들을 송신 및 수신하기 위한 하나 이상의 안테나들을 포함할 수도 있다. 일부 예들에서, 그러한 컴퓨팅 디바이스 또는 장치는 이전에 설명된 바와 같이 RF 신호들을 송신, 수신, 변조 및 복조하기 위한 안테나 및 모뎀을 포함할 수도 있다.In some cases, such a computing device or apparatus may include a processor, microprocessor, microcomputer, or other component of the device configured to perform the steps of the methods disclosed above. In some examples, such a computing device or apparatus may include one or more antennas for transmitting and receiving RF signals. In some examples, such a computing device or apparatus may include an antenna and modem for transmitting, receiving, modulating, and demodulating RF signals as previously described.

컴퓨팅 디바이스의 컴포넌트들은 회로부로 구현될 수 있다. 예를 들어, 컴포넌트들은 하나 이상의 프로그래밍 가능한 전자 회로들(예를 들어, 마이크로프로세서, 그래픽 처리 장치(GPU), 디지털 신호 프로세서(DSP), 중앙 처리 장치(CPU) 및/또는 기타 적절한 전자 회로)을 포함할 수 있는 전자 회로 또는 다른 전자 하드웨어를 포함할 수 있고 및/또는 이를 사용하여 구현될 수 있고 및/또는 본 명세서에 설명된 다양한 동작을 수행하기 위해 컴퓨터 소프트웨어, 펌웨어, 또는 이들의 임의의 조합을 포함 및/또는 사용하여 구현될 수 있다. 컴퓨팅 디바이스는 디스플레이(출력 디바이스의 예로 또는 출력 디바이스에 추가하여), 데이터를 통신 및/또는 수신하도록 구성된 네트워크 인터페이스, 이들의 임의의 조합, 및/또는 다른 컴포넌트(들)를 더 포함할 수 있다. 네트워크 인터페이스는 인터넷 프로토콜(IP) 기반 데이터 또는 다른 유형의 데이터를 통신 및/또는 수신하도록 구성될 수 있다.Components of a computing device may be implemented as circuitry. For example, components may include one or more programmable electronic circuits (eg, a microprocessor, graphics processing unit (GPU), digital signal processor (DSP), central processing unit (CPU), and/or other suitable electronic circuitry). may include and/or may include electronic circuitry or other electronic hardware, and/or may be implemented using same, and/or may include computer software, firmware, or any combination thereof to perform the various operations described herein. It may be implemented by including and/or using. A computing device may further include a display (as an example of, or in addition to, an output device), a network interface configured to communicate and/or receive data, any combination thereof, and/or other component(s). The network interface may be configured to communicate and/or receive Internet Protocol (IP) based data or other types of data.

위에서 논의된 방법은 논리적 흐름도로서 예시되며, 그 동작은 하드웨어, 컴퓨터 명령어, 또는 이들의 조합으로 구현될 수 있는 동작들의 시퀀스를 나타낸다. 컴퓨터 명령어의 맥락에서, 동작은 하나 이상의 프로세서에 의해 실행될 때 인용된 동작을 수행하는 하나 이상의 컴퓨터 판독가능 저장 매체에 저장된 컴퓨터 실행가능 명령어를 나타낸다. 일반적으로, 컴퓨터 실행가능 명령어는 특정 기능을 수행하거나 특정 데이터 유형을 구현하는 루틴, 프로그램, 오브젝트, 컴포넌트, 데이터 구조 등을 포함한다. 동작이 설명되는 순서는 제한으로 해석되지 않으며 설명된 동작의 임의의 수는 프로세스를 구현하기 위해 임의의 순서로 및/또는 병렬로 결합될 수 있다.The methods discussed above are illustrated as logical flow diagrams, the operations of which represent sequences of operations that may be implemented in hardware, computer instructions, or a combination of both. In the context of computer instructions, operations refer to computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc. that perform particular functions or implement particular data types. The order in which the operations are described is not to be construed as limiting and any number of the described operations may be combined in any order and/or in parallel to implement a process.

추가적으로, 본 명세서에 개시된 방법은 실행 가능한 명령어로 구성된 하나 이상의 컴퓨터 시스템의 제어하에 수행될 수 있고 하드웨어에 의해 하나 이상의 프로세서에서 집합적으로 실행되는 코드(예를 들어, 실행 가능한 명령어, 하나 이상의 컴퓨터 프로그램 또는 하나 이상의 애플리케이션)로 구현될 수 있고, 또는 이들의 조합이 가능하다. 위에서 언급된 바와 같이, 코드는 예를 들어 하나 이상의 프로세서에 의해 실행가능한 복수의 명령어를 포함하는 컴퓨터 프로그램의 형태로 컴퓨터 판독가능 또는 머신 판독가능 저장 매체에 저장될 수 있다. 컴퓨터 판독가능 또는 머신 판독가능 저장 매체는 비일시적일 수 있다.Additionally, the methods disclosed herein may be performed under the control of one or more computer systems composed of executable instructions and codes (e.g., executable instructions, one or more computer programs) collectively executed by hardware on one or more processors. Or one or more applications), or a combination thereof is possible. As noted above, the code may be stored on a computer readable or machine readable storage medium, for example in the form of a computer program comprising a plurality of instructions executable by one or more processors. Computer readable or machine readable storage media may be non-transitory.

"컴퓨터 판독가능 매체"라는 용어는 휴대용 또는 비휴대용 저장 디바이스, 광학 저장 디바이스, 및 명령어(들) 및/또는 데이터를 저장, 포함 또는 운반할 수 있는 다양한 기타 매체를 포함하지만 이에 제한되지 않는다. 컴퓨터 판독가능 매체는 데이터가 저장될 수 있고 무선으로 또는 유선 연결을 통해 전파하는 반송파 및/또는 일시적인 전자 신호를 포함하지 않는 비일시적 매체를 포함할 수 있다. 비일시적 매체의 예는 자기 디스크 또는 테이프, 콤팩트 디스크(CD) 또는 디지털 다목적 디스크(DVD)와 같은 광학 저장 매체, 플래시 메모리, 메모리 또는 메모리 디바이스를 포함할 수 있지만 이에 제한되지 않는다. 컴퓨터 판독가능 매체에는 코드 및/또는 머신 실행가능 명령어가 저장되어 있을 수 있으며, 이는 절차, 기능, 서브프로그램, 프로그램, 루틴, 서브루틴, 모듈, 소프트웨어 패키지, 클래스 또는 명령어, 데이터 구조 또는 프로그램 명령문의 조합을 나타낼 수 있다. 코드 세그먼트는 정보, 데이터, 인수, 파라미터 또는 메모리 내용을 전달 및/또는 수신하여 다른 코드 세그먼트 또는 하드웨어 회로에 연결될 수 있다. 정보, 인수, 파라미터, 데이터 등은 메모리 공유, 메시지 전달, 토큰 전달, 네트워크 전송 등을 포함하는 임의의 적절한 수단을 통해 전달, 포워딩 또는 송신될 수 있다.The term “computer readable medium” includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media that can store, contain, or carry instruction(s) and/or data. Computer readable media may include non-transitory media on which data may be stored and which do not include carrier waves and/or transitory electronic signals that propagate wirelessly or over wired connections. Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media such as compact disks (CDs) or digital versatile disks (DVDs), flash memory, memory or memory devices. Computer readable media may store code and/or machine executable instructions, which may include procedures, functions, subprograms, programs, routines, subroutines, modules, software packages, classes or instructions, data structures, or program statements. combinations can be represented. Code segments can be coupled to other code segments or hardware circuits by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be conveyed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, and the like.

일부 실시예에서, 컴퓨터 판독가능 저장 디바이스, 매체, 및 메모리는 비트 스트림 등을 포함하는 케이블 또는 무선 신호를 포함할 수 있다. 그러나 언급된 경우 비일시적 컴퓨터 판독가능 저장 매체는 에너지, 캐리어 신호, 전자기파 및 신호 자체와 같은 매체를 명시적으로 제외한다.In some embodiments, computer readable storage devices, media, and memories may include cables or radio signals including bit streams and the like. However, where referred to, non-transitory computer readable storage media expressly excludes such media as energy, carrier signals, electromagnetic waves, and signals themselves.

특정 세부사항은 본 명세서에 제공된 실시예 및 예의 완전한 이해를 제공하기 위해 위의 설명에서 제공된다. 그러나, 이들 특정 세부사항 없이 실시예가 실시될 수 있다는 것이 당업자에 의해 이해될 것이다. 설명의 명확성을 위해, 일부 경우에 본 기술은 디바이스, 디바이스 컴포넌트, 소프트웨어로 구현된 방법의 단계 또는 루틴, 또는 하드웨어와 소프트웨어의 조합을 포함하는 개별 기능 블록을 포함하는 것으로 제시될 수 있다. 도면에 도시되거나 본 명세서에 설명된 것 이외의 추가 컴포넌트가 사용될 수 있다. 예를 들어, 회로, 시스템, 네트워크, 프로세스 및 기타 구성 요소는 실시예를 불필요한 세부 사항으로 모호하게 하지 않기 위해 블록도 형태의 컴포넌트로 도시될 수 있다. 다른 예에서, 잘 알려진 회로, 프로세스, 알고리즘, 구조 및 기술은 실시예를 모호하게 하는 것을 피하기 위해 불필요한 세부사항 없이 도시될 수 있다.Specific details are provided in the above description to provide a thorough understanding of the embodiments and examples provided herein. However, it will be understood by those skilled in the art that embodiments may be practiced without these specific details. For clarity of explanation, the subject technology in some instances may be presented as comprising devices, device components, method steps or routines implemented in software, or individual functional blocks comprising a combination of hardware and software. Additional components may be used other than those shown in the drawings or described herein. For example, circuits, systems, networks, processes, and other components may be shown in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

개별 실시예는 순서도, 흐름도, 데이터 흐름도, 구조도, 또는 블록도로서 도시된 프로세스 또는 방법으로서 위에서 설명될 수 있다. 순서도에서는 동작들을 순차적 프로세스로 설명할 수 있지만 많은 동작들이 병렬로 또는 동시에 수행될 수 있다. 또한, 동작들의 순서는 재배열될 수 있다. 프로세스는 동작들이 완료되면 종료되지만 도면에 포함되지 않은 추가 단계가 존재할 수 있다. 프로세스는 방법, 함수, 절차, 서브루틴, 서브프로그램 등에 대응할 수 있다. 프로세스가 기능에 해당할 때 해당 종료는 호출 함수 또는 메인 함수에 대한 함수의 반환에 대응할 수 있다.Individual embodiments may be described above as processes or methods depicted as flowcharts, flow diagrams, data flow diagrams, structure diagrams, or block diagrams. Although flowcharts may describe operations as a sequential process, many operations may be performed in parallel or concurrently. Also, the order of operations may be rearranged. The process ends when the actions are completed, but there may be additional steps not included in the figure. A process can correspond to a method, function, procedure, subroutine, subprogram, etc. When a process corresponds to a function, its termination may correspond to the return of the function to the calling function or to the main function.

전술한 예에 따른 프로세스 및 방법은 컴퓨터 판독가능 매체로부터 저장되거나 달리 이용가능한 컴퓨터 실행가능 명령어를 사용하여 구현될 수 있다. 이러한 명령어는 예를 들어 범용 컴퓨터, 특수 목적 컴퓨터 또는 처리 디바이스가 특정 기능 또는 기능 그룹을 수행하도록 하거나 구성하는 명령어 및 데이터를 포함할 수 있다. 사용된 컴퓨터 리소스의 일부는 네트워크를 통해 액세스 가능할 수 있다. 컴퓨터 실행 가능한 명령어는 예를 들어 바이너리, 어셈블리 언어, 펌웨어, 소스 코드와 같은 중간 형식 명령어일 수 있다. 설명된 예에 따른 방법 동안 생성된 정보, 사용된 정보 및/또는 명령어를 저장하는 데 사용될 수 있는 컴퓨터 판독가능 매체의 예시는 자기 또는 광 디스크, 플래시 메모리, 비휘발성 메모리가 제공되는 USB 디바이스, 네트워크 저장 디바이스 등을 포함한다.Processes and methods according to the examples described above may be implemented using computer executable instructions stored on or otherwise available from computer readable media. Such instructions may include, for example, instructions and data that cause or configure a general purpose computer, special purpose computer, or processing device to perform a particular function or group of functions. Some of the computer resources used may be accessible via a network. Computer-executable instructions may be intermediate form instructions, such as, for example, binary, assembly language, firmware, or source code. Examples of computer readable media that can be used to store information generated during the method according to the described example, information used and/or instructions are magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networks storage devices, and the like.

본 개시에 따른 프로세스 및 방법을 구현하는 디바이스는 하드웨어, 소프트웨어, 펌웨어, 미들웨어, 마이크로코드, 하드웨어 기술 언어, 또는 이들의 임의의 조합을 포함할 수 있으며, 다양한 폼 팩터를 사용할 수 있다. 소프트웨어, 펌웨어, 미들웨어 또는 마이크로코드로 구현될 때, 필요한 작업을 수행하기 위한 프로그램 코드 또는 코드 세그먼트(예를 들어, 컴퓨터 프로그램 제품)는 컴퓨터 판독가능 또는 머신 판독가능 매체에 저장될 수 있다. 프로세서(들)는 필요한 작업을 수행할 수 있다. 폼 팩터의 일반적인 예는 랩톱, 스마트폰, 휴대폰, 태블릿 장치 또는 기타 소형 폼 팩터 개인용 컴퓨터, PDA, 랙마운트 장치, 독립형 디바이스 등을 포함한다. 본 명세서에 설명된 기능은 또한 주변기기 또는 추가 카드로 구현될 수 있다. 이러한 기능은 또한 추가 예로서 단일 디바이스에서 실행되는 상이한 칩 또는 상이한 프로세스 사이의 회로 기판 상에서 구현될 수 있다.Devices implementing processes and methods according to the present disclosure may include hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof, and may use a variety of form factors. When implemented in software, firmware, middleware or microcode, program code or code segments (eg, computer program products) to perform necessary tasks may be stored on computer-readable or machine-readable media. The processor(s) may perform the necessary tasks. Common examples of form factors include laptops, smartphones, mobile phones, tablet devices or other small form factor personal computers, PDAs, rackmount devices, standalone devices, and the like. The functions described herein may also be implemented with peripherals or add-on cards. These functions may also be implemented on a circuit board between different chips or different processes running in a single device as a further example.

명령어, 이러한 명령어를 전달하기 위한 매체, 이를 실행하기 위한 컴퓨팅 리소스, 및 이러한 컴퓨팅 리소스를 지원하기 위한 다른 구조는 본 개시에 설명된 기능을 제공하기 위한 예시적인 수단이다.Instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are exemplary means for providing the functionality described in this disclosure.

전술한 설명에서, 애플리케이션의 양태는 그의 특정 실시예를 참조하여 설명되지만, 당업자는 애플리케이션이 이에 제한되지 않는다는 것을 인식할 것이다. 따라서, 본 출원의 예시적인 실시예가 본 명세서에서 상세히 설명되었지만, 본 발명의 개념은 달리 다양하게 구현 및 사용될 수 있으며, 첨부된 청구범위는 선행 기술에 의해 제한되는 경우를 제외하고는 이러한 변형을 포함하는 것으로 해석되어야 함을 이해해야 한다. 전술한 애플리케이션의 다양한 피쳐 및 양태는 개별적으로 또는 공동으로 사용될 수 있다. 또한, 실시예는 명세서의 보다 넓은 사상 및 범위를 벗어나지 않고 본 명세서에 설명된 것 이상의 환경 및 애플리케이션에서 활용될 수 있다. 따라서 명세서 및 도면은 제한적인 것이 아니라 예시적인 것으로 간주되어야 한다. 설명을 위해 특정 순서로 방법이 설명되었다. 대안적인 실시예에서, 방법은 설명된 것과 다른 순서로 수행될 수 있음을 이해해야 한다.In the foregoing description, aspects of the application have been described with reference to specific embodiments thereof, but those skilled in the art will recognize that the application is not limited thereto. Thus, although exemplary embodiments of the present application have been described in detail herein, the inventive concept may be embodied and used in many other ways, and the appended claims cover such variations except where limited by the prior art. It should be understood that it should be interpreted as The various features and aspects of the applications described above may be used individually or jointly. Further, the embodiments may be utilized in environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive. Methods are described in a specific order for purposes of explanation. It should be understood that in alternative embodiments, the methods may be performed in an order different from that described.

통상의 기술을 가진 사람은 보다 작음("<") 및 보다 큼(">") 기호 또는 본 명세서에 사용된 용어는 본 설명의 범위를 벗어나지 않고 각각 이하("

") 및 이상("

") 기호로 대체될 수 있다는 것을 이해해야 한다.Those skilled in the art will understand that the less-than ("<") and greater-than (">") symbols or terms used herein do not depart from the scope of this description, and the following ("

") and over("

") symbol.

컴포넌트가 특정 동작을 수행하도록 "구성된" 것으로 설명되는 경우, 그러한 구성은 예를 들어 동작을 수행하기 위해 전자 회로 또는 기타 하드웨어를 설계함으로써, 동작을 수행하기 위해 프로그래밍 가능한 전자 회로(예를 들어, 마이크로프로세서 또는 기타 적절한 전자 회로)를 프로그래밍함으로써, 또는 이들의 임의의 조합을 통해 달성될 수 있다.Where a component is described as being “configured” to perform a particular action, such configuration refers to electronic circuitry programmable to perform the action (e.g., by designing electronic circuitry or other hardware to perform the action). processor or other suitable electronic circuitry), or through any combination thereof.

"결합된"이라는 문구는 다른 컴포넌트에 직접 또는 간접적으로 물리적으로 연결된 임의의 컴포넌트 및/또는 다른 컴포넌트와 직접 또는 간접적으로 통신하는(예를 들어, 유선 또는 무선 연결 및/또는 기타 적절한 통신 인터페이스를 통해 다른 컴포넌트에 연결된) 임의의 컴포넌트를 지칭한다.The phrase "coupled" means any component that is physically connected, directly or indirectly, to another component and/or communicates directly or indirectly with another component (eg, via a wired or wireless connection and/or other appropriate communication interface). Refers to any component (connected to another component).

세트의 "적어도 하나" 및/또는 "하나 이상"을 인용하는 청구 언어 또는 기타 언어는 세트의 하나의 맴버 또는 세트의 다중 맴버들(임의의 조합으로)이 청구를 충족함을 나타낸다. 예를 들어, "A와 B 중 적어도 하나" 또는 "A 또는 B 중 적어도 하나"를 인용하는 클레임 언어는 A, B 또는 A와 B를 의미한다. 다른 예에서, "A, B, 및 C 중 적어도 하나" 또는 "A, B, 또는 C 중 적어도 하나"를 인용하는 클레임 언어는 A, B, C, 또는 A 및 B, 또는 A와 C, 또는 B와 C, 또는 A와 B와 C를 의미한다. 세트의 "적어도 하나" 및/또는 세트의 "하나 이상"이라는 언어는 세트에 나열된 항목으로 세트를 제한하지 않는다. 예를 들어, "A와 B 중 적어도 하나" 또는 "A 또는 B 중 적어도 하나"를 인용하는 클레임 언어는 A, B 또는 A 및 B를 의미할 수 있으며, A 및 B의 세트에 나열되지 않은 항목을 추가로 포함할 수 있다.Claim language or other language reciting “at least one” and/or “one or more” of a set indicates that one member of the set or multiple members of the set (in any combination) satisfy the claim. For example, claim language referring to “at least one of A and B” or “at least one of A or B” means A, B or A and B. In another example, claim language referencing "at least one of A, B, and C" or "at least one of A, B, or C" is A, B, C, or A and B, or A and C, or B and C, or A and B and C. The language of "at least one" of a set and/or "one or more" of a set does not limit the set to the items listed in the set. For example, claim language citing "at least one of A and B" or "at least one of A or B" could mean A, B or A and B, and items not listed in the set of A and B may additionally include.

첨부된 청구항의 범위 내에서 양태를 설명하기 위해 다양한 예 및 기타 정보가 사용되었지만, 통상의 기술자가 광범위한 구현예를 유도하기 위해 이러한 예를 사용할 수 있기 때문에 이러한 예의 특정 피쳐 또는 배열에 기초하여 청구범위의 제한이 암시되어서는 안 된다. 또한 일부 주제가 구조적 피쳐 및/또는 방법 단계의 예에 특정한 언어로 설명되었을 수 있지만, 첨부된 청구범위에 정의된 주제는 이러한 설명된 피쳐 또는 작용에 반드시 제한되지 않는다는 것을 이해해야 한다. 예를 들어, 이러한 기능은 본 명세서에서 식별된 것과 다른 컴포넌트에서 다르게 배포되거나 수행될 수 있다. 오히려, 설명된 피쳐 및 단계는 첨부된 청구범위 내에서 시스템 및 방법의 컴포넌트의 예로서 개시된다.Although various examples and other information have been used to describe aspects within the scope of the appended claims, the claims may be based on the specific features or arrangements of such examples as those skilled in the art may use such examples to derive a wide range of implementations. should not be implied. It is also to be understood that while some subject matter may be described in language specific to examples of structural features and/or method steps, subject matter defined in the appended claims is not necessarily limited to such described features or acts. For example, these functions may be differently distributed or performed in other components than those identified herein. Rather, the described features and steps are disclosed as examples of components of systems and methods within the scope of the appended claims.

세트의 "적어도 하나"를 인용하는 클레임 언어는 세트의 한 맴버 또는 세트의 여러 맴버들이 클레임을 충족함을 나타낸다. 예를 들어, "A와 B 중 적어도 하나"를 인용하는 클레임 언어는 A, B 또는 A와 B를 의미한다.Claim language that refers to "at least one" of a set indicates that one member of the set or several members of the set satisfy the claim. For example, claim language referring to “at least one of A and B” means A, B or A and B.

Claims

in the method,
Dividing a neural network into a first client-side network, a second client-side network, and a server-side network;
forwarding the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type and the first client-side network includes at least one first client-side layer;
forwarding the second client-side network to a second client, wherein the second client-side network is configured to process second data from the second client, the second data having a second type and a second client-side network includes at least one second client-side layer, and the first type and the second type have a common association;
training the first client-side network on first data from the first client and generating first activations;
sending the first activations from the first client-side network to the server-side network;
training the second client-side network on second data from the second client and generating second activations;
sending the second activations from the second client-side network to the server-side network;
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients; and
sending the gradients from the server-side network to the first client-side network and the second client-side network.

The method of claim 1 , wherein the common association includes at least one of a device, person, consumer, patient, business, concept, medical condition, group of people, process, product and/or service.

The method of claim 1 , wherein the server-side network comprises a global machine learning model.

2. The method of claim 1, wherein the neural network includes weights, biases and hyperparameters.

The method of claim 1 , wherein the at least one first client-side layer and the at least one second client-side layer include the same number of layers or different numbers of layers.

2. The method of claim 1, wherein a cut layer exists between the server-side network and the first client-side network and the second client-side network.

The method of claim 1 , wherein the first type comprises text data and the second type comprises image data.

8. The method of claim 7, wherein the first client-side network and the second client-side network are independent and operate independently.

The method of claim 1 , wherein the first type comprises tabular data and the second type comprises image data.

in the system,
processor; and
A computer-readable storage device storing instructions that, when executed by the processor, cause the processor to:
dividing the neural network into a first client-side network, a second client-side network and a server-side network;
forwarding the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type; the first client-side network includes at least one first client-side layer;
forwarding the second client-side network to a second client, wherein the second client-side network is configured to process second data from the second client, the second data having a second type; the second client-side network includes at least one second client-side layer, and the first type and the second type have a common association;
receiving, at the server-side network, first activations from training of the first client-side network on first data from the first client;
receiving, at the server-side network, second activations from training of the second client-side network on second data from the second client;
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients; and
and transmitting the gradients from the server-side network to the first client-side network and the second client-side network.

11. The system of claim 10, wherein the common association includes at least one of a device, person, consumer, patient, business, concept, medical condition, group of people, process, product and/or service.

11. The system of claim 10, wherein the server-side network includes a global machine learning model.

11. The system of claim 10, wherein the neural network includes weights, biases and hyperparameters.

11. The system of claim 10, wherein the at least one first client-side layer and the at least one second client-side layer include the same number of layers or different numbers of layers.

11. The system of claim 10, wherein a truncation layer exists between the server-side network and the first client-side network and the second client-side network.

11. The system of claim 10, wherein the first type and the second type are different types of data.

11. The system of claim 10, wherein the first type includes tabular data or time series data, and the second type includes image data.

in the method,
Dividing the neural network into a first client-side network, a second client-side network and a server-side network;
forwarding the first client-side network to a first client, wherein the first client-side network is configured to process first data from the first client, the first data having a first type and the first client-side network includes at least one first client-side layer;
forwarding the second client-side network to a second client, wherein the second client-side network is configured to process second data from the second client, the second data having a second type and the second client-side network includes at least one second client-side layer, and the first type and the second type have a common association;
receiving, at the server-side network, first activations from training of the first client-side network on first data from the first client;
receiving, at the server-side network, second activations from training of the second client-side network on second data from the second client;
training at least one server-side layer of the server-side network based on the first activations and the second activations to generate gradients; and
sending the gradients from the server-side network to the first client-side network and the second client-side network.

19. The method of claim 18, wherein the first type and the second type are different types of data.

19. The method of claim 18, wherein the first type includes tabular data or time series data, and the second type includes image data.