CN113704777A

CN113704777A - Data processing method based on isomorphic machine learning framework

Info

Publication number: CN113704777A
Application number: CN202110803159.7A
Authority: CN
Inventors: 林博; 张豫元; 王涛; 董科雄; 王德健
Original assignee: Hangzhou Yikang Huilian Technology Co ltd
Current assignee: Hangzhou Yikang Huilian Technology Co ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-11-26

Abstract

The application discloses a data processing method based on a isomorphic machine learning framework, which comprises the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged. The method has the beneficial effects that the data processing method based on the isomorphic machine learning framework is provided, and the training nodes can effectively interact with the intermediate data in a forwarding node mode.

Description

Data processing method based on isomorphic machine learning framework

Technical Field

The application relates to the field of data processing, in particular to a data processing method based on a isomorphic machine learning framework.

Background

In the near future, the medical industry will incorporate more high technologies such as artificial intelligence, sensing technology and the like, so that the medical service is made to be intelligent in real sense, and the prosperity and development of the medical industry are promoted. Under the background of new Chinese medical improvement, intelligent medical treatment is going to live in the lives of common people. The data of the medical industry has the need of privacy protection, so that when artificial intelligence is applied to the research, model training and data prediction in the medical field, a plurality of medical institutions are often required to perform the research, model training and data prediction in a networking and data collaboration mode.

In the prior art, when a machine learning model based on federal learning is trained, data generated in the training cannot be well interacted, so that the model cannot be converged, and the efficiency of platform model training is further influenced.

Disclosure of Invention

In order to solve the defects of the prior art, the application provides a data processing method based on a homogeneous machine learning framework, which comprises the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged.

Further, each training node participating in federated learning performs training of the machine learning model locally.

Furthermore, after each iteration, each training node participating in federated learning encrypts intermediate data generated by training the machine learning model and sends the intermediate data to the forwarding node.

Further, the forwarding node distributes the encrypted intermediate data to each of the training nodes.

Further, the training node calculates the received encrypted intermediate data and locally generated intermediate data and then performs the next iteration.

Further, the training nodes include an initiating node and a participating node for federated learning.

Further, an initiating node of the training nodes selects a participating node that participates in federated learning.

Further, the encryption method of the intermediate data is a hash encryption algorithm.

Further, the training data is a data set labeled as floating point numbers.

Further, the training data comprises medical data.

The application has the advantages that: the data processing method based on the isomorphic machine learning framework enables each training node to effectively interact with intermediate data in a mode of forwarding the nodes is provided.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic diagram of steps of a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application;

FIG. 2 is a schematic illustration of an operator interface of a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application;

fig. 3 is a schematic diagram of a node architecture in a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1 and 3, the data processing method based on the isomorphic machine learning framework includes the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in the one-time iteration process, each training node participating in training sends gradient information to the forwarding node, then the gradient information of other nodes is obtained from the forwarding node, and the local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and if so, the iteration is stopped.

As a preferred scheme, the data interaction and storage of the system are provided with a server besides the computer of the training party, so that the functions of data storage, interaction and calculation are provided. The server and each computer can form limited communication connection or wireless communication connection.

As a specific scheme, the training data are medical data, which can only be stored locally at each training node to avoid privacy disclosure, but one of the training nodes can know the index or data profile of the data through the system, and cannot know the specific data content, so that, as shown in fig. 2, the user of the training node can select other training nodes participating in federal learning as participating nodes by selecting the required training data range. Namely, each training node participating in the federal learning carries out the training of the machine learning model locally, the training nodes comprise an initiating node and a participating node for the federal learning, and the initiating node in the training nodes selects the participating node for participating in the federal learning.

As a specific scheme, after each iteration, each training node participating in federal learning encrypts intermediate data generated by a training machine learning model and sends the intermediate data to a forwarding node. And the forwarding node distributes the encrypted intermediate data to each training node. And the training node calculates the received encrypted intermediate data and the locally generated intermediate data and then carries out the next iteration.

As a more specific scheme, the server may perform functions such as encryption of data interaction and data distribution as a forwarding node, and as a preferred scheme, the encryption method of the intermediate data is a hash encryption algorithm. The training data is a data set labeled as floating point numbers.

As a further approach, if the machine learning model does not converge, the next iteration process may be carried over.

As a preferred scheme, even after the machine learning model converges, the training can be continued according to the selected participants when the training initiator does not use the machine learning model, so as to improve the model. As a further scheme, the training participants can be dynamically selected according to the set data conditions of the initiator, and the model training of the above method can be performed when the conditions are satisfied.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data processing method based on isomorphic machine learning framework is characterized in that:

the data processing method based on the isomorphic machine learning framework comprises the following steps:

inputting training data by each training node participating in federal learning;

the training node performs characteristic processing on the training data to obtain characteristic data;

the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model;

in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated;

the training node updates the model weight of the local node through the updated gradient information;

and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged.

2. The isomorphic machine learning framework-based data processing method of claim 1, wherein:

and each training node participating in the federal learning carries out the training of the machine learning model locally.

3. The isomorphic machine learning framework-based data processing method of claim 2, wherein:

after each iteration, each training node participating in federated learning encrypts intermediate data generated by the machine learning model and sends the intermediate data to the forwarding node.

4. The homogeneous machine learning framework-based data processing method according to claim 3, wherein:

and the forwarding node distributes the encrypted intermediate data to each training node.

5. The homogeneous machine learning framework-based data processing method according to claim 4, wherein:

and the training node calculates the received encrypted intermediate data and the locally generated intermediate data and then carries out the next iteration.

6. The homogeneous machine learning framework-based data processing method according to claim 5, wherein:

the training nodes comprise an initiating node and a participating node of federated learning.

7. The homogeneous machine learning framework-based data processing method according to claim 6, wherein:

an initiating node of the training nodes selects a participating node that participates in federated learning.

8. The homogeneous machine learning framework-based data processing method according to claim 7, wherein:

the encryption method of the intermediate data is a Hash encryption algorithm.

9. The homogeneous machine learning framework-based data processing method according to claim 8, wherein:

the training data is a data set labeled as floating point numbers.

10. The homogeneous machine learning framework-based data processing method according to claim 9, wherein:

the training data comprises medical data.