CN113704777A - Data processing method based on isomorphic machine learning framework - Google Patents

Data processing method based on isomorphic machine learning framework Download PDF

Info

Publication number
CN113704777A
CN113704777A CN202110803159.7A CN202110803159A CN113704777A CN 113704777 A CN113704777 A CN 113704777A CN 202110803159 A CN202110803159 A CN 202110803159A CN 113704777 A CN113704777 A CN 113704777A
Authority
CN
China
Prior art keywords
training
node
machine learning
data
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110803159.7A
Other languages
Chinese (zh)
Inventor
林博
张豫元
王涛
董科雄
王德健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yikang Huilian Technology Co ltd
Original Assignee
Hangzhou Yikang Huilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yikang Huilian Technology Co ltd filed Critical Hangzhou Yikang Huilian Technology Co ltd
Priority to CN202110803159.7A priority Critical patent/CN113704777A/en
Publication of CN113704777A publication Critical patent/CN113704777A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The application discloses a data processing method based on a isomorphic machine learning framework, which comprises the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged. The method has the beneficial effects that the data processing method based on the isomorphic machine learning framework is provided, and the training nodes can effectively interact with the intermediate data in a forwarding node mode.

Description

Data processing method based on isomorphic machine learning framework
Technical Field
The application relates to the field of data processing, in particular to a data processing method based on a isomorphic machine learning framework.
Background
In the near future, the medical industry will incorporate more high technologies such as artificial intelligence, sensing technology and the like, so that the medical service is made to be intelligent in real sense, and the prosperity and development of the medical industry are promoted. Under the background of new Chinese medical improvement, intelligent medical treatment is going to live in the lives of common people. The data of the medical industry has the need of privacy protection, so that when artificial intelligence is applied to the research, model training and data prediction in the medical field, a plurality of medical institutions are often required to perform the research, model training and data prediction in a networking and data collaboration mode.
In the prior art, when a machine learning model based on federal learning is trained, data generated in the training cannot be well interacted, so that the model cannot be converged, and the efficiency of platform model training is further influenced.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a data processing method based on a homogeneous machine learning framework, which comprises the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged.
Further, each training node participating in federated learning performs training of the machine learning model locally.
Furthermore, after each iteration, each training node participating in federated learning encrypts intermediate data generated by training the machine learning model and sends the intermediate data to the forwarding node.
Further, the forwarding node distributes the encrypted intermediate data to each of the training nodes.
Further, the training node calculates the received encrypted intermediate data and locally generated intermediate data and then performs the next iteration.
Further, the training nodes include an initiating node and a participating node for federated learning.
Further, an initiating node of the training nodes selects a participating node that participates in federated learning.
Further, the encryption method of the intermediate data is a hash encryption algorithm.
Further, the training data is a data set labeled as floating point numbers.
Further, the training data comprises medical data.
The application has the advantages that: the data processing method based on the isomorphic machine learning framework enables each training node to effectively interact with intermediate data in a mode of forwarding the nodes is provided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is a schematic diagram of steps of a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application;
FIG. 2 is a schematic illustration of an operator interface of a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application;
fig. 3 is a schematic diagram of a node architecture in a data processing method based on a homogeneous machine learning framework according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1 and 3, the data processing method based on the isomorphic machine learning framework includes the following steps: inputting training data by each training node participating in federal learning; the training node performs characteristic processing on the training data to obtain characteristic data; the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model; in the one-time iteration process, each training node participating in training sends gradient information to the forwarding node, then the gradient information of other nodes is obtained from the forwarding node, and the local gradient information is updated and calculated; the training node updates the model weight of the local node through the updated gradient information; and the training node judges whether the machine learning model is converged, and if so, the iteration is stopped.
As a preferred scheme, the data interaction and storage of the system are provided with a server besides the computer of the training party, so that the functions of data storage, interaction and calculation are provided. The server and each computer can form limited communication connection or wireless communication connection.
As a specific scheme, the training data are medical data, which can only be stored locally at each training node to avoid privacy disclosure, but one of the training nodes can know the index or data profile of the data through the system, and cannot know the specific data content, so that, as shown in fig. 2, the user of the training node can select other training nodes participating in federal learning as participating nodes by selecting the required training data range. Namely, each training node participating in the federal learning carries out the training of the machine learning model locally, the training nodes comprise an initiating node and a participating node for the federal learning, and the initiating node in the training nodes selects the participating node for participating in the federal learning.
As a specific scheme, after each iteration, each training node participating in federal learning encrypts intermediate data generated by a training machine learning model and sends the intermediate data to a forwarding node. And the forwarding node distributes the encrypted intermediate data to each training node. And the training node calculates the received encrypted intermediate data and the locally generated intermediate data and then carries out the next iteration.
As a more specific scheme, the server may perform functions such as encryption of data interaction and data distribution as a forwarding node, and as a preferred scheme, the encryption method of the intermediate data is a hash encryption algorithm. The training data is a data set labeled as floating point numbers.
As a further approach, if the machine learning model does not converge, the next iteration process may be carried over.
As a preferred scheme, even after the machine learning model converges, the training can be continued according to the selected participants when the training initiator does not use the machine learning model, so as to improve the model. As a further scheme, the training participants can be dynamically selected according to the set data conditions of the initiator, and the model training of the above method can be performed when the conditions are satisfied.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A data processing method based on isomorphic machine learning framework is characterized in that:
the data processing method based on the isomorphic machine learning framework comprises the following steps:
inputting training data by each training node participating in federal learning;
the training node performs characteristic processing on the training data to obtain characteristic data;
the training nodes adopt the characteristic data to carry out linear regression training of a machine learning model;
in an iteration process, each training node participating in training sends gradient information to a forwarding node, then gradient information of other nodes is obtained from the forwarding node, and local gradient information is updated and calculated;
the training node updates the model weight of the local node through the updated gradient information;
and the training node judges whether the machine learning model is converged, and exits iteration if the machine learning model is converged.
2. The isomorphic machine learning framework-based data processing method of claim 1, wherein:
and each training node participating in the federal learning carries out the training of the machine learning model locally.
3. The isomorphic machine learning framework-based data processing method of claim 2, wherein:
after each iteration, each training node participating in federated learning encrypts intermediate data generated by the machine learning model and sends the intermediate data to the forwarding node.
4. The homogeneous machine learning framework-based data processing method according to claim 3, wherein:
and the forwarding node distributes the encrypted intermediate data to each training node.
5. The homogeneous machine learning framework-based data processing method according to claim 4, wherein:
and the training node calculates the received encrypted intermediate data and the locally generated intermediate data and then carries out the next iteration.
6. The homogeneous machine learning framework-based data processing method according to claim 5, wherein:
the training nodes comprise an initiating node and a participating node of federated learning.
7. The homogeneous machine learning framework-based data processing method according to claim 6, wherein:
an initiating node of the training nodes selects a participating node that participates in federated learning.
8. The homogeneous machine learning framework-based data processing method according to claim 7, wherein:
the encryption method of the intermediate data is a Hash encryption algorithm.
9. The homogeneous machine learning framework-based data processing method according to claim 8, wherein:
the training data is a data set labeled as floating point numbers.
10. The homogeneous machine learning framework-based data processing method according to claim 9, wherein:
the training data comprises medical data.
CN202110803159.7A 2021-07-15 2021-07-15 Data processing method based on isomorphic machine learning framework Pending CN113704777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110803159.7A CN113704777A (en) 2021-07-15 2021-07-15 Data processing method based on isomorphic machine learning framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110803159.7A CN113704777A (en) 2021-07-15 2021-07-15 Data processing method based on isomorphic machine learning framework

Publications (1)

Publication Number Publication Date
CN113704777A true CN113704777A (en) 2021-11-26

Family

ID=78648723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110803159.7A Pending CN113704777A (en) 2021-07-15 2021-07-15 Data processing method based on isomorphic machine learning framework

Country Status (1)

Country Link
CN (1) CN113704777A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189825A (en) * 2018-08-10 2019-01-11 深圳前海微众银行股份有限公司 Lateral data cutting federation learning model building method, server and medium
CN111507481A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Federated learning system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189825A (en) * 2018-08-10 2019-01-11 深圳前海微众银行股份有限公司 Lateral data cutting federation learning model building method, server and medium
CN111507481A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Federated learning system

Similar Documents

Publication Publication Date Title
CN110263936B (en) Horizontal federal learning method, device, equipment and computer storage medium
Xu et al. Trust-aware service offloading for video surveillance in edge computing enabled internet of vehicles
CN105074685B (en) The multi-tenant that the social business of enterprise is calculated supports method, computer-readable medium and system
Wang et al. A novel reputation-aware client selection scheme for federated learning within mobile environments
CN105868231A (en) Cache data updating method and device
WO2022016964A1 (en) Vertical federated modeling optimization method and device, and readable storage medium
CN107889082A (en) A kind of D2D method for discovering equipment using social networks between user
Duarte et al. Improved heuristics for the regenerator location problem
Huang et al. Improving Quality of Experience in multimedia Internet of Things leveraging machine learning on big data
CN110008402A (en) A kind of point of interest recommended method of the decentralization matrix decomposition based on social networks
Zhang et al. Multiaccess edge integrated networking for Internet of Vehicles: A blockchain-based deep compressed cooperative learning approach
Usman et al. Channel allocation schemes for permanent user channel assignment in wireless cellular networks
CN114372516A (en) XGboost-based federal learning training and prediction method and device
Hsu et al. A genetic algorithm for the maximum edge-disjoint paths problem
CN116703553B (en) Financial anti-fraud risk monitoring method, system and readable storage medium
CN113055902A (en) Intelligent mobile communication network system
CN113704777A (en) Data processing method based on isomorphic machine learning framework
CN108156194A (en) A kind of form data processing method
CN113704776A (en) Machine learning method based on federal learning
CN106209984A (en) A kind of information processing method and Smart Home open platform
CN114492849B (en) Model updating method and device based on federal learning
CN111784078B (en) Distributed prediction method and system for decision tree
CN114168295A (en) Hybrid architecture system and task scheduling method based on historical task effect
Yang et al. Asynchronous Wireless Federated Learning with Probabilistic Client Selection
CN104200354B (en) A kind of information processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination