CN117076918A - Model training system and model training method based on federal learning - Google Patents
Model training system and model training method based on federal learning Download PDFInfo
- Publication number
- CN117076918A CN117076918A CN202310723674.3A CN202310723674A CN117076918A CN 117076918 A CN117076918 A CN 117076918A CN 202310723674 A CN202310723674 A CN 202310723674A CN 117076918 A CN117076918 A CN 117076918A
- Authority
- CN
- China
- Prior art keywords
- data
- calculation
- model
- unit
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000012549 training Methods 0.000 title claims abstract description 54
- 238000004364 calculation method Methods 0.000 claims abstract description 117
- 238000012544 monitoring process Methods 0.000 claims abstract description 49
- 230000008569 process Effects 0.000 claims abstract description 46
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 13
- 230000008676 import Effects 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000013210 evaluation model Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 238000004806 packaging method and process Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 42
- 238000007477 logistic regression Methods 0.000 description 18
- 238000010801 machine learning Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 7
- 241000720945 Hosta Species 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000003999 initiator Substances 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012417 linear regression Methods 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a model training system and a model training method based on federal learning, wherein the model training system comprises a data acquisition layer: acquiring data, encrypting, acquiring a component required to be calculated, and managing the acquired data and the component; calculation layer: performing joint modeling on the data sources of all parties in a federal computing mode, and then performing model training to generate a model; model layer: the model is used for managing the model generated by the calculation layer; state monitoring layer: the system is used for monitoring the calculation process of the calculation layer and monitoring the server; basic data layer: for managing data generated in the system. The invention also provides a model training method based on the model training system based on federal learning. The method effectively reduces the threshold of developing the federation algorithm by the user, simplifies the step of importing the federation algorithm into the service scene, and is more convenient for model use.
Description
Technical Field
The invention relates to a model training system, in particular to a model training system based on federal learning, and further relates to a model training method based on the model training system.
Background
Federal learning (Federated Learning) is a new branch of artificial intelligence, opening the door for new era of machine learning.
What is machine learning first? The computer is used for solving uncertainty problems like people, such as judging acquaintances and names thereof under different illumination conditions (face recognition), deciding whether to borrow money to someone according to evaluation of historic behaviors of the person (wind control admission modeling), borrowing the money (credit line modeling) and the like. The learning process of the person is to continuously accumulate experience from books, teachers and practical exploration, and the person becomes an individual with 'wisdom'; unlike machine learning, the experience of machine learning is derived from a large amount of data, and a certain field of data is received to train into an intelligent agent in the field, for example, a large amount of face images can train a face recognition or identity authentication system.
The process of using data to obtain experience is called modeling; the process of empirically making estimates or predictions for new data is referred to as reasoning. More colloquially, the modeling process is a process in which a person gets knowledge by reading a large number of books (e.g., natural numbers 100 are greater than 10), and the reasoning process is a process in which a person judges new things by the knowledge that has been formed (e.g., it can be judged that 100 yuan is more valuable than 10 yuan).
In real life, machine learning can solve many problems in digital economy. In the big data age, machine learning has available more comprehensive data and experience. The intelligent service can be realized without manual intervention in machine learning, so that the production efficiency can be greatly improved.
The biggest problem with machine learning is that machines are fed by data and are a lot of quality data.
However, in real life, most enterprises have the problems of small data volume and poor data quality, and the problems are insufficient for supporting the realization of artificial intelligence technology; data owned by business companies often has tremendous potential value from both a user and enterprise perspective. The exchange of interests is considered by two companies and even by departments between the companies, often these institutions do not provide aggregation of individual data with other companies, resulting in the data often appearing in islanding form even within the same company.
Traditional modeling is often unidirectional, for example, banks can only use data owned by their own and perform modeling, and for this reason, cannot use data from their own. Based on the three points that the realization is not supported enough, rough exchange is not allowed and the contribution value is unwilling, the data island existing in a large quantity at present and the privacy protection problem are caused, and the federal learning is generated.
The essence of federal learning is a distributed machine learning technology, or a machine learning framework, which aims to realize common modeling on the basis of ensuring data privacy safety and legal compliance. A sentence summary is that federal learning can achieve "available invisible data".
In the federal computing scene, the data are integrated in a privacy computing mode, and the trained model is combined, so that the model effect is higher than that of the model trained by the traditional data because of more data dimensions.
Therefore, how to put the theory of federal learning into practice becomes an operable system platform, and is the main direction of current research.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a model training system based on federal learning and a model training method based on the model training system.
The model training system based on federal learning of the invention comprises,
data acquisition layer: acquiring data, encrypting, acquiring a component required to be calculated, and managing the acquired data and the component;
calculation layer: the method comprises the steps of performing checksum task management on data before data calculation, calculating based on the checked data, distributing data tasks, performing joint modeling on data sources of all parties in a federal calculation mode, and performing model training to generate a model;
model layer: the model is used for managing the model generated by the calculation layer;
state monitoring layer: the system is used for monitoring the calculation process of the calculation layer and monitoring the server;
basic data layer: for managing data generated in the system.
The invention further improves, the said data acquisition layer includes data import module and assembly management module, the said data import module is used for importing the data used for calculation, calculate the data used include CSV data, database data and imported platform data that upload;
the component management module includes: the system comprises a component configuration unit, a component uploading unit, a parameter configuration unit and a component testing unit, wherein the component configuration unit is used for controlling all parties to import algorithm components, configuring the components through the component configuration unit, setting calculation parameters in the system by the parameter configuration unit after the configuration is completed, wherein each component is provided with an independent parameter configuration file, and then adopting the component testing unit to conduct component testing.
The invention further improves, after the data is imported by the data importing module, the field which takes the unique identity as the ID in the data is encrypted, and then the encrypted data is processed and stored.
The invention is further improved, the parameter configuration unit adds a check function in the configuration, the comparison and judgment of the calculated input parameters and the set parameter types are checked, the system guides the components into the calculation layer for verification, and whether the calculation logic, the parameter check and the component configuration of the components have problems or not is tested.
The invention further improves, the said calculation layer includes check module and task management module, the said check module includes:
and a data authentication unit: the method comprises the steps of checking data authority before data calculation, and judging whether another node has calculation authority or not;
parameter checking unit: the method is used for checking the calculated parameters and preventing calculation risks caused by the loss of the parameters;
resource management unit: the method comprises the steps that before calculation, server resources are counted, resource deduction is initiated on all the involved nodes, and if resource application fails, a task is created to fail;
task analysis unit: for parsing all modeling tasks into a combination of different components, and performing calculations under the scheduling of the task management module,
the task management module comprises a task control unit, a task scheduling unit, a task list unit and a model training unit, wherein,
the task control unit is used for controlling the synchronization of the calculation processes of all the parties, the task scheduling unit is used for scheduling the calculation processes, and the task list unit is used for displaying the calculation tasks to be executed by all the parties; the model training unit is used for training a model.
In the task control unit, the system can import corresponding components before controlling each party to open subtasks, perform parameter inspection on the components, then inquire computing resources, if insufficient resources appear, terminate the tasks, and after all inspection is completed, the computing layer starts a process, imports algorithm components and completes task computation.
The invention further improves, the model layer comprises an online service module and a model evaluation module, wherein the online service module comprises a data importing unit, a model importing unit and a model reasoning unit, and the model evaluation module comprises a data result output unit, a model result output unit and an evaluation result output unit.
The invention further improves, the state monitoring layer comprises a calculation monitoring module and a service monitoring module, wherein the calculation monitoring module comprises a task monitoring unit and a task log management unit, and the service monitoring module is used for monitoring the state of the service and comprises, but is not limited to, an IO monitoring unit, a disk monitoring unit, a CPU monitoring unit, a memory monitoring unit and an operation log monitoring unit.
The invention also provides a model training method based on the model training system, which is executed by more than one client and server, wherein the client and the server are both provided with the model training system based on federal learning, and the model training method is characterized by comprising the following steps:
s1: one client initiates a calculation signal, and then the data acquisition layer performs initialization operation;
s2: the server receives the calculation signal, starts the calculation process, performs initialization operation, and after receiving the client start signal, calculates by itself and dispatches each client to generate a key pair, and then sends a public key to each client for storage;
s3: the client and the server start to read data, the calculation layer calls the component to calculate, and in the calculation process, the client is synchronous with the state and calculation of the server, and the client transmits the encrypted gradient factors to the server;
s4: the client and the server respectively calculate the current gradients according to gradient factors, meanwhile, the client calculates the current loss, then the current loss and the calculated client gradient are transmitted to the server, the server updates model parameters based on the calculated gradient and the received client gradient, the client synchronously updates the model parameters based on the decrypted client gradient, and performs model training based on the updated model parameters to generate a model;
s5: in the calculation process of the steps S3-S4, a state monitoring layer of each party monitors the state of a server in the calculation process in real time, if one party fails in calculation due to various anomalies, each party synchronously calculates the failure state, and the calculation tasks of each client are closed to return calculation resources;
s6: the model layer evaluates the model to form an evaluation model dimension and displays the evaluation model dimension;
s7: after confirming the model effect, packaging the model, generating an API which can be called by the outside, and providing model service.
In step S3, the state synchronization is performed by using the gRPC method, and the data is subjected to data dicing streaming, and the data transmission method is as follows:
converting the transmission data sequence into a binary file, and then dividing the data according to the maximum data volume of the RPC to form data of a block, wherein the data is block data; and converting the block data into batch packet data, and storing the batch packet data by adopting a struct data format.
Compared with the prior art, the invention has the beneficial effects that: the risk controllable modeling between data can be realized in an encryption state, and enterprises can break out data islands on the premise of not causing data leakage, so that data results are effectively utilized to prevent risks; dynamic heat loading, the related algorithm components are directly tested in the running process, so that the modeling efficiency is improved; through the data acquisition layer, the configuration of the components can be flexible, and the calculation process and the calculation state can be visually displayed; the method adopts a parallel computing mode of a client and a server, uses fewer computing resources, and achieves the purpose of reducing the use of server resources; the method and the device have the advantages that the threshold of developing the federation algorithm by the user can be effectively reduced, the step of importing the federation algorithm into the service scene is simplified, and the model is more convenient to use.
Drawings
FIG. 1 is a system business flow diagram of the present invention;
FIG. 2 is a schematic diagram of the structure of the present invention;
FIG. 3 is a schematic diagram of data import according to the present invention;
FIG. 4 is a flow chart of a method for model training by client-side interaction with a server-side.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples.
As shown in fig. 1, the model training system of this example is named as IGN federal computing framework (IGN for short), and its main business process is started by a data importing module, and data is imported into the framework. The data information is stored in a database (such as Mysql database and Hive database), the data source is not limited to the data format, and the collected data is converted into a unified format for storage. The data import is a special component for component management, and other components except the data import component are divided into an algorithm component, a characteristic engineering component and the like. The different components are combined into a DAG (directed acyclic graph) computation flow chart to complete the federal computation task. In the process of calculating tasks, a state monitoring service is provided, and after all tasks are normally completed, model evaluation is completed, model effects are measured and calculated, and a model is generated. The trained model can be packaged and online to online service, and API call is provided for the outside, and model export can also be supported.
As shown in fig. 2, the model training system based on federal learning is in a four-layer structure in the overall function, which is respectively a data acquisition layer, a calculation layer, a model layer and a basic data layer, and is further provided with a state monitoring layer for monitoring the working states of all the IGN layers and ensuring the smooth and safe running of model training.
Specifically, the data acquisition layer is responsible for managing data, and comprises a data importing module and a component management module. The data is divided into imported csv data, database data and other data used for calculation and data from imported calculation components.
The core layer is a calculation layer and comprises checksum task management. The data authentication is to check the data authority before the data calculation and judge whether the other node has the calculation authority. Parameter checking is carried out on the calculated parameters, so that calculation risks caused by parameter missing are prevented. And the resource management function counts the server resources before calculation and initiates resource deduction to all the participating nodes. If the application resource fails, the creation task will also fail.
The main part of the computing layer is the task management layer. And the task control is used for managing the successfully created task and opening or closing the task. And (3) task scheduling, namely scheduling a plurality of tasks, analyzing the task priority and the subtask state, and controlling the task flow. Model training adopts a multiparty safe calculation principle, adopts a homomorphic encryption mode, utilizes gRPC to transmit intermediate data, adopts LMDB to perform data caching, and achieves the aim of federal modeling. And the task list is used for displaying all task states and task lists.
The state monitoring layer is divided into a calculation monitoring module and a service monitoring module. And monitoring tasks, namely checking the current task state at any time for the scheduled tasks. And the task log is used for importing the running log and the calculation log into a panel for display and providing a downloading function.
The service monitoring is divided into IO monitoring, disk monitoring CPU monitoring, operation log, memory monitoring and control to monitor the service, to complete the monitoring of the service and to prevent the calculation problem caused by server abnormality.
The model layer is used for managing the model generated by the calculation layer. Among these are mainly model evaluation and online service functions. And outputting the data result as a model calculation result, and exporting result data used for calculation. And outputting the model result, and deriving the calculated model result to derive the model result.
The base data layer manages data generated in the system, including intermediate cache data, result data, task parameters, and the like.
Taking a logistic regression federal calculation method as an example, the invention establishes a credit model to elaborate the IGN integral calculation flow by taking bank (Guest side) data and government (host side) data as training data, wherein IGN is installed on both sides. It should be understood that the invention is not limited to financial wind control scenarios, but can also be applied to federal modeling of various industries, and balance of data privacy protection and data sharing analysis can be realized through the invention.
1. Data importation
Entering a data import process, and uploading signaling data to an IGN federal computing framework by using a bank A as a Guest, wherein a field with an ID of an identity card in the data is subjected to hash encryption in the framework, so that the data ID is ensured not to be leaked in the subsequent use process.
The government as host side uploads credit information data to IGN federal computing framework, and the field with ID of the ID card in the data is subjected to hash encryption in the framework, so that the data ID is ensured not to be leaked in the subsequent use process, as shown in figure 3.
The framework supports various ways of uploading data, such as uploading CSV files, binding databases, etc. The data obtained by different uploading modes are converted into a uniform data format and are stored by using a uniform data frame. Before storage, the data is subjected to basic data cleaning, such as filling null values, removing repeated values and the like.
2. Component management
And importing the components, and importing logistic algorithm components by a Guest side and a host side. The IGN federal computing framework is configured in a flexible component manner, and both the algorithm and the process of feature data processing are regarded as one component. Therefore, both sides need to have a synchronous uploading logistic regression algorithm component to synchronously calculate.
Firstly, the traditional logistic regression algorithm is a classification algorithm based on linear regression, the general form of the linear regression is Y=aX+b, a and b are parameters, X is an independent variable, Y is a variable, and the value range of Y is positive and negative infinitely, so that the method is not suitable for classification problems. The logistic regression uses a sigmoid function to enable a model result to have probability meaning, model parameters are learned through an optimization algorithm such as a gradient descent method, data are input into a learned model, and a prediction conclusion can be directly obtained. Both logistic regression and poisson regression are linear regressions, with the emphasis on calculating the gradients of the individual parties. Because homomorphic encryption does not support loss and exponential operations in the gradient formula, taylor expansion is performed on the loss function of logistic regression to obtain a polynomial, and a shared gradient factor d is abstracted.
However, the logistic regression algorithm component mainly comprises calculation logic of federal logistic regression calculation, and is different from the traditional logistic regression calculation, the algorithm is divided into a Guest side and a Host side for calculation, and finally a result is calculated.
After the component is uploaded, the component needs to be configured, and the IGN can be identified and imported into the framework. The configuration of the components is finished, the names of the components are calculated by importing logistic regression, the descriptions of the components, the uploading time and the like are recorded and imported into a database.
After the component configuration is completed, logistic regression calculation parameters are set in the framework. Each component has an independent parameter configuration file, such as a logistic regression calculation component, which sets parameters specific to the logistic regression algorithm, such as the penalty regularization selection parameters, and specifies the input parameter type string. There are other basic algorithm parameters, default settings are used by themselves in the framework.
And adding a check function in the parameter configuration, checking and calculating the input parameters and comparing and judging the set parameter types. Due to the complexity of machine learning parameters, some parameters, such as optimizers, will be co-located with other parameters, modifying the optimizer parameters, and other parameter types will also change. At present, a recursive mode is adopted to verify parameters layer by layer.
After all parameter configurations are completed, the individual logistic regression calculation components may be tested. The IGN framework will import the components into the computation flow for verification, test the computing logic of the components, parameter checking, and whether the component configuration is problematic.
3. Calculation layer and state monitoring layer processing
The IGN federation computing framework is generally divided into task analysis, task scheduling, task control and model training in a computing task stage.
In the task parsing function, the core computing task is designed as a DAG (directed acyclic graph) mode, so all modeling tasks will parse into a combination of different components and compute under the scheduling of IGN.
For example, if a logistic regression algorithm is used as the modeling algorithm, the modeling flow is broken down into a data_io component, an interaction component, a scale component, a logistic regression component, and an eval evaluation component.
In the task scheduling function, for example, credit model calculation, 5 subtasks are obtained after task parsing. Scheduling of the IGN federation computing framework will control the computing flow of 5 subtasks. The data_io component used in the first sub task needs to perform state synchronization after the task is completed by the Guest side and the Host side. And integrating the states of the two parties by the initiator Guest party, and comparing the completion condition of the first subtask of the Guest party and the Host party. If both sides are in the success state, the Guest side computing framework will schedule the second subtask and use the interaction component, and at the same time, synchronize the Host side computing flow, and schedule the Host side to enter the computing flow of the second subtask. And so on until the last eval component completes the calculation. The IGN will schedule the communication between the Guest and Host, and schedule the calculation flow of both parties to complete the final calculation task.
In the task control function, the IGN federation computing framework imports corresponding components before controlling and opening subtasks, performs parameter inspection on the components, queries computing resources in the framework, and terminates the tasks if insufficient resources occur. After all checks are completed, the IGN will start a process, import the algorithm component, and complete the task calculation.
The number of the Guest sides in the example can be multiple, and the Host side uniformly schedules and calculates.
4. Model layer processing
(1) In the model evaluation stage, the IGN federal computing framework mainly uses various evaluation coefficients provided by the eval component to form the dimension of an evaluation model, and the dimension is displayed on a front-end page to support the output of results of all subtasks.
(2) After confirming the model effect, the IGN federation computing framework packages the model, generates an API which can be called by the outside, and provides model service.
As shown in fig. 3, in the model training method based on the model training system of the present invention, for convenience of explanation, in this example, execution of a guist B as a client and a HOST a at a server is performed, and in a model training function, a current logistic regression calculation is used as an example, and a business processing flow is the same as the processing procedure of each layer, where a federal calculation flow is as follows:
(1) An IGN federal computing framework B initiator (Guest B for short) performs an initialization operation after starting a computing process. And importing frame basic parameters, checking the input configuration parameters, initializing gRPC service, establishing a long connection channel, and configuring log logs. After the initialization operation is completed, the IGN actively calls a transfer function, uses the gRPC protocol to synchronize the state to the Host, and calculates that the main process enters a waiting state until the Host completes signals;
(2) After receiving the calculation signal, the IGN federal calculation framework A (HostA) of the Host side starts the calculation process, the IGN federal calculation framework A of the Host side also enters the initialization process as the Guest B. Initialization ofAfter the process is completed, waiting for Guest B side start signal, then calculating to generate key pair D pri And E is pub Sending public key E to initiator B pub The transmitted intermediate data is stored by using an LMDB;
(3) The HostA starts to read data, and the data imported by the component entering the flow of the fourth component of the credit model is the Scale component data of the last completed calculation(initializing w) A ). After task analysis is completed, the IGN framework controls the data flow and determines the input and output of each component. Then in the calculation, intermediate information U is generated A =W A X A Encrypting the original intermediate information and generating random information Ur to generate new intermediate information U Ar =U A +ur is sent to GuestB. IGN framework uses Paillier encryption algorithm, < >>Encryption mode is the same as 2, wherein the subitem: n, g is the Paillier public key, r is the random number defined by the Paillier encryption algorithm;
(4) IGN directly uses gRPC mode to carry out state synchronization, but because the intermediate result data is relatively large, the data slicing streaming mode can be carried out in the aspect of data, and the phenomenon of data loss is prevented. After the transmission data is serialized into a binary file, the data is subjected to data segmentation according to the maximum data volume of the RPC, and the data of a block is formed and is block data. And converting the block data into batch (package data), and storing the batch in a struct data format in the process of converting the block data into the batch. And through three-layer data processing and packaging treatment, stream data packet loss caused by communication abnormality is prevented. By adopting the data mode, the data can be compressed according to different levels, the delay and the transmission cost of data transmission are greatly reduced, and the transmission efficiency is improved by at least 20%.
Guest B reads the data transmitted by the Scale component through the frame processing(initializing w) B ) The sender calculates [ [ U ] according to the intermediate information n ]]=[[U Ar +U B ]],[[d]]Finally, the encrypted gradient factor [ [ d ]]]Using the data chunk streaming described above to the HostA;
(5) HostA and GuestB are respectively according to [ [ d ]]]Calculating the current gradient [ [ G ] A ]]And [ [ G ] B ]]. Meanwhile, guest B calculates the current loss [ [ L ]]Then [ [ GB ]]]And [ [ L ]]And to the HostA. H. stA uses private key to encrypt gradient [ [ G ] A ]]And [ [ G ] B ]]Decryption is carried out, and then the HostA updates model parameters W by using a gradient descent method A And will decrypt [ [ G ] B ]]To GuestB, guestB updates the model parameters w accordingly B 。
In the calculation process, the IGN frameworks of the two parties monitor the state of the server in the calculation process, such as the residual capacity of a CPU and a memory, once one party fails to calculate due to various anomalies, the two parties synchronize the calculation failure state, close the calculation task of the other party, return calculation resources and prevent the condition of resource overflow or resource occupation, so that the state display is displayed by using a visual page.
(6) After the calculation is completed, the IGN frameworks of the two parties store the data after the subtask logistic regression calculation is completed, and the LMDB mode is used for storing the data, so that the size of the stored data is reduced.
(7) The IGN frames enter task scheduling, enter a preparation state again through the IGN frames of the two sides of state synchronization, start sub-tasks formed by the last eval components by using the tasks, and start a computing process to complete final computing.
After each subtask is completed, model evaluation is started, if the model effect reaches the expected value, the IGN federal computing framework packages the model, generates an API which can be called by the outside, and provides model service.
In this example, in the IGN calculation process, the key technology stack gRPC transmission and homomorphic encryption are preferably used.
In the aspect of data encryption, the Paillier is a homomorphic encryption algorithm, and is a probability public key encryption system invented by Paillier in 1999. The principle of the Paillier encryption algorithm is based on several number theory bases: prime numbers, modulo operations, prime-to-prime relationships, and modulo inverse elements. The principle and implementation of the encryption algorithm are as follows:
(1) Host first randomly selects two unequal prime numbers p and q, satisfying gcd (pq, (p-1) (q-1))=1, which condition ensures that p and q are equal in length;
(2) Calculating the product of p and q, n=pq and λ=lcm (p-1, q-1), where lcm represents the least common multiple;
(3) By g=n+1, wherein Z represents an integer,μ=(L(g λ mod N 2 )) -1 mod N。
(4) Finally, n and g are packaged into public keys (n, g), and lambda and mu are packaged into private keys (lambda, mu) for decryption.
The formula for calculating ciphertext:where m is encryption information, and r is random number obeying normal distribution.
Decryption formula: m=l (c) λ mod n 2 )·μmod n。
In the aspect of data transmission, the gRPC streaming mode is used for replacing the traditional HTTP transmission, and the data transmission rate is ensured. gRPC is a high-performance, open-source and general RPC framework, developed based on Protobuf serialization protocol, and supports numerous development languages. The service side and protocol side oriented, HTTP/2 based design brings features such as bi-directional streaming, streaming control, header compression, multiplexing requests over a single TCP connection, etc. These characteristics make it better to perform on mobile devices, more power-efficient and space-efficient. The multiplexing characteristic makes it practical to process multiple requests at a time, and requests in the same TCP channel are not sequential and blocked, thus being true 'full duplex communication'. The concepts of Stream and Frame are introduced in HTTP/2, and all subsequent operations are sent in Stream after the TCP channel is established, while the binary Frame is the smallest unit that constitutes the Stream, belonging to streaming on the protocol layer.
From the above description, the invention has the following innovative points:
(1) Dynamic heat loading, the related algorithm components are directly tested in the running process, so that the modeling efficiency is improved;
(2) Through the data acquisition layer, the configuration of the components can be flexible, and the calculation process and the calculation state can be visually displayed;
(3) The method adopts a parallel computing mode of a client and a server, uses fewer computing resources, and achieves the purpose of reducing the use of server resources;
(4) The method and the device can effectively reduce the threshold of developing the federation algorithm by the user, simplify the step of importing the federation algorithm into the service scene, and are more convenient for model use.
The above embodiments are preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, which includes but is not limited to the embodiments, and equivalent modifications according to the present invention are within the scope of the present invention.
Claims (10)
1. A model training system based on federal learning, comprising:
data acquisition layer: acquiring data, encrypting, acquiring a component required to be calculated, and managing the acquired data and the component;
calculation layer: the method comprises the steps of performing checksum task management on data before data calculation, calculating based on the checked data, distributing data tasks, performing joint modeling on data sources of all parties in a federal calculation mode, and performing model training to generate a model;
model layer: the model is used for managing the model generated by the calculation layer;
state monitoring layer: the system is used for monitoring the calculation process of the calculation layer and monitoring the server;
basic data layer: for managing data generated in the system.
2. The federal learning-based model training system according to claim 1, wherein: the data acquisition layer comprises a data importing module and a component management module, wherein the data importing module is used for importing data used for calculation, and the data used for calculation comprises uploaded CSV data, database data and imported platform data;
the component management module includes: the system comprises a component configuration unit, a component uploading unit, a parameter configuration unit and a component testing unit, wherein the component configuration unit is used for controlling all parties to import algorithm components, configuring the components through the component configuration unit, setting calculation parameters in the system by the parameter configuration unit after the configuration is completed, wherein each component is provided with an independent parameter configuration file, and then adopting the component testing unit to conduct component testing.
3. The federal learning-based model training system according to claim 2, wherein: after the data is imported by the data importing module, the field taking the unique identity as the ID in the data is encrypted, and then the encrypted data is processed and stored.
4. The federal learning-based model training system according to claim 2, wherein: the parameter configuration unit adds a check function in the configuration, checks the comparison and judgment of the calculated input parameters and the set parameter types, the system guides the components into the calculation layer for verification, and tests whether the calculation logic, the parameter check and the component configuration of the components have problems.
5. The federal learning-based model training system according to claim 1, wherein: the computing layer comprises a checking module and a task management module, wherein the checking module comprises:
and a data authentication unit: the method comprises the steps of checking data authority before data calculation, and judging whether another node has calculation authority or not;
parameter checking unit: the method is used for checking the calculated parameters and preventing calculation risks caused by the loss of the parameters;
resource management unit: the method comprises the steps that before calculation, server resources are counted, resource deduction is initiated on all the involved nodes, and if resource application fails, a task is created to fail;
task analysis unit: for parsing all modeling tasks into a combination of different components, and performing calculations under the scheduling of the task management module,
the task management module comprises a task control unit, a task scheduling unit, a task list unit and a model training unit, wherein,
the task control unit is used for controlling the synchronization of the calculation processes of all the parties, the task scheduling unit is used for scheduling the calculation processes, and the task list unit is used for displaying the calculation tasks to be executed by all the parties; the model training unit is used for training a model.
6. The federally-learning-based model training system according to claim 5, wherein: in the task control unit, the system introduces corresponding components before controlling each party to start subtasks, performs parameter inspection on the components, inquires about computing resources, terminates the tasks if insufficient resources appear, and starts a process after all inspection is completed, introduces algorithm components and completes task calculation.
7. The federal learning-based model training system according to claim 1, wherein: the model layer comprises an online service module and a model evaluation module, wherein the online service module comprises a data importing unit, a model importing unit and a model reasoning unit, and the model evaluation module comprises a data result output unit, a model result output unit and an evaluation result output unit.
8. The federal learning-based model training system according to claim 1, wherein: the state monitoring layer comprises a calculation monitoring module and a service monitoring module, wherein the calculation monitoring module comprises a task monitoring unit and a task log management unit, and the service monitoring module is used for monitoring the state of service and comprises, but is not limited to, an IO monitoring unit, a disk monitoring unit, a CPU monitoring unit, a memory monitoring unit and an operation log monitoring unit.
9. Model training method based on federal learning according to any of claims 1-8, executed by more than one client and server, both the client and server being provided with a model training system based on federal learning, characterized in that the model training method comprises the steps of:
s1: one client initiates a calculation signal, and then the data acquisition layer performs initialization operation;
s2: the server receives the calculation signal, starts the calculation process, performs initialization operation, and after receiving the client start signal, calculates by itself and dispatches each client to generate a key pair, and then sends a public key to each client for storage;
s3: the client and the server start to read data, the calculation layer calls the component to calculate, and in the calculation process, the client is synchronous with the state and calculation of the server, and the client transmits the encrypted gradient factors to the server;
s4: the client and the server respectively calculate the current gradients according to gradient factors, meanwhile, the client calculates the current loss, then the current loss and the calculated client gradient are transmitted to the server, the server updates model parameters based on the calculated gradient and the received client gradient, the client synchronously updates the model parameters based on the decrypted client gradient, and performs model training based on the updated model parameters to generate a model;
s5: in the calculation process of the steps S3-S4, a state monitoring layer of each party monitors the state of a server in the calculation process in real time, if one party fails in calculation due to various anomalies, each party synchronously calculates the failure state, and the calculation tasks of each client are closed to return calculation resources;
s6: the model layer evaluates the model to form an evaluation model dimension and displays the evaluation model dimension;
s7: after confirming the model effect, packaging the model, generating an API which can be called by the outside, and providing model service.
10. The model training method of claim 9, wherein: in step S3, the gRPC mode is used for state synchronization, and the data is subjected to data dicing streaming, and the data transmission method is as follows:
converting the transmission data sequence into a binary file, and then dividing the data according to the rpc maximum data amount to form data of a block, wherein the data is block data; and converting the block data into batch packet data, and storing the batch packet data by adopting a struct data format.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723674.3A CN117076918A (en) | 2023-06-16 | 2023-06-16 | Model training system and model training method based on federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723674.3A CN117076918A (en) | 2023-06-16 | 2023-06-16 | Model training system and model training method based on federal learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117076918A true CN117076918A (en) | 2023-11-17 |
Family
ID=88718230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310723674.3A Pending CN117076918A (en) | 2023-06-16 | 2023-06-16 | Model training system and model training method based on federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117076918A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117932685A (en) * | 2024-03-22 | 2024-04-26 | 智慧眼科技股份有限公司 | Privacy data processing method and related equipment based on longitudinal federal learning |
-
2023
- 2023-06-16 CN CN202310723674.3A patent/CN117076918A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117932685A (en) * | 2024-03-22 | 2024-04-26 | 智慧眼科技股份有限公司 | Privacy data processing method and related equipment based on longitudinal federal learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021115480A1 (en) | Federated learning method, device, equipment, and storage medium | |
US20230028606A1 (en) | Method and apparatus for vertical federated learning | |
US20230078061A1 (en) | Model training method and apparatus for federated learning, device, and storage medium | |
CN112949760B (en) | Model precision control method, device and storage medium based on federal learning | |
CN108491267B (en) | Method and apparatus for generating information | |
JP2020017256A (en) | System for performing verification in block chain | |
CN110084377A (en) | Method and apparatus for constructing decision tree | |
US11494342B1 (en) | Computer-based systems configured to utilize an event log to pre-authorize distributed network events and methods of use thereof | |
CN111506909B (en) | Method and system for interaction of tax data | |
CN109815344B (en) | Network model training system, method, apparatus and medium based on parameter sharing | |
CN105260292B (en) | A kind of log recording method, apparatus and system | |
US20220092056A1 (en) | Technologies for providing prediction-as-a-service through intelligent blockchain smart contracts | |
CN117076918A (en) | Model training system and model training method based on federal learning | |
CN113836809B (en) | Cross-industry data joint modeling method and system based on block chain and federal learning | |
CN111951363A (en) | Cloud computing chain-based rendering method and system and storage medium | |
CN112767113A (en) | Account checking data processing method, device and system based on block chain | |
CN115297008B (en) | Collaborative training method, device, terminal and storage medium based on intelligent computing network | |
US20240232269A1 (en) | Hybrid metaverse using edge nodes to support a soft repository for event processing | |
CN114817739A (en) | Industrial big data processing system based on artificial intelligence algorithm | |
Wang et al. | zkfl: Zero-knowledge proof-based gradient aggregation for federated learning | |
CN117521102A (en) | Model training method and device based on federal learning | |
CN113807157A (en) | Method, device and system for training neural network model based on federal learning | |
CN113887740B (en) | Method, device and system for jointly updating model | |
CN110119315A (en) | Rendering method, relevant device and system based on block chain | |
CN115292144A (en) | Credibility evaluation method, device and equipment for multi-party model and multi-party financial model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |