CN115169594A

CN115169594A - Learning debugging method and debugging system of privacy protection distributed machine

Info

Publication number: CN115169594A
Application number: CN202211100671.6A
Authority: CN
Inventors: 刘川意; 段少明; 何田雨; 韩培义
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2022-10-11

Abstract

The invention is suitable for the field of learning and debugging of distributed machines, and provides a learning and debugging method and a debugging system of a privacy-protecting distributed machine, wherein the learning and debugging method comprises the following steps: step S1: firstly, a data analyst builds a distributed machine learning pipeline; step S2: in the local client training process; and step S3: the server receives the model from local training and the calculated debugging intermediate value; and step S4: the server debugging module detects the federal training of the current round based on debugging metadata collected by the server and debugging intermediate values transmitted by each client according to a distributed machine learning debugging algorithm; step S5: and after the federal training is finished, outputting a debugging report of the training. Aims to solve the technical problems existing in the background technology.

Description

Learning debugging method and system for privacy protection distributed machine

Technical Field

The invention belongs to the field of distributed machine learning and debugging, and particularly relates to a learning and debugging method and a debugging system of a privacy-protecting distributed machine.

Background

For the federal learning training, patent CN 114638377A provides a model training method, device and electronic device based on federal learning, wherein the method comprises the following steps: processing external intermediate data and local feature data by using a key to obtain initial secret data, wherein the key is obtained from a central node, and the external intermediate data is obtained from other training nodes; sending the initial secret state difference value to a central node, receiving a screened index sent by the central node, wherein the initial secret state data is used for determining the screened index; determining a dense state gradient value by using the updated dense state difference value and the training subset corresponding to the updated dense state difference value; sending the dense gradient value to the central node, receiving the target gradient value sent by the central node, and updating the current model by using the target gradient value. The federated learning algorithm only considers how the model is trained in the federated learning environment, but does not consider the training problem except how to debug the model when the model is trained, and the model effect is improved.

The patent CN 111796811A provides a flow control engine system for supporting breakpoint debugging in bang learning, which includes a flow graph generation module generating a logic flow graph at the back end; the scheduling queue module is used for traversing nodes of the logic flow graph; the operation container module is used for creating a task ID for each task and transmitting the task ID to the life cycle manager; the breakpoint management module is used for setting the state of the running container and triggering the breakpoint to trigger a pause state and a continuous running state; the life cycle manager is used for managing the node running state in the container and controlling the complete life cycle of the container running process; the context manager is used to save the currently running intermediate results in the node state. The flow control engine system supporting breakpoint debugging in federal learning is adopted, the system runs in a breakpoint mode, and is suspended after running to be fused, and some intermediate results after the system is currently run can be checked, so that federal learning modeling personnel can conveniently debug. The Federal learning and debugging can only pause training through a breakpoint in the training process so that a debugging person can directly check static data in the model training process, on one hand, the training problem of the model cannot be automatically detected, and on the other hand, no method is available for analyzing and tracing the problem of the model after the training is finished.

In the dependent distributed training, an embodiment of a patent CN 114580651A discloses a method, an apparatus, a device, a system and a computer-readable storage medium for federated learning, where the method includes: first, each second device obtains data distribution information and sends the data distribution information to the first device. And then, the first equipment receives data distribution information sent by a plurality of second equipment participating in federal learning, so that a matched federal learning strategy is selected according to the data distribution information sent by the plurality of second equipment. And then, the first equipment sends a parameter reporting strategy corresponding to the federal learning to at least one second equipment in the plurality of second equipment. The second device receiving the parameter reporting policy is configured to obtain second gain information according to the parameter reporting policy and the current training sample, where the second gain information is used to obtain a second model of the second device. According to the method and the device, the interference caused by the phenomenon of non-independent and same distribution of data is reduced, and the second model with better performance can be obtained. The interference of the non-independent same distribution phenomenon of data on the training of the Federal learning model is reduced by acquiring data distribution and selecting a proper Federal learning strategy, but the problems existing in Federal learning and debugging are far more than the non-independent same distribution of the data, and the Federal learning and debugging can not detect other problems in other data preprocessing, feature engineering or training processes.

In machine learning commissioning, patent CN 114503132A discloses a method, system and computer readable medium for commissioning and profiling of machine learning model training. A machine learning analysis system receives data associated with training of a machine learning model. The data is collected by a machine learning training cluster. The machine learning analysis system performs an analysis of the data associated with the training of the machine learning model. The machine learning analysis system detects one or more conditions associated with the training of the machine learning model based at least in part on the analysis. The machine learning analysis system generates one or more alerts describing the one or more conditions associated with the training of the machine learning model. However, this patent automatically detects and alarms the collected data by collecting the relevant data in the training process of the machine learning model, but it cannot be directly applied to the federal learning.

In terms of data collection, patent CN 114066314A provides a data tracking method, device, server, and storage medium, where the method includes receiving an index to be tracked and an index type of the index to be tracked, which are input by a user when a configuration start operation is detected, displaying a main interactive interface corresponding to the index type according to the index type, receiving early warning information, which is input by the user on the main interactive interface and is specific to the index to be tracked, acquiring a plurality of target data according to the early warning information, performing operation processing on the plurality of target data according to a target operation relationship corresponding to the plurality of target data to obtain an operation result, determining an early warning trigger state according to the operation result, and performing early warning tracking on the index to be tracked according to an early warning manner in the early warning information when the early warning trigger state indicates that early warning needs to be triggered. The method and the device can realize tracking of the index to be tracked in an information configuration mode, and are beneficial to improving the tracking efficiency. The patent realizes the functions of tracking, collecting and analyzing variables according to the user input specification in a data tracking mode. But not for federal learning.

Disclosure of Invention

The invention aims to provide a learning debugging method and a debugging system of a privacy-preserving distributed machine, and aims to solve the technical problems in the background technology.

The invention is realized in such a way, a learning debugging method and a debugging system of a privacy protection distributed machine, wherein the learning debugging method comprises the following steps:

step S1: firstly, a data analyst builds distributed machine learning pipeline, and distributes the distributed machine learning pipeline to all client sides of the federation for training by using local data;

step S2: in the local client training process, a non-invasive metadata collection module of the client automatically analyzes a distributed machine learning pipeline script, collects metadata related to debugging, asynchronously reads the debugging metadata through a debugging module of the client, calculates an intermediate value according to a distributed machine learning debugging algorithm, and transmits the intermediate value to a server along with a model after one local training round;

and step S3: the server receives the models from the local training and the calculated debugging intermediate values, on one hand, the Federal learning algorithm is used for aggregating all client models and testing the global model effect of the training, and on the other hand, the server uses the server non-intrusive metadata collection model to collect debugging metadata in the process;

and step S4: the server debugging module detects the federal training of the current round according to the distributed machine learning debugging algorithm based on debugging metadata collected by the server and debugging intermediate values transmitted by each client, if a training problem is detected, a warning is given to a data analyst through the debugging feedback module, the data analyst debugs the distributed machine learning pipeline according to warning information, and if the training problem is not detected, the next round of federal training is started;

step S5: and after the federal training is finished, outputting a debugging report of the training.

The further technical scheme of the invention is as follows: the client non-invasive metadata collection module in the step S2 automatically analyzes the distributed machine learning pipeline script and collects metadata related to debugging therein, including the following steps:

step S21: a code analysis stage;

step S22: and a metadata acquisition stage.

The further technical scheme of the invention is as follows: the step S21 further includes the steps of:

step S211: obtaining an abstract syntax tree according to a source code;

step S212: and traversing the abstract syntax tree, acquiring the function call and variable information of the program, and forming a data flow diagram.

The further technical scheme of the invention is as follows: the step S22 further includes the steps of:

step S221: identifying target data;

step S222: target data is acquired.

The further technical scheme of the invention is as follows: the concrete step of step S211 is to present the code through the structure of the abstract syntax tree, and obtain the abstract syntax tree object through the interface of the programming language.

The further technical scheme of the invention is as follows: the concrete step of step S212 is to traverse the abstract syntax tree object through the corresponding interface, and obtain function call and variable information in the program after traversing, where the variables and the functions have input-output relationships, and the functions and the variables are connected in a staggered manner according to the relationships to form a data flow graph.

The invention further adopts the technical scheme that: the specific step of step S221 is to classify the target data into two types: one is the model parameters and inputs and outputs that are closely related to the specific framework of machine learning, becoming framework-related metadata; another type is customized by the model developer.

The further technical scheme of the invention is as follows: the step S222 specifically includes that a variable has a unique variable name during the running process of the code, and if the variable name is in the target metadata list, the variable is added to the metadata collection dictionary, and after multiple rounds of training, the dictionary will collect multiple rounds of data values of the variable data.

The invention also aims to provide a debugging system of the learning and debugging method of the privacy-preserving distributed machine, which comprises a server, a plurality of clients connected with the server and a debugging feedback module connected with the server.

The further technical scheme of the invention is as follows: the server comprises a model aggregation, a server non-invasive metadata collection module connected with the model aggregation, a server metadata storage connected with the server non-invasive metadata collection module, and a server debugging module connected with the server metadata storage, wherein the debugging feedback module is connected with the server debugging module; the client comprises a local training connected with the model aggregation, a client non-invasive metadata collection module connected with the local training, a client metadata storage connected with the client non-invasive metadata collection module, and a client debugging module respectively connected with the client metadata storage and the server debugging module.

The invention has the beneficial effects that: this application has following advantage: the method comprises the steps of automatically detecting training problems in privacy protection distributed machine learning training processes such as federal learning and the like, helping a data analyst debug the problems and improving model effects, through a non-invasive metadata collection method, not embedding any data collection codes in a machine learning training script, and aiming at the current common distributed training problems, providing a series of distributed machine learning debugging rules for detecting and debugging a machine learning model in the distributed training process and improving the model effects.

Drawings

FIG. 1 is a block diagram of a system provided by an embodiment of the invention;

FIG. 2 is an overall flow diagram of the non-invasive metadata collection provided by an embodiment of the present invention;

FIG. 3 is example code for deep learning provided by embodiments of the present invention;

FIG. 4 is an exemplary code data flow diagram provided by an embodiment of the present invention;

fig. 5 is a distributed machine learning debugging diagram provided by an embodiment of the present invention.

Detailed Description

The data as a novel production element has great value, and the data safety becomes a big 'road barricade' in the process of releasing the data value. Federal learning has been extensively studied because it can release data value while protecting data privacy. The federated learning is essentially a distributed machine learning framework, a plurality of client nodes cooperatively train a deep learning model under the coordination of a central server, the model is locally trained on each participant, and the model gradient or parameters are shared after local training without sharing data, so that data value mining is realized under privacy protection. However, federal learning of such data with invisible security philosophy can bring new problems while protecting the data. That is, in the machine learning and training process, when the model test performance is reduced or the training is not converged, since the training data and the training process are not contactable, it becomes a challenge how to debug the machine learning under the privacy protection.

A data analyst trains machine learning tasks in a distributed machine learning training system such as federal learning, and machine learning pipeline is distributed to various clients to be trained by real data. When a client side carries out local training, the non-invasive metadata collection module analyzes the machine learning pipeline script, and training metadata (model weight, gradient, input, output and the like) are obtained when the machine learning training script runs. And the local client debugging module calculates the acquired metadata to obtain debugging intermediate values, and the intermediate results are sent to the server side for aggregation. When a server side and a global model are aggregated and tested, a non-invasive metadata collection module collects metadata (model weight or gradient) in the process, and a server side debugging module carries out debugging problem detection on the round of distributed training based on the collected metadata and the middle value of each client side. If the training problem is found, a warning is given to a data analyst, and the data analyst debugs the machine to learn pipeline according to the warning problem.

Fig. 1 shows a debugging system of a learning and debugging method for a privacy-preserving distributed machine according to the present invention, where the debugging system includes a server, a plurality of clients connected to the server, and a debugging feedback module connected to the server.

The server comprises a model aggregation, a server non-invasive metadata collection module connected with the model aggregation, a server metadata storage connected with the server non-invasive metadata collection module, and a server debugging module connected with the server metadata storage, wherein the debugging feedback module is connected with the server debugging module; the client comprises a local training connected with the model aggregation, a client non-intrusive metadata collection module connected with the local training, a client metadata storage connected with the client non-intrusive metadata collection module, and a client debugging module respectively connected with the client metadata storage and the server debugging module.

A learning and debugging method of a privacy-preserving distributed machine comprises the following steps:

step S1: firstly, a data analyst builds distributed machine learning pipeline, and distributes the distributed machine learning pipeline to all client sides of the federation for training by using local data; the client-side local training process is completely transparent to the data analyst.

Step S2: in the local client training process, the client non-invasive metadata collection module automatically analyzes the distributed machine learning pipeline script, collects metadata related to debugging, asynchronously reads the debugging metadata through the client debugging module, calculates an intermediate value according to a distributed machine learning debugging algorithm, and transmits the intermediate value to the server along with the model after a local training round.

The client non-invasive metadata collection module in the step S2 automatically analyzes the distributed machine learning pipeline script and collects metadata related to debugging therein, including the following steps:

step S21: a code analysis stage;

the step S21 further includes the steps of:

step S211: obtaining an abstract syntax tree according to a source code; the specific steps are that the codes are presented through the structure of the abstract syntax tree, and abstract syntax tree objects are obtained through an interface of a programming language.

Step S212: traversing the abstract syntax tree, acquiring function call and variable information of a program, and forming a data flow graph; the method specifically comprises the steps of traversing an abstract syntax tree object through a corresponding interface, obtaining function call and variable information in a program after traversing, wherein the variables and the functions have input and output relations, and connecting the functions and the variables in a staggered manner according to the relation to form a data flow graph.

Step S22: and a metadata acquisition stage.

The step S22 further includes the steps of:

step S221: identifying target data; the specific steps are that target data are divided into two types: one is the model parameters and inputs and outputs that are closely related to the specific framework of machine learning, becoming framework-related metadata; another type is customized by the model developer.

Step S222: target data is acquired. The method specifically comprises the steps that a variable has a unique variable name in the running process of the code, if the variable name is in a target metadata list, the variable is added into a metadata collection dictionary, and the dictionary collects multiple rounds of data values of variable data after multiple rounds of training.

And step S3: the server receives the models from local training and the calculated debugging intermediate values, on one hand, the client models are aggregated by using a federal learning algorithm, and the global model effect of the training in the current round is tested, and on the other hand, the server collects debugging metadata in the process by using a server non-invasive metadata collection model.

And step S4: and the server debugging module detects the federal training of the current round according to the distributed machine learning debugging algorithm based on debugging metadata collected by the server and debugging intermediate values transmitted by each client, if the training problem is detected, the debugging feedback module gives an alarm to a data analyst, the data analyst debugs the distributed machine learning pipeline according to the alarm information, and otherwise, the next round of federal training is started.

Taking the task of using federal learning to train tuberculosis gene data by a data analyst and predicting whether a patient has tuberculosis according to the patient gene data as an example:

1. a data analyst writes a deep learning training script by using a deep learning framework such as a Pythrch and then trains the script on a federal learning system in combination with gene data of a plurality of hospitals.

2. In the training process, the debugging feedback module alarms and detects that the global model has overfitting errors, and simultaneously detects that a plurality of client side models have overfitting errors, and data distribution of all participants is not independent and has the same distribution errors.

3. And stopping the federal training by the data analyst, processing the detected problem, adjusting machine learning pipeline, and performing distributed training again.

Non-invasive metadata collection. In order to enable a data analyst to concentrate on machine learning, and no debugging information collection code needs to be embedded in a machine learning code, the invention provides a cost-intrusive metadata collection method, and target debugging metadata (such as model weight, gradient, input and output) are automatically collected in the running process of a machine learning training script. The non-invasive metadata collection method comprises two parts of code analysis and metadata acquisition: firstly, analyzing a machine learning training code, analyzing a source code through a syntax analysis module, constructing an abstract syntax tree, and traversing the abstract syntax tree to generate a machine learning training data flow diagram. The target metadata data is then located in the dataflow graph in conjunction with the corresponding machine learning framework grammar. Finally, values of these metadata are collected during the program run phase.

The overall flow of non-invasive metadata collection is shown in fig. 2:

in the code analysis phase:

1. and obtaining the abstract syntax tree according to the source code. Modern programming languages present codes through the structure of an abstract syntax tree in the process of compiling and running, and abstract syntax tree objects can be obtained through an interface of the programming languages.

2. And traversing the abstract syntax tree, acquiring the function call and variable information of the program, and forming a data flow diagram. For abstract syntax tree objects, we can traverse them through the corresponding interfaces. And after traversing, acquiring function call and variable information in the program, wherein the variable and the function have input and output relations, and the function and the variable are connected in a staggered way according to the relation to form a data flow diagram.

In the metadata acquisition phase:

1. target data is identified. Target data is divided into two categories: one is that the specific framework of machine learning is closely related, e.g., model parameters, inputs and outputs, etc., to framework-related metadata. The other is metadata that is customized by the model developer, e.g., training accuracy. The following process is used to identify the data name to be collected in the above two data types: (1) constructing a search list for the data flow graph according to a depth-first search algorithm; (2) judging whether a data node in the data flow graph is metadata appointed to be collected by a developer or not or whether the data node is output data of a corresponding function or not according to the sequence of the search list, and if so, adding the data node into a target metadata list; (3) and returning the target metadata list.

2. Target data is acquired. And if the variable name is in the target metadata list, the variable is added into a metadata collection dictionary, and the dictionary collects multiple rounds of data values of the variable data after multiple rounds of training.

Taking the convolutional neural network model training script written by the Pythrch to collect metadata as an example:

for example, as shown in fig. 3, a simplest convolutional neural network comprises a convolutional layer, a fully connected layer, and an active layer, and a process in which data entering the neural network passes through the above layers in a fixed order to obtain an output is called a forward propagation process. Besides forward propagation, a certain loss function is needed to help the model training, and the loss function is customized by a model developer.

In the code analysis phase:

and establishing a syntax tree for the source code of the neural network part, calling an interface built in a programming language to facilitate the syntax tree to form a data flow graph, wherein the data flow graph of the simple deep convolutional neural network is shown as a figure 4. In fig. 4, black blocks correspond to function calls in the code, white blocks correspond to variables in the code, and the function calls are linked together through the variables to form a complete neural network workflow.

In the metadata acquisition phase:

the result of the convolution process has an important image on the training effect of the model, and a model developer designates convolution layer output and neural network output as metadata to be checked. Starting the metadata identification process to determine the data name needed to be acquired in the training process: and constructing a search list for the data flow graph according to a depth-first search algorithm, traversing data nodes in the data flow graph according to the list sequence, finding that an output data node is metadata which is specified and collected by a developer, and adding output variable information into a target metadata list. Meanwhile, the convolutional layer is an objective function specified by a user, output data (the data are not real variable names) of the convolutional layer is added into a target data list, training of the convolutional neural network is started after the target metadata list is obtained in a traversing mode, and when the convolutional layer is operated to obtain output in the training process, a program can add corresponding data of the convolutional layer to corresponding positions of a metadata collection dictionary.

In the distributed machine learning training process, the machine learning model is trained by using private data at each client, and then model parameters are transmitted to the server side for aggregation. In order to effectively debug various problems occurring in machine learning pipeline in the distributed training process, the method mainly comprises a client debugging module and a server debugging module. The method is limited by privacy limitation, and debugging metadata collected by a client cannot be directly concentrated to a server for analysis, so that local debugging metadata needs to be analyzed at the client, a part of intermediate values are calculated, and the intermediate values without privacy are sent to the server for training problem detection. The server debugging module detects a training problem of the global model by combining the server debugging metadata and the debugging intermediate values of the clients, wherein the training problem is shown in table 1. Feedback to the data analyst when training problems are detected.

Distributed debugging method as shown in fig. 5, the client debugging module mainly analyzes the local debugging metadata and calculates some privacy-free debugging intermediate values, such as the local model training accuracy and the local model training loss value. The debugging intermediate values are sent to a server, a server debugging module comprises built-in and pre-customized distributed debugging rules, whether debugging problems shown in the table 1 are met or not is calculated according to the debugging intermediate values transmitted by the global model debugging metadata and the clients, and if the debugging problems are matched, the debugging problems are fed back to a data analyst through a debugging feedback module. The data analyst then adjusts the machine learning pipeline according to the feedback problem.

TABLE 1 debugging problem example Table

Taking the debugging of the over-fitting problem in the distributed training process as an example:

the overfitting of the machine learning model means that the training error is much lower than the testing error, and the formalization is expressed as:

or

. Wherein

The value of the training loss is represented,

representing training accuracy, i.e. the training loss value is much lower than the test loss value, or the training test accuracy exceeds the test accuracy by a certain threshold

. In the distributed training process, the model is trained at each client, and the model is tested at the server, so that the working flow of the distributed debugging method provided by the invention is as follows:

1. the client debugging module of each client calculates the accuracy rate of local training

And loss value of local training

Counting the number of local data samples

And sending the intermediate results to the server;

2. loss value of server side calculation model test

And test accuracy

(ii) a Meanwhile, the training loss value is calculated according to the intermediate value transmitted by each client

And training accuracy

；

3. The server debugging module detects whether the training of the current round is overfitting according to the overfitting rule:

or

. If the conditions are met, the global model is over-fitted, and the global model is reported to a data analyst; otherwise, entering the next round of distributed training.

This application has following advantage: the method comprises the steps of automatically detecting training problems in privacy protection distributed machine learning training processes such as federal learning and the like, helping a data analyst debug the problems and improving model effects, through a non-invasive metadata collection method, not embedding any data collection codes in a machine learning training script, and aiming at the current common distributed training problems, providing a series of distributed machine learning debugging rules for detecting and debugging a machine learning model in the distributed training process and improving the model effects.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A learning and debugging method of a privacy-preserving distributed machine is characterized by comprising the following steps:

step S2: in the local client training process, a client non-invasive metadata collection module automatically analyzes a distributed machine learning pipeline script, collects metadata related to debugging, asynchronously reads the debugging metadata through a client debugging module, calculates an intermediate value according to a distributed machine learning debugging algorithm, and transmits the intermediate value to a server along with a model after a local training round is finished;

and step S3: the server receives the models from local training and the calculated debugging intermediate values, on one hand, the client models are aggregated by using a federal learning algorithm, and the global model effect of the training in the current round is tested, and on the other hand, the server collects debugging metadata in the process by using a server non-invasive metadata collection model;

and step S4: the server debugging module detects the federal training of the current round according to the debugging metadata collected by the server and the debugging intermediate value transmitted by each client based on the distributed machine learning debugging algorithm, if the training problem is detected, the debugging feedback module gives an alarm to a data analyst, the data analyst debugs the distributed machine learning pipeline according to the alarm information, and if the training problem is not detected, the next round of federal training is started;

2. The learning and debugging method of claim 1, wherein the step S2 of the client non-intrusive metadata collection module automatically parsing the distributed machine learning pipeline script and collecting metadata related to debugging therein comprises the following steps:

step S21: a code analysis stage;

step S22: and a metadata acquisition stage.

3. The learning and debugging method of claim 2, wherein the step S21 further comprises the steps of:

step S211: obtaining an abstract syntax tree according to a source code;

step S212: and traversing the abstract syntax tree, acquiring function call and variable information of the program, and forming a data flow diagram.

4. The learning and debugging method of claim 2, wherein the step S22 further comprises the steps of:

step S221: identifying target data;

step S222: target data is acquired.

5. The learning and debugging method of claim 3, wherein the step S211 is embodied by rendering the code through a structure of an abstract syntax tree, and obtaining the abstract syntax tree object through an interface of a programming language.

6. The learning and debugging method of claim 5, wherein the step S212 is implemented by traversing the abstract syntax tree object through the corresponding interface, and obtaining function calls and variable information in the program after the traversal, wherein the variables and the functions have input/output relationships, and the functions and the variables are connected in an interleaving manner by the relationship to form the data flow graph.

7. The learning and debugging method of claim 4, wherein the step S221 specifically comprises the following steps: one is the model parameters and inputs and outputs that are closely related to the specific framework of machine learning, becoming framework-related metadata; the other is customized by the model developer.

8. The method for learning and debugging of claim 7, wherein the step S222 specifically includes that a variable has a unique variable name during the running of the code, and if the variable name is in the target metadata list, the variable is added to the metadata collection dictionary, and after multiple rounds of training, the dictionary collects multiple rounds of data values of the variable data.

9. The debugging system for the learning and debugging method according to any one of claims 1 to 8, wherein the debugging system comprises a server, a plurality of clients connected to the server, and a debugging feedback module connected to the server.

10. The debugging system of claim 9, wherein the server comprises a model aggregation, a server non-intrusive metadata collection module connected to the model aggregation, a server metadata store connected to the server non-intrusive metadata collection module, and a server debugging module connected to the server metadata store, the debugging feedback module being connected to the server debugging module; the client comprises a local training connected with the model aggregation, a client non-invasive metadata collection module connected with the local training, a client metadata storage connected with the client non-invasive metadata collection module, and a client debugging module respectively connected with the client metadata storage and the server debugging module.