CN113254943A

CN113254943A - Model contribution degree evaluation system based on longitudinal federal learning

Info

Publication number: CN113254943A
Application number: CN202110571771.6A
Authority: CN
Inventors: 戴夫; 王湾湾; 何浩; 姚明
Original assignee: Shenzhen Dongjian Intelligent Technology Co ltd
Current assignee: Shenzhen Dongjian Intelligent Technology Co ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2021-08-13

Abstract

The model contribution degree evaluation system based on longitudinal federal learning provided by the embodiment of the invention is applied to the field of information technology, and is provided for solving the problem of how to evaluate the contribution degree of data of participants and distribute benefits under the federal learning scene. The system extracts a target sample from a training data set; randomly generating disturbance samples based on target samples, calculating the matching degree between the target samples and the disturbance samples, preprocessing the disturbance samples through a preset function, inputting the disturbance samples into a trained federated learning model to obtain label information corresponding to the disturbance samples, calculating the contribution degree corresponding to each feature of a plurality of participants through a weighted linear regression function according to the matching degree score, the label information and the processed disturbance samples, and realizing reasonable distribution of the benefits of the federated learning model among different participants.

Description

Model contribution degree evaluation system based on longitudinal federal learning

Technical Field

The invention relates to the technical field of information, in particular to a model contribution degree evaluation system based on longitudinal federal learning.

Background

At present, with the improvement of the platform of the data security law and the improvement of the security consciousness of data privacy, more and more enterprises adopt the federal learning technology to realize that the security joint modeling can be completed without data being in a private domain. Meanwhile, the federal learning technology can be used for training the model by contribution data of each participant under the condition of ensuring the data safety of each participant, and the trained model part is shared by each participant, so that the combined modeling is realized.

In model feature evaluation and profit allocation, in the prior art, the contribution degree of each participant is often calculated according to the accumulated classification use times of features, so that model profit allocation is performed according to contribution degree data. However, in the actual joint modeling process, the data characteristics and distribution conditions of each participant data are not the same (for example, the sample size and the value range distribution under a certain characteristic), and the method for calculating the contribution degree of each participant through the accumulated classification use times of the characteristic lacks fairness and rationality.

Disclosure of Invention

The embodiment of the invention aims to provide a model contribution degree evaluation system based on longitudinal federal learning so as to realize reasonable calculation of contribution degrees of all participants. The specific technical scheme is as follows:

in a first aspect of embodiments of the present application, a model contribution degree evaluation system based on longitudinal federated learning is provided, where the system includes a first participant and one or more second participants;

the first and second parties are to: extracting a target sample from a training data set of the target sample, wherein the training data set is a data set used for training a federal learning model; performing disturbance processing on the extracted target sample to obtain a disturbance sample; calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score; processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample; inputting the disturbance sample into a federal learning model to obtain label information corresponding to the disturbance sample;

the second participant is further configured to: sending the matching degree score of the second participant and the processed disturbance sample to the first participant;

the first party is further to: receiving the matching degree score and the processed disturbance sample sent by the second participant; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

Optionally, the first party is specifically configured to: and calculating to obtain an average value of the contribution degrees corresponding to the features according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant, so as to obtain the global contribution degree corresponding to each feature.

Optionally, the second party is specifically configured to: and setting the weights of the label information and the processed disturbance sample according to the matching degree score of the label information and the processed disturbance sample.

Optionally, the first and second parties are specifically configured to: counting the characteristics of multiple dimensions of the self; and preprocessing the self disturbance sample according to the characteristics of multiple dimensionalities of the self to obtain a processed disturbance sample.

Optionally, the first and second parties are specifically configured to: counting the characteristics of multiple dimensions of the self; and carrying out standardization processing, abnormal value processing and one-hot encoding processing on the self disturbance sample according to the characteristics of multiple dimensions of the self disturbance sample to obtain the processed disturbance sample.

Optionally, the first participant is further configured to: and sending the contribution degree corresponding to each feature in the one or more features of the target sample of the second party to the second party.

Optionally, the first participant is further configured to: sending the public key of the second party to the second party;

the second party is specifically configured to: encrypting the matching degree score, the label information and the processed disturbance sample of the second party by the public key of the first party; sending the encrypted matching degree score and the processed disturbance sample to a first participant;

the first party is further to: and decrypting the encrypted matching degree score and the processed disturbance sample sent by the second party according to the private key of the second party.

In a second aspect of the embodiments of the present application, there is provided a method for evaluating a model contribution degree based on longitudinal federated learning, which is applied to a first participant in a system for evaluating a model contribution degree based on longitudinal federated learning, where the system includes the first participant and one or more second participants, the method including:

extracting a target sample from a training data set of the target sample, wherein the training data set is a data set used for training a federal learning model;

performing disturbance processing on the extracted target sample to obtain a disturbance sample; calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample;

inputting the disturbance sample into a model based on federal learning to obtain label information corresponding to the disturbance sample;

receiving the matching degree score and the processed disturbance sample sent by the second participant;

and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

In a third aspect of the embodiments of the present application, there is provided a model contribution degree evaluation method based on longitudinal federated learning, which is applied to a second participant in a model contribution degree evaluation system based on longitudinal federated learning, where the system includes a first participant and one or more second participants, the method includes:

inputting the disturbance sample into a federal learning model to obtain label information corresponding to the disturbance sample;

sending the matching degree score of the second party and the processed disturbance sample to the first party so that the first party receives the matching degree score and the processed disturbance sample sent by the second party; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

In a fourth aspect of the embodiments of the present application, there is provided a longitudinal federal learning-based model contribution degree evaluation apparatus, applied to a first participant in a longitudinal federal learning-based model contribution degree evaluation system, where the system includes the first participant and one or more second participants, the apparatus including:

the system comprises a first sample acquisition module, a second sample acquisition module and a third sample acquisition module, wherein the first sample acquisition module is used for extracting a target sample from a training data set of the first sample acquisition module, and the training data set is used for training a federal learning model;

the first disturbance processing module is used for carrying out disturbance processing on the extracted target sample to obtain a disturbance sample;

the first matching degree calculation module is used for calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

the first disturbance sample module is used for processing the disturbance sample of the first disturbance sample module by a data preprocessing method to obtain a processed disturbance sample;

the first label information acquisition module is used for inputting the disturbance sample of the first label information acquisition module into a model based on federal learning to obtain label information corresponding to the disturbance sample of the first label information acquisition module;

the information receiving module is used for receiving the matching degree score and the processed disturbance sample sent by the second participant;

and the contribution degree calculating module is used for calculating the contribution degree corresponding to each feature in one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degree corresponding to each feature according to the contribution degree corresponding to each feature in one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in one or more features of the target sample of the second participant.

In a fifth aspect of the embodiments of the present application, there is provided a longitudinal federal learning-based model contribution degree evaluation apparatus, applied to a second participant in a longitudinal federal learning-based model contribution degree evaluation system, where the system includes a first participant and one or more second participants, the apparatus includes:

the second sample acquisition module is used for extracting a target sample from a training data set of the second sample acquisition module, wherein the training data set is a data set used for training a federal learning model;

the second disturbance processing module is used for carrying out disturbance processing on the extracted target sample to obtain a disturbance sample;

the second matching degree calculation module is used for calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

the second label information acquisition module is used for processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample;

the second preprocessing module is used for inputting the disturbance sample into the federal learning model to obtain the label information corresponding to the disturbance sample;

an information sending module: sending the matching degree score of the second participant and the processed disturbance sample to the first participant so that the first participant receives the matching degree score and the processed disturbance sample sent by the second participant; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

In another aspect of this embodiment, an electronic device is further provided, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the model contribution degree evaluation methods based on longitudinal federal learning applied to the first participant when executing the program stored in the memory.

a memory for storing a computer program;

and the processor is used for realizing any one of the model contribution degree evaluation methods based on longitudinal federal learning applied to the second participant when executing the program stored in the memory.

In another aspect of the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and when executed by a processor, the computer program implements any of the above methods for evaluating model contribution based on longitudinal federal learning applied to a first participant.

In another aspect of the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and when executed by a processor, the computer program implements any of the above methods for evaluating model contribution based on longitudinal federal learning and applied to a second party.

In another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-described methods for longitudinal federal learning based model contribution assessment as applied to a first party.

In another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-described methods for model contribution assessment based on longitudinal federal learning for application to a second party.

The embodiment of the invention has the following beneficial effects:

the model contribution degree evaluation system based on longitudinal federal learning provided by the embodiment of the invention can extract target samples from training data sets of the system; performing disturbance processing on the extracted target sample to obtain a disturbance sample; calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score; processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample; inputting the disturbance sample into a model based on federal learning to obtain label information corresponding to the disturbance sample; and receiving the matching degree score and the processed disturbance sample sent by the second participant, and summarizing the local feature contribution value of the sample to obtain the global contribution degree of the features. And reasonable calculation of the contribution degree is realized, so that benefits obtained by the model based on the federal learning are distributed to all the participants according to the calculated contribution degree, and reasonable distribution of benefits of the model based on the federal learning among different participants is realized.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by referring to these drawings.

Fig. 1 is a schematic structural diagram of a model contribution degree evaluation system based on longitudinal federal learning according to an embodiment of the present application;

fig. 2 is a diagram illustrating an example of a model contribution degree evaluation method based on longitudinal federated learning according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a model contribution degree evaluation method based on longitudinal federal learning according to an embodiment of the present application;

fig. 4 is another schematic flow chart of a model contribution degree evaluation method based on longitudinal federal learning according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a model contribution degree evaluation device based on longitudinal federal learning according to an embodiment of the present application;

fig. 6 is another schematic structural diagram of a model contribution degree evaluation device based on longitudinal federal learning according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to solve the problem that in the prior art, the contribution calculation of each participant lacks fairness and rationality during joint modeling, a first aspect of the embodiments of the present application provides a model contribution evaluation system based on longitudinal federal learning.

In particular, referring to fig. 1, the system includes a first party 101 and one or more second parties 102;

the first party 101 and the second party 102 are configured to: extracting a target sample from a training data set of the target sample, wherein the training data set is a data set used for training a federal learning model; performing disturbance processing on the extracted target sample to obtain a disturbance sample; calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score; processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample; inputting the disturbance sample into a federal learning model to obtain label information corresponding to the disturbance sample;

the second party 102 is also for: sending the matching degree score of the second participant and the processed disturbance sample to the first participant 101;

the first party 101 is also configured to: receiving the matching degree score and the processed disturbance sample sent by the second participant; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

The federate learning-based model in the embodiment of the present application may be a classification model or a regression model, and specifically may be an XGboost (Extreme Gradient Boosting) model, an SVM (Support Vector Machine) model, and the like. The first participant 101 and the second participant 102 may be different terminals in the process of training the model based on federal learning, and specifically, the terminals may be computers, smart phones, servers, and the like.

After the first participant 101 and the second participant 102 select the target sample from their own training data sets, the collected target sample may be shared, for example, after the target sample is collected, the ID of the target sample is synchronized between different participants. In the actual use process, the first participant 101 and the second participant 102 select a target sample from their own training data sets, and the target sample may be collected by a random sampling method.

The first participant 101 and the second participant 102 calculate the matching degree between their own target samples and disturbance samples to obtain a matching degree score, which can be calculated by a gaussian kernel function.

The disturbance sample of the second participant 102 is preprocessed through the preset function to obtain the processed disturbance sample, so that data privacy leakage caused in the process that the second participant 102 sends the disturbance sample of the second participant to the first participant 101 can be prevented. In the data transmission process, the transmitted data is disturbed, such as normalized and standardized, so that the information of the real sample cannot be leaked.

The first participant 101 calculates, according to the matching degree score of the first participant 101, the label information, the processed disturbance sample, the matching degree score of the second participant 102, the label information, and the processed disturbance sample, a contribution degree corresponding to each of the one or more features of the target sample of the first participant 101 and a contribution degree corresponding to each of the one or more features of the target sample of the second participant 102 through a preset linear regression function, may first set a weight of the label information and the processed disturbance sample according to the matching degree score, and then calculate a contribution degree corresponding to each feature through the preset linear regression function.

In this embodiment of the application, the calculated contribution degree corresponding to each feature may be used to allocate benefits obtained by a pre-trained federate learning-based model, for example, earnings obtained by a trained model may be allocated according to the contribution degree of data provided by each participant in the model training process.

Therefore, the model contribution degree evaluation system based on the longitudinal federal learning can realize reasonable calculation of the contribution degree, so that benefits obtained by the model based on the federal learning are distributed to all the participants according to the calculated contribution degree, and reasonable distribution of the benefits of the model based on the federal learning among different participants is realized.

Optionally, the first participant 101 is specifically configured to: and calculating to obtain an average value of the contribution degrees corresponding to the features according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant, so as to obtain the global contribution degree corresponding to each feature.

The average value of the contribution degrees corresponding to the features is calculated according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant 101 and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant 102, and the contribution degrees corresponding to the features are aggregated through various modes, such as homogeneous weighted processing, heterogeneous processing, disturbance curve area processing, and the like, and then the average value of the contribution degrees corresponding to the features is calculated to obtain the overall contribution degree corresponding to the features.

Optionally, the first participant 101 and the second participant 102 are specifically configured to: setting the weight of the label information and the processed disturbance sample according to the matching degree score of the label information and the processed disturbance sample; and calculating the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant 101 and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant 102 according to the set label information and the processed disturbance sample by a preset linear regression function.

Optionally, the first participant 101 and the second participant 102 are specifically configured to: counting the characteristics of multiple dimensions of the self; and preprocessing the self disturbance sample according to the characteristics of multiple dimensionalities of the self to obtain a processed disturbance sample.

Optionally, the first participant 101 and the second participant 102 are specifically configured to: counting the characteristics of multiple dimensionalities of the self to obtain the numerical characteristic and the classification characteristic of the self; and preprocessing the self disturbance sample according to the numerical characteristic and the classification characteristic of the self to obtain a processed disturbance sample.

Optionally, the first participant 101 and the second participant 102 are specifically configured to: counting the characteristics of multiple dimensions of the self; and carrying out standardization processing, abnormal value processing and one-hot encoding (one-hot encoding) processing on the disturbance sample according to the characteristics of multiple dimensions of the disturbance sample to obtain the processed disturbance sample.

Optionally, the first participant 101 is further configured to: the contribution degree corresponding to each of the one or more features of the target sample of the second participant 102 is sent to the second participant 102.

Optionally, the first participant 101 is further configured to: sending the public key of itself to the second party 102;

the second party 102 is specifically configured to: encrypting the matching degree score, the label information and the processed disturbance sample of the second party by the public key of the first party; sending the encrypted matching degree score and the processed disturbance sample to a first participant;

the first party 101 is also configured to: and decrypting the encrypted matching degree score and the processed disturbance sample sent by the second party according to the private key of the second party.

Referring to fig. 2, fig. 2 is a diagram illustrating an example of a model contribution degree evaluation method based on longitudinal federated learning according to an embodiment of the present application;

the A party is used for:

1. calculating the mean value and variance of the local features;

2. sampling around the real sample to generate a disturbance sample, and calculating Distance (Distance) between the disturbance sample and the real sample;

3. loading a trained model (XGboost, NN (Neural Network, Neural Network model), LR (Logistic regression model), and the like);

4. predicting the disturbance sample by using the trained model to obtain a predicted value y, wherein the A party and the B party have interaction in the prediction process;

5. fitting the disturbance sample by using weighted linear regression according to the information sent by the party A and the party B;

6. obtaining respective feature importance;

7. and sending the B-party feature importance result to the B-party.

The B participant is used for:

1. calculating the mean value and variance of the local features;

2. sampling around the real sample to generate a disturbance sample, and calculating Distance between the disturbance sample and the real sample;

3. loading a trained model (XGboost, NN, LR, etc.);

5. encrypting and sending the generated disturbance sample and inverse-distance information to the A side;

6. the characteristic importance of the present invention of the a-party transmission is obtained.

In a second aspect of the embodiments of the present application, there is provided a longitudinal federal learning-based model contribution degree evaluation method, applied to a first participant in a longitudinal federal learning-based model contribution degree evaluation system, where the system includes the first participant and one or more second participants, and referring to fig. 3, the method includes:

step S31, extracting target samples from a training data set of the target samples, wherein the training data set is a data set used for training a federal learning model;

step S32, performing disturbance processing on the extracted target sample to obtain a disturbance sample;

step S33, calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

step S34, processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample;

step S35, inputting the disturbance sample into a model based on federal learning to obtain label information corresponding to the disturbance sample;

step S36, receiving the matching degree score and the processed disturbance sample sent by the second participant;

step S37, calculating, according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant, a contribution degree corresponding to each of the one or more features of the target sample of the first participant and a contribution degree corresponding to each of the one or more features of the target sample of the second participant through a preset linear regression function, and calculating a global contribution degree corresponding to each feature according to the contribution degree corresponding to each of the one or more features of the target sample of the first participant and the contribution degree corresponding to each of the one or more features of the target sample of the second participant.

Optionally, calculating a global contribution degree corresponding to each feature according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first party and the contribution degree corresponding to each feature in the one or more features of the target sample of the second party, includes:

and calculating to obtain an average value of the contribution degrees corresponding to the features according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant, so as to obtain the global contribution degree corresponding to each feature.

Optionally, the disturbing processing is performed on the target sample of the self to obtain a disturbed sample, including:

counting the characteristics of multiple dimensions of the self;

and carrying out standardization processing, abnormal value processing and one-hot encoding processing on the self disturbance sample according to the characteristics of multiple dimensions of the self disturbance sample to obtain the processed disturbance sample.

Optionally, the statistics of features of multiple dimensions of the device itself includes:

counting the characteristics of multiple dimensions of the self;

and preprocessing the self disturbance sample according to the characteristics of multiple dimensionalities of the self to obtain a processed disturbance sample.

Optionally, the method further includes: sending the contribution degree corresponding to each feature in one or more features of the target sample of the second party to the second party through the first party;

optionally, the method further includes:

and sending the contribution degree corresponding to each feature in the one or more features of the target sample of the second party to the second party.

Optionally, the method further includes:

sending a self public key to a second participant so that the second participant encrypts the matching degree score, the label information and the processed disturbance sample of the second participant through the public key of the first participant; sending the encrypted matching degree score and the processed disturbance sample to a first participant;

and decrypting the encrypted matching degree score and the processed disturbance sample sent by the second party according to the private key of the second party.

Therefore, the model contribution degree evaluation method based on the longitudinal federal learning can realize reasonable calculation of the contribution degree, so that benefits obtained based on the federal learning model are distributed to all participants according to the calculated contribution degree, and reasonable distribution of benefits of the model based on the federal learning among different participants is realized.

In a third aspect of the embodiments of the present application, there is provided a model contribution degree evaluation method based on longitudinal federated learning, which is applied to a second party in a model contribution degree evaluation system based on longitudinal federated learning, where the system includes a first party and one or more second parties, and referring to fig. 4, the method includes:

step S41, extracting target samples from a training data set of the target samples, wherein the training data set is a data set used for training a federal learning model;

step S42, performing disturbance processing on the extracted target sample to obtain a disturbance sample;

step S43, calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

step S44, processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample;

step S45, inputting the disturbance sample into a federal learning model to obtain label information corresponding to the disturbance sample;

step S46, sending the matching degree score of the second participant and the processed disturbance sample to the first participant, so that the first participant receives the matching degree score and the processed disturbance sample sent by the second participant; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

counting the characteristics of multiple dimensions of the self;

counting the characteristics of multiple dimensionalities of the self to obtain the numerical characteristic and the classification characteristic of the self;

preprocessing the disturbance sample according to the characteristics of multiple dimensionalities of the disturbance sample to obtain a processed disturbance sample, comprising the following steps of:

and preprocessing the self disturbance sample according to the numerical characteristic and the classification characteristic of the self to obtain a processed disturbance sample.

Optionally, the step of preprocessing the disturbance sample by using a preset function to obtain a processed disturbance sample includes:

counting the characteristics of multiple dimensions of the self;

Optionally, the method further includes:

receiving a public key sent by a first participant;

encrypting the matching degree score, the label information and the processed disturbance sample by the public key;

and sending the encrypted matching degree score, the label information and the processed disturbance sample to the first participant so that the first participant decrypts the encrypted matching degree score, the label information and the processed disturbance sample sent by the second participant according to a preset private key.

Therefore, the model contribution degree evaluation method based on the longitudinal federal learning can realize reasonable calculation of the contribution degree, so that benefits obtained by the model based on the federal learning are distributed to all participants according to the calculated contribution degree, and reasonable distribution of benefits of the model based on the federal learning among different participants is realized.

In a fourth aspect of the embodiments of the present application, there is provided a longitudinal federal learning-based model contribution degree evaluation apparatus, applied to a first participant in a longitudinal federal learning-based model contribution degree evaluation system, where the system includes the first participant and one or more second participants, and referring to fig. 5, the apparatus includes:

the first sample obtaining module 501 is configured to extract a target sample from a training data set of the first sample obtaining module, where the training data set is a data set used for training a federal learning model;

the first perturbation processing module 502 is configured to perform perturbation processing on the extracted target sample to obtain a perturbation sample;

the first matching degree calculating module 503 is configured to calculate a matching degree between a target sample and a disturbance sample of the first matching degree calculating module, so as to obtain a matching degree score;

a first disturbance sample module 504, configured to process a disturbance sample of the first disturbance sample module by using a data preprocessing method to obtain a processed disturbance sample;

a first tag information obtaining module 505, configured to input a disturbance sample of the first tag information obtaining module into a model based on federal learning, and obtain tag information corresponding to the disturbance sample of the first tag information obtaining module;

the information receiving module 506 is configured to receive the matching degree score and the processed disturbance sample sent by the second party;

the contribution degree calculating module 507 is configured to calculate, according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant, a contribution degree corresponding to each of the one or more features of the target sample of the first participant and a contribution degree corresponding to each of the one or more features of the target sample of the second participant through a preset linear regression function, and calculate a global contribution degree corresponding to each feature according to the contribution degree corresponding to each of the one or more features of the target sample of the first participant and the contribution degree corresponding to each of the one or more features of the target sample of the second participant.

Optionally, the apparatus further comprises:

and the overall contribution degree calculating module is used for calculating and obtaining an average value of the contribution degrees corresponding to the features according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant, so as to obtain the overall contribution degree corresponding to each feature.

Optionally, the contribution degree calculating module 507 is specifically configured to set a weight of the label information and the processed disturbance sample according to the matching degree score of the module; and calculating the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant and the contribution degree corresponding to each feature in the one or more features of the target sample of the second participant according to the set label information and the processed disturbance sample through a preset linear regression function.

Optionally, the first perturbation sample module 504 includes:

the first feature statistics submodule is used for counting features of multiple dimensions of the first feature statistics submodule;

and the first disturbance sample processing submodule is used for carrying out standardization processing, abnormal value processing and one-hot encoding processing on the disturbance sample according to the characteristics of multiple dimensions of the first disturbance sample processing submodule to obtain the processed disturbance sample.

Optionally, the first feature statistics sub-module is specifically configured to count features of multiple dimensions of the first feature statistics sub-module to obtain numerical features and classification features of the first feature statistics sub-module;

the first disturbance sample processing submodule is specifically used for preprocessing the disturbance sample according to the numerical characteristic and the classification characteristic of the disturbance sample to obtain a processed disturbance sample.

Optionally, the first preprocessing module further includes:

and the first feature processing submodule is used for carrying out standardization processing, abnormal value processing and one-hot encoding processing on the disturbance sample according to the features of multiple dimensions of the first feature processing submodule to obtain the processed disturbance sample.

Optionally, the apparatus further comprises:

and the contribution degree sending module is used for sending the contribution degree corresponding to each feature in the one or more features of the target sample of the second party to the second party.

Optionally, the apparatus further comprises:

the public key sending module is used for sending a public key of the public key sending module to a second participant so that the second participant encrypts the matching degree score, the label information and the processed disturbance sample of the second participant through the public key of the first participant; sending the encrypted matching degree score and the processed disturbance sample to a first participant;

and the information decryption module is used for decrypting the encrypted matching degree score and the processed disturbance sample sent by the second party according to the private key of the information decryption module.

Therefore, the model contribution degree evaluation device based on the longitudinal federal learning can realize reasonable calculation of the contribution degree, so that benefits obtained by the model based on the federal learning are distributed to all the participants according to the calculated contribution degree, and reasonable distribution of benefits of the model based on the federal learning among different participants is realized.

In a fifth aspect of the embodiments of the present application, there is provided a longitudinal federal learning-based model contribution degree evaluation apparatus, applied to a second participant in a longitudinal federal learning-based model contribution degree evaluation system, where the system includes a first participant and one or more second participants, and referring to fig. 6, the apparatus includes:

a second sample obtaining module 601, configured to extract a target sample from a training data set of the second sample obtaining module, where the training data set is a data set used for training a federal learning model;

a second perturbation processing module 602, configured to perform perturbation processing on the extracted target sample to obtain a perturbation sample;

the second matching degree calculating module 603 is configured to calculate a matching degree between the target sample and the disturbance sample of the second matching degree calculating module, so as to obtain a matching degree score;

the second tag information obtaining module 604 is configured to process the self-disturbance sample by using a data preprocessing method to obtain a processed disturbance sample;

the second preprocessing module 605 is configured to input the disturbance sample of the second preprocessing module into the federal learning model to obtain tag information corresponding to the disturbance sample of the second preprocessing module;

the information sending module 606 is configured to send the matching degree score of the second party and the processed disturbance sample to the first party, so that the first party receives the matching degree score and the processed disturbance sample sent by the second party; and calculating contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the matching degree score of the processed disturbance sample and the matching degree score of the second participant and the processed disturbance sample, and calculating the global contribution degrees corresponding to the features according to the contribution degrees corresponding to the features in the one or more features of the target sample of the first participant and the contribution degrees corresponding to the features in the one or more features of the target sample of the second participant.

Optionally, the second perturbation processing module includes:

the second feature statistics submodule is used for counting features of multiple dimensions of the second feature statistics submodule;

and the second disturbance sample processing submodule is used for preprocessing the disturbance sample according to the characteristics of multiple dimensions of the second disturbance sample processing submodule to obtain the processed disturbance sample.

Optionally, the second feature statistics sub-module is specifically configured to count features of multiple dimensions of the second feature statistics sub-module to obtain numerical features and classification features of the second feature statistics sub-module;

and the second disturbance sample processing submodule is used for preprocessing the disturbance sample according to the numerical characteristic and the classification characteristic of the second disturbance sample processing submodule to obtain the processed disturbance sample.

Optionally, the second preprocessing module further includes:

and the second characteristic processing submodule is used for carrying out standardization processing, abnormal value processing and one-hot encoding processing on the disturbance sample according to the characteristics of multiple dimensions of the disturbance sample to obtain the processed disturbance sample.

Optionally, the apparatus further comprises:

the contribution degree first module is used for receiving a public key sent by a first participant;

the information encryption module is used for encrypting the matching degree score, the label information and the processed disturbance sample through a public key;

and the information sending module is used for sending the encrypted matching degree score, the label information and the processed disturbance sample to the first participant so that the first participant decrypts the encrypted matching degree score, the label information and the processed disturbance sample sent by the second participant according to a preset private key.

An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,

a memory 703 for storing a computer program;

the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:

selecting a target sample from a training data set of the target sample, wherein the training data set is a data set of a model which is trained in advance and is based on federal learning;

carrying out disturbance processing on a target sample of the target sample to obtain a disturbance sample;

calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score;

preprocessing the self disturbance sample through a preset function to obtain a processed disturbance sample;

receiving the matching degree score, the label information and the processed disturbance sample of the second party;

and calculating the contribution degree corresponding to each feature in one or more features of the target sample of the self and the contribution degree corresponding to each feature in one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the self, the tag information, the processed disturbance sample, the matching degree score of the second participant, the tag information and the processed disturbance sample.

An embodiment of the present invention further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the following steps when executing the program stored in the memory 803:

and sending the matching degree score, the label information and the processed disturbance sample of the first party to the first party, so that the first party calculates the contribution degree corresponding to each feature in one or more features of the target sample of the first party and the contribution degree corresponding to each feature in one or more features of the target sample of the second party through a preset linear regression function according to the matching degree score, the label information, the processed disturbance sample, the matching degree score of the second party, the label information and the processed disturbance sample of the first party.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above steps of the method for evaluating model contribution based on longitudinal federal learning applied to a first participant.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above steps of the longitudinal federal learning based model contribution evaluation method applied to the second participant.

In yet another embodiment provided by the present invention, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to execute any one of the above-described method for evaluating model contribution based on longitudinal federal learning for a first participant.

In yet another embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-described method for longitudinal federal learning based model contribution assessment applied to a second party.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method, apparatus, electronic device, storage medium, and computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A longitudinal federal learning-based model contribution assessment system, characterized in that the system comprises a first party and one or more second parties;

the first and second parties are to: extracting a target sample from a training data set of the target sample, wherein the training data set is a data set used for training a federal learning model; performing disturbance processing on the extracted target sample to obtain a disturbance sample; calculating the matching degree between the target sample and the disturbance sample to obtain a matching degree score; processing the self disturbance sample by a data preprocessing method to obtain a processed disturbance sample; inputting the disturbance sample into the federal learning model to obtain label information corresponding to the disturbance sample;

the second participant further configured to: sending the matching degree score of a second participant and the processed disturbance sample to the first participant;

the first party is further to: receiving the matching degree score and the processed disturbance sample sent by the second participant; calculating contribution degrees corresponding to each feature in one or more features of the target sample of the first participant and contribution degrees corresponding to each feature in one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the processed disturbance sample, the matching degree score of the second participant and the processed disturbance sample, and calculating a global contribution degree corresponding to each feature according to the contribution degrees corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degrees corresponding to each feature in the one or more features of the target sample of the second participant.

2. The system of claim 1,

the first party is specifically configured to: calculating to obtain an average value of the contribution degrees corresponding to the features according to the contribution degree corresponding to each feature in the one or more features of the target sample of the first party and the contribution degree corresponding to each feature in the one or more features of the target sample of the second party, so as to obtain the global contribution degree corresponding to each feature.

3. The system of claim 1,

the second party is specifically configured to: and setting the weight of the label information and the processed disturbance sample according to the matching degree score of the label information and the processed disturbance sample.

4. The system of claim 1,

the first and second parties are specifically configured to: counting the characteristics of multiple dimensions of the self; and preprocessing the self disturbance sample according to the characteristics of the multiple dimensions of the self disturbance sample to obtain the processed disturbance sample.

5. The system of claim 4,

the first and second parties are specifically configured to: counting the characteristics of multiple dimensions of the self; and carrying out standardization processing, abnormal value processing and one-hot encoding processing on the disturbance sample according to the characteristics of the multiple dimensions of the disturbance sample to obtain the processed disturbance sample.

6. The system of claim 1,

the first party is further to: and sending the contribution degree corresponding to each feature in the one or more features of the target sample of the second party to the second party.

7. The system of claim 1,

the first party is further to: sending a public key of the second party to the second party;

the second party is specifically configured to: encrypting the matching degree score, the label information and the processed disturbance sample of the second party by a public key of the first party; sending the encrypted matching degree score and the processed disturbance sample to the first participant;

the first party is further to: and decrypting the encrypted matching degree score and the processed disturbance sample sent by the second party according to a private key of the second party.

8. A model contribution degree evaluation method based on longitudinal federated learning is characterized in that the method is applied to a first participant in a model contribution degree evaluation system based on longitudinal federated learning, the system comprises the first participant and one or more second participants, and the method comprises the following steps:

inputting the disturbance sample into the model based on the federal learning to obtain label information corresponding to the disturbance sample;

calculating contribution degrees corresponding to each feature in one or more features of the target sample of the first participant and contribution degrees corresponding to each feature in one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the processed disturbance sample, the matching degree score of the second participant and the processed disturbance sample, and calculating a global contribution degree corresponding to each feature according to the contribution degrees corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degrees corresponding to each feature in the one or more features of the target sample of the second participant.

9. A model contribution degree evaluation method based on longitudinal federated learning is characterized in that the method is applied to a second participant in a model contribution degree evaluation system based on longitudinal federated learning, wherein the system comprises a first participant and one or more second participants, and the method comprises the following steps:

inputting the disturbance sample into the federal learning model to obtain label information corresponding to the disturbance sample;

sending the matching degree score of a second participant and the processed disturbance sample to the first participant so that the first participant receives the matching degree score and the processed disturbance sample sent by the second participant; calculating contribution degrees corresponding to each feature in one or more features of the target sample of the first participant and contribution degrees corresponding to each feature in one or more features of the target sample of the second participant through a preset linear regression function according to the matching degree score of the first participant, the processed disturbance sample, the matching degree score of the second participant and the processed disturbance sample, and calculating a global contribution degree corresponding to each feature according to the contribution degrees corresponding to each feature in the one or more features of the target sample of the first participant and the contribution degrees corresponding to each feature in the one or more features of the target sample of the second participant.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 8 to 9 when executing a program stored in the memory.